Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use scrapy.items.Item to clarify data export; preserve project name; ditch bite_size_bug_name #3

Merged
merged 9 commits into from
May 22, 2012

Conversation

paulproteus
Copy link

These commits clarify the data export interface -- see bugimporters/items.py for a spec of what data gets passed out through the data transport.

It also adds code that calculates a project name within the bug importer, rather than doing it within the data transit as before. This code mostly passes the tests when tested with oh-mainline, although to make it really pass you need a branch I'll be pushing momentarily. (Look for it on github.com/openhatch/oh-mainline )

Note that calculating the project name within the bug importer is essential to fixing a problem where http://openhatch.org/search/ lists many tasks as being within "GNOME Bugzilla". (Those are supposed to use a custom bug parser, but that custom bug parser's project name was being ignored.)

So, the questions really for this review are:

  • Is this a reasonable way to add scrapy as a dependency?
  • Do we agree that bugimporters/items.py is a reasonable spec for what data gets passed out the data_transit ?
  • Do we agree on ditching bite_size_bug_name? (BTW, that column was recently deleted from oh-mainline, so I think that's a fine plan.)

If so, please give me an ACK (:

@shawnl
Copy link

shawnl commented May 22, 2012

why do we have this same or similar dict over and over in the code?

why can't the get_parsed_data_dict() function be put in a utility file, and make to work for all the scrapers?

@paulproteus
Copy link
Author

I don't agree with the claims of "out of order", but I'm open to hear how it is true.

The get_parsed_data_dict() is functionality that varies between the different bug importers; it's their common API. Hope that helps.

I think the move toward more Scrapy-ness will dramatically improve the cleanliness of this; this is one micro step toward that.

@shawnl
Copy link

shawnl commented May 22, 2012

fair enough, Just thought it should be mentioned.

Otherwise, looks good.

@paulproteus paulproteus merged commit 9dfec28 into openhatch:master May 22, 2012
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants