Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add full data dump framework and implement for roundup #47

Merged
merged 32 commits into from Apr 16, 2014
Merged

Add full data dump framework and implement for roundup #47

merged 32 commits into from Apr 16, 2014

Conversation

602p
Copy link

@602p 602p commented Apr 15, 2014

This adds the framework to use --extended-scrape to dump a copy of the raw scraped data. It also implements this feature for the Roundup issue importer.

@602p
Copy link
Author

602p commented Apr 16, 2014

Code has been merged and test have been run (except for those dumb 2.6 dependent ones >:( .)

# Store the tracker model
self.tm = tracker_model
# Store the reactor manager
self.rm = reactor_manager
# Store wether or not to scrape messages, keywords, etc
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo: "wether" -> whether

try:
self.extended_scrape==False
except AttributeError:
self.extended_scrape=False
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: Assuming I understand the reason for this code---I think this pull request should include the addition of extended_scrape to all the importers, rather than catching this exception here.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually this is here to make sure it exists because is sometimes disappears when called from tests (thru no lack of debugging of my own >:? .)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @ehashman 's remark -- good spotting.

Also, PEP8 suggests using more spaces around these "=" characters. See http://legacy.python.org/dev/peps/pep-0008/ for more information on that.

@@ -130,13 +139,17 @@ def start_requests(self):
logging.error("FYI, this bug importer does not support "
"process_bugs(). Fix it.")

def __init__(self, input_filename=None):
def __init__(self, input_filename=None, extended_scrape="False"):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're using a string, "False", here, but can you use a boolean value, like False, instead? That would be much tidier, in my opinion.

This comment might not be super clear, so please feel free to ask me to clarify.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is required because the value passed thru scrapy.cmdline.args get run thru str(), so for the sake of keeping the code consistent I made it a string.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(However this has been cleaned up!)

@602p 602p changed the title Add full data dump framework and implemen for roundup Add full data dump framework and implement for roundup Apr 16, 2014
@paulproteus
Copy link

+1, merging now

@602p did you send an email with licensing info to the mailing list? If not, do that too.

paulproteus added a commit that referenced this pull request Apr 16, 2014
Add full data dump framework and implement for roundup
@paulproteus paulproteus merged commit bce528a into openhatch:master Apr 16, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants