New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add full data dump framework and implement for roundup #47
Conversation
Code has been merged and test have been run (except for those dumb 2.6 dependent ones >:( .) |
# Store the tracker model | ||
self.tm = tracker_model | ||
# Store the reactor manager | ||
self.rm = reactor_manager | ||
# Store wether or not to scrape messages, keywords, etc |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo: "wether" -> whether
try: | ||
self.extended_scrape==False | ||
except AttributeError: | ||
self.extended_scrape=False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestion: Assuming I understand the reason for this code---I think this pull request should include the addition of extended_scrape to all the importers, rather than catching this exception here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually this is here to make sure it exists because is sometimes disappears when called from tests (thru no lack of debugging of my own >:? .)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with @ehashman 's remark -- good spotting.
Also, PEP8 suggests using more spaces around these "=" characters. See http://legacy.python.org/dev/peps/pep-0008/ for more information on that.
@@ -130,13 +139,17 @@ def start_requests(self): | |||
logging.error("FYI, this bug importer does not support " | |||
"process_bugs(). Fix it.") | |||
|
|||
def __init__(self, input_filename=None): | |||
def __init__(self, input_filename=None, extended_scrape="False"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're using a string, "False", here, but can you use a boolean value, like False, instead? That would be much tidier, in my opinion.
This comment might not be super clear, so please feel free to ask me to clarify.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is required because the value passed thru scrapy.cmdline.args get run thru str(), so for the sake of keeping the code consistent I made it a string.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(However this has been cleaned up!)
+1, merging now @602p did you send an email with licensing info to the mailing list? If not, do that too. |
Add full data dump framework and implement for roundup
This adds the framework to use --extended-scrape to dump a copy of the raw scraped data. It also implements this feature for the Roundup issue importer.