This repository has been archived by the owner on Feb 2, 2022. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 15
Parse Age without Date of Birth #125
Merged
Merged
Changes from all commits
Commits
Show all changes
29 commits
Select commit
Hold shift + click to select a range
ad536b6
use first item of deceased_field for First Name
mscarey 65c9e87
make missing name data throw IndexError
mscarey 9fb41cc
add name with punctuation to parsing test
mscarey 3698f8f
move split method within deceased field parser
mscarey 7ffadbb
split deceased field on some slashes
mscarey 510032e
Merge https://github.com/scrapd/scrapd
mscarey 7f40cde
Merge remote-tracking branch 'origin'
mscarey 4fa089c
change parsing tests to expect datetimes
mscarey 304879e
change date string calculations to datetime
mscarey 097aaf7
update more tests to expect datetime objects
mscarey 9b7dc35
add function to parse deceased field without DOB
mscarey 8ba1ae7
run yapf format
mscarey a2a1ec8
change datetime objects to date objects
mscarey efffc77
add tests for formatters
mscarey d61d162
import date_utils functions by name
mscarey cabdf73
add date serializing to formatters
mscarey fabe8cd
run yapf format
mscarey c400595
remove ValueError for failed age parsing.
mscarey 306b6fd
slightly delay conversion of datetime to date
mscarey 9ea122c
add line breaks for formatting tests
mscarey 394fbed
rephrase first lines of docstrings for pydocstyle
mscarey 6fbd7a8
use strftime, not isoformat, for date format
mscarey 848debf
add yapf line break
mscarey b10aeaa
delete duplicate test
mscarey 6b0cfbf
change CSV formatter to month-first date format
mscarey a27a9e4
change to module-level import of date_utils
mscarey ec32be9
change DOB to datetime.date earlier
mscarey 730c83b
add yapf line breaks
mscarey ccd34d7
Merge branch 'master' of https://github.com/scrapd/scrapd into age
mscarey File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -63,6 +63,17 @@ def print(self, results, **kwargs): | |
formatter = self._get_formatter() | ||
formatter.printer(results, **kwargs) | ||
|
||
def date_serialize(self, obj): | ||
""" | ||
Convert date objects to string for serialization. | ||
|
||
:rtype: str | ||
""" | ||
|
||
if isinstance(obj, (datetime.datetime, datetime.date)): | ||
return obj.strftime("%m/%d/%Y") | ||
raise TypeError("Type %s not serializable" % type(obj)) | ||
|
||
# pylint: disable=unused-argument | ||
def printer(self, results, **kwargs): | ||
""" | ||
|
@@ -72,6 +83,17 @@ def printer(self, results, **kwargs): | |
""" | ||
print(results, file=self.output) | ||
|
||
def to_json_string(self, results): | ||
""" | ||
Convert dict of parsed fields to JSON string. | ||
|
||
:param results dict: results of scraping APD news site | ||
|
||
:rtype: str | ||
""" | ||
|
||
return json.dumps(results, sort_keys=True, indent=2, default=self.date_serialize) | ||
|
||
|
||
class PythonFormatter(Formatter): | ||
""" | ||
|
@@ -97,7 +119,8 @@ class JSONFormatter(Formatter): | |
__format_name__ = 'json' | ||
|
||
def printer(self, results, **kwargs): # noqa: D102 | ||
print(json.dumps(results, sort_keys=True, indent=2), file=self.output) | ||
json_string = self.to_json_string(results) | ||
print(json_string, file=self.output) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. You don't need to create an extra function here. print(json.dumps(results, sort_keys=True, indent=2, default=self.date_serialize), file=self.output) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I wanted that line to be in a method with a return value for easier testing. I made more changes for the CSV formatter. |
||
|
||
|
||
class CSVFormatter(Formatter): | ||
|
@@ -110,6 +133,8 @@ class CSVFormatter(Formatter): | |
__format_name__ = 'csv' | ||
|
||
def printer(self, results, **kwargs): # noqa: D102 | ||
results = self.to_json_string(results) | ||
results = json.loads(results) | ||
writer = csv.DictWriter(self.output, fieldnames=CSVFIELDS, extrasaction='ignore') | ||
writer.writeheader() | ||
writer.writerows(results) | ||
|
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This part should go to the
sanitize_fatality_entity()
function. The goal of the sanitizing function was to ensure that the final values had the right format, or that unnecessary or invalid values (likeNone
or empty ones) were discarded.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently that part needs to happen before the deceased field is parsed, while
sanitize_fatality_entity()
has to happen afterwards.