Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parse more deceased field formats #90

Merged
merged 1 commit into from Apr 25, 2019

Conversation

Projects
None yet
2 participants
@rgreinho
Copy link
Member

commented Apr 24, 2019

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • Code cleanup / Refactoring

Description

Parse more deceased field formats

Enhances the parsers to be able to extract information from more formats
of the deceased fields.

It now handles the fields separated by:

  • pipes
  • spaces

It also fixes the case when multiple fatalities occurred in one crash by
picking up the details of the first victim.

Other improvements:

  • Create new function for parsing the FLEG.
  • Improve the Gender parsing.
  • Improve the Ethnicity parsing.

Checklist:

  • I have updated the documentation accordingly
  • I have written unit tests

Fixes #80
Fixes #77
Fixes #51

@rgreinho rgreinho self-assigned this Apr 24, 2019

@rgreinho rgreinho requested a review from mrengler Apr 24, 2019

@rgreinho

This comment has been minimized.

Copy link
Member Author

commented Apr 24, 2019

Here is the diff if I update the 2019 data set with this version. I think it looks legit.

diff --git a/datasets/fatalities-2019-raw.json b/datasets/fatalities-2019-raw.json
index 4919642..0012593 100644
--- a/datasets/fatalities-2019-raw.json
+++ b/datasets/fatalities-2019-raw.json
@@ -6,7 +6,7 @@
     "Date": "02/09/2019",
     "Ethnicity": "Black",
     "Fatal crashes this year": "7",
-    "First Name": "Zion",
+    "First Name": "Messiah",
     "Gender": "male",
     "Last Name": "Mouton",
     "Link": "http://austintexas.gov/news/traffic-fatality-7-4",
@@ -21,7 +21,7 @@
     "Date": "02/06/2019",
     "Ethnicity": "White",
     "Fatal crashes this year": "6",
-    "First Name": "James",
+    "First Name": "Trevor",
     "Gender": "male",
     "Last Name": "Ralston",
     "Link": "http://austintexas.gov/news/traffic-fatality-6-6",
@@ -81,7 +81,7 @@
     "Date": "01/15/2019",
     "Ethnicity": "White",
     "Fatal crashes this year": "1",
-    "First Name": "Hilburn",
+    "First Name": "David",
     "Gender": "male",
     "Last Name": "Sell",
     "Link": "http://austintexas.gov/news/traffic-fatality-1-4",
@@ -111,7 +111,7 @@
     "Date": "02/27/2019",
     "Ethnicity": "Hispanic",
     "Fatal crashes this year": "10",
-    "First Name": "Ni\u00f1o",
+    "First Name": "Javier",
     "Gender": "male",
     "Last Name": "Esparza",
     "Link": "http://austintexas.gov/news/traffic-fatality-10-4",
@@ -126,7 +126,7 @@
     "Date": "02/21/2019",
     "Ethnicity": "Hispanic",
     "Fatal crashes this year": "9",
-    "First Name": "\u201cRudy\u201d",
+    "First Name": "Rosbel",
     "Gender": "male",
     "Last Name": "Tamez",
     "Link": "http://austintexas.gov/news/traffic-fatality-9-4",
@@ -186,7 +186,7 @@
     "Date": "03/29/2019",
     "Ethnicity": "Hispanic",
     "Fatal crashes this year": "12",
-    "First Name": "Cardenas",
+    "First Name": "Carlos",
     "Gender": "male",
     "Last Name": "Jr.",
     "Link": "http://austintexas.gov/news/traffic-fatality-12-4",
@@ -201,7 +201,7 @@
     "Date": "03/28/2019",
     "Ethnicity": "White",
     "Fatal crashes this year": "11",
-    "First Name": "Rae",
+    "First Name": "Jessica",
     "Gender": "female",
     "Last Name": "Saathoff",
     "Link": "http://austintexas.gov/news/traffic-fatality-11-4",
@@ -211,8 +211,13 @@
   },
   {
     "Case": "19-0921776",
+    "DOB": "06/24/1991",
     "Date": "04/02/2019",
+    "Ethnicity": "White",
     "Fatal crashes this year": "15",
+    "First Name": "Garrett",
+    "Gender": "male",
+    "Last Name": "Davis",
     "Link": "http://austintexas.gov/news/traffic-fatality-15-4",
     "Location": "517 E. Slaughter Lane",
     "Notes": "This is Austin\u2019s 15th fatal traffic crash of 2019, resulting in 15 fatalities this year. At this time in 2018, there were 15 fatal traffic crashes and 16 traffic fatalities.",
@@ -220,8 +225,13 @@
   },
   {
     "Case": "19-0961200",
+    "DOB": "08/01/1949",
     "Date": "04/06/2019",
+    "Ethnicity": "Asian",
     "Fatal crashes this year": "17",
+    "First Name": "Wing",
+    "Gender": "male",
+    "Last Name": "Chou",
     "Link": "http://austintexas.gov/news/traffic-fatality-17-4",
     "Location": "14000 block of N. SH-45",
     "Notes": "This is Austin\u2019s 17th fatal traffic crash of 2019, resulting in 17 fatalities this year. At this time in 2018, there were 15 fatal traffic crashes and 16 traffic fatalities.",
@@ -229,8 +239,13 @@
   },
   {
     "Case": "19-0930132",
+    "DOB": "01/15/1994",
     "Date": "04/03/2019",
+    "Ethnicity": "White",
     "Fatal crashes this year": "16",
+    "First Name": "Hannah",
+    "Gender": "female",
+    "Last Name": "Jaggers",
     "Link": "http://austintexas.gov/news/traffic-fatality-16-4",
     "Location": "E. Wells Branch Parkway/S. Heatherwilde Boulevard",
     "Notes": "This is Austin\u2019s 16th fatal traffic crash of 2019, resulting in 16 fatalities this year. At this time in 2018, there were 15 fatal traffic crashes and 16 traffic fatalities.",
@@ -238,8 +253,13 @@
   },
   {
     "Case": "19-1080673",
+    "DOB": "08/23/1941",
     "Date": "04/18/2019",
+    "Ethnicity": "Hispanic",
     "Fatal crashes this year": "21",
+    "First Name": "Elvira",
+    "Gender": "female",
+    "Last Name": "Trujillo",
     "Link": "http://austintexas.gov/news/traffic-fatality-21-3",
     "Location": "FM 973 and Pearce Lane",
     "Notes": "This is Austin\u2019s 21st fatal traffic crash of 2019, resulting in 22 fatalities this year. At this time in 2018, there were 19 fatal traffic crashes and 20 traffic fatalities.",
@@ -247,8 +267,13 @@
   },
   {
     "Case": "19-1110655",
+    "DOB": "08/19/1972",
     "Date": "04/21/2019",
+    "Ethnicity": "Black",
     "Fatal crashes this year": "22",
+    "First Name": "Aric",
+    "Gender": "male",
+    "Last Name": "Maxwell",
     "Link": "http://austintexas.gov/news/traffic-fatality-22-3",
     "Location": "5300 blk N. IH-35 SB",
     "Notes": "This is Austin\u2019s 22nd fatal traffic crash of 2019, resulting in 23 fatalities this year. At this time in 2018, there were 19 fatal traffic crashes and 20 traffic fatalities.",
@@ -256,8 +281,11 @@
   },
   {
     "Case": "19-1080319",
+    "DOB": "04/19/2019",
     "Date": "04/18/2019",
+    "Ethnicity": "Hispanic",
     "Fatal crashes this year": "20",
+    "Gender": "male",
     "Link": "http://austintexas.gov/news/traffic-fatality-20-4",
     "Location": "8000 block of West U.S. 290",
     "Notes": "This is Austin\u2019s 20th fatal traffic crash of 2019, resulting in 21 fatalities this year. At this time in 2018, there were 19 fatal traffic crashes and 20 traffic fatalities.",
@@ -265,8 +293,13 @@
   },
   {
     "Case": "19-1070614",
+    "DOB": "06/25/1948",
     "Date": "04/17/2019",
+    "Ethnicity": "White",
     "Fatal crashes this year": "19",
+    "First Name": "Michael",
+    "Gender": "male",
+    "Last Name": "Cannatti",
     "Link": "http://austintexas.gov/news/traffic-fatality-19-5",
     "Location": "Jollyville Rd. and Balcones Woods Drive.",
     "Notes": "This is Austin\u2019s 19th fatal traffic crash of 2019, resulting in 20 fatalities this year. At this time in 2018, there were 19 fatal traffic crashes and 20 traffic fatalities.",
@@ -274,8 +307,13 @@
   },
   {
     "Case": "19-1070052",
+    "DOB": "05/17/1963",
     "Date": "04/17/2019",
+    "Ethnicity": "White",
     "Fatal crashes this year": "18",
+    "First Name": "James",
+    "Gender": "male",
+    "Last Name": "Bourgeois",
     "Link": "http://austintexas.gov/news/traffic-fatality-18-4",
     "Location": "5600 block of S. Congress Avenue",
     "Notes": "This is Austin\u2019s 18th fatal traffic crash of 2019, resulting in 19 fatalities this year. At this time in 2018, there were 19 fatal traffic crashes and 20 traffic fatalities.",

@rgreinho rgreinho force-pushed the rgreinho:issues/80/new-deceased-format branch from a18c3e1 to f127e4f Apr 24, 2019

Parse more deceased field formats
Enhances the parsers to be able to extract information from more formats
of the deceased fields.

It now handles the fields separated by:
* pipes
* spaces

It also fixes the case when multiple fatalities occurred in one crash by
picking up the details of the first victim.

Other improvements:
* Create new function for parsing the FLEG.
* Improve the `Gender` parsing.
* Improve the `Ethnicity` parsing.

Fixes #80
Fixes #77
Fixes #51

@rgreinho rgreinho force-pushed the rgreinho:issues/80/new-deceased-format branch from f127e4f to 71bcd3e Apr 25, 2019


def parse_space_delimited_deceased_field(deceased_field):

This comment has been minimized.

Copy link
@mrengler

mrengler Apr 25, 2019

Contributor

So much delimiter support now 🎉 A lot of similar code between pipe and space, maybe these could be one function in the future with a delimiter parameter.

try:
d[Fields.GENDER] = fleg.pop().replace(',', '').lower()
if d.get(Fields.GENDER) == 'f':

This comment has been minimized.

Copy link
@mrengler

mrengler Apr 25, 2019

Contributor

Didn't even realize there were f's and m's...good catch

@mergify mergify bot merged commit 17fe417 into scrapd:master Apr 25, 2019

7 checks passed

Mergify — Summary 1 rule matches
Details
ci/circleci: docs Your tests passed on CircleCI!
Details
ci/circleci: format Your tests passed on CircleCI!
Details
ci/circleci: lint Your tests passed on CircleCI!
Details
ci/circleci: prepare Your tests passed on CircleCI!
Details
ci/circleci: test Your tests passed on CircleCI!
Details
coverage/coveralls Coverage remained the same at 100.0%
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.