Skip to content
This repository has been archived by the owner on Feb 2, 2022. It is now read-only.

Parse more deceased field formats #90

Merged
merged 1 commit into from
Apr 25, 2019

Conversation

rgreinho
Copy link
Member

@rgreinho rgreinho commented Apr 24, 2019

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • Code cleanup / Refactoring

Description

Parse more deceased field formats

Enhances the parsers to be able to extract information from more formats
of the deceased fields.

It now handles the fields separated by:

  • pipes
  • spaces

It also fixes the case when multiple fatalities occurred in one crash by
picking up the details of the first victim.

Other improvements:

  • Create new function for parsing the FLEG.
  • Improve the Gender parsing.
  • Improve the Ethnicity parsing.

Checklist:

  • I have updated the documentation accordingly
  • I have written unit tests

Fixes #80
Fixes #77
Fixes #51

@rgreinho rgreinho self-assigned this Apr 24, 2019
@rgreinho rgreinho requested a review from mrengler April 24, 2019 23:23
@rgreinho
Copy link
Member Author

Here is the diff if I update the 2019 data set with this version. I think it looks legit.

diff --git a/datasets/fatalities-2019-raw.json b/datasets/fatalities-2019-raw.json
index 4919642..0012593 100644
--- a/datasets/fatalities-2019-raw.json
+++ b/datasets/fatalities-2019-raw.json
@@ -6,7 +6,7 @@
     "Date": "02/09/2019",
     "Ethnicity": "Black",
     "Fatal crashes this year": "7",
-    "First Name": "Zion",
+    "First Name": "Messiah",
     "Gender": "male",
     "Last Name": "Mouton",
     "Link": "http://austintexas.gov/news/traffic-fatality-7-4",
@@ -21,7 +21,7 @@
     "Date": "02/06/2019",
     "Ethnicity": "White",
     "Fatal crashes this year": "6",
-    "First Name": "James",
+    "First Name": "Trevor",
     "Gender": "male",
     "Last Name": "Ralston",
     "Link": "http://austintexas.gov/news/traffic-fatality-6-6",
@@ -81,7 +81,7 @@
     "Date": "01/15/2019",
     "Ethnicity": "White",
     "Fatal crashes this year": "1",
-    "First Name": "Hilburn",
+    "First Name": "David",
     "Gender": "male",
     "Last Name": "Sell",
     "Link": "http://austintexas.gov/news/traffic-fatality-1-4",
@@ -111,7 +111,7 @@
     "Date": "02/27/2019",
     "Ethnicity": "Hispanic",
     "Fatal crashes this year": "10",
-    "First Name": "Ni\u00f1o",
+    "First Name": "Javier",
     "Gender": "male",
     "Last Name": "Esparza",
     "Link": "http://austintexas.gov/news/traffic-fatality-10-4",
@@ -126,7 +126,7 @@
     "Date": "02/21/2019",
     "Ethnicity": "Hispanic",
     "Fatal crashes this year": "9",
-    "First Name": "\u201cRudy\u201d",
+    "First Name": "Rosbel",
     "Gender": "male",
     "Last Name": "Tamez",
     "Link": "http://austintexas.gov/news/traffic-fatality-9-4",
@@ -186,7 +186,7 @@
     "Date": "03/29/2019",
     "Ethnicity": "Hispanic",
     "Fatal crashes this year": "12",
-    "First Name": "Cardenas",
+    "First Name": "Carlos",
     "Gender": "male",
     "Last Name": "Jr.",
     "Link": "http://austintexas.gov/news/traffic-fatality-12-4",
@@ -201,7 +201,7 @@
     "Date": "03/28/2019",
     "Ethnicity": "White",
     "Fatal crashes this year": "11",
-    "First Name": "Rae",
+    "First Name": "Jessica",
     "Gender": "female",
     "Last Name": "Saathoff",
     "Link": "http://austintexas.gov/news/traffic-fatality-11-4",
@@ -211,8 +211,13 @@
   },
   {
     "Case": "19-0921776",
+    "DOB": "06/24/1991",
     "Date": "04/02/2019",
+    "Ethnicity": "White",
     "Fatal crashes this year": "15",
+    "First Name": "Garrett",
+    "Gender": "male",
+    "Last Name": "Davis",
     "Link": "http://austintexas.gov/news/traffic-fatality-15-4",
     "Location": "517 E. Slaughter Lane",
     "Notes": "This is Austin\u2019s 15th fatal traffic crash of 2019, resulting in 15 fatalities this year. At this time in 2018, there were 15 fatal traffic crashes and 16 traffic fatalities.",
@@ -220,8 +225,13 @@
   },
   {
     "Case": "19-0961200",
+    "DOB": "08/01/1949",
     "Date": "04/06/2019",
+    "Ethnicity": "Asian",
     "Fatal crashes this year": "17",
+    "First Name": "Wing",
+    "Gender": "male",
+    "Last Name": "Chou",
     "Link": "http://austintexas.gov/news/traffic-fatality-17-4",
     "Location": "14000 block of N. SH-45",
     "Notes": "This is Austin\u2019s 17th fatal traffic crash of 2019, resulting in 17 fatalities this year. At this time in 2018, there were 15 fatal traffic crashes and 16 traffic fatalities.",
@@ -229,8 +239,13 @@
   },
   {
     "Case": "19-0930132",
+    "DOB": "01/15/1994",
     "Date": "04/03/2019",
+    "Ethnicity": "White",
     "Fatal crashes this year": "16",
+    "First Name": "Hannah",
+    "Gender": "female",
+    "Last Name": "Jaggers",
     "Link": "http://austintexas.gov/news/traffic-fatality-16-4",
     "Location": "E. Wells Branch Parkway/S. Heatherwilde Boulevard",
     "Notes": "This is Austin\u2019s 16th fatal traffic crash of 2019, resulting in 16 fatalities this year. At this time in 2018, there were 15 fatal traffic crashes and 16 traffic fatalities.",
@@ -238,8 +253,13 @@
   },
   {
     "Case": "19-1080673",
+    "DOB": "08/23/1941",
     "Date": "04/18/2019",
+    "Ethnicity": "Hispanic",
     "Fatal crashes this year": "21",
+    "First Name": "Elvira",
+    "Gender": "female",
+    "Last Name": "Trujillo",
     "Link": "http://austintexas.gov/news/traffic-fatality-21-3",
     "Location": "FM 973 and Pearce Lane",
     "Notes": "This is Austin\u2019s 21st fatal traffic crash of 2019, resulting in 22 fatalities this year. At this time in 2018, there were 19 fatal traffic crashes and 20 traffic fatalities.",
@@ -247,8 +267,13 @@
   },
   {
     "Case": "19-1110655",
+    "DOB": "08/19/1972",
     "Date": "04/21/2019",
+    "Ethnicity": "Black",
     "Fatal crashes this year": "22",
+    "First Name": "Aric",
+    "Gender": "male",
+    "Last Name": "Maxwell",
     "Link": "http://austintexas.gov/news/traffic-fatality-22-3",
     "Location": "5300 blk N. IH-35 SB",
     "Notes": "This is Austin\u2019s 22nd fatal traffic crash of 2019, resulting in 23 fatalities this year. At this time in 2018, there were 19 fatal traffic crashes and 20 traffic fatalities.",
@@ -256,8 +281,11 @@
   },
   {
     "Case": "19-1080319",
+    "DOB": "04/19/2019",
     "Date": "04/18/2019",
+    "Ethnicity": "Hispanic",
     "Fatal crashes this year": "20",
+    "Gender": "male",
     "Link": "http://austintexas.gov/news/traffic-fatality-20-4",
     "Location": "8000 block of West U.S. 290",
     "Notes": "This is Austin\u2019s 20th fatal traffic crash of 2019, resulting in 21 fatalities this year. At this time in 2018, there were 19 fatal traffic crashes and 20 traffic fatalities.",
@@ -265,8 +293,13 @@
   },
   {
     "Case": "19-1070614",
+    "DOB": "06/25/1948",
     "Date": "04/17/2019",
+    "Ethnicity": "White",
     "Fatal crashes this year": "19",
+    "First Name": "Michael",
+    "Gender": "male",
+    "Last Name": "Cannatti",
     "Link": "http://austintexas.gov/news/traffic-fatality-19-5",
     "Location": "Jollyville Rd. and Balcones Woods Drive.",
     "Notes": "This is Austin\u2019s 19th fatal traffic crash of 2019, resulting in 20 fatalities this year. At this time in 2018, there were 19 fatal traffic crashes and 20 traffic fatalities.",
@@ -274,8 +307,13 @@
   },
   {
     "Case": "19-1070052",
+    "DOB": "05/17/1963",
     "Date": "04/17/2019",
+    "Ethnicity": "White",
     "Fatal crashes this year": "18",
+    "First Name": "James",
+    "Gender": "male",
+    "Last Name": "Bourgeois",
     "Link": "http://austintexas.gov/news/traffic-fatality-18-4",
     "Location": "5600 block of S. Congress Avenue",
     "Notes": "This is Austin\u2019s 18th fatal traffic crash of 2019, resulting in 19 fatalities this year. At this time in 2018, there were 19 fatal traffic crashes and 20 traffic fatalities.",

@rgreinho rgreinho force-pushed the issues/80/new-deceased-format branch from a18c3e1 to f127e4f Compare April 24, 2019 23:33
Enhances the parsers to be able to extract information from more formats
of the deceased fields.

It now handles the fields separated by:
* pipes
* spaces

It also fixes the case when multiple fatalities occurred in one crash by
picking up the details of the first victim.

Other improvements:
* Create new function for parsing the FLEG.
* Improve the `Gender` parsing.
* Improve the `Ethnicity` parsing.

Fixes scrapd#80
Fixes scrapd#77
Fixes scrapd#51
@rgreinho rgreinho force-pushed the issues/80/new-deceased-format branch from f127e4f to 71bcd3e Compare April 25, 2019 01:06

def parse_space_delimited_deceased_field(deceased_field):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So much delimiter support now 🎉 A lot of similar code between pipe and space, maybe these could be one function in the future with a delimiter parameter.

try:
d[Fields.GENDER] = fleg.pop().replace(',', '').lower()
if d.get(Fields.GENDER) == 'f':
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Didn't even realize there were f's and m's...good catch

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
2 participants