### Los Angeles County Coroner Scraper

By: Shirsho Dasgupta (2019; last scrape: October 6, 2021)

##### Notes:

The code scrapes the L.A. County Coroner's website and stores the details of the deceased into a spreadsheet. 

An example after scraping the first two pages (or 200 entries) is attached. See deathlist.csv

### Importing libraries

In [1]:
import requests 
from bs4 import BeautifulSoup
import time
import re
import json
import pandas as pd

### Creating spreadsheet to be written into

In [2]:
with open("deathlist.csv", "w") as f:
    f.write("CaseNumber, Name, BirthDate, DeathDate, Age, Gender, DeathPlace, BodyStatus, Mode, Investigator, DepMedicalExaminer, CaseStatus, Race, CauseA, CauseB, CauseC, CauseD, CauseOther, toxstatus\n" )

### Defining function to extract data from each case and write it into spreadsheet

In [3]:
# defining function 
def pageDetails(url):   
    
    # takes the URL
    page = requests.get(url)
    # takes the JSON file from the URL
    data = page.json()
    # stores the data 
    records = data["cases"]
    
    # loops through each entry in the JSON
    for record in records:
            
            # stores the case number
            casenumber = record["CaseNum"]
            
            # stores the deathdate and formats it to YYYY-MM-DD 
            deathdate = record["DeathDate"]
            y = str(deathdate)
            deathdate_final = y[6:] + "-" + y[:2] + "-" + y[3:][:2]
            
            # checks if the person died after 2014; deaths before that are not recorded online and the code breaks if for any reason the recorded year is before 2014
            if int(deathdate[6:]) > 2014:
                
                # a sleep timer ensures the server is not overloaded and the connection is open
                time.sleep(0.01)
                
                # stores the name
                name = str(record["NameFirst"]) + " " + str(record["NameLast"])
                name = name.replace(",", "-")
                
                # stores the birthdate and formats it to YYYY-MM-DD 
                birthdate = record["BirthDate"]
                x = str(birthdate)
                birthdate_final = x[6:] + "-" + x[:2] + "-" + x[3:][:2]
              
                # stores the age 
                age = record["Age"]
                # stores the gender
                gender = record["Gender"]
                # stores the place of death
                deathplace = record["DeathPlace"].replace(",", "-")
                # stores the status of the body
                bodystatus = record["BodyStatus"].replace(",", "-")
                # stores the mode of death
                mode = record["Mode"].replace(",", "-")
                # stores the name of the primary investigator
                investigator = record["Investigator"].replace(",", "-")
                # stores the name of the deputy medical examiner
                depmedexaminer = record["DME"].replace(",", "-")
                # stores the status of the case
                casestatus = record["CaseStatus"].replace(",", "-")
                
                ### REDIRECTS TO PAGE WITH MORE DETAILED DATA
                # stores URL for the JSON file that stores further details of each case
                newurl = "http://api.lacounty.gov/mecsearch/CaseInformationServlet?caseDetails=1&CaseNum=" + str(casenumber)
                # takes the URL 
                detailpage = requests.get(newurl)
                # takes the JSON file from the URL
                detdata = detailpage.json()
                # stores the data 
                details = detdata["caseDetail"]
                # stores the race
                race = details[0]["Race"].replace(",", "-")
                # stores each cause
                causeA = details[0]["CauseA"].replace(",", "-")
                causeC = details[0]["CauseB"].replace(",", "-")
                causeB = details[0]["CauseC"].replace(",", "-")
                causeD = details[0]["CauseD"].replace(",", "-")
                causeOther = details[0]["CauseOther"].replace(",", "-")
                # stores the toxic status
                toxstatus = details[0]["ToxStatus"].replace(",", "-")
                
                # writes details into the CSV
                with open("deathlist.csv", "a") as f:
                            f.write(casenumber + "," + name + "," + birthdate_final + "," + deathdate_final + "," + age + "," + gender + "," + deathplace + "," + bodystatus + "," + mode + "," + investigator + "," + depmedexaminer + "," + casestatus + "," + race + "," + causeA + "," + causeB + "," + causeC + "," + causeD + "," + causeOther + "," + toxstatus + "\n") 
            else:
                break

### Calling the function and running through each JSON page

In [4]:
# to run through the entire dataset, the end-point at the time of coding should be 1374 since a total of 137359 deaths were recorded and each page displays 100 results; this can be changed accordingly
## the end-point is 2 since the code only runs through the first 200 entries or 2 pages as an example
for offset in range(0, 2):
    
    # starts with the initial loop, resetting each time reading that page is completed
    url = "http://api.lacounty.gov/mecsearch/CaseInformationServlet?pageNumber=" + str(offset) + "&pageSize=100&NameFirst=&NameLast=&BirthDate=&Age=&DeathDate=&CaseNum=&sortColumn=CaseNum&sortOrder=desc"
    
    # calls the function to write the details into the CSV
    pageDetails(url)
    time.sleep(0.01)

### Importing and displaying the resulting spreadsheet

In [5]:
deaths = pd.read_csv("deathlist.csv")

# filling up empty cells with "N/A"
deaths.fillna("N/A", inplace = True)

deaths.head(5)

Unnamed: 0,CaseNumber,Name,BirthDate,DeathDate,Age,Gender,DeathPlace,BodyStatus,Mode,Investigator,DepMedicalExaminer,CaseStatus,Race,CauseA,CauseB,CauseC,CauseD,CauseOther,toxstatus
0,2021-11134,BRUCE CORBIN,1933-01-03,2021-10-05,88.0,MALE,BATHROOM,HERE,,PECK,,INV. COMPLETE,CAUCASIAN,,,,,,NOT REQUESTED
1,2021-11128,FREDDY MACHUCA,1987-09-11,2021-10-05,34.0,MALE,SIDEWALK,HERE,,MUNOZ,,INV. ASSIGNED,HISPANIC/LATIN AMERICAN,,,,,,NOT REQUESTED
2,2021-11125,LORRAINE JOHNSON,1976-02-19,2021-10-04,45.0,FEMALE,RESIDENCE,EXAM SCHEDULED,,DARABEDYAN,,Exam Pending,HISPANIC/LATIN AMERICAN,,,,,,NOT REQUESTED
3,2021-11115,MIGUEL CONTRERAS HUIZAR,1996-04-14,2021-10-04,25.0,MALE,RESIDENCE,EXAM SCHEDULED,,TARDIE,,Exam Pending,HISPANIC/LATIN AMERICAN,,,,,,NOT REQUESTED
4,2021-11106,EDWARD MUNOZ,1960-09-05,2021-10-04,61.0,MALE,RESIDENCE,HERE,,PECK,,INV. COMPLETE,HISPANIC/LATIN AMERICAN,,,,,,NOT REQUESTED
