# Introduction

**Incarceration & COVID-19: How Jails Respond to COVID**<br>

This project scrapes daily jail roster information to create a large dataset. This dataset is designed to analyze how jail populations have fluctuated in response to COVID-19. Research centers on explaining why county jails in different parts of the United States have responded differently to the pandemic over time. 

A separate but related idea for this dataset analyzes the impact of pandemic-related jail population declines on local crime. This project uses daily jail roster population counts as the focal variable and the analysis uses group-based trajectory modeling. Our scraped data will address gaps in the [Vera](https://github.com/vera-institute/jail-population-data) dataset.

We start by comparing Washington and New York states because they dealt with COVID-19 at the early on-set of the pandemic. Below is a list of the data points to collect to harmonize with the Vera data.
- County Name
- State Name
- Daily Population Counts
- Reporting Jail Name

# Imports

In [1]:
# Import libraries
import pandas as pd
from datetime import datetime

# Request
import requests
import re
import zlib
import urllib
import urllib.request

# Selenium
from selenium import webdriver

#SQL
import mysql.connector
import os

# MySQL

In [2]:
db_user = os.getenv('db_user')
db_passwd = os.getenv('db_passwd')


mydb = mysql.connector.connect(host='localhost',\
                              user=db_user,\
                              passwd=db_passwd,\
                              database='testdb'\
                              )
                               
print(mydb)

<mysql.connector.connection.MySQLConnection object at 0x7fe64c0316d0>


In [3]:
# Create a database
mycursor = mydb.cursor()

# This line is commented out because it only needs to be run once
# mycursor.execute("CREATE DATABASE testdb")

mycursor.execute("SHOW DATABASES")

for db in mycursor:
    print(db)

('information_schema',)
('mysql',)
('performance_schema',)
('sys',)
('testdb',)


In [4]:
# # Create a table (do not erase/for refrerence)

# mycursor.execute("CREATE TABLE county_jails\
#                  (reporting_jurisdictions VARCHAR(100),\
#                  county_name VARCHAR(100),\
#                  state_name VARCHAR(100),\
#                  Date VARCHAR(100),\
#                  jail_population INTEGER(255))")

mycursor.execute("SHOW TABLES")

for tb in mycursor:
    print(tb)

('county_jails',)


In [5]:
sqlFormula = "INSERT INTO county_jails (Date, reporting_jurisdictions, county_name, \
state_name, jail_population) VALUES (%s, %s, %s, %s, %s)"

# States

Be sure to check for APIs in addition to scraping. Will include NY, WA and FL.

## Washington

### Whitman

In [11]:
#View website with driver
url = "http://www.whitmancountyjail.org/"
driver = webdriver.Chrome('/Users/meaganrossi/Projects/Incarceration_COVID/chromedriver')
driver.implicitly_wait(3)
driver.get(url)

#Scrape the county jail daily count, view full list 
listy = driver.find_elements_by_css_selector('h4')
# for x in listy[:50]:
#     if len(x.text) > 0:
#         print(x.text)

#Identify and create dynamic fields
todays_date = datetime.now().strftime('%Y-%m-%d')
JPWhitman = (len(listy))-10
print('Date = ',todays_date)
print('jail_population = ',JPWhitman)


#Use for all MySQL pushes
Whitman = (todays_date, "Whitman County Jail", "Whitman County", "WA", JPWhitman)
mycursor.execute(sqlFormula, Whitman)
mydb.commit()

#Exit out of browser window
driver.close()

Date =  2020-09-14
jail_population =  22


### Spokane

In [13]:
url = "https://www.spokanecounty.org/352/Inmate-Roster'"
driver = webdriver.Chrome('/Users/meaganrossi/Projects/Incarceration_COVID/chromedriver')
driver.implicitly_wait(3)
driver.get(url)

# print(driver.page_source)

# Sinmate = driver.find_element_by_xpath('//*[@id="tblInmateRoster_info"]')

In [14]:
JPSpokane = 718

In [15]:
#USE FOR ALL COMMITS
Spokane = (todays_date, "Spokane County Jail", "Spokane County", "WA", JPSpokane)
mycursor.execute(sqlFormula, Spokane)
mydb.commit()

driver.close()

### Okanogan

Details can be found in the Daily Jail Inmate Log on [Okanogan Sherriff Website](https://okanogansheriff.org/).

In [16]:
url = "https://okanogansheriff.org/"
driver = webdriver.Chrome('/Users/meaganrossi/Projects/Incarceration_COVID/chromedriver')
driver.implicitly_wait(3)
driver.get(url)

# print(driver.page_source)

In [17]:
JPOkanogan = 93

In [18]:
#USE FOR ALL COMMITS
Okanogan = (todays_date, "Okanogan County Jail", "Okanogan County", "WA", JPOkanogan)
mycursor.execute(sqlFormula, Okanogan)
mydb.commit()

driver.close()

### Jefferson

[Jefferson](https://co.jefferson.wa.us/174/Jail-Inmate-Search)<br> To view the full inmate roster click the Clear button then the Search button.

In [19]:
url = "https://co.jefferson.wa.us/174/Jail-Inmate-Search"
driver = webdriver.Chrome('/Users/meaganrossi/Projects/Incarceration_COVID/chromedriver')
driver.implicitly_wait(3)
driver.get(url)

# print(driver.page_source)

In [20]:
# Hidden input type

inmate = driver.find_elements_by_name('Name')
print(len(inmate))

0


In [21]:
JPJefferson = 19

In [22]:
#USE FOR ALL COMMITS
Jefferson = (todays_date, "Jefferson County Jail", "Jefferson County", "WA", JPJefferson)
mycursor.execute(sqlFormula, Jefferson)
mydb.commit()

driver.close()

### Grant

[Grant](ttps://www.grantcountywa.gov/SHERIFF/Corrections/Inmate-Roster.htm), daily pdf

In [23]:
grant = urllib.request.Request("https://www.grantcountywa.gov/SHERIFF/Corrections/Roster-InmateinmateRoster%20v%206.rpt.pdf",\
#                               method= 'GET',\
                               headers= { 'User-Agent' : 'Chrome/41.0.2228.0',\
                                 'Content-Type': 'application/x-www-form-urlencoded'\
                                        })

response = urllib.request.urlopen(grant)
the_page = response.read()
# print(the_page)

In [24]:
stream = re.compile(b'.*?FlateDecode.*?stream(.*?)endstream', re.S)

for s in re.findall(stream,the_page):
    s = s.strip(b'\r\n')
    try:
        print(zlib.decompress(s).decode('UTF-8'))
    except:
        pass
    

1 0 0 1 0 792.1 cm 0 0 0 rg
0 0 0 RG
36 -39.65 558.1 -734.45 re
W
n
0.914 0.996 0.996 sc
36 -265.2 558.1 -12 re
f
36 -289.2 558.1 -12 re
f
36 -313.2 558.1 -12 re
f
36 -337.2 558.1 -12 re
f
36 -361.2 558.1 -12 re
f
36 -385.2 558.1 -12 re
f
36 -409.2 558.1 -12 re
f
36 -433.2 558.1 -12 re
f
36 -457.2 558.1 -12 re
f
36 -481.2 558.1 -12 re
f
36 -505.2 558.1 -12 re
f
36 -529.2 558.1 -12 re
f
36 -553.2 558.1 -12 re
f
36 -577.2 558.1 -12 re
f
36 -601.2 558.1 -12 re
f
36 -625.2 558.1 -12 re
f
36 -649.2 558.1 -12 re
f
36 -673.2 558.1 -12 re
f
36 -697.2 558.1 -12 re
f
36 -721.2 558.1 -12 re
f
36 -745.2 558.1 -12 re
f
1 w
36 -220.35 m
593.9 -220.35 l
S
q
409.6 0 0 121.5 103.6 -161.15 cm
/img0  Do
Q
BT
1 0 0 1 77.15 -176.9 Tm
0 0 0 sc
/c 12 Tf
(Current in-) Tj
61.35 0 Td
(custody list for the Grant County Jail and the Grant County ) Tj
[(W)21(ork Release)] TJ
124.2 -15 Td
/d 8 Tf
(As of: 9/14/2020  2:42 pm) Tj
-226.7 -26.7 Td
/c 11 Tf
(Name) Tj
220.3 0.45 Td
(Age) Tj
265.7 0 Td
(Location) Tj
ET
q
4

In [25]:
JPGrant=43

In [26]:
#USE FOR ALL COMMITS
Grant = (todays_date, "Grant County Jail", "Grant County", "WA", JPGrant)
mycursor.execute(sqlFormula, Grant)
mydb.commit()

# driver.close()

### Gray's Harbor

[Gray's Harbor]('http://ghlea.com/JailRosters/GHCJRoster.html')

In [27]:
url = "http://ghlea.com/JailRosters/GHCJRoster.html"
driver = webdriver.Chrome('/Users/meaganrossi/Projects/Incarceration_COVID/chromedriver')
driver.implicitly_wait(3)
driver.get(url)


GHinmate = driver.find_elements_by_xpath('//*[@id="main-table"]/tbody/tr')
JPGray=(len(GHinmate))
print(JPGray)

#USE FOR ALL COMMITS

Gray = (todays_date, "Grays Harbor County Jail", "Grays Harbor County", "WA", JPGray)
mycursor.execute(sqlFormula, Gray)
mydb.commit()


driver.close()

208


### Ferry

[Ferry](https://www.ferry-county.com/Courts%20and%20Law/Inmate%20Roster/Inmate_Roster_Page.html): in the section that says "MAY 11, 2020 - 8 inmates")

In [31]:
url = "https://www.ferry-county.com/Courts%20and%20Law/Inmate%20Roster/Inmate_Roster_Page.html"
driver = webdriver.Chrome('/Users/meaganrossi/Projects/Incarceration_COVID/chromedriver')
driver.implicitly_wait(3)
driver.get(url)

# print(driver.page_source)


Finmate = driver.find_element_by_xpath('//*[@id="mainContent3"]/p[9]').text
JPFerry=Finmate[21:23]
print(JPFerry)

18


In [32]:
#USE FOR ALL COMMITS

Ferry = (todays_date, "Ferry County Corrections", "Ferry County", "WA", JPFerry)
mycursor.execute(sqlFormula, Ferry)
mydb.commit()


driver.close()

### Clallam

[Clallam](https://websrv23.clallam.net/NewWorld.InmateInquiry/WA0050000/)

In [34]:
url = "https://websrv23.clallam.net/NewWorld.InmateInquiry/WA0050000/"
driver = webdriver.Chrome('/Users/meaganrossi/Projects/Incarceration_COVID/chromedriver') 
driver.implicitly_wait(3)
driver.get(url)

# print(driver.page_source)


Clallam_inmate = driver.find_elements_by_class_name('Name')
JPClallam = (len(Clallam_inmate))
print(JPClallam)


#USE FOR ALL COMMITS

Clallam = (todays_date, "Clallam County Jail", "Clallam County", "WA", JPClallam)
mycursor.execute(sqlFormula, Clallam)
mydb.commit()


driver.close()

66


### Adams

[View](https://www.co.adams.wa.us/government/jail_roster_and_booking_information/index.php) Jail Roster Information

In [35]:
url = "https://www.co.adams.wa.us/jailrosterout.txt"
driver = webdriver.Chrome('/Users/meaganrossi/Projects/Incarceration_COVID/chromedriver')
driver.implicitly_wait(3)
driver.get(url)

# print(driver.page_source)

Adams_text=driver.find_element_by_xpath('/html/body/pre').text
JPAdams = Adams_text.count("Booking")
print('Jail Population = ',JPAdams)

#USE FOR ALL COMMITS

Adams = (todays_date, "Adams County Jail", "Adams County", "WA", JPAdams)
mycursor.execute(sqlFormula, Adams)
mydb.commit()

driver.close()

Jail Population =  18


### Chelan

In [36]:
#  https://www.co.chelan.wa.us/regional-jail/inmate-list

### Cowlitz

In [37]:
url = "http://apps.co.cowlitz.wa.us/CCCD/Custody/default/Index.html"
driver = webdriver.Chrome('/Users/meaganrossi/Projects/Incarceration_COVID/chromedriver')
driver.implicitly_wait(3)
driver.get(url)

Cowlitz_text=driver.find_element_by_xpath('//*[@id="pcoded"]/div[2]/div/div/div/div/div/div/div[1]/div[1]/div[1]/a/div/div/h3').text
print(Cowlitz_text)

149


In [38]:
#USE FOR ALL COMMITS

Cowlitz = (todays_date, "Cowlitz County Jail", "Cowlitz County", "WA", Cowlitz_text)
mycursor.execute(sqlFormula, Cowlitz)
mydb.commit()

driver.close()

### Franklin

In [39]:
# Franklin http://apps.co.cowlitz.wa.us/CCCD/Custody/default/Index.html
# (pdf, no anumber that i saw, would need to be counted somehow)

### Kitsap

In [40]:
# Kitsap https://www.kitsapgov.com/sheriff/Pages/InCustody.aspx

### Kittitas

In [31]:
#  https://www.co.kittitas.wa.us/sheriff/roster.aspx
# (irregular, hard to count?)

### Lewis

In [32]:
#  https://jail.lewiscountywa.gov/

### Mason

In [33]:
#  https://so.co.mason.wa.us/documents/incustdy.pdf
# pdf, hard?

### Pierce

In [33]:
#  https://linxonline.co.pierce.wa.us/linxweb/Booking/GetJailRoster.cfm
# I had to go to it once and get a log in page, then go back, then return, and i could see the roster without logging in

### Skagit

In [34]:
#  https://www.skagitcounty.net/Reporting/JailRoster/
# has the total #, easier?

### Skamania

In [35]:
#  (amazing name) http://skamaniasheriff.com/corrections/daily-population/

### Whatcom

In [36]:
#  https://apps1.whatcomcounty.us/jaildata/roster.html
# list, no summary #

### Yakima

In [37]:
#  http://www.yakimaco.us/inmatelookup/YcDocPublicIncarcerated.aspx
# hard one

### Benton

In [None]:
stop
https://onedrive.live.com/?authkey=%21AJGFdAwyTKe%2DA7g&cid=A093CAACE69E1C91&id=A093CAACE69E1C91%214171&parId=root&o=OneUp

In [None]:
url = "http://apps.co.cowlitz.wa.us/CCCD/Custody/default/Index.html"
driver = webdriver.Chrome('/Users/meaganrossi/Projects/Incarceration_COVID/chromedriver')
driver.implicitly_wait(3)
driver.get(url)

Cowlitz_text=driver.find_element_by_xpath('//*[@id="pcoded"]/div[2]/div/div/div/div/div/div/div[1]/div[1]/div[1]/a/div/div/h3').text
print(Cowlitz_text)

In [None]:
#USE FOR ALL COMMITS

Cowlitz = (todays_date, "Cowlitz County Jail", "Cowlitz County", "WA", Cowlitz_text)
mycursor.execute(sqlFormula, Cowlitz)
mydb.commit()

driver.close()

### Klickitat

In [39]:
#  https://www.klickitatcounty.org/DocumentCenter/View/1416/Booking-Roster-PDF
# pdf

### Stevens

In [40]:
#  https://sheriff.stevenscountywa.gov/jail/inmate-roster/
# list of pdfs

### Wahkiakum

In [41]:
#  http://jailviewer.co.wahkiakum.wa.us/Home/BookingSearchQuery?
# that's a search by name, not roster of all, but adding it just in case you can see the population on the back end!

### Pacific

In [42]:
#  https://co.pacific.wa.us/sheriff/corrections/

### Thurston

In [43]:
#  https://www.co.thurston.wa.us/sheriff/bureau-corrections-roster-search.asp?mod=fourth

In [44]:
# WA Counties with No Website List
# Douglas
# Garfield
# Pend Oreille
# San Juan
# Asotin

# Export csv

In [41]:
country_jail_df = pd.read_sql("SELECT DISTINCT * FROM county_jails", con=mydb)
country_jail_df.head()

Unnamed: 0,reporting_jurisdictions,county_name,state_name,Date,jail_population
0,Whitman County Jail,Whitman County,WA,2020-06-12,26
1,Adams County Jail,Adams County,WA,2020-06-12,22
2,Clallam County Jail,Clallam County,WA,2020-06-12,74
3,Ferry County Corrections,Ferry County,WA,2020-06-12,8
4,Grays Harbor County Jail,Grays Harbor County,WA,2020-06-12,172


In [42]:
country_jail_df.to_csv('County_Jail.csv')

In [43]:
country_jail_df.shape

(507, 5)