# Introduction

**Incarceration & COVID-19: How Jails Respond to COVID**<br>

This project scrapes daily jail roster information to create a large dataset. This dataset is designed to analyze how jail populations have fluctuated in response to COVID-19. Research centers on explaining why county jails in different parts of the United States have responded differently to the pandemic over time. 

A separate but related idea for this dataset analyzes the impact of pandemic-related jail population declines on local crime. This project uses daily jail roster population counts as the focal variable and the analysis uses group-based trajectory modeling. Our scraped data will address gaps in the [Vera](https://github.com/vera-institute/jail-population-data) dataset.

We start by comparing Washington and New York states because they dealt with COVID-19 at the early on-set of the pandemic. Below is a list of the data points to collect to harmonize with the Vera data.
- County Name
- State Name
- Daily Population Counts
- Reporting Jail Name

# Imports

In [3]:
# Import standard libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline

# API libraries
import re
import os
import time
import random
import requests
from os import system   
from math import floor
from copy import deepcopy

# Scraping libraries
from bs4 import BeautifulSoup
from time import sleep
from random import randint
import json
# Selenium
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.common.keys import Keys
from selenium.common.exceptions import TimeoutException
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC


import sqlite3
import mysql.connector

In [None]:
MySQL

In [9]:
mydb = mysql.connector.connect(host='localhost',\
                              user='root',
                              passwd='Smoke4fire**',
                              database='testdb')
print(mydb)

<mysql.connector.connection.MySQLConnection object at 0x7faf04fbedd0>


In [22]:
mycursor = mydb.cursor()
mycursor.execute("CREATE DATABASE testdb")

In [23]:
for db in mycursor:
    print(db)

In [25]:
mycursor.execute("CREATE TABLE jails\
                 (reporting_jurisdictions VARCHAR(100),\
                 county_name VARCHAR(100),\
                 state_name VARCHAR(100),\
                 Date VARCHAR(100),\
                 jail_population INTEGER(10000))")



ProgrammingError: 1046 (3D000): No database selected

# States

Be sure to check for APIs in addition to scraping. Will include NY, WA and FL.

## Washington

### Whitman

In [3]:
url = "http://www.whitmancountyjail.org/"
driver = webdriver.Chrome('/Users/meaganrossi/Projects/Incarceration_COVID/chromedriver')
driver.implicitly_wait(3)
driver.get(url)

In [None]:
listy = driver.find_elements_by_css_selector('h4')

#view full list
# for x in listy[:50]:
#     if len(x.text) > 0:
#         print(x.text)

In [None]:
location = driver.find_element_by_xpath('//*[@id="form1"]/footer').text
location

In [5]:
from datetime import datetime # Current date time in local system )

In [6]:
JWhitman = (location[:19])
CWhitman = (location[:7])
SWhitman = (location[47:49])
DWhitman = datetime.now().strftime('%Y-%m-%d')
PWhitman = (len(listy))-10

print('reporting_jurisdictions = ',JWhitman)
print('county_name = ',CWhitman)
print('state_name = ',SWhitman)
print('Date = ',DWhitman)
print('jail_population = ',PWhitman)

Jail =  Whitman County Jail
County =  Whitman
State =   W
Date =  2020-06-07
Population =  28


In [2]:
# Define pipeline

# conn = sqlite3.connect('new.db')
# curr = conn.cursor()

# curr.execute("""create table(\
# date DWhitman,\
# jail_population PWhitman'\
# county_name CWhitman,\
# state_name SWhitman,\
# reporting_jurisdictions JWhitman\
# )""")

In [None]:
driver.close()

### Spokane

#### Selenium

In [7]:
# url = "https://www.spokanecounty.org/352/Inmate-Roster'"
# driver = webdriver.Chrome('/Users/meaganrossi/Projects/Incarceration_COVID/chromedriver')
# driver.implicitly_wait(3)
# driver.get(url)

# print(driver.page_source)

In [9]:
# Outter html <h1 style="margin-top:0px;">Saturday, June 6, 2020</h1>

# date = driver.find_elements_by_xpath('//*[@id="aspnetForm"]/div[3]/h1')
# len(date)

In [10]:
# class PrisonSpider(scrapy.Spider):
#     name = 'Prison'

# #     def remove_characters(self, value):
# #         return value.strip('\xa0')
    
#     def start_requests(self):
#         yield SeleniumRequest(
#             url='https://www.spokanecounty.org/352/Inmate-Roster',
#             wait_time=3,
#             callback=self.parse
#         )

#     def parse(self, response):
#         products = response.xpath("//*")
#         for product in products:
#             yield {
#                 'Date': product.xpath('//*[@id="aspnetForm"]/div[3]/h1').get(),
#                 'County': product.xpath("/html/head/title").get()\
# #                 'State'
# #                 'Pop_Count'
# #                 'Jail'
#             }

In [11]:
driver.close()

### Okanogan

Details can be found in the Daily Jail Inmate Log on [Okanogan Sherriff Website](https://okanogansheriff.org/).

In [13]:
url = "https://okanogansheriff.org/"
driver = webdriver.Chrome('/Users/meaganrossi/Projects/Incarceration_COVID/chromedriver')
driver.implicitly_wait(3)
driver.get(url)

# print(driver.page_source)

In [14]:
# Need to read from a pdf name=CD6F9816E7949144F43AB92B6CCADAA8

# location = driver.find_element_by_xpath('/html/body/embed').text
# location

In [None]:
# print('reporting_jurisdictions = ',JWhitman)
# print('county_name = ',CWhitman)
# print('state_name = ',SWhitman)
# print('Date = ',DWhitman)
# print('jail_population = ',PWhitman)

In [15]:
driver.close()

### Jefferson

[Jefferson](https://co.jefferson.wa.us/174/Jail-Inmate-Search)<br> To view the full inmate roster click the Clear button then the Search button.

In [16]:
url = "https://co.jefferson.wa.us/174/Jail-Inmate-Search"
driver = webdriver.Chrome('/Users/meaganrossi/Projects/Incarceration_COVID/chromedriver')
driver.implicitly_wait(3)
driver.get(url)

# print(driver.page_source)

In [21]:
#Hidden input type

# inmate = driver.find_element_by_xpath('//*[@id="Inmate_Index"]').text
# inmate

inmate = driver.find_elements_by_name('Name')
print(len(inmate))

0


In [None]:
# print('reporting_jurisdictions = ',JWhitman)
# print('county_name = ',CWhitman)
# print('state_name = ',SWhitman)
# print('Date = ',DWhitman)
# print('jail_population = ',PWhitman)

driver.close()

### Grant

[Grant](ttps://www.grantcountywa.gov/SHERIFF/Corrections/Inmate-Roster.htm), daily pdf

In [23]:
url = "https://www.grantcountywa.gov/SHERIFF/Corrections/Inmate-Roster.htm"
driver = webdriver.Chrome('/Users/meaganlazer/Projects/Incarceration_COVID/chromedriver')
driver.implicitly_wait(3)
driver.get(url)

print(driver.page_source)

In [None]:
# print('reporting_jurisdictions = ',JWhitman)
# print('county_name = ',CWhitman)
# print('state_name = ',SWhitman)
# print('Date = ',DWhitman)
# print('jail_population = ',PWhitman)

driver.close()

### Gray's Harbor

[Gray's Harbor]('http://ghlea.com/JailRosters/GHCJRoster.html')

In [23]:
url = "http://ghlea.com/JailRosters/GHCJRoster.html"
driver = webdriver.Chrome('/Users/meaganrossi/Projects/Incarceration_COVID/chromedriver')
driver.implicitly_wait(3)
driver.get(url)

# print(driver.page_source)

<html lang="en"><head>
<meta http-equiv="Content-Type" content="text/html">
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
<meta charset="utf-8">
<meta http-equiv="refresh" content="1800">
<meta http-equiv="X-UA-Compatible" content="IE=edge">
<meta name="viewport" content="width=device-width, initial-scale=1">
<meta name="description" content="Grays Harbor County, Cities of Aberdeen and City of Hoquiam Washington Jail Rosters, Bookings and Releases.">
<meta name="keywords" content="Grays Harbor County, Aberdeen, Hoquiam, Washington, County, City, County Jail, City Jail, Jail Roster, Roster, Jail Booking, Jail Releases, Sheriff, Grays Harbor County Jail">
<meta name="author" content="Grays Harbor County - Central Services">
<link rel="icon" href="/images/favicon.ico">
<title>Grays Harbor County Jail Roster</title>
<meta http-equiv="Cache-Control" content="no-cache, no-store, must-revalidate">
<meta http-equiv="Pragma" content="no-cache">
<meta http-equiv="Expir

In [26]:
inmate = driver.find_elements_by_xpath('//*[@id="main-table"]/tbody/tr')
print(len(inmate))

166


In [27]:
# print('reporting_jurisdictions = ',JWhitman)
# print('county_name = ',CWhitman)
# print('state_name = ',SWhitman)
# print('Date = ',DWhitman)
# print('jail_population = ',PWhitman)

driver.close()

### Ferry

[Ferry](https://www.ferry-county.com/Courts%20and%20Law/Inmate%20Roster/Inmate_Roster_Page.html): in the section that says "MAY 11, 2020 - 8 inmates")

In [28]:
url = "https://www.ferry-county.com/Courts%20and%20Law/Inmate%20Roster/Inmate_Roster_Page.html"
driver = webdriver.Chrome('/Users/meaganrossi/Projects/Incarceration_COVID/chromedriver')
driver.implicitly_wait(3)
driver.get(url)

# print(driver.page_source)

In [39]:
Finmate = driver.find_element_by_xpath('//*[@id="mainContent3"]/p[9]').text
Finmate[16]

'8'

In [40]:
# print('reporting_jurisdictions = ',JWhitman)
# print('county_name = ',CWhitman)
# print('state_name = ',SWhitman)
# print('Date = ',DWhitman)
# print('jail_population = ',PWhitman)

driver.close()

### Clallam

[Clallam](https://websrv23.clallam.net/NewWorld.InmateInquiry/WA0050000/)

In [41]:
url = "https://websrv23.clallam.net/NewWorld.InmateInquiry/WA0050000/"
driver = webdriver.Chrome('/Users/meaganrossi/Projects/Incarceration_COVID/chromedriver') 
driver.implicitly_wait(3)
driver.get(url)

# print(driver.page_source)

In [42]:
# print('reporting_jurisdictions = ',JWhitman)
# print('county_name = ',CWhitman)
# print('state_name = ',SWhitman)
# print('Date = ',DWhitman)
# print('jail_population = ',PWhitman)

driver.close()

### Adams

[View](https://www.co.adams.wa.us/government/jail_roster_and_booking_information/index.php) Jail Roster Information

In [31]:
url = "https://www.co.adams.wa.us/government/jail_roster_and_booking_information/index.php"
driver = webdriver.Chrome('/Users/meaganrossi/Projects/Incarceration_COVID/chromedriver')
driver.implicitly_wait(3)
driver.get(url)

# print(driver.page_source)

In [None]:
# print('reporting_jurisdictions = ',JWhitman)
# print('county_name = ',CWhitman)
# print('state_name = ',SWhitman)
# print('Date = ',DWhitman)
# print('jail_population = ',PWhitman)

driver.close()