# Introduction

**Incarceration & COVID-19: How Jails Respond to COVID**<br>

This project scrapes daily jail roster information to create a large dataset. This dataset is designed to analyze how jail populations have fluctuated in response to COVID-19. Research centers on explaining why county jails in different parts of the United States have responded differently to the pandemic over time. 

A separate but related idea for this dataset analyzes the impact of pandemic-related jail population declines on local crime. This project uses daily jail roster population counts as the focal variable and the analysis uses group-based trajectory modeling. Our scraped data will address gaps in the [Vera](https://github.com/vera-institute/jail-population-data) dataset.

We start by comparing Washington and New York states because they dealt with COVID-19 at the early on-set of the pandemic. Below is a list of the data points to collect to harmonize with the Vera data.
- County Name
- State Name
- Daily Population Counts
- Reporting Jail Name

# Imports

In [1]:
# Import standard libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline

# API libraries
import re
import os
import time
import random
import requests
from os import system   
from math import floor
from copy import deepcopy

# Scraping libraries
from bs4 import BeautifulSoup
from time import sleep
from random import randint
import json
# Selenium
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.common.keys import Keys
from selenium.common.exceptions import TimeoutException
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

# States

Be sure to check for APIs in addition to scraping. Will include NY, WA and FL.

## Washington

### Whitman

In [53]:
# http://www.whitmancountyjail.org/
#Need to count unique IDs
#Check for an API

### Spokane

#### Selenium

In [2]:
html = requests.get('https://www.spokanecounty.org/352/Inmate-Roster')
bs = BeautifulSoup(html.content, 'lxml')
table = bs.table

In [8]:
# bs.select('container td')

[]

In [3]:
url = "https://www.spokanecounty.org/352/Inmate-Roster'"
driver = webdriver.Chrome('/Users/meaganrossi/Projects/Incarceration_COVID/chromedriver')

In [4]:
driver.get(url)

In [9]:
listy = driver.find_elements_by_css_selector('h1')

In [10]:
for x in listy[:15]:
    if len(x.text) > 0:
        print(x.text)

Inmate Roster


In [66]:
lis = driver.find_elements_by_css_selector('div')
print(lis).text

[<selenium.webdriver.remote.webelement.WebElement (session="704bb066a88cabe6f9199d84f837469f", element="7f8dc0d7-4d45-4ad2-8d16-e52ca3c318e6")>, <selenium.webdriver.remote.webelement.WebElement (session="704bb066a88cabe6f9199d84f837469f", element="0f24d4d9-4017-4594-91fa-95c07c440958")>, <selenium.webdriver.remote.webelement.WebElement (session="704bb066a88cabe6f9199d84f837469f", element="92f57819-6df6-44f6-90a8-474d7b9f7f90")>, <selenium.webdriver.remote.webelement.WebElement (session="704bb066a88cabe6f9199d84f837469f", element="750f3e8e-415d-428c-b718-342361c0f35d")>, <selenium.webdriver.remote.webelement.WebElement (session="704bb066a88cabe6f9199d84f837469f", element="86de7aa9-d5e3-4afa-9f8a-06b8e7a3196c")>, <selenium.webdriver.remote.webelement.WebElement (session="704bb066a88cabe6f9199d84f837469f", element="b259e522-b144-49cd-b38f-c1c56db9fa24")>, <selenium.webdriver.remote.webelement.WebElement (session="704bb066a88cabe6f9199d84f837469f", element="694212b8-f697-4f58-b0d4-3ef5d06f

AttributeError: 'NoneType' object has no attribute 'text'

In [None]:
STOP

#### HTML

In [19]:
# Defining the url of the site
base_site = "https://www.spokanecounty.org/352/Inmate-Roster"

# Making a get request
response = requests.get(base_site)
response.status_code

200

In [20]:
# Extracting the HTML
html = response.content

# Checking that the reply is indeed an HTML code by inspecting the first 100 symbols
html[:100]

b'\r\n\r\n<!DOCTYPE html>\r\n<html lang="en">\r\n<head>\r\n\r\n\t<meta http-equiv="Content-type" content="text/html'

In [21]:
# Convert HTML to a BeautifulSoup object.
soup = BeautifulSoup(html, "html.parser")

In [22]:
# Exporting the HTML to a file
with open('Spokane_response.html', 'wb') as file:
    file.write(soup.prettify('utf-8'))

In [54]:
soup.find_all('title')

[<title>Inmate Roster | Spokane County, WA</title>,
 <title>Arrow Left</title>,
 <title>Arrow Right</title>,
 <title>Slideshow Left Arrow</title>,
 <title>Slideshow Right Arrow</title>]

In [55]:
soup.find_all('div')

AttributeError: ResultSet object has no attribute 'text'. You're probably treating a list of elements like a single element. Did you call find_all() when you meant to call find()?

In [24]:
soup.find_all('div', class_ = 'container')
# soup.find_all('li')

[]

In [52]:
mydivs = soup.findAll("div", {"class": "container"})
print(mydivs)

[]


### Okanogan

In [None]:
# https://okanogansheriff.org/
# (under Daily Jail Inmate Log)

### Jefferson

In [None]:
# https://co.jefferson.wa.us/174/Jail-Inmate-Search
# (To view the full inmate roster click the Clear button then the Search button.)

### Grant

In [None]:
# https://www.grantcountywa.gov/SHERIFF/Corrections/Inmate-Roster.htm

### Gray's Harbor

In [None]:
# http://ghlea.com/JailRosters/GHCJRoster.html

### Ferry

In [None]:
# https://www.ferry-county.com/Courts%20and%20Law/Inmate%20Roster/Inmate_Roster_Page.html
# (in the section that says "MAY 11, 2020 - 8 inmates")

### Clallam

In [None]:
# https://websrv23.clallam.net/NewWorld.InmateInquiry/WA0050000/

### Adams

In [None]:
# https://www.co.adams.wa.us/government/jail_roster_and_booking_information/index.php
# (View Jail Roster Information)