# Introduction

**Incarceration & COVID-19: How Jails Respond to COVID**<br>

This project scrapes daily jail roster information to create a large dataset. This dataset is designed to analyze how jail populations have fluctuated in response to COVID-19. Research centers on explaining why county jails in different parts of the United States have responded differently to the pandemic over time. 

A separate but related idea for this dataset analyzes the impact of pandemic-related jail population declines on local crime. This project uses daily jail roster population counts as the focal variable and the analysis uses group-based trajectory modeling. Our scraped data will address gaps in the [Vera](https://github.com/vera-institute/jail-population-data) dataset.

We start by comparing Washington and New York states because they dealt with COVID-19 at the early on-set of the pandemic. Below is a list of the data points to collect to harmonize with the Vera data.
- County Name
- State Name
- Daily Population Counts
- Reporting Jail Name

# Imports

In [1]:
# Import standard libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline

# API libraries
import re
import os
import time
import random
import requests
from os import system   
from math import floor
from copy import deepcopy

# Scraping libraries
from bs4 import BeautifulSoup
from time import sleep
from random import randint
import json
# Selenium
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.common.keys import Keys
from selenium.common.exceptions import TimeoutException
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

from scrapy_selenium import SeleniumRequest
import scrapy

import scrapy
from scrapy_selenium import SeleniumRequest
import sqlite3

# States

Be sure to check for APIs in addition to scraping. Will include NY, WA and FL.

## Washington

### Whitman

In [2]:
url = "http://www.whitmancountyjail.org/"
driver = webdriver.Chrome('/Users/meaganrossi/Projects/Incarceration_COVID/chromedriver')
driver.implicitly_wait(3)
driver.get(url)

In [3]:
listy = driver.find_elements_by_css_selector('h4')

for x in listy[:50]:
    if len(x.text) > 0:
        print(x.text)

COVID-19 Alerts
Visiting Closed
Work Release Shut-Down
ARLAND, ANDREW MICHAEL (34 years old) -- Inmate Number: 1569
CAMERON, ZANNAN ANTHONY (24 years old) -- Inmate Number: 465632
CARGILL, JOSHUA O'HARA (43 years old) -- Inmate Number: 459693
CRUZ BARRERA, JASON REY (22 years old) -- Inmate Number: 417523
DICKERSON, PATRICIA LYNN (36 years old) -- Inmate Number: 424584
GAUTHIER, JEREMY VICTOR (31 years old) -- Inmate Number: 466484
HAYES, PHILLIP ALAN (59 years old) -- Inmate Number: 16806
HINOJOS, ALEX JOSEPH DAVID (31 years old) -- Inmate Number: 80192
JOHNSON, EVERETT CLARK (65 years old) -- Inmate Number: 19821
MARCHINI, TIMOTHY PAUL (61 years old) -- Inmate Number: 458478
MARTINEZ GARZA, ZACARIAS (24 years old) -- Inmate Number: 463422
MEAD, MARVIN HOWARD (53 years old) -- Inmate Number: 156118
NANCE, KYLE BRANDT (23 years old) -- Inmate Number: 271494
NANIK, DANIEL DOUGLAS (36 years old) -- Inmate Number: 142629
PEARSON, DUSTIN DUANE (33 years old) -- Inmate Number: 133850
PEDERS

In [4]:
location = driver.find_element_by_xpath('//*[@id="form1"]/footer').text
location

"Whitman County Jail, 411 N. Mill Street, Colfax WA 99111.\nThis Website was created and is maintained by Lobo Public Safety Software.\nContact: web@lobopublicsafetysoftware.com.\nThis site is served by Lobo Webhosting.\n\nWhitman County Jail - Inmate Listing Website v1.1.5.0524 Copyright © 2020\nWhitman County Sheriff's Office | Disclaimer"

In [5]:
from datetime import datetime # Current date time in local system )

In [1]:
JWhitman = (location[:19])
CWhitman = (location[:7])
SWhitman = (location[47:49])
DWhitman = datetime.now().strftime('%Y-%m-%d')
PWhitman = (len(listy))-10

print('Jail = ',JWhitman)
print('County = ',CWhitman)
print('State = ',SWhitman)
print('Date = ',DWhitman)
print('Population = ',PWhitman)

SyntaxError: invalid syntax (<ipython-input-1-7a0cfee3d1f6>, line 4)

### Spokane

#### Selenium

In [7]:
# url = "https://www.spokanecounty.org/352/Inmate-Roster'"
# driver = webdriver.Chrome('/Users/meaganrossi/Projects/Incarceration_COVID/chromedriver')
# driver.implicitly_wait(3)
# driver.get(url)

# print(driver.page_source)

In [8]:

# keep this cell, has a long print out but want the code available for future use

In [9]:
# Outter html <h1 style="margin-top:0px;">Saturday, June 6, 2020</h1>

# date = driver.find_elements_by_xpath('//*[@id="aspnetForm"]/div[3]/h1')
# len(date)

In [10]:
# class PrisonSpider(scrapy.Spider):
#     name = 'Prison'

# #     def remove_characters(self, value):
# #         return value.strip('\xa0')
    
#     def start_requests(self):
#         yield SeleniumRequest(
#             url='https://www.spokanecounty.org/352/Inmate-Roster',
#             wait_time=3,
#             callback=self.parse
#         )

#     def parse(self, response):
#         products = response.xpath("//*")
#         for product in products:
#             yield {
#                 'Date': product.xpath('//*[@id="aspnetForm"]/div[3]/h1').get(),
#                 'County': product.xpath("/html/head/title").get()\
# #                 'State'
# #                 'Pop_Count'
# #                 'Jail'
#             }

In [11]:
driver.close()

#### HTML

In [None]:
# # Defining the url of the site
# base_site = "https://www.spokanecounty.org/352/Inmate-Roster"

# # Making a get request
# response = requests.get(base_site)
# response.status_code

In [None]:
# # Extracting the HTML
# html = response.content

# # Checking that the reply is indeed an HTML code by inspecting the first 100 symbols
# html[:100]

In [None]:
# # Convert HTML to a BeautifulSoup object.
# soup = BeautifulSoup(html, "html.parser")

In [None]:
# # Exporting the HTML to a file
# with open('Spokane_response.html', 'wb') as file:
#     file.write(soup.prettify('utf-8'))

In [None]:
# soup.find_all('title')

In [None]:
# soup.find_all('div', class_ = 'container')
# # soup.find_all('li')

In [None]:
# mydivs = soup.findAll("div", {"class": "container"})
# print(mydivs)

### Okanogan

Details can be found in the Daily Jail Inmate Log on [Okanogan Sherriff Website](https://okanogansheriff.org/).

In [2]:
url = "https://okanogansheriff.org/"
driver = webdriver.Chrome('/Users/meaganrossi/Projects/Incarceration_COVID/chromedriver')
driver.implicitly_wait(3)
driver.get(url)

print(driver.page_source)

NameError: name 'webdriver' is not defined

### Jefferson

In [None]:
# https://co.jefferson.wa.us/174/Jail-Inmate-Search
# (To view the full inmate roster click the Clear button then the Search button.)

In [None]:
# url = "https://www.spokanecounty.org/352/Inmate-Roster'"
# driver = webdriver.Chrome('/Users/meaganrossi/Projects/Incarceration_COVID/chromedriver')
# driver.implicitly_wait(3)
# driver.get(url)

# print(driver.page_source)

### Grant

In [None]:
# https://www.grantcountywa.gov/SHERIFF/Corrections/Inmate-Roster.htm

In [None]:
# url = "https://www.spokanecounty.org/352/Inmate-Roster'"
# driver = webdriver.Chrome('/Users/meaganrossi/Projects/Incarceration_COVID/chromedriver')
# driver.implicitly_wait(3)
# driver.get(url)

# print(driver.page_source)

### Gray's Harbor

In [None]:
# http://ghlea.com/JailRosters/GHCJRoster.html

In [None]:
# url = "https://www.spokanecounty.org/352/Inmate-Roster'"
# driver = webdriver.Chrome('/Users/meaganrossi/Projects/Incarceration_COVID/chromedriver')
# driver.implicitly_wait(3)
# driver.get(url)

# print(driver.page_source)

### Ferry

In [None]:
# https://www.ferry-county.com/Courts%20and%20Law/Inmate%20Roster/Inmate_Roster_Page.html
# (in the section that says "MAY 11, 2020 - 8 inmates")

In [None]:
# url = "https://www.spokanecounty.org/352/Inmate-Roster'"
# driver = webdriver.Chrome('/Users/meaganrossi/Projects/Incarceration_COVID/chromedriver')
# driver.implicitly_wait(3)
# driver.get(url)

# print(driver.page_source)

### Clallam

In [None]:
# https://websrv23.clallam.net/NewWorld.InmateInquiry/WA0050000/

In [None]:
# url = "https://www.spokanecounty.org/352/Inmate-Roster'"
# driver = webdriver.Chrome('/Users/meaganrossi/Projects/Incarceration_COVID/chromedriver')
# driver.implicitly_wait(3)
# driver.get(url)

# print(driver.page_source)

### Adams

In [None]:
# https://www.co.adams.wa.us/government/jail_roster_and_booking_information/index.php
# (View Jail Roster Information)

In [None]:
# url = "https://www.spokanecounty.org/352/Inmate-Roster'"
# driver = webdriver.Chrome('/Users/meaganrossi/Projects/Incarceration_COVID/chromedriver')
# driver.implicitly_wait(3)
# driver.get(url)

# print(driver.page_source)