# Introduction

**Incarceration & COVID-19: How Jails Respond to COVID**<br>

This project scrapes daily jail roster information to create a large dataset. This dataset is designed to analyze how jail populations have fluctuated in response to COVID-19. Research centers on explaining why county jails in different parts of the United States have responded differently to the pandemic over time. 

A separate but related idea for this dataset analyzes the impact of pandemic-related jail population declines on local crime. This project uses daily jail roster population counts as the focal variable and the analysis uses group-based trajectory modeling. Our scraped data will address gaps in the [Vera](https://github.com/vera-institute/jail-population-data) dataset.

We start by comparing Washington and New York states because they dealt with COVID-19 at the early on-set of the pandemic. Below is a list of the data points to collect to harmonize with the Vera data.
- County Name
- State Name
- Daily Population Counts
- Reporting Jail Name

# Imports

In [67]:
# Import standard libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline

# API libraries
import re
import os
import time
import random
import requests
from os import system   
from math import floor
from copy import deepcopy

# Scraping libraries
from bs4 import BeautifulSoup
from time import sleep
from random import randint
import json
# Selenium
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.common.keys import Keys
from selenium.common.exceptions import TimeoutException
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

from scrapy_selenium import SeleniumRequest
import scrapy

# States

Be sure to check for APIs in addition to scraping. Will include NY, WA and FL.

## Washington

### Whitman

In [69]:
# http://www.whitmancountyjail.org/
#Need to count unique IDs
#Check for an API

### Spokane

#### Selenium

In [70]:
DOWNLOADER_MIDDLEWARES = {
    'scrapy_selenium.SeleniumMiddleware': 800
}

In [71]:
url = "https://www.spokanecounty.org/352/Inmate-Roster'"
driver = webdriver.Chrome('/Users/meaganrossi/Projects/Incarceration_COVID/chromedriver')
driver.get(url)

In [135]:
print(driver.page_source)
# keep this cell, has a long print out but want the code available for future use

<html lang="en" class=" js no-flexbox canvas canvastext webgl no-touch geolocation postmessage websqldatabase indexeddb hashchange history draganddrop websockets rgba hsla multiplebgs backgroundsize borderimage borderradius boxshadow textshadow opacity cssanimations csscolumns cssgradients cssreflections csstransforms csstransforms3d csstransitions fontface generatedcontent video audio localstorage sessionstorage webworkers applicationcache svg inlinesvg smil svgclippaths"><head>

	<meta http-equiv="Content-type" content="text/html; charset=UTF-8">

					<link rel="stylesheet" href="//fonts.googleapis.com/css?family=Bitter:700,italic,regular|Cabin:500,500italic,600,600italic,700,700italic,italic,regular|Montserrat:700,regular|Work+Sans:100,200,300,500,600,700,800,900,regular|" media="all"><script type="text/javascript" async="" defer="" src="//analytics.civicplus.com/piwik.js"></script><script type="text/javascript" async="" src="//siteimproveanalytics.com/js/siteanalyze_1303182.js"></

In [73]:
#<h1 style="margin-top:0px;">Saturday, June 6, 2020</h1>

In [114]:
form = driver.find_elements_by_name("text")
form

[]

In [98]:
# yield SeleniumRequest(url=url, callback=self.parse_result)

NameError: name 'info' is not defined

In [102]:
mc = driver.find_element_by_id("customHtmlb79d7d7c-5306-471f-a51e-aa7ff8e448af")
print(mc.text)




In [146]:
import scrapy
from scrapy_selenium import SeleniumRequest


class PrisonSpider(scrapy.Spider):
    name = 'Prison'

#     def remove_characters(self, value):
#         return value.strip('\xa0')
    
    def start_requests(self):
        yield SeleniumRequest(
            url='https://www.spokanecounty.org/352/Inmate-Roster',
            script='/DetentionServices/WebResource.axd?d=pynGkmcFUV13He1Qd6_TZMDa2z4M0XOtGnubLCgjAAoCbYNg0Qhj2vPC6GVIdqEugBgxgTF5yc-IYCFY2eK82w2&t=637103058965614113',
            wait_time=3,
            callback=self.parse
        )

#     def parse(self, response):
#         products = response.xpath("//ul[@class='dealTiles categoryGridDeals']/li")
#         for product in products:
#             yield {
#                 'name': product.xpath(".//a[@class='itemTitle']/text()").get(),
#                 'link': product.xpath(".//a[@class='itemTitle']/@href").get(),
#                 'store_name': self.remove_characters(product.xpath("normalize-space(.//span[@class='itemStore']/text())").get()),
#                 'price': product.xpath("normalize-space(.//div[@class='itemPrice  wide ']/text())").get()
#             }

In [141]:
listy = driver.find_elements_by_css_selector('h1')
listy

[<selenium.webdriver.remote.webelement.WebElement (session="3812b17dcb0458dce9b0be3a80eadfbe", element="fe9f6545-2a6c-4e1c-a64d-75c46b2f08cf")>,
 <selenium.webdriver.remote.webelement.WebElement (session="3812b17dcb0458dce9b0be3a80eadfbe", element="5f2755dd-f3e6-4690-bef2-017f1d2415ef")>]

In [143]:
for x in listy[:50]:
    if len(x.text) > 0:
        print(x.text)

Inmate Roster


In [144]:
lis = driver.find_element_by_css_selector('container')

NoSuchElementException: Message: no such element: Unable to locate element: {"method":"css selector","selector":"container"}
  (Session info: chrome=83.0.4103.97)


In [134]:
lis.get_attribute('innerHTML')

''

In [None]:
driver.close()

In [None]:
STOP

#### HTML

In [19]:
# Defining the url of the site
base_site = "https://www.spokanecounty.org/352/Inmate-Roster"

# Making a get request
response = requests.get(base_site)
response.status_code

200

In [20]:
# Extracting the HTML
html = response.content

# Checking that the reply is indeed an HTML code by inspecting the first 100 symbols
html[:100]

b'\r\n\r\n<!DOCTYPE html>\r\n<html lang="en">\r\n<head>\r\n\r\n\t<meta http-equiv="Content-type" content="text/html'

In [21]:
# Convert HTML to a BeautifulSoup object.
soup = BeautifulSoup(html, "html.parser")

In [22]:
# Exporting the HTML to a file
with open('Spokane_response.html', 'wb') as file:
    file.write(soup.prettify('utf-8'))

In [54]:
soup.find_all('title')

[<title>Inmate Roster | Spokane County, WA</title>,
 <title>Arrow Left</title>,
 <title>Arrow Right</title>,
 <title>Slideshow Left Arrow</title>,
 <title>Slideshow Right Arrow</title>]

In [55]:
soup.find_all('div')

AttributeError: ResultSet object has no attribute 'text'. You're probably treating a list of elements like a single element. Did you call find_all() when you meant to call find()?

In [24]:
soup.find_all('div', class_ = 'container')
# soup.find_all('li')

[]

In [52]:
mydivs = soup.findAll("div", {"class": "container"})
print(mydivs)

[]


### Okanogan

In [None]:
# https://okanogansheriff.org/
# (under Daily Jail Inmate Log)

### Jefferson

In [None]:
# https://co.jefferson.wa.us/174/Jail-Inmate-Search
# (To view the full inmate roster click the Clear button then the Search button.)

### Grant

In [None]:
# https://www.grantcountywa.gov/SHERIFF/Corrections/Inmate-Roster.htm

### Gray's Harbor

In [None]:
# http://ghlea.com/JailRosters/GHCJRoster.html

### Ferry

In [None]:
# https://www.ferry-county.com/Courts%20and%20Law/Inmate%20Roster/Inmate_Roster_Page.html
# (in the section that says "MAY 11, 2020 - 8 inmates")

### Clallam

In [None]:
# https://websrv23.clallam.net/NewWorld.InmateInquiry/WA0050000/

### Adams

In [None]:
# https://www.co.adams.wa.us/government/jail_roster_and_booking_information/index.php
# (View Jail Roster Information)