# Eviction Data Set
## KEY: Eviction1
## Source: Princeton University Eviction Lab

Required citation text: This research uses data from The Eviction Lab at Princeton University, a project directed by Matthew Desmond and designed by Ashley Gromis, Lavar Edmonds, James Hendrickson, Katie Krywokulski, Lillian Leung, and Adam Porton. The Eviction Lab is funded by the JPB, Gates, and Ford Foundations as well as the Chan Zuckerberg Initiative. More information is found at evictionlab.org.

Source: https://data-downloads.evictionlab.org/
Add'l Documentation: https://evictionlab.org/methods/ , Methodology Report (stored in DataSetDiscovery folder).

There are two sets of data in this document: one for cities (*va_eviction_cities.csv*) and one for the full state (*va_eviction_state.csv*). 

Initial file read in was va_eviction_data.csv; this combined multiple datasets and would have been difficult to wrangle (I tested it). Suggest ingesting other datasets from Eviction Lab (e.g., state-level data) separately via additional downloads.  


In [2]:
# import needed libraries
from numpy import *
import pandas as pd
import matplotlib.pyplot as plt


In [3]:
# First load the cities data
dfEvictCit = pd.read_csv(r'../DataSet/va_eviction_cities.csv',header=0,encoding = "ISO-8859-1")

In [4]:
# view first 5 results
dfEvictCit.head()

Unnamed: 0,GEOID,year,name,parent-location,population,poverty-rate,renter-occupied-households,pct-renter-occupied,median-gross-rent,median-household-income,...,pct-nh-pi,pct-multiple,pct-other,eviction-filings,evictions,eviction-rate,eviction-filing-rate,low-flag,imputed,subbed
0,5100148,2000,Abingdon,Virginia,7780.0,10.06,1398.75,41.94,440.0,30976.0,...,0.01,0.58,0.05,29.86,28.86,2.06,2.13,0,0,0
1,5100148,2001,Abingdon,Virginia,7780.0,10.06,1438.06,41.94,440.0,30976.0,...,0.01,0.58,0.05,38.5,33.8,2.35,2.68,0,0,0
2,5100148,2002,Abingdon,Virginia,7780.0,10.06,1474.88,41.94,440.0,30976.0,...,0.01,0.58,0.05,61.25,47.14,3.2,4.15,0,0,0
3,5100148,2003,Abingdon,Virginia,7780.0,10.06,1514.18,41.94,440.0,30976.0,...,0.01,0.58,0.05,53.0,31.0,2.05,3.5,0,0,0
4,5100148,2004,Abingdon,Virginia,7780.0,10.06,1552.0,41.94,440.0,30976.0,...,0.01,0.58,0.05,84.2,57.86,3.73,5.43,0,0,0


In [5]:
# show last 5 rows to check data loaded as epected
dfEvictCit.tail(5)

Unnamed: 0,GEOID,year,name,parent-location,population,poverty-rate,renter-occupied-households,pct-renter-occupied,median-gross-rent,median-household-income,...,pct-nh-pi,pct-multiple,pct-other,eviction-filings,evictions,eviction-rate,eviction-filing-rate,low-flag,imputed,subbed
10118,5188240,2012,Yorktown,Virginia,130.0,0.0,46.2,51.06,1000.0,62750.0,...,0.0,11.54,0.0,0.7,0.7,1.52,1.52,0,0,0
10119,5188240,2013,Yorktown,Virginia,130.0,0.0,46.9,51.06,1000.0,62750.0,...,0.0,11.54,0.0,0.7,0.0,0.0,1.49,0,0,0
10120,5188240,2014,Yorktown,Virginia,130.0,0.0,47.6,51.06,1000.0,62750.0,...,0.0,11.54,0.0,2.8,0.0,0.0,5.88,1,0,0
10121,5188240,2015,Yorktown,Virginia,130.0,0.0,49.0,51.06,1000.0,62750.0,...,0.0,11.54,0.0,2.8,0.0,0.0,5.71,1,0,0
10122,5188240,2016,Yorktown,Virginia,130.0,0.0,49.7,51.06,1000.0,62750.0,...,0.0,11.54,0.0,2.8,0.7,1.41,5.63,1,0,0


In [6]:
# Check column data types
dfEvictCit.dtypes

GEOID                           int64
year                            int64
name                           object
parent-location                object
population                    float64
poverty-rate                  float64
renter-occupied-households    float64
pct-renter-occupied           float64
median-gross-rent             float64
median-household-income       float64
median-property-value         float64
rent-burden                   float64
pct-white                     float64
pct-af-am                     float64
pct-hispanic                  float64
pct-am-ind                    float64
pct-asian                     float64
pct-nh-pi                     float64
pct-multiple                  float64
pct-other                     float64
eviction-filings              float64
evictions                     float64
eviction-rate                 float64
eviction-filing-rate          float64
low-flag                        int64
imputed                         int64
subbed      

In [7]:
# run describe to get basics
dfEvictCit.describe()

Unnamed: 0,GEOID,year,population,poverty-rate,renter-occupied-households,pct-renter-occupied,median-gross-rent,median-household-income,median-property-value,rent-burden,...,pct-nh-pi,pct-multiple,pct-other,eviction-filings,evictions,eviction-rate,eviction-filing-rate,low-flag,imputed,subbed
count,10123.0,10123.0,7859.0,7859.0,9232.0,7859.0,6970.0,7610.0,7511.0,7011.0,...,7859.0,7859.0,7859.0,9232.0,9232.0,9232.0,9232.0,10123.0,10123.0,10123.0
mean,5143502.0,2008.00573,10718.142893,10.737679,1172.603194,32.594347,854.081923,53783.204205,194453.4,28.546413,...,0.038169,1.668556,0.148226,190.725967,74.039021,4.220044,7.947898,0.208337,0.0,0.0
std,25203.76,4.901133,32860.178692,10.14402,4798.024399,17.205897,499.311236,32296.614073,147393.3,7.889964,...,0.193079,2.376365,0.390711,1222.358701,414.06779,31.20335,62.427726,0.406139,0.0,0.0
min,5100148.0,2000.0,29.0,0.0,0.38,0.0,0.0,4688.0,10500.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
25%,5121312.0,2004.0,560.5,3.14,41.8875,20.25,506.0,32255.75,93100.0,23.7,...,0.0,0.08,0.0,1.05,0.76,1.39,2.0375,0.0,0.0,0.0
50%,5143392.0,2008.0,1804.0,8.51,132.97,31.23,695.0,42788.0,144800.0,27.4,...,0.0,1.13,0.0,4.755,3.35,2.65,3.85,0.0,0.0,0.0
75%,5165744.0,2012.0,7926.0,15.49,618.685,42.815,1034.0,64868.0,238350.0,32.3,...,0.0,2.41,0.16,30.195,18.0,4.63,7.32,0.0,0.0,0.0
max,5188240.0,2016.0,448290.0,82.14,62900.0,100.0,3501.0,234091.0,1103000.0,50.1,...,4.53,46.77,4.86,23285.0,6474.0,1700.0,3400.0,1.0,0.0,0.0


In [8]:
# Use groupby to see value counts for various cities. There are discrepancies between various cities, with 7 and 17 as the 
# most frequent values
dfEvictCit.groupby(['name']).count()

Unnamed: 0_level_0,GEOID,year,parent-location,population,poverty-rate,renter-occupied-households,pct-renter-occupied,median-gross-rent,median-household-income,median-property-value,...,pct-nh-pi,pct-multiple,pct-other,eviction-filings,evictions,eviction-rate,eviction-filing-rate,low-flag,imputed,subbed
name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Abingdon,17,17,17,17,17,17,17,17,17,17,...,17,17,17,17,17,17,17,17,17,17
Accomac,17,17,17,17,17,17,17,17,17,17,...,17,17,17,17,17,17,17,17,17,17
Adwolf,17,17,17,17,17,17,17,17,17,17,...,17,17,17,17,17,17,17,17,17,17
Alberta,17,17,17,17,17,17,17,17,17,17,...,17,17,17,17,17,17,17,17,17,17
Alexandria,17,17,17,17,17,7,17,17,17,17,...,17,17,17,7,7,7,7,17,17,17
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
Wyndham,17,17,17,17,17,17,17,17,17,17,...,17,17,17,17,17,17,17,17,17,17
Wytheville,17,17,17,17,17,17,17,17,17,17,...,17,17,17,17,17,17,17,17,17,17
Yogaville,17,17,17,7,7,12,7,7,7,7,...,7,7,7,12,12,12,12,17,17,17
Yorkshire,17,17,17,17,17,17,17,17,17,17,...,17,17,17,17,17,17,17,17,17,17


In [20]:
# There are zeros in a number of places in the dataframe, including in the eviction counts. 
pd.set_option('display.max_rows', None)
pd.set_option('display.max_columns', None)
dfEvictCit_zeros = dfEvictCit[dfEvictCit['evictions'] == 0]

dfEvictCit[dfEvictCit['population'].isnull()]

#dfEvictCit[dfEvictCit['name'] == 'New Kent']

Unnamed: 0,GEOID,year,name,parent-location,population,poverty-rate,renter-occupied-households,pct-renter-occupied,median-gross-rent,median-household-income,median-property-value,rent-burden,pct-white,pct-af-am,pct-hispanic,pct-am-ind,pct-asian,pct-nh-pi,pct-multiple,pct-other,eviction-filings,evictions,eviction-rate,eviction-filing-rate,low-flag,imputed,subbed
85,5101256,2000,Allisonia,Virginia,,,9.19,,,,,,,,,,,,,,0.2,0.2,2.22,2.22,0,0,0
86,5101256,2001,Allisonia,Virginia,,,8.98,,,,,,,,,,,,,,0.41,0.41,4.55,4.55,0,0,0
87,5101256,2002,Allisonia,Virginia,,,8.98,,,,,,,,,,,,,,0.2,0.2,2.27,2.27,0,0,0
88,5101256,2003,Allisonia,Virginia,,,8.78,,,,,,,,,,,,,,0.2,0.2,2.33,2.33,0,0,0
89,5101256,2004,Allisonia,Virginia,,,8.58,,,,,,,,,,,,,,0.0,0.0,0.0,0.0,0,0,0
90,5101256,2005,Allisonia,Virginia,,,8.37,,,,,,,,,,,,,,0.0,0.0,0.0,0.0,0,0,0
91,5101256,2006,Allisonia,Virginia,,,8.37,,,,,,,,,,,,,,0.0,0.0,0.0,0.0,0,0,0
92,5101256,2007,Allisonia,Virginia,,,8.17,,,,,,,,,,,,,,0.0,0.0,0.0,0.0,0,0,0
93,5101256,2008,Allisonia,Virginia,,,7.96,,,,,,,,,,,,,,0.2,0.2,2.56,2.56,0,0,0
94,5101256,2009,Allisonia,Virginia,,,,,,,,,,,,,,,,,,,,,0,0,0


## Description of City-Level Data   
1. Annual data for cities for 2000-2016.
2. Missing data for some metrics, scattered throughout the dataset. 
3. GEOID is a Census location ID which may be used for matching or mapping.
4. Data subject matter generally includes the following topics:
    a. Location data
    b. Population data: Count and demographic information
    c. Renter data: Count and percent occupancy, median cost, burden(?)
    d. Property values (median)
    e. Eviction data: Filings, evictions, eviction rate, filing rate

### Specific Challenges of City-Level Data
1. There are zeros, blanks, and value count discrepancies across multiple fields (including eviction metrics) we will need to determine how to address. These
are also readily viewable via the CSV file. 
2. Need to confirm if we will be able to easily join on city easily with other datasets. The GeoID is a US Census metric (see https://www.census.gov/programs-surveys/geography/guidance/geo-identifiers.html), so it may be the required key. They may be matchable with FIPS and GNIS IDs if necessary. 

In [12]:
pd.set_option('display.max_rows', None)
pd.set_option('display.max_columns', None)

# Now read in and state-level data
dfEvictState = pd.read_csv(r'../DataSet/va_eviction_state.csv',header=0,encoding = "ISO-8859-1")

# Only 16 rows, so inspect full df
dfEvictState

Unnamed: 0,GEOID,year,name,parent-location,population,poverty-rate,renter-occupied-households,pct-renter-occupied,median-gross-rent,median-household-income,median-property-value,rent-burden,pct-white,pct-af-am,pct-hispanic,pct-am-ind,pct-asian,pct-nh-pi,pct-multiple,pct-other,eviction-filings,evictions,eviction-rate,eviction-filing-rate,low-flag,imputed,subbed
0,51,2000,Virginia,USA,7078515.0,9.59,664083.0,31.91,650.0,46677.0,125400.0,24.5,70.15,19.44,4.66,0.26,3.66,0.05,1.61,0.17,77522.0,40945.0,6.17,11.67,0,0,0
1,51,2001,Virginia,USA,7078515.0,9.59,734527.0,31.91,650.0,46677.0,125400.0,24.5,70.15,19.44,4.66,0.26,3.66,0.05,1.61,0.17,83498.0,44970.0,6.12,11.37,0,0,0
2,51,2002,Virginia,USA,7078515.0,9.59,761196.0,31.91,650.0,46677.0,125400.0,24.5,70.15,19.44,4.66,0.26,3.66,0.05,1.61,0.17,123429.0,49799.0,6.54,16.22,0,0,0
3,51,2003,Virginia,USA,7078515.0,9.59,885087.0,31.91,650.0,46677.0,125400.0,24.5,70.15,19.44,4.66,0.26,3.66,0.05,1.61,0.17,127023.0,52790.0,5.96,14.35,0,0,0
4,51,2004,Virginia,USA,7078515.0,9.59,900780.0,31.91,650.0,46677.0,125400.0,24.5,70.15,19.44,4.66,0.26,3.66,0.05,1.61,0.17,153631.0,55869.0,6.2,17.06,0,0,0
5,51,2005,Virginia,USA,7721730.0,7.16,914428.0,30.84,931.0,60316.0,247100.0,28.7,66.98,19.31,6.67,0.23,4.78,0.07,1.7,0.25,156448.0,51745.0,5.66,17.11,0,0,0
6,51,2006,Virginia,USA,7721730.0,7.16,928075.0,30.84,931.0,60316.0,247100.0,28.7,66.98,19.31,6.67,0.23,4.78,0.07,1.7,0.25,158878.0,50920.0,5.49,17.12,0,0,0
7,51,2007,Virginia,USA,7721730.0,7.16,613794.0,30.84,931.0,60316.0,247100.0,28.7,66.98,19.31,6.67,0.23,4.78,0.07,1.7,0.25,66538.0,37676.0,6.14,10.84,0,0,0
8,51,2008,Virginia,USA,7721730.0,7.16,623822.0,30.84,931.0,60316.0,247100.0,28.7,66.98,19.31,6.67,0.23,4.78,0.07,1.7,0.25,65233.0,42797.0,6.86,10.46,0,0,0
9,51,2009,Virginia,USA,7721730.0,7.16,629765.0,30.84,931.0,60316.0,247100.0,28.7,66.98,19.31,6.67,0.23,4.78,0.07,1.7,0.25,62974.0,38357.0,6.09,10.0,0,0,0


## Description of State-Level Data
See city-level description above. Metrics are comparable.


### Specific Challenges of State-Level Data
No immediate observations. Likely need to review methodology to understand how aggregation was handled - may provide clues for how we do it ourselves. 
