# Project Name

## Group Members and Contributions

 - Chiehkun (Timo) Chen
 - Jordan Daley
 - Jacob Moul, PID: A13548393
 - Hannah Peterson
 - Yun (Denise) Tang
 - George Thomas


## Introduction and Background

### Background Research

### Research Question

Question: Did the 2011-19 drought in California disproportionately affect low-income communities?

This project will examine the impacts of climate change on low-income communities (specifically through the climatic event of drought). We will focus on California, because it has experienced prolonged drought within the past decade (for 376 consecutive weeks—Dec 2011 - March 2019). In particular, we will investigate whether or not the California drought has had disproportionate negative effects on low-income communities compared to average and high-income communities. This question is important because as the effects of global warming become more severe, efforts must be made to protect communities that are most vulnerable to these negative effects.

To answer this question we are planning on analyzing different indicators of economic well being and different effects of drought for various communities over time, from before during and after the most recent drought. For example, we plan to investigate the relationships between income in communities and costs associated with the drought, such as utility rates, as well as potential health issues, such as respiratory illnesses, that are known to increase in conjunction with drought.

### Hypothesis

Hypothesis: The 2011-19 drought in California did disproportionately affect low-income communities.

We expect to find that these communities will have suffered more than relatively better-off communities because they have fewer safeguards to deal with environmental events, and also have less means to bear the cost of higher utility or healthcare rates, for example. 

## Data

### Data Sets

**Community Economic Data**
 - Data Set Name: 'cbp[yr]co.txt' (For years 2012-2016)
      - We modified these files to include only observations for California, and they have been renamed 'cbp[yr]co_mod.csv'
 - Source: https://www.census.gov/programs-surveys/cbp/data/datasets.html

> These data sets are County Business Pattern data sets, and are provided with the description: “This series includes the number of establishments, employment during the week of March 12, first quarter payroll, and annual payroll. This data is useful for studying the economic activity of small areas; analyzing economic changes over time; and as a benchmark for other statistical series, surveys, and databases between economic censuses”. After being condensed to just the state of California, the 2016 data set (out of many others) is composed of 36616 observations of 26 variables, several of which are identifying information such as state or county code. In addition, it contains values for first quarter payroll, annual payroll, and number of employees, among other variables, for different industries in each county of California. This data comes from the US Census Bureau. All of these data sets are downloadable in csv format.


## Data Cleaning/Pre-Processing

In [1]:
import pandas as pd
import numpy as np
import seaborn as sns

In [2]:
cbp12 = pd.read_csv('Data/cbp12co_mod.csv')
cbp13 = pd.read_csv('Data/cbp13co_mod.csv')

In [10]:
{
    'FIPSTATE': 'FIPS State Code',
    'FIPSCTY': 'FIPS County Code',
    'NAICS': 'Industry Code',
    'EMPFLAG': 'Data Suppression Flag'
    'EMP_NF': 'Total Mid-March Employees Noise Flag',
    'EMP': 'Total Mid-March Employees with Noise',
    'QP1_NF': 'Total First Quarter Payroll Noise Flag',
    'QP1': 'Total First Quarter Payroll ($1,000) with Noise',
    'AP_NF': 'Total Annual Payroll Noise Flag',
    'AP': 'Total Annual Payroll ($1,000) with Noise',
    'EST': 'Total Number of Establishments',
    'N1_4': 'Number of Establishments: 1-4 Employee Size Class',
    'N5_9': 'Number of Establishments: 5-9 Employee Size Class',
    'N10_19': 'Number of Establishments: 10-19 Employee Size Class',
    'N20_49': 'Number of Establishments: 20-49 Employee Size Class',
    'N50_99': 'Number of Establishments: 50-99 Employee Size Class',
    'N100_249': 'Number of Establishments: 100-249 Employee Size Class',
    'N250_499': 'Number of Establishments: 250-499 Employee Size Class',
    'N500_999': 'Number of Establishments: 500-999 Employee Size Class',
    'N1000': 'Number of Establishments: 1,000 or More Employee Size Class',
    'N1000_1': 'Number of Establishments: Employment Size Class: 1,000-1,499 Employees',
    'N1000_2': 'Number of Establishments: Employment Size Class: 1,500-2,499 Employees',
    'N1000_3': 'Number of Establishments: Employment Size Class: 2,500-4,999 Employees',
    'N1000_4': 'Number of Establishments: Employment Size Class: 5,000 or More Employees',
    'CENSTATE': 'Census State Code',
    'CENCTY': 'Census County Code'
}

In [3]:
cbp12.head()

Unnamed: 0.1,Unnamed: 0,fipstate,fipscty,naics,empflag,emp_nf,emp,qp1_nf,qp1,ap_nf,...,n100_249,n250_499,n500_999,n1000,n1000_1,n1000_2,n1000_3,n1000_4,censtate,cencty
0,115015,6,1,------,,G,587140,G,9012283,G,...,667,159,54,35,13,15,6,1,93,1
1,115016,6,1,11----,,H,61,H,780,H,...,0,0,0,0,0,0,0,0,93,1
2,115017,6,1,114///,A,D,0,D,0,D,...,0,0,0,0,0,0,0,0,93,1
3,115018,6,1,1141//,A,D,0,D,0,D,...,0,0,0,0,0,0,0,0,93,1
4,115019,6,1,11411/,A,D,0,D,0,D,...,0,0,0,0,0,0,0,0,93,1


## Data Description

## Data Exploration

## Data Analysis

## Ethical Considerations

As the data for this research will only require looking at quantitative measures such as income values or disease rates, there will be no need for personal information if it presents itself. To best protect the privacy of the individuals we are collecting data from, all personal information not related to the data sets specifically (such as name or address of the household we are collecting utility data from) will be removed in the end results. We do not believe though that our question or datasets are invasive in nature and predict this will be of little occurrence if any. For our analyses, being aware of the racial inequalities present in low income communities is important. Before making any specific generalizations, we will make sure (if the data is available) that the ethnicities of households or individuals that are making up the census data are representative of the communities we are looking at. 

## Conclusions and Discussion