# SOC 88 Final Project - EPA and Pollution

This notebook is the first step for your final project.  In this notebook, you will be introduced to your dataset and guided through some analysis and visualizations that you may use for your final policy brief.  You can use some or all of the figures provided below to formulate your argument. 

Due to the limited examples shown in this notebook, it is highly encouraged to perform your own analysis and create your own figures.

# The Data

In [None]:
from datascience import *
import matplotlib.pyplot as plt
import numpy as np
%matplotlib inline

Our dataset is the CalEnviroScreen 3.0 data, which contains data on a variety of different environmental markers in different census regions, or "tracts," as well as an overall pollution score, called the "CES 3.0 Score."  In our version of the data table, it is labelled as __ces_30_score__.  It also contains some demographic information.  More information on the data can be found here: https://oehha.ca.gov/media/downloads/calenviroscreen/report/ces3report.pdf

In [None]:
clean_ces_data = Table.read_table("data/cleaned_epa_data.csv")
clean_ces_data.show(5)

In [None]:
clean_ces_data.labels

As you can see, there are quite a few different categories to look at.  We will provide some figures below, and you can use those or use them as a starting point for your own analysis.

__Note__: We found that many of these data points do not show any correlation, so don't be concerned if you find the same.

# The Analysis

Let's explore how pollution score affect asthma rates.

__The plot below has no labels and limited formatting.  It is your job to format the plot and to determine the axis labels and title.__

In [None]:
plt.clf()

fig = plt.figure(facecolor='white', figsize=(8,8))
ax = fig.add_subplot(111)

plt.scatter(clean_ces_data.column("ces_30_score"), clean_ces_data.column("asthma"))

#add trendline using np.polyfit: takes x, y, and degree of polynomial (1)
#recall y = m*x + b
m, b = np.polyfit(clean_ces_data.column("ces_30_score"), clean_ces_data.column("asthma"), 1)
plt.plot(clean_ces_data.column("ces_30_score"), m*clean_ces_data.column("ces_30_score") + b, color='red')

plt.show()

Let's explore the relationship between the percent of nonwhite residents in a census tract and its pollution score.

__Again, the plot below has no labels and limited formatting.  It is your job to format the plot and to determine the axis labels and title.__ Perhaps consider adding a trendline?

In [None]:
plt.clf()

fig = plt.figure(facecolor='white', figsize=(8,8))
ax = fig.add_subplot(111)

plt.scatter(clean_ces_data.column("pct_nonwhite"), clean_ces_data.column("ces_30_score"))

plt.show()

Let's now look at how poverty of a census tract and pollution score relate to one another.

__Again, the plot below has no labels and limited formatting.  It is your job to format the plot and to determine the axis labels and title.__ Perhaps consider adding a trendline?

In [None]:
plt.clf()

fig = plt.figure(facecolor='white', figsize=(8,8))
ax = fig.add_subplot(111)

plt.scatter(clean_ces_data.column("poverty"), clean_ces_data.column("ces_30_score"))

plt.show()

Let's now look at mapping visualizations.  Due to the complicated nature of this data, cleaning and readjusting the data has already been done for you.

In [None]:
#run this cell, do not change the contents
#load dependencies
import folium
import json
import os
import pandas as pd

In [None]:
#run this cell, do not change the contents
#load raw Cal EnviroScreen xlsx file, this will take a few seconds, format to match geojson
full_epa_data = pd.read_excel('data/ces3results.xlsx')
full_epa_data['Census Tract'] = full_epa_data['Census Tract'].apply(str)

In [None]:
#run this cell, do not change the contents
#load clean geojson
mapping = json.load(open('data/formatted_ca_tracts.geojson'))

The map below is a __template__ for you to use to create your own maps.  __The formatting is up to you.__

In [None]:
#this code will take a few seconds to run
center = [37.16611, -119.44944]

m = folium.Map(center, zoom_start=6)

folium.Choropleth(
    geo_data=mapping,
    data=full_epa_data,
    columns=['Census Tract', 'CES 3.0 Score'],
    key_on='feature.properties.GEOID',
    fill_color='RdPu',
    legend_name='CES 3.0 Pollution Score'
).add_to(m)

folium.LayerControl().add_to(m)

m

# Next Steps

After formatting these plots, your task is to use these visualizations or others of your own creation in developing a policy brief, impact plan, and optional explainer video.  Good luck!

_Developed by Katherine Oosterbaan, Keilyn Yuzuki, and Keeley Takimoto_