# Seeing The Forest: Tree Equity in Oakland, California
## Elliott Shaw's and Michael Rosen's 206A midterm

### What is tree equity, and why does it matter?

For our 206A midterm, we will be analyzing the distribution of street trees in Oakland, California. But before we get into all of that, we'd like to start by explaining the stakes of our investigation.

First, and most importantly, trees are our best weapon against urban heat islands. As the planet warms and days with extreme heat become more frequent, the massive protection trees provide will only be more important.

Trees provide countless benefits to their surroundings, including:

 - Improved air quality
 - Noise absorption
 - Contributions to a more vibrant ecosystem
 - Looking nice

Given all of these benefits of trees, we are curious if...

### Research Question

#### Are trees equally distributed across the city of Oakland, or are there variations that can be predicted by socioeconomic indicators?

For our investigation, we'll be bringing in a few datasets...

### Data Sources

First, we'll upload the necessary libraries and then bring in our street trees dataset, courtesy of the city of Oakland. 

In [1]:
import pandas as pd
import geopandas as gpd
oaktrees=gpd.read_file('https://opendata.arcgis.com/datasets/385456a220174ef1854738b4029df3fd_0.geojson')



Next, we'll bring in our American Community Survey datasets: Table B19001 for household income, Table B03002 for race, and Table B15003 for educational attainment.

 

In [2]:
oakincome=gpd.read_file('oakdata.geojson')
oakrace=gpd.read_file('oakrace.geojson')
oakeduc=gpd.read_file('oakeduc.geojson')

Everything's looking good so far, so we're going to move onto...

### Data Exploration and Data Cleaning

To start, let's make sure all of our dataframes are looking as expected.

In [4]:
oakincome.head()

Unnamed: 0,geoid,name,B19013001,"B19013001, Error",geometry
0,15000US060014001001,"Block Group 1, Alameda, CA",219861.0,17887.0,"MULTIPOLYGON (((-122.24692 37.88544, -122.2466..."
1,15000US060014002001,"Block Group 1, Alameda, CA",237500.0,23842.0,"MULTIPOLYGON (((-122.25508 37.84607, -122.2542..."
2,15000US060014002002,"Block Group 2, Alameda, CA",162583.0,63106.0,"MULTIPOLYGON (((-122.25792 37.84261, -122.2577..."
3,15000US060014003001,"Block Group 1, Alameda, CA",183482.0,79006.0,"MULTIPOLYGON (((-122.25186 37.84475, -122.2517..."
4,15000US060014003002,"Block Group 2, Alameda, CA",101736.0,36064.0,"MULTIPOLYGON (((-122.26230 37.83786, -122.2622..."


In [5]:
oakrace.head()

Unnamed: 0,geoid,name,B03002001,"B03002001, Error",B03002002,"B03002002, Error",B03002003,"B03002003, Error",B03002004,"B03002004, Error",...,"B03002017, Error",B03002018,"B03002018, Error",B03002019,"B03002019, Error",B03002020,"B03002020, Error",B03002021,"B03002021, Error",geometry
0,15000US060014001001,"Block Group 1, Alameda, CA",3120.0,208.0,3002.0,191.0,2317.0,235.0,107.0,68.0,...,12.0,34.0,33.0,0.0,12.0,0.0,12.0,0.0,12.0,"MULTIPOLYGON (((-122.24692 37.88544, -122.2466..."
1,15000US060014002001,"Block Group 1, Alameda, CA",990.0,138.0,894.0,129.0,761.0,129.0,13.0,13.0,...,12.0,5.0,7.0,23.0,25.0,0.0,12.0,23.0,25.0,"MULTIPOLYGON (((-122.25508 37.84607, -122.2542..."
2,15000US060014002002,"Block Group 2, Alameda, CA",1017.0,123.0,939.0,129.0,714.0,116.0,39.0,46.0,...,12.0,12.0,14.0,32.0,34.0,14.0,25.0,18.0,25.0,"MULTIPOLYGON (((-122.25792 37.84261, -122.2577..."
3,15000US060014003001,"Block Group 1, Alameda, CA",1134.0,238.0,1059.0,241.0,735.0,215.0,0.0,12.0,...,12.0,0.0,12.0,0.0,12.0,0.0,12.0,0.0,12.0,"MULTIPOLYGON (((-122.25186 37.84475, -122.2517..."
4,15000US060014003002,"Block Group 2, Alameda, CA",1237.0,263.0,1139.0,255.0,1008.0,248.0,18.0,27.0,...,12.0,37.0,60.0,0.0,12.0,0.0,12.0,0.0,12.0,"MULTIPOLYGON (((-122.26230 37.83786, -122.2622..."


In [6]:
oakeduc.tail()

Unnamed: 0,geoid,name,B15003001,"B15003001, Error",B15003002,"B15003002, Error",B15003003,"B15003003, Error",B15003004,"B15003004, Error",...,"B15003021, Error",B15003022,"B15003022, Error",B15003023,"B15003023, Error",B15003024,"B15003024, Error",B15003025,"B15003025, Error",geometry
333,15000US060019819001,"Block Group 1, Alameda, CA",58.0,43.0,0.0,12.0,0.0,12.0,0.0,12.0,...,12.0,30.0,17.0,4.0,5.0,0.0,12.0,9.0,13.0,"MULTIPOLYGON (((-122.34668 37.81103, -122.3441..."
334,15000US060019820001,"Block Group 1, Alameda, CA",49.0,20.0,4.0,5.0,0.0,12.0,0.0,12.0,...,12.0,14.0,13.0,13.0,9.0,0.0,12.0,0.0,12.0,"MULTIPOLYGON (((-122.31439 37.79484, -122.3135..."
335,15000US060019832001,"Block Group 1, Alameda, CA",542.0,64.0,19.0,25.0,0.0,12.0,0.0,12.0,...,14.0,233.0,43.0,133.0,35.0,43.0,16.0,18.0,14.0,"MULTIPOLYGON (((-122.28417 37.79402, -122.2838..."
336,15000US060019900000,"Block Group 0, Alameda, CA",0.0,12.0,0.0,12.0,0.0,12.0,0.0,12.0,...,12.0,0.0,12.0,0.0,12.0,0.0,12.0,0.0,12.0,"MULTIPOLYGON (((-122.09859 37.49488, -122.0910..."
337,16000US0653000,"Oakland, CA",308577.0,1640.0,11429.0,859.0,78.0,84.0,287.0,111.0,...,934.0,77864.0,1520.0,38590.0,1071.0,11609.0,635.0,7821.0,491.0,"MULTIPOLYGON (((-122.35588 37.83573, -122.3507..."


The good news: this looks like data, and each dataframe has a geometry column. The bad news: it looks like we've got some serious data cleaning ahead of us. Right away, we can see we need to rename columns. Secondly, we need to remove the citywide summary row for each dataframe. Finally, and most onerously, we're going to have to calculate proportions for oakrace and oakeduc. Let's get started!