# Bus Access and Income in Los Angeles: Final Notebook 1
**_Arturo Jacobo & Miranda Mead-Newton_**

For our final research project, we are exploring bus access and income in Los Angeles County, and have made recommendations according to our findings. Bus riders in Los Angeles are some of the most low-income populations in the City and changes are coming to the bus network. We sought to examine whether these planned changes are going to improve bus network coverage for low income populations. We found that very little coverage changes are planned for the new NextGen bus network. We examine the areas of the City we believe deserve higher coverage.

## Demographic Background
In this notebook, we examine demographic information for the City and County of LA and examine the spatial relationship between race, income, and commute mode in Los Angeles.

In [None]:
#importing libraries
import geopandas as gpd
import pandas as pd
import matplotlib.pyplot as plt
import contextily as ctx
import urllib.request, json 
import plotly.express as px
import plotly.graph_objects as go

In [None]:
#import data from Census Reporter on income, race, and commute modes in LA County
#race and income
gdf = gpd.read_file('data/acs2015_2019.geojson')

#commute mode
tr = gpd.read_file('data/acs2019_5yr_B08134_14000US06037185320.geojson')

### Cleaning Data

Using census data, we had to make sure to drop the summary row for both the demographic dataset and the commuter dataset. We used County information for the demographic dataset and City information for the commuter dataset. If we were to redo this section we would probably use commuter data for the County as well, but the data sets can still be somewhat compared as you will see under the mapping section. 

Here we're cleaning up the demographic dataset. This dataset was already cleaned by Yoh in lab. We just grabbed the relevant columns.

In [None]:
gdf = gdf[['FIPS',
           'geometry',
           '% Total Population: Black or African American Alone',
           '% Total Population: White Alone',
           '% Total Population: Hispanic or Latino',
           'Median Household Income (In 2019 Inflation Adjusted Dollars)']]

Here we're cleaning up the commuter data. And keeping the relavent columns. 

In [None]:
tr=tr.drop([1004])

In [None]:
columns_to_keep = ['geoid',
                  'name',
                  'B08134001',
                  'B08134021',
                  'B08134071',
                  'geometry']

In [None]:
tr = tr[columns_to_keep]

In [None]:
tr.columns = ['geoid',
 'name',
 'Total',
 'Drove_Solo',
 'Bus',
 'geometry']

In [None]:
# create a new column and populate it with normalized data to get the percent of total value
# we are creating new variables
tr['Percent_D_Solo'] = tr['Drove_Solo']/tr['Total']*100
tr['Percent_Bus']= tr['Bus']/tr['Total']*100

### Graphing Data

We began by graphing the datasets. 

#### Demographic info

In [None]:
indicators = ['% Total Population: Black or African American Alone',
                   '% Total Population: White Alone',
                   '% Total Population: Hispanic or Latino',
                   'Median Household Income (In 2019 Inflation Adjusted Dollars)',]

In [None]:
def get_histogram(column = '% Total Population: Black or African American Alone'):
    series_to_plot=gdf[column]

    plt.figure(figsize=(10,5))

    plt.hist(series_to_plot,bins=50,color='skyblue')

    plt.axvline(series_to_plot.mean(), color='k', linestyle='dashed', linewidth=1)
    plt.axvline(series_to_plot.median(), color='r', linestyle='dashed', linewidth=1)
    min_ylim, max_ylim = plt.ylim()
    plt.text(series_to_plot.mean()*1.1, max_ylim*0.9, 'Mean: {:.2f}'.format(series_to_plot.mean()))
    plt.text(series_to_plot.median()*1.1, max_ylim*0.8, 'Median: {:.2f}'.format(series_to_plot.median()),color='r')
    plt.title(column + ' in Los Angeles County')

In [None]:
for indicator in indicators:
    get_histogram(column=indicator)

These histograms show the racial percentages for Census Tracts in LA County. 

#### Commuter info

In [None]:
tr_1 = pd.DataFrame.from_records(tr)

In [None]:
x0 = tr_1['Percent_D_Solo']
x1 = tr_1['Percent_Bus']

fig = go.Figure()
fig.add_trace(go.Histogram(
    x=x0,
    histnorm='percent',
    name='Percent of CT that drive to work', # name used in legend and hover labels
    xbins=dict( # bins used for histogram
        start=0,
        end=100,
        size=1
    ),
    marker_color='#EB89B5',
    opacity=0.75
))
fig.add_trace(go.Histogram(
    x=x1,
    histnorm='percent',
    name='Percent of CT that take the bus',
    xbins=dict(
        start=0,
        end=100,
        size=1
    ),
    marker_color='#330C73',
    opacity=0.75
))

fig.update_layout(
    title_text='Histogram of Census Tract Percentages by Commute Mode for the City of LA', # title of plot
    xaxis_title_text='Percent', # xaxis label
    yaxis_title_text='Count', # yaxis label
    bargap=0.2, # gap between bars of adjacent location coordinates
    bargroupgap=0.1 # gap between bars of the same location coordinates
)

This graph shows the frequency of proportions of census tracts in LA that drive to work or take the bus. Unsurprisingly, we can see that it's much more common to drive to work. In almost all census tracts in LA, less than 20% of the population takes the bus to work.

### Mapping Data

#### Demographic info

In [None]:
def get_map(column='% Total Population: Black or African American Alone'):
    ax = gdf.plot(figsize=(10,10),
                  column=column,
                  legend=True)
    ax.set_ylim(33.6,34.9)
    ax.set_title(column, fontsize=14)
    ax.axis('off');

In [None]:
for indicator in indicators:
    get_map(indicator) 

The census tracts with the lowest incomes tend to be those with the highest Black/African American and/or Hispanic/Latino populations.

#### Commuter info

In [None]:
# create the 1x2 subplots
fig, axs = plt.subplots(1, 2, figsize=(15, 12))

# name each subplot
ax1, ax2 = axs

# regular count map on the left
tr.plot(column='Percent_Bus', 
            cmap='RdYlGn_r', scheme='user_defined', 
         classification_kwds={'bins':[10,20,30,40,50,60,70,80,90,100]},
            
            edgecolor='white', 
            linewidth=0., 
            alpha=0.75, 
            ax=ax1, # this assigns the map to the subplot,
            legend=True
           )

ax1.axis("off")
ax1.set_title("Percentage of Bus Riders by Census Tract")

# spatial lag map on the right
tr.plot(column='Percent_D_Solo', 
            cmap='RdYlGn_r', 
            scheme='user_defined', 
         classification_kwds={'bins':[10,20,30,40,50,60,70,80,90,100]},
            edgecolor='white', 
            linewidth=0., 
            alpha=0.75, 
            ax=ax2, # this assigns the map to the subplot
            legend=True
           )

ax2.axis("off")
ax2.set_title("Percentage of Car Drivers by Census Tract")

Though we looked at City data for commuting and County demographic data, looking at the map displaying percent of bus riders per census tract and percent drivers per census, we can also see a correlation between the census tracts with the lowest median incomes, higher Black/African American and/or Hispanic/Latino populations, and higher percentages of bus riders. This is concentrated in Central and South Central LA. This fits with the our research that bus riders in LA tend to be lower income and non-White.