## **Week 7 Assignment**

In this assignment, I cleaned up the bar graph and maps from the midterm (and made sure they were actually functioning this time). 

I wanted to make the bar graph and choropleth map make a little more sense by getting the youth percentage by neighborhood, rather than census tract. I also wanted to add markers to the folium choropleth map that display the school locations as well to answer the question if schools were even located in neighborhoods with a higher percentage of children. 



In [65]:
#libraries
import pandas as pd
import geopandas as gpd
import matplotlib.pyplot as plt
import contextily as ctx
import numpy as np
import branca.colormap as cm
import plotly.express as px
import folium

### Data and Data Cleaning

In [66]:
#data
ag = pd.read_csv('data/sfpop2.csv')
dict_for_FIPS = {
    'GEO_ID': str,
    }
ag = pd.read_csv(
    'data/sfpop2.csv',
    dtype=dict_for_FIPS
)

In [67]:
# process the population data

ag = pd.read_csv(
    'data/sfpop2.csv',
    dtype={'GEO_ID':str}
)
ag = ag[["GEO_ID", "NAME", "S0601_C01_001E", "S0601_C01_003E"]]

ag = ag.rename(
    columns={
        'GEO_ID': 'FIPS', 
        'NAME' : 'Tract',
        'S0601_C01_001E' : 'Total tract pop',
        'S0601_C01_003E' : 'Total Youth pop'
    }
) 
ag = ag.drop(labels=0, axis=0)
# Drop percent youth with no data, marked as "-"
ag = ag[ag['Total Youth pop'] != '-']
# Change it to numeric
ag['Total Youth pop'] = ag['Total Youth pop'].astype(float)
ag['Total tract pop'] = ag['Total tract pop'].astype(float)

In [68]:
#converting percent youth by tract into a decimal point
ag['Total Youth pop'] = ag['Total Youth pop']/100

In [69]:
ag['Total Youth pop'] = ag['Total tract pop'] * ag['Total Youth pop']

In [70]:
# process the neighborhoods data

neighborhoods = pd.read_csv(
    'data/SFCensusTractstoNeighborhoods.csv',
    dtype={'geoid':str}
)

neighborhoods = neighborhoods[['neighborhoods_analysis_boundaries', 'geoid']]

neighborhoods = neighborhoods.rename(
    columns={'neighborhoods_analysis_boundaries': 'neighborhoods', 
        'geoid': 'FIPS'}
)

In [71]:
# process tracts data
tracts=gpd.read_file('data/Census 2020_ Tracts for San Francisco.geojson')
tracts = tracts[['geoid','geometry']]
tracts.columns = ['FIPS',
                  'geometry'
                 ]
#getting rid of Farallone Islands
tracts.drop([35,88,205], axis=0, inplace=True)

In [72]:
# merge all data. Output should be a GeoDataFrame
ag_merged1 = ag.merge(neighborhoods,on="FIPS")
ag_merged = ag_merged1.merge(tracts, on="FIPS")
# force it
ag_merged = gpd.GeoDataFrame(ag_merged, geometry='geometry')

### Bar Graph of Youth in San Francisco by Neighborhood

In [73]:
#test plot
fig2 = px.bar(ag_merged,
              x='neighborhoods',
              y='Total Youth pop',
              title='Number of Youth Ages 5 to 17 by Neighborhood',
              barmode='group',
              color_discrete_sequence=['lightseagreen'],
              height=700
              )
fig2

### Cleaning up more data to reflect youth population by neighborhood across the city

In [74]:
# summing up youth population by neighborhood
ag_merged['neighborhoods'] = ag_merged['neighborhoods'].astype(str)
ag_merged['Total Youth pop'] = pd.to_numeric(ag_merged['Total Youth pop'], errors = 'coerce')
ag_merged['Total tract pop'] = pd.to_numeric(ag_merged['Total tract pop'], errors = 'coerce')

#group by neighborhood and sum youth population
neighborhood_sums = ag_merged.groupby('neighborhoods', as_index=False)['Total Youth pop'].sum()

#group by neighborhood and sum each tract population
totalpop_sums = ag_merged.groupby('neighborhoods', as_index=False)['Total tract pop'].sum()

#merging the dataframes
neighborhood_sums = neighborhood_sums.merge(totalpop_sums, on='neighborhoods')

# rename columns
neighborhood_sums.columns = ['neighborhoods', 'youth population', 'neighborhood population']

neighborhood_sums.head()

Unnamed: 0,neighborhoods,youth population,neighborhood population
0,Bayview Hunters Point,6473.613,40495.0
1,Bernal Heights,2897.052,24767.0
2,Castro/Upper Market,1604.797,22623.0
3,Chinatown,1038.35,13693.0
4,Excelsior,5119.668,38846.0


In [75]:
total_citypop = neighborhood_sums['neighborhood population'].sum()
print('Total SF population:', total_citypop)

Total SF population: 851036.0


In [76]:
# creating another column on the dataframe
neighborhood_sums['youth percentage'] = (
    neighborhood_sums['youth population'] / total_citypop
) * 100
neighborhood_sums.head()

Unnamed: 0,neighborhoods,youth population,neighborhood population,youth percentage
0,Bayview Hunters Point,6473.613,40495.0,0.760674
1,Bernal Heights,2897.052,24767.0,0.340415
2,Castro/Upper Market,1604.797,22623.0,0.18857
3,Chinatown,1038.35,13693.0,0.12201
4,Excelsior,5119.668,38846.0,0.601581


In [77]:
neighbor_sums = px.bar(neighborhood_sums,
              x='neighborhoods',
              y='youth percentage',
              title='Percent of youth per neighborhood in San Francisco',
              barmode='group',
              color_discrete_sequence=['lightseagreen'],
              height=700
              )
neighbor_sums

This graph more accurately reflects the concentration of youth in the city by neighborhood. It shows that compared to the city's population, the Sunset/Parkside, Bayview Hunters Point, and Excelsior neighborhoods have the most children in the city. This helps inform us as to where 

## Interactive Choropleth Map 

In [78]:
# merging neighborhood_sums and ag_merged
ag_merged = ag_merged.merge(
    neighborhood_sums[['neighborhoods', 'youth percentage']],
    on='neighborhoods',
    how='left'
)

### Cleaning school data to add to the Choropleth

In [79]:
# school location data
schools = gpd.read_file('data/Filtered_Schools.csv')
schools = schools[schools['CCSF Entity'].str.contains('SFUSD')]
schools = schools[schools['Grade Range'].str.contains('9-12')]
schools = schools[['Campus Name', 'Campus Address', 'Location 1']]
schools = schools.drop(index=38)
schools.head(10)

Unnamed: 0,Campus Name,Campus Address,Location 1
11,"Marshall, Thurgood Marshall High School","45 CONKLING ST, San Francisco, CA 94124","CA\n(37.736309, -122.401649)"
90,"Burton, Phillip And Sala Burton High School","400 MANSELL ST, San Francisco, CA 94134","CA\n(37.721546, -122.406555)"
92,"Washington, George Washington High School","600 32ND AVE, San Francisco, CA 94121","CA\n(37.777905, -122.491013)"
135,"Lincoln, Abraham Lincoln High School","2162 24TH AVE, San Francisco, CA 94116","CA\n(37.746594, -122.48024)"
174,Life Learning Academy Charter School,"651 8TH TI ST, SAN FRANCISCO, CA 94130","CA\n(37.825512, -122.367996)"
194,Gateway High School / Kipp Sf Bay Academy,"1430 SCOTT ST, San Francisco, CA 94115","CA\n(37.783264, -122.436691)"
195,Galileo High School,"1150 FRANCISCO ST, San Francisco, CA 94109","CA\n(37.803791, -122.424149)"
219,Balboa High School,"1000 CAYUGA AVE, San Francisco, CA 94112","CA\n(37.721142, -122.441399)"
284,City Arts And Tech High School,"325 LA GRANDE AVE, San Francisco, CA 94112","CA\n(37.718784, -122.424667)"
286,Independence High School,"1350 07TH AVE, San Francisco, CA 94122","CA\n(37.763226, -122.463585)"


In [80]:
schools[['latitude', 'longitude']] = schools['Location 1'].str.extract(r'\(([^,]+), ([^,]+)\)').astype(float)
schools.head()

Unnamed: 0,Campus Name,Campus Address,Location 1,latitude,longitude
11,"Marshall, Thurgood Marshall High School","45 CONKLING ST, San Francisco, CA 94124","CA\n(37.736309, -122.401649)",37.736309,-122.401649
90,"Burton, Phillip And Sala Burton High School","400 MANSELL ST, San Francisco, CA 94134","CA\n(37.721546, -122.406555)",37.721546,-122.406555
92,"Washington, George Washington High School","600 32ND AVE, San Francisco, CA 94121","CA\n(37.777905, -122.491013)",37.777905,-122.491013
135,"Lincoln, Abraham Lincoln High School","2162 24TH AVE, San Francisco, CA 94116","CA\n(37.746594, -122.48024)",37.746594,-122.48024
174,Life Learning Academy Charter School,"651 8TH TI ST, SAN FRANCISCO, CA 94130","CA\n(37.825512, -122.367996)",37.825512,-122.367996


In [81]:
m = folium.Map(location=[37.7619, -122.4194], 
               zoom_start = 12,
               tiles='CartoDB positron', 
               attribution='CartoDB')

folium.Choropleth(
                  geo_data=ag_merged, # geo data
                  data=ag_merged, # data          
                  key_on='feature.properties.FIPS', # key, or merge column
                  columns=['FIPS', 'youth percentage'], # [key, value]
                  fill_color='YlGnBu',
                  line_weight=0.1, 
                  fill_opacity=0.8,
                  line_opacity=0.5, # line opacity (of the border)
                  legend_name='Population School-Age Youth').add_to(m) # name on the legend color bar



<folium.features.Choropleth at 0x7904486bc850>

In [82]:
style_function = lambda x: {'fillColor': '#ffffff', 
                            'color':'#000000', 
                            'fillOpacity': 0.1, 
                            'weight': 0.1}
highlight_function = lambda x: {'fillColor': '#000000', 
                                'color':'#000000', 
                                'fillOpacity': 0.50, 
                                'weight': 0.1}
TRACTS = folium.features.GeoJson(
    ag_merged,
    style_function=style_function, 
    control=False,
    highlight_function=highlight_function, 
    tooltip=folium.features.GeoJsonTooltip(
        fields=['FIPS','neighborhoods','youth percentage'],
        aliases=['FIPS Code: ','Neighborhood: ','School-age youth population in %: '],
        style=("background-color: white; color: #333333; font-family: arial; font-size: 12px; padding: 10px;") 
    )
)
m.add_child(TRACTS)
m.keep_in_front(TRACTS)
folium.LayerControl().add_to(m)

for index, row in schools.iterrows():
    folium.Marker(
        location=[row['latitude'], row['longitude']],
        popup=row['Campus Name'],
        icon=folium.Icon(color='blue', icon='school')
    ).add_to(m)

m

Where are all the children located in San Francisco? Are schools located in neighborhoods with higher concentrations of children? 

The map shows that most children are located in the Sunset/Parkside neighborhood on the west, and in the Bayview Hunters Point District on the east. Other top neighborboods with children include the Mission, Excelsior, Outer Richmond, and West of Twin Peaks neighborhoods. 

Despite the large concentration of children in the Sunset/Parkside, there are only 2 high schools on the westside of the city. On the contrast, there is a cluster of schools in the Mission neighborhoods and center of San Francisco (Western Addition/USF), despite not having a large concentration of children there. With an open enrollment system where any student is allowed to enroll anywhere in the city, some school commutes are bound to be messy, but we can see from our map where most student commutes begin. 

This map still has limitations. Although it lists out every high school available for open enrollment within SFUSD, it does not acknowledge the variety of school sizes. Some schools are bound to have larger or smaller capacities for enrollment (such as having a school enrollment of 1,800 students compared to 100) which are not reflected in this visualization. 