# **Updated population density of the LA city neighborhood councils.**

###  Here is a notebook to evaluate the populataion, area(in square miles) and population density of the 99 Neighborhood councils (NCs). 

**Have calculated the population of each neighborhood council using the [census tract 2020](https://www2.census.gov/geo/tiger/TIGER2020PL/STATE/06_CALIFORNIA/06037/), [updated NC shape file](https://geohub.lacity.org/datasets/9c8639737e3a457a8c0f6a93f9c36974_18/about)- 99 councils and the American Community Survey (ACS) [2020 demographics data](https://api.census.gov/data/2020/acs/acs5/profile?get=group(DP05)&for=tract:*&in=state:06&in=county:037&key=cb51343974c4b01dc140a03af63a82c6424272ee).**


**Note: The Tiger zip file that was downloaded- tl_2020_06037_tract20.zip. 
Save the ACS 2020 demographics data from this [link](https://api.census.gov/data/2020/acs/acs5/profile?get=group(DP05)&for=tract:*&in=state:06&in=county:037&key=cb51343974c4b01dc140a03af63a82c6424272ee) as a csv file and clean it (removing the empty space, closed bracket and quotes). Use this [resource](https://api.census.gov/data/2020/acs/acs5/profile/variables) to understand the variables in the ACS file. I found this [video](https://www.census.gov/library/video/2020/using-api-all-results-for-acs-table.html) very useful.**


In [1]:
# Importing the necessary packages.
import os
import pandas as pd
import geopandas as gpd
import matplotlib.pyplot as plt 
import folium
from shapely.geometry import Point, Polygon 
from pyproj import Geod
import webbrowser

In [2]:
os.getcwd()
os.chdir('C:/Users/AdithiPriya/Desktop/Hack for LA/Geospatial analysis')

# Census 2020 data.
census_2020 = gpd.read_file('C:/Users/AdithiPriya/Desktop/Hack for LA/Geospatial analysis/tl_2020_06037_tract20/tl_2020_06037_tract20.shp')

# LA NC shape file.
la_nc = gpd.read_file('C:/Users/AdithiPriya/Desktop/Hack for LA/Geospatial analysis/Neighborhood_Councils_(Certified)/Neighborhood_Councils_(Certified).shp')

# ACS demographics data.
acs_demo = pd.read_csv('ACS_census_tract_LA.csv')
acs_demo['GEO_ID']= acs_demo['GEO_ID'].str.replace('1400000US','')

# rename GEO_ID as GEOID20.

acs_demo= acs_demo.rename(columns={'GEO_ID':'GEOID20', 'Total population': 'population'})

### Defining a [function](https://github.com/hackforla/access-the-data-workshop-311-analysis/blob/main/notebooks/NC-population-density.ipynb) to compute the area in square miles (area (sq_miles)).

In [3]:
geod = Geod(ellps= 'WGS84') 

def area_sq_miles(geo):
    area_sq_meters= abs(geod.geometry_area_perimeter(geo)[0])
    return (area_sq_meters * 3.86102e-7)

In [4]:
# Taking a subset of the acs_demo dataframe.

acs_demo_subset=acs_demo[['GEOID20','population' ]]

# Making sure that the crs- coordinate reference system for census_2020 is the same as that of the la_nc using the to_crs()
# method. This will allow the spatial merging of both the geopandas dataframes.

census_2020 = census_2020.to_crs(la_nc.crs)

# Spatial overlap of the la_nc and census_2020 data.
census_NC = gpd.overlay(census_2020, la_nc, how='intersection')

census_NC= census_NC[['TRACTCE20','GEOID20', 'NAME', 'NC_ID', 'INTPTLAT20', 'INTPTLON20', 'geometry']]
print('Number of rows in census_NC: ', census_NC.shape[0])  

Number of rows in census_NC:  2108


In [5]:
# Merging the census_NC and acs_demo_subset.
census = pd.merge(census_NC, acs_demo_subset, on='GEOID20')

### Displaying the duplicate entries- census tracts intersecting more than 1 Neighborhood councils

In [6]:
df_duplicate= census_NC[census_NC.GEOID20.duplicated(keep=False)]
df_duplicate.sort_values(by=['GEOID20']).head(15)

Unnamed: 0,TRACTCE20,GEOID20,NAME,NC_ID,INTPTLAT20,INTPTLON20,geometry
1,101122,6037101122,SUNLAND-TUJUNGA NC,10,34.2677213,-118.2901465,"POLYGON ((-118.30334 34.27371, -118.30330 34.2..."
17,101122,6037101122,FOOTHILL TRAILS DISTRICT NC,9,34.2677213,-118.2901465,"POLYGON ((-118.29785 34.27778, -118.29783 34.2..."
5,101300,6037101300,SUNLAND-TUJUNGA NC,10,34.2487777,-118.270999,"POLYGON ((-118.27822 34.25068, -118.27822 34.2..."
18,101300,6037101300,FOOTHILL TRAILS DISTRICT NC,9,34.2487777,-118.270999,"POLYGON ((-118.26682 34.23124, -118.26695 34.2..."
6,101400,6037101400,SUNLAND-TUJUNGA NC,10,34.2428521,-118.2941612,"POLYGON ((-118.32227 34.24961, -118.32212 34.2..."
19,101400,6037101400,FOOTHILL TRAILS DISTRICT NC,9,34.2428521,-118.2941612,"POLYGON ((-118.32238 34.24963, -118.32227 34.2..."
42,102103,6037102103,SUN VALLEY AREA NC,8,34.2250792,-118.354188,"POLYGON ((-118.36533 34.22870, -118.36396 34.2..."
20,102103,6037102103,FOOTHILL TRAILS DISTRICT NC,9,34.2250792,-118.354188,"POLYGON ((-118.35739 34.22856, -118.35546 34.2..."
43,102104,6037102104,SUN VALLEY AREA NC,8,34.2161873,-118.3453981,"POLYGON ((-118.35620 34.21971, -118.35594 34.2..."
21,102104,6037102104,FOOTHILL TRAILS DISTRICT NC,9,34.2161873,-118.3453981,"MULTIPOLYGON (((-118.34413 34.21387, -118.3441..."


### So here is a method to take care of cases where the census tract intersects more than 1 NCs. 

- Find the area for each entry in census dataframe. 
- Group the census dataframe by 'TRACTCE20' and then find sum of the area- this gives you the total area of each census tract. 
- Next, find the percentage of area of the census tract intersecting different NCs. 
- Use this information to find the percentage of population for each census tract.
- Then find the total population by grouping them by NCs. 

In [7]:
# Add the area column.
census = pd.merge(census_NC, acs_demo_subset, on='GEOID20')
census['area(sq_miles)']= census.apply(lambda x: area_sq_miles(x.geometry), axis=1)
census[['NC_ID','NAME', 'TRACTCE20', 'population','area(sq_miles)']].head(15)
census.head(20)

Unnamed: 0,TRACTCE20,GEOID20,NAME,NC_ID,INTPTLAT20,INTPTLON20,geometry,population,area(sq_miles)
0,101110,6037101110,SUNLAND-TUJUNGA NC,10,34.2594737,-118.2929869,"POLYGON ((-118.30229 34.25870, -118.30091 34.2...",3923,0.441083
1,101122,6037101122,SUNLAND-TUJUNGA NC,10,34.2677213,-118.2901465,"POLYGON ((-118.30334 34.27371, -118.30330 34.2...",4119,1.020872
2,101122,6037101122,FOOTHILL TRAILS DISTRICT NC,9,34.2677213,-118.2901465,"POLYGON ((-118.29785 34.27778, -118.29783 34.2...",4119,3.8e-05
3,101220,6037101220,SUNLAND-TUJUNGA NC,10,34.2516083,-118.2816328,"POLYGON ((-118.28592 34.25227, -118.28592 34.2...",3775,0.269841
4,101221,6037101221,SUNLAND-TUJUNGA NC,10,34.254329,-118.2925767,"POLYGON ((-118.29945 34.25598, -118.29792 34.2...",3787,0.136748
5,101222,6037101222,SUNLAND-TUJUNGA NC,10,34.2513519,-118.2885261,"POLYGON ((-118.29434 34.25233, -118.29318 34.2...",2717,0.114484
6,101300,6037101300,SUNLAND-TUJUNGA NC,10,34.2487777,-118.270999,"POLYGON ((-118.27822 34.25068, -118.27822 34.2...",3741,0.993003
7,101300,6037101300,FOOTHILL TRAILS DISTRICT NC,9,34.2487777,-118.270999,"POLYGON ((-118.26682 34.23124, -118.26695 34.2...",3741,0.002387
8,101400,6037101400,SUNLAND-TUJUNGA NC,10,34.2428521,-118.2941612,"POLYGON ((-118.32227 34.24961, -118.32212 34.2...",3246,2.414663
9,101400,6037101400,FOOTHILL TRAILS DISTRICT NC,9,34.2428521,-118.2941612,"POLYGON ((-118.32238 34.24963, -118.32227 34.2...",3246,0.021664


In [8]:
census_test=census.groupby('TRACTCE20', as_index= False).agg({'NC_ID':'first','NAME':'first','geometry':'first', 'INTPTLAT20':'first', 'INTPTLON20':'first', 'NAME':'first','GEOID20':'first','area(sq_miles)':sum})
census_test.rename(columns={'area(sq_miles)': 'total_area(sq_miles)'}, inplace=True)
census_test= census_test[['TRACTCE20','total_area(sq_miles)']]

In [9]:
# Let us add the total_area to census dataframe.
census_perc= pd.merge(census, census_test, on='TRACTCE20')
census_perc.head(10)

Unnamed: 0,TRACTCE20,GEOID20,NAME,NC_ID,INTPTLAT20,INTPTLON20,geometry,population,area(sq_miles),total_area(sq_miles)
0,101110,6037101110,SUNLAND-TUJUNGA NC,10,34.2594737,-118.2929869,"POLYGON ((-118.30229 34.25870, -118.30091 34.2...",3923,0.441083,0.441083
1,101122,6037101122,SUNLAND-TUJUNGA NC,10,34.2677213,-118.2901465,"POLYGON ((-118.30334 34.27371, -118.30330 34.2...",4119,1.020872,1.02091
2,101122,6037101122,FOOTHILL TRAILS DISTRICT NC,9,34.2677213,-118.2901465,"POLYGON ((-118.29785 34.27778, -118.29783 34.2...",4119,3.8e-05,1.02091
3,101220,6037101220,SUNLAND-TUJUNGA NC,10,34.2516083,-118.2816328,"POLYGON ((-118.28592 34.25227, -118.28592 34.2...",3775,0.269841,0.269841
4,101221,6037101221,SUNLAND-TUJUNGA NC,10,34.254329,-118.2925767,"POLYGON ((-118.29945 34.25598, -118.29792 34.2...",3787,0.136748,0.136748
5,101222,6037101222,SUNLAND-TUJUNGA NC,10,34.2513519,-118.2885261,"POLYGON ((-118.29434 34.25233, -118.29318 34.2...",2717,0.114484,0.114484
6,101300,6037101300,SUNLAND-TUJUNGA NC,10,34.2487777,-118.270999,"POLYGON ((-118.27822 34.25068, -118.27822 34.2...",3741,0.993003,0.99539
7,101300,6037101300,FOOTHILL TRAILS DISTRICT NC,9,34.2487777,-118.270999,"POLYGON ((-118.26682 34.23124, -118.26695 34.2...",3741,0.002387,0.99539
8,101400,6037101400,SUNLAND-TUJUNGA NC,10,34.2428521,-118.2941612,"POLYGON ((-118.32227 34.24961, -118.32212 34.2...",3246,2.414663,2.436327
9,101400,6037101400,FOOTHILL TRAILS DISTRICT NC,9,34.2428521,-118.2941612,"POLYGON ((-118.32238 34.24963, -118.32227 34.2...",3246,0.021664,2.436327


In [10]:
# Evaluating the percentage of the intersecting areas (area_perc):
census_perc['area_perc'] = census_perc['area(sq_miles)']/census_perc['total_area(sq_miles)']

# Adding the percentage of population column
census_perc['total_population'] = census_perc['population']*census_perc['area_perc']   

In [11]:
# Summming up the population of each NCs.
census_pop =census_perc.groupby('NAME', as_index= False).agg({'NC_ID':'first',  'NAME':'first', 'total_population' : sum})
census_pop.head(15)

Unnamed: 0,NC_ID,NAME,total_population
0,6,ARLETA NC,37550.428577
1,42,ARROYO SECO NC,20635.891874
2,46,ARTS DISTRICT LITTLE TOKYO NC,5696.208864
3,37,ATWATER VILLAGE NC,29735.923325
4,64,BEL AIR-BEVERLY CREST NC,40598.581678
5,50,BOYLE HEIGHTS NC,102449.248792
6,13,CANOGA PARK NC,55027.703115
7,110,CENTRAL ALAMEDA NC,44115.098783
8,32,CENTRAL HOLLYWOOD NC,24871.036211
9,95,CENTRAL SAN PEDRO NC,30356.711594


In [12]:
census_pop.total_population.sum()

4534712.0

### Now that we have the population of each NC, let us move on to getting the area and finally the updated population density of the neighborhood councils. 

In [13]:
# Grouping the original census dataframe by NAME and then summing up the total area- this gives the area of each neighborhood 
# council. Very important note here- groupby function works when trying to aggregate dataframes but for spatial data, we can
# aggregate the geometry features using dissolve function.

census_area = census.dissolve(by= 'NAME', as_index= False, aggfunc=({'NC_ID':'first','area(sq_miles)' : sum }))

# Gathering all the columns of interest. 
census_final = census_area.join(census_pop['total_population'])
census_final['pop_density']= census_final['total_population']/census_final['area(sq_miles)']

# Rearranging the columns.
census_final = census_final[['NAME', 'geometry', 'NC_ID', 'total_population', 'area(sq_miles)', 'pop_density']]
census_final.head(15)

Unnamed: 0,NAME,geometry,NC_ID,total_population,area(sq_miles),pop_density
0,ARLETA NC,"POLYGON ((-118.41010 34.23309, -118.41034 34.2...",6,37550.428577,3.284868,11431.335753
1,ARROYO SECO NC,"POLYGON ((-118.18576 34.09293, -118.18576 34.0...",42,20635.891874,3.063327,6736.43201
2,ARTS DISTRICT LITTLE TOKYO NC,"POLYGON ((-118.22877 34.04155, -118.22827 34.0...",46,5696.208864,0.879216,6478.740126
3,ATWATER VILLAGE NC,"POLYGON ((-118.25399 34.10816, -118.25424 34.1...",37,29735.923325,8.74845,3398.993417
4,BEL AIR-BEVERLY CREST NC,"POLYGON ((-118.46573 34.07325, -118.46581 34.0...",64,40598.581678,17.038756,2382.719852
5,BOYLE HEIGHTS NC,"POLYGON ((-118.20504 34.01263, -118.20504 34.0...",50,102449.248792,5.735881,17861.119752
6,CANOGA PARK NC,"POLYGON ((-118.58846 34.19524, -118.58846 34.1...",13,55027.703115,3.689892,14913.093303
7,CENTRAL ALAMEDA NC,"POLYGON ((-118.23777 33.98933, -118.23777 33.9...",110,44115.098783,1.358014,32485.008067
8,CENTRAL HOLLYWOOD NC,"POLYGON ((-118.32445 34.08712, -118.32445 34.0...",32,24871.036211,1.229127,20234.724357
9,CENTRAL SAN PEDRO NC,"POLYGON ((-118.28794 33.73151, -118.28795 33.7...",95,30356.711594,2.438025,12451.356236


### I am going to go ahead and plot the census_final using geopandas explore- folium. 

In [14]:
m= census_final.explore(
    column= 'NAME', # make choropleth based on 'NC name' column
    name='NC Regions', 
    tooltip='NAME', # show 'NC name' value in tooltip (on hover)
    color="red", # use red color on all points
    popup=True, # show all values in popup (on click)
    tiles="openstreetmap", # use "openstreetmap" tiles
    cmap="Set1", # use "Set1" matplotlib colormap
    style_kwds=dict(color="black"), # use black outline
    legend=False
     )
folium.TileLayer('Stamen Terrain').add_to(m)
folium.TileLayer('Stamen Toner').add_to(m)
folium.TileLayer('Stamen Water Color').add_to(m)
folium.TileLayer('cartodbpositron').add_to(m)
folium.TileLayer('cartodbdark_matter').add_to(m)
folium.LayerControl().add_to(m)
m.save('census_final.html')
webbrowser.open('census_final.html')

True

In [15]:
census_final.total_population.sum()

4534712.0

### According to the [Census Bureau](https://www.census.gov/quickfacts/losangelescitycalifornia?), the total population estimate of LA city councils using the census 2020 data is: 3,849,297. 

### The sum of the population of all NCs at the tract level is inflated. I am going to explore this and add some filters - area and population to account for this inflated value.