# Group Assignment #3: A Clean Notebook

## Compilation of Work

We decide to shift to block group data for further analysis as it will help us perform better autocorrelations studies. For this purpose, we downloaded data for two years, 2019 and 2013. There is no 2013 block group level data avaliable for our variables,  hence we shift to 2013, which is still before the projects were competed and hence within the time frame.  
In this note book we will
1. Import 2019 data and narrow it down to relevant variables.
2. Import 2013 data and do a spatial join to obtain geodataframe, and also narrow it down to relevant variables
3. Create percentage for different ethnicity groups for each year.
4. Create a percentage change for various ethnicity groups from 2013 to 2019. 

We only need the pandas and geopandas library for all these tasks. 

In [2]:
# for general data wrangling tasks
import pandas as pd

# to read and visualize spatial data
import geopandas as gpd



## Import 2019 data and narrow it down to relevant variables.

The data is avaialble at census explorer as a geoJSON format, hence saving considerable time. First to check our data.

In [1]:
b2019 = gpd.read_file("acs2019_5yr_B03002_15000US060014094001.geojson")
b2019.head()

NameError: name 'gpd' is not defined

In [None]:
b2019=b2019.drop([0])
b2019.head()

In [None]:
list(b2019)

In [None]:
columns_to_keep = ['geoid',
                   'name',
                   'B03002001',
                   'B03002002',
                   'B03002003',
                   'B03002004',
                   'B03002005',
                   'B03002006',
                   'B03002007',
                   'B03002008',
                   'B03002009',
                   'B03002012',
                   'geometry']
block2019=b2019[columns_to_keep]
block2019.head()

In [None]:
block2019['geoid'] = block2019['geoid'].str.replace('15000US','')
block2019.head()

In [None]:
block2019.columns = ['GEOID',
 'NAME',
 'Total_2019',
 'Non Hispanic_2019',
 'Non Hispanic White_2019',
 'Non Hispanic Black_2019',
 'Non Hispanic American Indian and Alaska Native_2019',
 'Non Hispanic Asian_2019',
 'Non Hispanic Native Hawaiian and Other Pacific Islander_2019',
 'Non Hispanic Some other race_2019',
 'Non Hispanic Two or more races_2019',
 'Hispanic_2019',
 'geometry']
block2019.head()

## 2013 block group data for alameda county

In [None]:
b2013 = gpd.read_file("ACSDT5Y2013.B03002_data_with_overlays_2022-02-21T160847.csv")
b2013.head()

The total row is not present in the head. So let us check the tail. 

In [None]:
b2013.tail()

In [None]:
b2013=b2013.drop([1047])

In [None]:
b2013.tail()

In [None]:
list(b2013)

In [None]:
columns_to_keep2= ['B03002_001E',
                   'B03002_002E',
                   'B03002_003E',
                   'B03002_004E',
                   'B03002_005E',
                   'B03002_006E',
                   'B03002_007E',
                   'B03002_008E',
                   'B03002_009E',
                   'B03002_012E',
                   'GEO_ID',
                   'NAME',
                   'geometry']

In [None]:
block2013=b2013[columns_to_keep2]

In [None]:
block2013.head()

In [None]:
block2013=block2013.drop([0])

In [None]:
block2013.columns = [
 'Total_2013',
 'Non Hispanic_2013',
 'Non Hispanic White_2013',
 'Non Hispanic Black_2013',
 'Non Hispanic American Indian and Alaska Native_2013',
 'Non Hispanic Asian_2013',
 'Non Hispanic Native Hawaiian and Other Pacific Islander_2013',
 'Non Hispanic Some other race_2013',
 'Non Hispanic Two or more races_2013',
 'Hispanic_2013',
 'GEOID',
 'NAME',
 'geometry']

In [None]:
block2013.head()

In [None]:
block2013['GEOID'] = block2013['GEOID'].str.replace('1500000US','')
block2013.head()

In [None]:
block2013=block2013[[
 'Total_2013',
 'Non Hispanic_2013',
 'Non Hispanic White_2013',
 'Non Hispanic Black_2013',
 'Non Hispanic American Indian and Alaska Native_2013',
 'Non Hispanic Asian_2013',
 'Non Hispanic Native Hawaiian and Other Pacific Islander_2013',
 'Non Hispanic Some other race_2013',
 'Non Hispanic Two or more races_2013',
 'Hispanic_2013',
 'GEOID',
 'NAME']]

In [None]:
block2013.head()

## To convert the dataframe into a geodata frame

In [None]:
blockshape = gpd.read_file("2013_block/tl_2013_06_bg.shp")

In [None]:
blockshape.head()

In [None]:
blockshape.tail()

In [None]:
blockshape=blockshape[['GEOID', 'geometry']]

In [None]:
blockshape.plot(figsize=(10,10))

In [None]:
block2013v = blockshape.merge(block2013, on='GEOID')

In [None]:
block2013v.head()

In [None]:
block2013v.shape

In [None]:
block2013.shape

In [None]:
block2013v.plot(figsize=(10,10))

In [None]:

# for basemaps
import contextily as ctx

# For spatial statistics
import esda
from esda.moran import Moran, Moran_Local

import splot
from splot.esda import moran_scatterplot, plot_moran, lisa_cluster,plot_moran_simulation

import libpysal as lps

# Graphics
import matplotlib.pyplot as plt
import plotly.express as px

## To Create % of ethnicity within each census tract

In [None]:
block2013v.sort_values(by='Total_2013').head(20)

In [None]:
block2013v.head()

In [None]:
random_tract= block2013v.sample(1)

In [None]:
random_tract

In [None]:
type(random_tract.iloc[0]['Non Hispanic_2013'])

In [None]:
dtypes = ['Total_2013', 
 'Non Hispanic_2013',
 'Non Hispanic White_2013',
 'Non Hispanic Black_2013',
 'Non Hispanic American Indian and Alaska Native_2013',
 'Non Hispanic Asian_2013',
 'Non Hispanic Native Hawaiian and Other Pacific Islander_2013',
 'Non Hispanic Some other race_2013',
 'Non Hispanic Two or more races_2013',
 'Hispanic_2013']

In [None]:
for i in dtypes:
    block2013v[i]=block2013v[i].astype(float)

In [None]:
random_tract2 = block2013v.sample(1)

In [None]:
type(random_tract2.iloc[0]['Non Hispanic_2013'])

In [None]:
random_block=block2019.sample(1)

In [None]:
type(random_block.iloc[0]['Non Hispanic_2019'])

In [None]:
block2019.info()

In [None]:
block2013v.info()

In [None]:
def createpercentage(block):
    for x in dtypes:
        block[("Percent " + x)] = block[x]/block['Total_2013']*100
        print(x, "Completed")

In [None]:
createpercentage(block2013v)

In [None]:
block2013v.head()

In [None]:
col2019=list(block2019)

In [None]:
col2019

In [None]:
dtypes19 = ['Total_2019', 
 'Non Hispanic_2019',
 'Non Hispanic White_2019',
 'Non Hispanic Black_2019',
 'Non Hispanic American Indian and Alaska Native_2019',
 'Non Hispanic Asian_2019',
 'Non Hispanic Native Hawaiian and Other Pacific Islander_2019',
 'Non Hispanic Some other race_2019',
 'Non Hispanic Two or more races_2019',
 'Hispanic_2019']

In [None]:
def createpercentage19(block):
    for x in dtypes19:
        block[("Percent " + x)] = block[x]/block['Total_2019']*100
        print(x, "Completed")

In [None]:
createpercentage19(block2019)

In [None]:
block2019.head()

In [None]:
block2013v=block2013v.to_crs(epsg=4326)

In [None]:
blockall=gpd.sjoin(block2019, block2013v)

In [None]:
blockall.info()

## % change from 2013 to 2019

In [None]:
ball=blockall

In [None]:
for i in list(ball.columns):
    if "2019" in i:
        if "Percent" in i:
            print(i)
        else:
            ball[i.split("_")[0]+"_change"]= (((ball[i]-ball[i.split("_")[0]+"_2013"])/ball[i.split("_")[0]+"_2013"])*100)
       

In [None]:
ball.info()

In [None]:
ball.head()

In [None]:
ball.to_file("blockgroupethnicity.geojson", driver='GeoJSON')