# Merge with census

## Setup Python and R environment

In [None]:
%load_ext rpy2.ipython
%load_ext autoreload
%autoreload 2

%matplotlib inline  
from matplotlib import rcParams
rcParams['figure.figsize'] = (16, 100)

import warnings
from rpy2.rinterface import RRuntimeWarning
warnings.filterwarnings("ignore") # Ignore all warnings
# warnings.filterwarnings("ignore", category=RRuntimeWarning) # Show some warnings

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from IPython.display import display, HTML

In [None]:
%%javascript
// Disable auto-scrolling
IPython.OutputArea.prototype._should_scroll = function(lines) {
    return false;
}

In [None]:
%%R

# My commonly used R imports

require('tidyverse')

## Load data

Load the data along with the census connectors below.

In [None]:
%%R
df <- read_csv('data/intermediary/2023_subway_censusgeo.csv')

In [None]:
df = pd.read_csv('data/intermediary/2023_subway_censusgeo.csv')
print(df.shape)
pd.set_option('display.max_columns', None)
df.head()

## 👉 Grab Census Data

1. loading the Census API key

In [None]:
import dotenv

# Load the environment variables
# (loads CENSUS_API_KEY from .env)
dotenv.load_dotenv()


In [None]:
%%R 

require('tidycensus')

# because it an environment variable, we don't have to 
# explicitly pass this string to R, it is readable here
# in this R cell.
census_api_key(Sys.getenv("CENSUS_API_KEY"))

2. Decide which Census variables you want

    Use <https://censusreporter.org/> to figure out which tables you want. (if censusreporter is down, check out the code in the cell below)

    -   Scroll to the bottom of the page to see the tables.
    -   If you already know the table ID, stick that in the "Explore" section to learn more about that table.

    By default this code loads (B01003_001) which we found in censusreporter here: https://censusreporter.org/tables/B01003/

    - find some other variables that you're also interested in
    - don't forget to pick a geography like "tract", "county" or "block group". here is the list of [all geographies](https://walker-data.com/tidycensus/articles/basic-usage.html#geography-in-tidycensus
    ).


In [None]:
%%R 

# Get variable from ACS
nyc_census_data <- get_acs(geography = "tract", 
                      state='NY',
                      county = c("New York", "Kings", "Queens", "Bronx", "Richmond"),
                      variables = c(
                        population="B01003_001", 
                        med_earn="B19013_001", # Median household income in the past 12 months
                        sub_pop='B08301_012', # Population using subway or elevated rail to work
                        amb_pop='B18105_001' # Population with Ambulatory Difficulty
                      ), 
                      year = 2021,
                      survey="acs5",
                      geometry=T)
options(width = 1000)

nyc_census_data

## 👉 Merge it with your data

hint...`tidycensus` provides you data in long format you may need to pivot the census data from long to wide format before merging it with your data

In [None]:
%%R

# pivot from long to wide
nyc_census_data <- nyc_census_data %>% 
  pivot_wider(
    names_from = variable, 
    values_from = c(estimate, moe),
    names_glue = "{variable}_{.value}"
  )
options(width = 1000)
nyc_census_data

In [None]:
%%R
df

In [None]:
%%R 

# keep the first 11 digits in df$GEOID
df$GEOID <-  substr(df$GEOID, 1, 11) %>%
    as.numeric(df$GEOID)

# change df$GEOID to double
nyc_census_data$GEOID <- as.numeric(nyc_census_data$GEOID)
df$GEOID <- as.numeric(df$GEOID)
    
# merge nyc_census_data with df on GEOID
df_census <- merge(df, nyc_census_data, by = "GEOID")

In [None]:
%%R 
write_csv(df_census, "data/intermediary/2023_subway_censusvar.csv")