# Notebook Title

## Setup Python and R environment
you can ignore this section

In [1]:
%load_ext rpy2.ipython
%load_ext autoreload
%autoreload 2

%matplotlib inline  
from matplotlib import rcParams
rcParams['figure.figsize'] = (16, 100)

import warnings
from rpy2.rinterface import RRuntimeWarning
warnings.filterwarnings("ignore") # Ignore all warnings
# warnings.filterwarnings("ignore", category=RRuntimeWarning) # Show some warnings

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from IPython.display import display, HTML

1: Setting LC_COLLATE failed, using "C" 
2: Setting LC_TIME failed, using "C" 
3: Setting LC_MESSAGES failed, using "C" 
4: Setting LC_MONETARY failed, using "C" 


In [2]:
%%javascript
// Disable auto-scrolling
IPython.OutputArea.prototype._should_scroll = function(lines) {
    return false;
}

<IPython.core.display.Javascript object>

In [3]:
%%R

# My commonly used R imports

require('tidyverse')

── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.4.4     ✔ tibble    3.2.1
✔ lubridate 1.9.3     ✔ tidyr     1.3.0
✔ purrr     1.0.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors


Loading required package: tidyverse


## Load & Clean Data

👉 Load the data along with the census connectors below (the output of the `connect-to-census.ipynb` notebook) and do any cleanup you'd like to do.

In [6]:
df = pd.read_csv('baltimore-lmop-with-geocodes.csv')
df.head()

Unnamed: 0,GHGRP ID,Landfill ID,Landfill Name,State,Physical Address,City,County,Zip Code,lat,long,...,Actual MW Generation,Rated MW Capacity,LFG Flow to Project (mmscfd),Current Year Emission Reductions (MMTCO2e/yr) - Direct,Current Year Emission Reductions (MMTCO2e/yr) - Avoided,GEOID,STATE,COUNTY,TRACT,BLOCK
0,1007291.0,734,Alpha Ridge SLF,MD,2350 Marriottsville Road,Marriottsville,Howard,21104.0,39.305776,-76.898803,...,0.58,1.059,0.28,0.0294,0.0028,240276030013008,24,27,603001,3008
1,,735,Annapolis SLF,MD,Defense Highway,Annapolis,Anne Arundel,21401.0,38.992,-76.573,...,,,,,,240037516002006,24,3,751600,2006
2,,736,Appeal SLF,MD,,Lusby,Calvert,20657.0,38.381112,-76.438334,...,,,,,,240098610033000,24,9,861003,3000
3,1000331.0,10120,Beulah Municipal Landfill,MD,6815 East New Market Ellwood Road,Hurlock,Dorchester,21643.0,38.6735,-75.8994,...,,,,,,240199702002028,24,19,970200,2028
4,,740,Bowley's Lane LF,MD,Bowley's Lane,Baltimore,Baltimore city,21206.0,39.3138,-76.5444,...,,,,,,245102604022006,24,510,260402,2006


## 👉 Grab Census Data

1. loading the Census API key

In [7]:
import dotenv

# Load the environment variables
# (loads CENSUS_API_KEY from .env)
dotenv.load_dotenv()


False

In [9]:
%%R 

require('tidycensus')

# because it an environment variable, we don't have to 
# explicitly pass this string to R, it is readable here
# in this R cell.
census_api_key(Sys.getenv("CENSUS_API_KEY"))

census_api_key("841d451c54bb61bb1407f462e7f89747f36f0b19")

To install your API key for use in future sessions, run this function with `install = TRUE`.
To install your API key for use in future sessions, run this function with `install = TRUE`.


2. Decide which Census variables you want

    Use <https://censusreporter.org/> to figure out which tables you want. (if censusreporter is down, check out the code in the cell below)

    -   Scroll to the bottom of the page to see the tables.
    -   If you already know the table ID, stick that in the "Explore" section to learn more about that table.

    By default this code loads (B01003_001) which we found in censusreporter here: https://censusreporter.org/tables/B01003/

    - find some other variables that you're also interested in
    - don't forget to pick a geography like "tract", "county" or "block group". here is the list of [all geographies](https://walker-data.com/tidycensus/articles/basic-usage.html#geography-in-tidycensus
    ).


In [10]:
%%R 

# Finding the Census Varaibles for the ACS 5 year survey
# Generall you'd do this in CensusReporter, but since it's down sometimes, here it is using tidycensus's load_variables function

# get every single variable in the ACS5
all_census_vars <- load_variables(2021, "acs5", cache = TRUE) 

filtered_census_vars <- all_census_vars %>% 
    filter(grepl("median income", label, ignore.case = TRUE))   # filter to those containing "median income"
    
# write to CSV so we can view it in python
filtered_census_vars %>% 
    write_csv("filtered_census_vars.csv")

# show the first few rows
filtered_census_vars %>%
    select(-geography) %>% # remove the geography column
    print(n = 20) # print the first 20 rows

# A tibble: 46 × 3
   name         label                                                    concept
   <chr>        <chr>                                                    <chr>  
 1 B06011PR_001 Estimate!!Median income in the past 12 months --!!Total: MEDIAN…
 2 B06011PR_002 Estimate!!Median income in the past 12 months --!!Total… MEDIAN…
 3 B06011PR_003 Estimate!!Median income in the past 12 months --!!Total… MEDIAN…
 4 B06011PR_004 Estimate!!Median income in the past 12 months --!!Total… MEDIAN…
 5 B06011PR_005 Estimate!!Median income in the past 12 months --!!Total… MEDIAN…
 6 B06011_001   Estimate!!Median income in the past 12 months --!!Total: MEDIAN…
 7 B06011_002   Estimate!!Median income in the past 12 months --!!Total… MEDIAN…
 8 B06011_003   Estimate!!Median income in the past 12 months --!!Total… MEDIAN…
 9 B06011_004   Estimate!!Median income in the past 12 months --!!Total… MEDIAN…
10 B06011_005   Estimate!!Median income in the past 12 months --!!Total… MEDIAN…
11 B07011

In [17]:
%%R 
# the variable B01003_001E was selectd from the census table 
# for population, which we found in censusreporter here:
# https://censusreporter.org/tables/B01003/

# in the table below, pick the geography, the variables, and the survey you want to pull from
# see the possible values here https://walker-data.com/tidycensus/articles/basic-usage.html

# Get variable from ACS
md_census_data <- get_acs(geography = "tract", 
                      state='MD',
                      #county = c("New York", "Kings", "Queens", "Bronx", "Richmond"),
                      variables = c(
                        population="B11003_001E",
                        med_inc="B19013_001E"
                      ), 
                      year = 2019,
                      survey="acs5",
                      geometry=T)

md_census_data

Simple feature collection with 2812 features and 5 fields (with 20 geometries empty)
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: -79.48765 ymin: 37.91172 xmax: -75.04894 ymax: 39.72304
Geodetic CRS:  NAD83
First 10 features:
         GEOID                                                   NAME
1  24031700310      Census Tract 7003.10, Montgomery County, Maryland
2  24031700310      Census Tract 7003.10, Montgomery County, Maryland
3  24047950000          Census Tract 9500, Worcester County, Maryland
4  24047950000          Census Tract 9500, Worcester County, Maryland
5  24005408601       Census Tract 4086.01, Baltimore County, Maryland
6  24005408601       Census Tract 4086.01, Baltimore County, Maryland
7  24005490703       Census Tract 4907.03, Baltimore County, Maryland
8  24005490703       Census Tract 4907.03, Baltimore County, Maryland
9  24033806602 Census Tract 8066.02, Prince George's County, Maryland
10 24033806602 Census Tract 8066.02, Prince George's

Getting data from the 2015-2019 5-year ACS
Downloading feature geometry from the Census website.  To cache shapefiles for use in future sessions, set `options(tigris_use_cache = TRUE)`.
Using FIPS code '24' for state 'MD'


In [26]:
%%R
# put census data into pd dataframe
md_census_df <- as.data.frame(md_census_data)


In [35]:
%%R
# md census data to csv
write_csv(md_census_df, "md_census_data.csv")

## 👉 Merge it with your data

hint...`tidycensus` provides you data in long format you may need to pivot the census data from long to wide format before merging it with your data

In [43]:
%%R
# pivot md_census_data to wide format

md_census_df_wide <- md_census_df %>%
    pivot_wider(names_from = "NAME", values_from = "estimate")

In [56]:
%%R
# Load necessary packages
library(dplyr)

# Assuming df and md_census_df are your dataframes
# Replace 'geoid' and 'GEOID' with the column names you want to join on

# Merge df with md_census_df using left join
merge_df <- left_join(df, md_census_df, by = c("geoid" = "GEOID"))

Error in UseMethod("left_join") : 
  no applicable method for 'left_join' applied to an object of class "function"


RInterpreterError: Failed to parse and evaluate line '# Load necessary packages\nlibrary(dplyr)\n\n# Assuming df and md_census_df are your dataframes\n# Replace \'geoid\' and \'GEOID\' with the column names you want to join on\n\n# Merge df with md_census_df using left join\nmerge_df <- left_join(df, md_census_df, by = c("geoid" = "GEOID"))\n'.
R error message: 'Error in UseMethod("left_join") : \n  no applicable method for \'left_join\' applied to an object of class "function"'

In [57]:
import pandas as pd

# Assuming md_census_data and df are your two dataframes
# Merge md_census_data with df using a left join on the 'GEOID' column
merge_df = pd.merge(df, md_census_df, how='left', on='GEOID')

NameError: name 'md_census_df' is not defined

In [62]:
md_census_data = pd.read_csv('md_census_data.csv')

In [65]:
df.columns

Index(['GHGRP ID', 'Landfill ID', 'Landfill Name', 'State', 'Physical Address',
       'City', 'County', 'Zip Code', 'lat', 'long', 'Ownership Type',
       'Landfill Owner Organization(s)', 'Year Landfill Opened',
       'Landfill Closure Year', 'Current Landfill Status',
       'Waste in Place (tons)', 'Waste in Place Year',
       'LFG Collection System In Place?', 'LFG Collected (mmscfd)',
       'LFG Flared (mmscfd)', 'Project ID', 'Current Project Status',
       'Project Name', 'Project Start Date', 'Project Shutdown Date',
       'Project Type Category', 'LFG Energy Project Type',
       'RNG Delivery Method', 'Actual MW Generation', 'Rated MW Capacity',
       'LFG Flow to Project (mmscfd)',
       'Current Year Emission Reductions (MMTCO2e/yr) - Direct',
       'Current Year Emission Reductions (MMTCO2e/yr) - Avoided', 'GEOID',
       'STATE', 'COUNTY', 'TRACT', 'BLOCK'],
      dtype='object')

In [67]:
df_merge = df.merge(md_census_data, left_on='GEOID', right_on='GEOID', how='left')

In [69]:
# save df_merge to csv

df_merge.to_csv('baltimore-lmop-with-geocodes-and-census.csv', index=False)

In [68]:
df_merge

Unnamed: 0,GHGRP ID,Landfill ID,Landfill Name,State,Physical Address,City,County,Zip Code,lat,long,...,GEOID,STATE,COUNTY,TRACT,BLOCK,NAME,variable,estimate,moe,geometry
0,1007291.0,734,Alpha Ridge SLF,MD,2350 Marriottsville Road,Marriottsville,Howard,21104.0,39.305776,-76.898803,...,240276030013008,24,27,603001,3008,,,,,
1,,735,Annapolis SLF,MD,Defense Highway,Annapolis,Anne Arundel,21401.0,38.992,-76.573,...,240037516002006,24,3,751600,2006,,,,,
2,,736,Appeal SLF,MD,,Lusby,Calvert,20657.0,38.381112,-76.438334,...,240098610033000,24,9,861003,3000,,,,,
3,1000331.0,10120,Beulah Municipal Landfill,MD,6815 East New Market Ellwood Road,Hurlock,Dorchester,21643.0,38.6735,-75.8994,...,240199702002028,24,19,970200,2028,,,,,
4,,740,Bowley's Lane LF,MD,Bowley's Lane,Baltimore,Baltimore city,21206.0,39.3138,-76.5444,...,245102604022006,24,510,260402,2006,,,,,
5,1002655.0,741,Brown Station Road Sanitary Landfill,MD,3500 Brown Station Road,Upper Marlboro,Prince George's,20774.0,38.851,-76.789,...,240338006071007,24,33,800607,1007,,,,,
6,1002655.0,741,Brown Station Road Sanitary Landfill,MD,3500 Brown Station Road,Upper Marlboro,Prince George's,20774.0,38.851,-76.789,...,240338006071007,24,33,800607,1007,,,,,
7,1002655.0,741,Brown Station Road Sanitary Landfill,MD,3500 Brown Station Road,Upper Marlboro,Prince George's,20774.0,38.851,-76.789,...,240338006071007,24,33,800607,1007,,,,,
8,1004812.0,742,Cecil County Central Landfill,MD,758 East Old Philadelphia Road,Elkton,Cecil,21921.0,39.5969,-75.9136,...,240150309033011,24,15,30903,3011,,,,,
9,1005295.0,738,Central SLF,MD,7091 Central Site Lane,Newark,Worcester,21841.0,38.2159,-75.3186,...,240479512001053,24,47,951200,1053,,,,,
