# For High Society, Less Recycling
By Sawyer Click, Frida Cai and Aseem Shukla

This project looks at recycling over the last decade in New York's 59 community districts. 

### Where did the data come from?

Recycling data comes from NYC Open Data. We might now use all of them, but they're there in case we want to!

    Monthly Tonnage Data
    Public Recycling Baskets
    Recycling Diversion and Capture Rates

We have census data by community district through the city's Community Profiles

    District Demographics


In [2]:
import pandas as pd
import datetime as dt
import json

import matplotlib
import matplotlib.pyplot as plt
matplotlib.rcParams['pdf.fonttype'] = 42
pd.set_option('display.max_colwidth', -1)
%matplotlib inline
plt.style.use('fivethirtyeight')

# Read in monthly tonnage data
It's <a href='https://data.cityofnewyork.us/City-Government/DSNY-Monthly-Tonnage-Data/ebb7-mvp5'>data</a> from the Department of Sanitation every month and is detailed by Community District
We're reading in a truncated version because we don't need columns. 

For recycling & refuse:
- refuse
- paper
- metal, plastic and glass
- organics

For identifiers:
- borough
- borough_id
- communitydistrict

In [3]:
df = pd.read_csv('monthly_tonnage_2019.csv', usecols=['BOROUGH', 'MONTH', 'COMMUNITYDISTRICT', 'BOROUGH_ID', 'MGPTONSCOLLECTED', 'REFUSETONSCOLLECTED', 'PAPERTONSCOLLECTED', 'RESORGANICSTONS'])
df.head(1)

Unnamed: 0,MONTH,BOROUGH,COMMUNITYDISTRICT,REFUSETONSCOLLECTED,PAPERTONSCOLLECTED,MGPTONSCOLLECTED,RESORGANICSTONS,BOROUGH_ID
0,1993 / 11,Manhattan,1,625.2,119.4,34.4,,1


# Cleaning the data
taking out the nonsense, converting dates, and calculating the district id

In [4]:
df = df.fillna(0) ## fill the nan with 0s.

## make datetime, separate the year
df.MONTH = pd.to_datetime(df.MONTH)
df['year'] = df.MONTH.astype(str).str.extract(r'^(.*)-.*-.*').astype(int)

## it's read in as an int
df.BOROUGH_ID = df.BOROUGH_ID.astype(str)

## just for the past 10 years
df = df[df.year > 2008]

# give the borough code
def get_boro_code(x):
    return x.BOROUGH_ID + "{:02d}".format(x.COMMUNITYDISTRICT)
df['borough_code'] = df.apply(get_boro_code, axis=1)

# Calculating percentages and changes

In [5]:
## some math to get the percents and totals
df['total_recycled'] = df.PAPERTONSCOLLECTED + df.MGPTONSCOLLECTED + df.RESORGANICSTONS ## these are the recycleables
df['total_waste'] = df.total_recycled + df.REFUSETONSCOLLECTED ## all recycled + all non-recycled
df['percent_recycled'] = df['total_recycled'] / df['total_waste'] * 100 ## grabbing the percent recycled out of total waste

## Making a dataset based on years

In [16]:
by_year = df.groupby(['year', 'borough_code']).percent_recycled.mean().reset_index()

by_year.to_csv('recycled_by_year.csv', index=False)

## Splitting it into two decades to track the difference
2019 data was just finalized, so we'll use 2009 and 2019

In [46]:
bydist_2019 = df[df['year']==2019].groupby('borough_code').percent_recycled.mean().reset_index().rename(columns={'percent_recycled':'percent_recycled_2019'})
bydist_2014 = df[df['year']==2014].groupby('borough_code').percent_recycled.mean().reset_index().rename(columns={'percent_recycled':'percent_recycled_2014'})
bydist_2009 = df[df['year']==2009].groupby('borough_code').percent_recycled.mean().reset_index().rename(columns={'percent_recycled':'percent_recycled_2009'})

In [58]:
## throwing the two together
bydist = pd.merge(bydist_2019, bydist_2014['percent_recycled_2014'],left_index = True, right_index = True)
joined = pd.merge(bydist, bydist_2009, on='borough_code')
# print(joined)

## skimming down columns
joined.columns = ['borough_code','pct_recyc_2019','pct_recyc_2014', 'pct_recyc_2009']

## grabbing pct change and pct point change from 2019 and 2009
joined['pct_change'] = (joined['pct_recyc_2019'] / joined['pct_recyc_2009'] - 1) * 100
joined['pct_point_change'] = joined['pct_recyc_2019'] - joined['pct_recyc_2009']

joined

Unnamed: 0,borough_code,pct_recyc_2019,pct_recyc_2014,pct_recyc_2009,pct_change,pct_point_change
0,101,33.321613,29.826129,34.408124,-3.157718,-1.086511
1,102,29.872936,25.847944,26.322415,13.488585,3.550521
2,103,17.891729,16.358382,17.288615,3.488505,0.603114
3,104,28.727106,25.114945,27.317897,5.158557,1.409209
4,105,26.919081,23.794573,25.016822,7.603919,1.902259
5,106,26.690802,23.401618,28.033464,-4.789495,-1.342661
6,107,29.136751,25.092309,25.598104,13.823864,3.538647
7,108,28.333311,25.212358,25.873587,9.506701,2.459725
8,109,20.453201,16.214241,14.481206,41.239624,5.971995
9,110,16.636785,12.831079,10.459362,59.061178,6.177423


## Reading in demographic data
The Department of Urban Planning has <a href='https://communityprofiles.planning.nyc.gov/'>demographic data</a> by community district, ripped from the census.

In [59]:
demo = pd.read_csv('Bronx-2-indicators.csv')
demo.borocd = demo.borocd.astype(str)

## Merging and spitting out a master dataset

In [63]:
merged = joined.merge(demo, left_on='borough_code', right_on='borocd')
merged.to_csv('demographics_and_recyc.csv', index=False)
merged.head(1)

## there are a TON of columns, so I'm cutting some in excel. it's quicker!
## Here's what the cut version looked like
final = pd.read_csv('final.csv')
final.head(1)

Unnamed: 0,borough_code,pct_recyc_2019,pct_recyc_2014,pct_recyc_2009,pct_change,pct_point_change,cd_full_title,fp_100_mhhi,mean_commute,neighborhoods,pct_bach_deg,pct_clean_strts,pct_foreign_born,pct_white_nh,poverty_rate,son_issue_1,son_issue_2,son_issue_3
0,101,33.321613,29.826129,34.408124,-3.157718,-1.086511,Manhattan Community District 1,121000.0,25.2,"Battery Park City, Civic Center, Ellis Island, Governors Island, Liberty Island, South Street Seaport, Tribeca, Wall Street, World Trade Center",82.3,95.4,23.9,72.0,8.8,Resiliency,Traffic,Other


# Analysis time

In [68]:
df

Unnamed: 0,MONTH,BOROUGH,COMMUNITYDISTRICT,REFUSETONSCOLLECTED,PAPERTONSCOLLECTED,MGPTONSCOLLECTED,RESORGANICSTONS,BOROUGH_ID,year,borough_code,total_recycled,total_waste,percent_recycled
3,2012-03-01,Manhattan,1,1194.4,341.6,210.8,0.0,1,2012,101,552.4,1746.8,31.623540
9,2012-01-01,Brooklyn,7,3156.7,436.6,343.4,0.0,3,2012,307,780.0,3936.7,19.813549
13,2016-11-01,Brooklyn,7,3356.9,424.4,380.3,120.1,3,2016,307,924.8,4281.7,21.598898
21,2013-11-01,Brooklyn,7,3179.2,398.7,318.3,57.5,3,2013,307,774.5,3953.7,19.589246
23,2010-12-01,Brooklyn,7,2514.9,367.6,275.5,0.0,3,2010,307,643.1,3158.0,20.364155
26,2012-10-01,Brooklyn,7,3287.2,387.9,285.8,0.0,3,2012,307,673.7,3960.9,17.008761
29,2013-10-01,Brooklyn,7,3341.4,414.6,319.1,38.2,3,2013,307,771.9,4113.3,18.765954
35,2012-04-01,Brooklyn,7,3129.2,390.2,280.4,0.0,3,2012,307,670.6,3799.8,17.648297
38,2011-11-01,Brooklyn,7,3465.7,407.7,288.1,0.0,3,2011,307,695.8,4161.5,16.719933
39,2009-11-01,Brooklyn,7,3211.7,399.6,302.8,0.0,3,2009,307,702.4,3914.1,17.945377


In [72]:
decade = df.groupby('year').total_waste.sum().reset_index()
decade = decade.merge(df.groupby('year').total_recycled.sum().reset_index(), on='year')
decade.to_clipboard(header=True, index=False)