# Cleaning the Tree Census Data

**Author: Inga Silkworth <br>
Date: 08/10/2017**

## Introduction

Three dataframes are examined: New York City street tree census data from 1995, 2005, and 2015. The dataframes are cleaned and initial exploratory data analysis is performed. Some information relating to trees in NYC is added and initial observations are recorded.

## Get the Data and the Basic Info

### Resources for the 2015 census

The main page of 2015 census can be found here: https://www.nycgovparks.org/trees/treescount

>Number of Volunteers. The 2,241 volunteers is double the number that participated in 2006. Volunteers completed 34 percent of the census. Innovative Mapping Technology. The use of innovative geospatial technology and a strong quality review process has yielded an exceptionally accurate inventory of street trees.

They don't include trees planted on private property. https://www.nycgovparks.org/trees/treescount/past-censuses

A nice map with all NYC trees plotted for every street and marked with colors by species and circle size by diameter: 
https://tree-map.nycgovparks.org/ <br>

Benefits of trees: https://tree-map.nycgovparks.org/learn/benefits <br>
>Stormwater intercepted each year: 1,095,211,388 gallons Value: \$10,842,587.27 <br>
Energy conserved each year: 671,779,096 kWh Value: \$84,808,673.34 <br>
Air pollutants removed each year: 641 tons Value: \$6,700,060.27 <br>
Carbon dioxide reduced each year: 623,193 tons Value: \$4,162,900.81 <br>
Total Value of Annual Benefits \$110,677,149.92 <br> 

CO2 reduced each year numbers cannot possibly be right though, as one tree can only absorb ~40 lbs of CO2 a year. They used these equations http://www.itreetools.org/ Although they also count CO2 reduced by power plants because of lower AC usage because of tree shade. Still seems way too high. A tree can sequester a ton of CO2 in 40 years of its life, but that's not an anual measure. <br>

Percent change map from 1995 to 2015: https://www.nycgovparks.org/pagefiles/109/tree-census-population-change-lg__583319dce4432.gif <br>
There's already a map of street tree density by census tract https://www.nycgovparks.org/pagefiles/109/tree-census-density-lg__58330128a3d6b.jpg

In [1]:
import pandas as pd
import numpy as np
import warnings
warnings.filterwarnings('ignore')

In [2]:
df95 = pd.read_csv('../RawData/1995_Street_Tree_Census.csv')
df05 = pd.read_csv('../RawData/2005_Street_Tree_Census.csv')
df15 = pd.read_csv('../RawData/2015_Street_Tree_Census_-_Tree_Data.csv')

In [3]:
print(df95.info())
print(df05.info())
print(df15.info())

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 516989 entries, 0 to 516988
Data columns (total 27 columns):
RecordId              516989 non-null int64
Address               516989 non-null object
House_Number          516989 non-null object
Street                516989 non-null object
Zip_Original          516989 non-null int64
CB_Original           516989 non-null int64
Site                  516989 non-null object
Species               516989 non-null object
Diameter              516989 non-null int64
Condition             516989 non-null object
Wires                 516989 non-null object
Sidewalk_Condition    495716 non-null object
Support_Structure     516989 non-null object
Borough               516989 non-null object
X                     516989 non-null float64
Y                     516989 non-null float64
Longitude             516989 non-null float64
Latitude              516989 non-null float64
CB_New                516989 non-null int64
Zip_New               516989 non-nu

## Clean Up the Zip Codes

Don't include tress from neighborhoods such as Yonkers, Mt. Vernon, etc., since they are not part of New York City.

In [4]:
zips_to_remove = [10550, 10704, 10803, 11005, 11096, 11251, 11359, 11559, 11580, 83]

df95 = df95[~df95.Zip_New.isin(zips_to_remove)]
df05 = df05[~df05.zipcode.isin(zips_to_remove)]
df15 = df15[~df15.zipcode.isin(zips_to_remove)]

# in df95 change zip codes of some buildings to those of the nearby neighborhoods
# since they are not used in 05 and 15
df95.Zip_New = df95.Zip_New.replace(10103, 10019)
df95.Zip_New = df95.Zip_New.replace(10041, 10004)
df95.Zip_New = df95.Zip_New.replace(10119, 10001)
df95.Zip_New = df95.Zip_New.replace(10153, 10019)
df95.Zip_New = df95.Zip_New.replace(10162, 10075)
df95.Zip_New = df95.Zip_New.replace(10129, 10029)
df95.Zip_New = df95.Zip_New.replace(10112, 10020)
df95.Zip_New = df95.Zip_New.replace(10107, 10019)

# a building near Riverside Church
df15.zipcode = df15.zipcode.replace(10115, 10027)
# Laguardia airport
df15.zipcode = df15.zipcode.replace(11371, 11370)
# york college
df15.zipcode = df15.zipcode.replace(11451, 11433)
# there's no area information on 10281, so I'll change it to 10280
df15.zipcode = df15.zipcode.replace(10281, 10280)

# change the zip code of the world trade center to the currently used one
df95.Zip_new = df95.Zip_New.replace(10048, 10007)
df05.zipcode = df05.zipcode.replace(10048, 10007)
df15.zipcode = df15.zipcode.replace(10048, 10007)

**The 1995 (2005) dataset has 23,299 (8911) trees in zipcode 0. Ignore those trees in zip code plots** <br>
Either there are no trees on Roosevelt Island (zip = 10044) in 2015 or they didn't do the census there that year. In df95 they consider it part of Manhattan and technically it is. <br>
In 2015 dataset, there are 935 trees included from Central Park. Zip code 83 will be excluded for that reason. <br>
In 2005 dataset, zip code 10023 is used for the area of 10069 and zip code 11211 is used for 11249.

In [5]:
# Import areas for zip codes (areas are in sq. miles)
# I had to guesstimate the area for 11249 since I couldn't find it anywhere.
zip_areas = pd.read_csv('zip_code_areas.csv')
print(len(zip_areas))

186


## Chceck the Conditions of the Trees

In [6]:
print(df95.Condition.value_counts(), '\n', '************************')
print(df05.status.value_counts(), '\n', '************************')
print(df15.status.value_counts())
print(df15.health.value_counts())

Good              332562
Excellent         100286
Poor               38571
Planting Space     15231
Dead               12859
Unknown            10761
Stump               6087
Fair                 327
Shaft                303
Critical               2
Name: Condition, dtype: int64 
 ************************
Good         393318
Excellent    141582
Poor          49111
Dead           8118
Name: status, dtype: int64 
 ************************
Alive    650987
Stump     17640
Dead      13956
Name: status, dtype: int64
Good    527805
Fair     96396
Poor     26785
Name: health, dtype: int64


Could there be a jump in dead tree numbers in 2015 because of Hurricane Sandy? Sandy was in 2012. <br>
http://www.theepochtimes.com/n3/1328435-sandy-is-still-killing-nyc-trees/ <br>

>In the immediate aftermath of Sandy, almost 11,000 street trees and 9,000 park trees were destroyed. That’s $28 million in day-of-storm tree damages.

Although by the time the survey was taken in 2015, most of the dead trees might have been cleared out.
> In Brooklyn alone, 48,000 trees have been inspected, once in the summer of 2013, and again in the summer of 2014, resulting in the removal of more than 2,500 storm-impacted trees.

Maybe that's why there were 17,000 stumps in 2015. It would be interesting to plot the stumps and see if they're close to the coast or scattered everywhere around the city. 
* The updated evacuation map after hurricane Sandy https://www.huffingtonpost.com/2013/06/18/nyc-hurricane-evacuation-zones-map_n_3460565.html
* Hurrican Sandy inundation zones https://www.cdc.gov/mmwr/preview/mmwrhtml/mm6342a4.htm

**NOTE: 10,761 trees of unknown status from 1995 will be basically excluded from all of the plots.**

## Prepare the dataframe for dead trees

Three zip codes that are in the 95 and 05 dataframes, but not in the 15 (and thus the final dead tree dataset) are 0, 10044 (Roosevelt Island), and 11430 (JFK Airport).

There are 896 (176) dead trees in zip code 0 for 1995 (2005).

Sites that will help with the vis: 
* https://stackoverflow.com/questions/42408265/plot-new-york-neighborhoods-with-d3-js
* http://www.d3noob.org/2013/03/a-simple-d3js-map-explained.html

In [7]:
# save a dataset to plot dead trees later. Only want numbers of dead trees by zip codes for the three datasets
# and density per zip code
d_15 = df15.zipcode[df15.status.isin(['Dead', 'Stump'])].value_counts()
dead = pd.DataFrame(d_15.reset_index())
dead.columns = ['zip_code', 'count_15']
dead = pd.merge(dead, zip_areas, on='zip_code', how='left')

d_05 = df05.zipcode[df05.status.isin(['Dead'])].value_counts()
dead05 = pd.DataFrame(d_05.reset_index())
dead05.columns = ['zip_code', 'count_05']
dead = pd.merge(dead, dead05, on='zip_code', how='left')

d_95 = df95.Zip_New[df95.Condition.isin(['Dead', 'Stump', 'Shaft'])].value_counts()
dead95 = pd.DataFrame(d_95.reset_index())
dead95.columns = ['zip_code', 'count_95']
dead = pd.merge(dead, dead95, on='zip_code', how='left')

dead['density_15'] = dead['count_15'] / dead['area']
dead['density_05'] = dead['count_05'] / dead['area']
dead['density_95'] = dead['count_95'] / dead['area']

In [8]:
dead.to_csv('/Users/ingasilk/Projects/NYCTreeAnalysis/nyc-tree-census/dead_tree_zip_densities.csv', index=False)

## Keep only Alive Trees and Columns of Interest

Get a reduced dataframe for the main plots of live trees.

In [9]:
df95a = df95[df95.Condition.isin(['Good', 'Excellent', 'Poor', 'Fair', 'Critical'])]
df05a = df05[df05.status.isin(['Good', 'Excellent', 'Poor'])]
df15a = df15[df15.status == 'Alive']

# remove non-trees from 1995
df95a = df95a[df95a.Spc_Common != 'Hedge']
df95a = df95a[df95a.Spc_Common != 'Unknown Stump']
df95a = df95a[df95a.Spc_Common != 'Shrub']

In [10]:
df95as = df95a[['Borough', 'Zip_New', 'Spc_Common']]
df95as.columns = ['boroname', 'zipcode', 'spc_common']

df05as = df05a[['boroname', 'zipcode', 'spc_common']] 

df15as = df15a[['boroname', 'zipcode', 'spc_common']] 

## Check the numbers by Boroughs

In [11]:
print(df95as.boroname.value_counts())
print(df05as.boroname.value_counts())
print(df15as.boroname.value_counts())

Queens           205942
Brooklyn         107828
Staten Island     73158
Bronx             43212
Manhattan         41608
Name: boroname, dtype: int64
Queens       236345
Brooklyn     141308
5             83529
Bronx         58710
Manhattan     49223
5             14896
Name: boroname, dtype: int64
Queens           237812
Brooklyn         169652
Staten Island    101443
Bronx             80585
Manhattan         61495
Name: boroname, dtype: int64


In [12]:
# fix Staten Island Entries for 2005
df05as.boroname = df05as.boroname.replace('5', 'Staten Island')
df05as.boroname = df05as.boroname.replace(5, 'Staten Island')

## Make Tree Names the Same Across Datasets

In [13]:
df95as.spc_common = df95as.spc_common.map(lambda x: ' '.join(reversed(x.split(', '))).title())
df05as.spc_common = df05as.spc_common.map(lambda x: ' '.join(reversed(x.split(', '))).title())
df15as.spc_common = df15as.spc_common.map(lambda x: str(x).title())

old95 = ['Unknown Live Trees', 'Willow Species', 'Euro. Mountain-Ash', 'Golden-Chain Tree', 'Trumpet Tree Sp', 
         'S Goldenrain Tree', 'Norway-Cr Kng Maple', 'Callery-Aristo Pear', 'Red-Red Sunst Maple', 'Privet Species', 
         'Crabapple-Ind.Summer', 'Higan-Pendla Cherry', 'Eur. Smoke Tree', 'Crabapple-Harv. Gold', 
         'Norway-Schwed Maple', 'White-Aut Purpl Ash', 'Green-Mars Seed Ash', 'Red-Oct Glory Maple', 
         'Sugar-Grn Mtn Maple', 'Fla. Strangler Fig', 'Amer. Mountain-Ash', 'Holly Species','Honeylocust', 
         'American Arborvitae']
new95 = ['Unknown', 'Other Willow', 'European Mountain Ash', 'Golden Chain Tree', 'Trumpet Tree', 'Goldenrain Tree', 
         'Crimson King Maple', 'Callery Pear', 'Red Sunset Maple', 'Other Privet', 'Indian Summer Crabapple', 
         'Weeping Higan Cherry', 'Smoketree', 'Harvest Gold Crabapple', 'Schwedleri Maple', 'Autumn Purple White Ash', 
         'Green Ash', 'October Glory Red Maple', 'Sugar Maple', 'Florida Strangler Fig', 'American Mountain Ash', 
         'Holly', 'Honey Locust', 'Eastern Arborvitae']

old05 = ['Norway-Cr Kng Maple', 'Holly Species', 'Hickory', 'Willow Species', 'Silverbell', 'Dogwood Spp.', 
         'Maackia,Amur', 'Larch', 'American Mountainash', 'Golden-Chain Tree', 'Pondcypress', 'Juniper Spp.', 
         'Baldcypress Species', 'Willow ?', 'American Smoketree', 'Korean Mountainash', 
         'Japanese Falsecypress', 'Atlantic Whitecedar', 'Honeylocust']
new05 = ['Crimson King Maple', 'Holly', 'Other Hickory', 'Other Willow', 'Other Silverbell', 'Other Dogwood', 
         'Amur Maackia', 'Common Larch', 'American Mountain Ash', 'Golden Chain Tree', 'Pond Cypress', 'Other Juniper', 
         'Baldcypress', 'Other Willow', 'Smoketree', 'Korean Mountain Ash', 
         'Japanese False Cypress', 'Atlantic White Cedar', 'Honey Locust', 'Eastern Arborvitae']

old15 = ['Ash', 'Crab Apple', 'Tulip-Poplar', 'Douglas-Fir', 'Spruce', 'Littleleaf Linden', 'Schumard\'S Oak', 
         'Purple-Leaf Plum', '\'Schubert\' Chokecherry', 'Magnolia', 'American Larch', 'Common Hackberry', 'Maple', 
         'Serviceberry', 'Pine', 'Nan', 'Honeylocust', 'Cherry', 'Arborvitae', 'Sophora']
new15 = ['Other Ash', 'Crabapple', 'Tulip Tree', 'Douglas Fir', 'Other Spruce', 'Little Leaf Linden', 'Schumard Oak', 
         'Purpleleaf Plum', 'Shubert Chokecherry', 'Other Magnolia', 'Common Larch', 'Hackberry', 'Other Maple', 
         'Other Serviceberry', 'Other Pine', 'Unknown', 'Honey Locust', 'Other Cherry', 'Eastern Arborvitae', 
         'Japanese Pagoda Tree']

for i in range(len(old95)):
    df95as.spc_common.replace(old95[i], new95[i], inplace=True)

for i in range(len(old05)):
    df05as.spc_common.replace(old05[i], new05[i], inplace=True)
    
for i in range(len(old15)):
    df15as.spc_common.replace(old15[i], new15[i], inplace=True)

In [14]:
print(len(df95as[df95as.spc_common == 'Unknown']))
print(len(df05as[df05as.spc_common == 'Unknown']))
print(len(df15as[df15as.spc_common == 'Unknown']))

9300
17505
5


There are 9300 unknown trees in 1995, 17,505 unknown trees in 2005, and 5 unknown trees in 2015.

For 2015, I renamed Sophora to Japanese Pagoda Tree, since there are no Sophoras in the earlier years, and Sophora Japonica is Japanese Pagoda Tree.

## Get a Quick Look at Most Popular Species

From Epoch Times http://www.theepochtimes.com/n3/1328435-sandy-is-still-killing-nyc-trees/ 
>As of 2006, New York City had an estimated 2.6 million public trees—600,000 on the streets, 2 million in parks.

1995 dataset has the most variety of trees and 2015 the least, but it has the least number of unknowns. Volunteers could've been using an app to recognize trees. From https://www.nycgovparks.org/trees/treescount/about about the 2015 census:
>The TreeKIT mapping method and the accompanying mobile app are the foundation of TreesCount! 2015. NYC Parks chose TreeKIT for TreesCount! 2015 because it is easy to use and generates a representative map of the urban forest that places the tree exactly where it is located along the curb.

>This year, we trained New Yorkers to be expert tree counters by providing extensive training and tree guides to make sure that our voluntreers were confident and that measurements were as accurate as possible.

Why did 70,000 Norway Maples disappear in 20 years? And why were 30,000 Honeylocusts planted?
* Norway maples were diseased in 1996
* Since it's an invasive species, there's been an effort to eradicate it
* Honey locusts are resilient to flooding

http://www.nytimes.com/1996/06/02/nyregion/diseased-norway-maple-trees-leaving-some-streets-bare.html 1996 <br> 
http://www.nytimes.com/2002/06/30/nyregion/environment-unfortunately-these-maples-are-spreading.html 2002 <br> 
https://patch.com/new-york/tarrytown/the-norway-maple-new-york-s-ultimate-weed <br>
https://www.change.org/p/new-york-state-department-of-environmental-conservation-ban-the-norway-maple-in-new-york-3 <br>
https://www.nycgovparks.org/trees/treescount

Invasive species: <br>
* Norway Maple
* Tree of Heaven
* Russian Olive
* Smooth Buckthorn
* Black Locust

The Epoch Times:
>According to the latest estimate from 2005, over half of the trees on New York City streets belong to five species: London planetree, known for its camouflage-patterned bark; Norway maple with its low and bushy foliage; callery pear, which infamously smells like semen when it blooms; the thorny honey locust, and pin oak, with its incised leaves. Of these, the best performing trees post-Sandy are **honey locust, pin oak, and callery pear**. Expect to see more of them on the streets as replanting continues.

In [15]:
print(df95as.spc_common.value_counts())
print(df05as.spc_common.value_counts())
print(df15as.spc_common.value_counts())

Norway Maple               105393
London Planetree            85634
Pin Oak                     35713
Honey Locust                32231
Callery Pear                30253
Little Leaf Linden          25582
Silver Maple                21830
Red Maple                   17481
Green Ash                   17105
Sugar Maple                 14805
Ginkgo                      13057
Unknown                      9300
Japanese Pagoda Tree         8144
Sycamore Maple               7378
Sweetgum                     6512
Northern Red Oak             6415
American Elm                 5847
Japanese Zelkova             5494
Cornelian Cherry             2452
Tree Of Heaven               2356
Willow Oak                   1919
Other Pine                   1315
Eastern Hop Hornbeam         1062
Hackberry                     974
Horsechestnut                 840
Black Cherry                  796
Blackgum                      759
Other Maple                   757
Other Cherry                  710
Other Hawthorn

In [16]:
df15as.spc_common.unique()

array(['Red Maple', 'Pin Oak', 'Honey Locust', 'American Linden',
       'London Planetree', 'Ginkgo', 'Willow Oak', 'Sycamore Maple',
       'Amur Maple', 'Hedge Maple', 'American Elm', 'Other Ash',
       'Crabapple', 'Silver Maple', 'Turkish Hazelnut', 'Black Cherry',
       'Eastern Redcedar', 'Norway Maple', 'Tulip Tree', 'Sawtooth Oak',
       'Japanese Pagoda Tree', 'Swamp White Oak', 'Chinese Fringetree',
       'Southern Magnolia', 'Sweetgum', 'Callery Pear', 'Scarlet Oak',
       'Atlantic White Cedar', 'Black Oak', 'Japanese Zelkova',
       'White Oak', 'Ohio Buckeye', 'Northern Red Oak', 'Silver Linden',
       'Pignut Hickory', 'Kentucky Yellowwood', 'Mulberry', 'Douglas Fir',
       'Crepe Myrtle', 'Sassafras', 'Other Spruce', 'Chinese Elm',
       'Horse Chestnut', 'Holly', 'Little Leaf Linden', 'White Pine',
       'Blackgum', 'Japanese Tree Lilac', 'Hardy Rubber Tree', 'Green Ash',
       'English Oak', 'White Ash', 'Golden Raintree', 'Schumard Oak',
       'Siberian 

In [17]:
df05as.spc_common.unique()

array(['Callery Pear', 'London Planetree', 'Crimson King Maple',
       'Other Hawthorn', 'Norway Maple', 'Honey Locust',
       'Little Leaf Linden', 'Unknown', 'Ginkgo', 'Goldenrain Tree',
       'Flowering Dogwood', 'Sycamore Maple', 'Red Maple',
       'Purpleleaf Plum', 'Japanese Zelkova', 'Pin Oak', 'Green Ash',
       'Japanese Tree Lilac', 'Other Cherry', 'Amur Maple',
       'Japanese Pagoda Tree', 'Katsura Tree', 'Northern Red Oak',
       'American Elm', 'Chinese Elm', 'Silver Linden', 'White Oak',
       'American Linden', 'Swamp White Oak', 'Tree Of Heaven',
       'Dawn Redwood', 'Sawtooth Oak', 'Black Locust', 'Willow Oak',
       'American Hornbeam', 'Eastern Redbud', 'Kentucky Coffeetree',
       'Shubert Chokecherry', 'Silver Maple', 'Hackberry', 'Baldcypress',
       'European Hornbeam', 'Other Poplar', 'Hedge Maple', 'Mulberry',
       'Other Elm', 'Japanese Maple', 'Other Magnolia', 'Crabapple',
       'Other Oak', 'Siberian Elm', 'Sweetgum', 'Laurel Oak', 'Holly',

In [18]:
df95as.spc_common.unique()

array(['Norway Maple', 'Japanese Pagoda Tree', 'Pin Oak',
       'Little Leaf Linden', 'Green Ash', 'Honey Locust',
       'Eastern Hop Hornbeam', 'Ginkgo', 'American Elm', 'Other Cherry',
       'Callery Pear', 'London Planetree', 'Northern Red Oak',
       'Silver Maple', 'Japanese Zelkova', 'Tree Of Heaven',
       'Other Hawthorn', 'Cornelian Cherry', 'Other Birch', 'Willow Oak',
       'Apple', 'Unknown', 'Blackgum', 'Other Magnolia', 'Sycamore Maple',
       'Other Maple', 'Red Maple', 'Sugar Maple', 'Coast Redwood',
       'Other Fir', 'Other Linden', 'Other Oak', 'Hackberry', 'Sweetgum',
       'Other Willow', 'Other Elm', 'Other Serviceberry', 'Other Pine',
       'Ilex', 'Black Locust', 'Amur Corktree', 'American Beech',
       'Black Cherry', 'White Mulberry', 'Balsam Poplar', 'Mimosa',
       'Other Beech', 'Red Mulberry', 'Eastern Redcedar',
       'Atlantic White Cedar', 'Other Spruce', 'European Mountain Ash',
       'Leyland Cypress', 'Eastern Arborvitae', 'Eastern Redb

In [19]:
len(df95as[df95as.spc_common == 'Other Linden'])

463

## Prepare the Dataframes for Alive Trees

### Get numbers for total tree counts

In [20]:
print('Number of trees in 1995:', len(df95as))
print('Number of trees in 2005:', len(df05as))
print('Number of trees in 2015:', len(df15as))

Number of trees in 1995: 471748
Number of trees in 2005: 584011
Number of trees in 2015: 650987


### Get df for borough counts and densities

Get the file with borough areas adjusted by subtracting the area of the 10 biggest parks in NYC and LaGuardia, JFK airport land areas. Area information found at:
* https://en.wikipedia.org/wiki/Boroughs_of_New_York_City
* https://www.nycgovparks.org/about/faq
* https://en.wikipedia.org/wiki/LaGuardia_Airport
* https://www.panynj.gov/airports/jfk-facts-info.html

In [21]:
borough_areas = pd.read_csv('borough_areas.csv')

In [22]:
borough_areas

Unnamed: 0,borough,area,area_adjusted
0,Queens,109.0,97.0
1,Brooklyn,71.0,69.8
2,Staten Island,58.5,54.7
3,Bronx,42.0,34.8
4,Manhattan,22.8,21.5


In [23]:
boroughs = pd.DataFrame(df15as.boroname.value_counts().reset_index())
boroughs.columns = ['borough', 'count_15']
boroughs = pd.merge(boroughs, borough_areas, on='borough', how='left')

boroughs05 = pd.DataFrame(df05as.boroname.value_counts().reset_index())
boroughs05.columns = ['borough', 'count_05']
boroughs = pd.merge(boroughs, boroughs05, on='borough', how='left')

boroughs95 = pd.DataFrame(df95as.boroname.value_counts().reset_index())
boroughs95.columns = ['borough', 'count_95']
boroughs = pd.merge(boroughs, boroughs95, on='borough', how='left')

boroughs['density_15'] = boroughs['count_15'] / boroughs['area_adjusted']
boroughs['density_05'] = boroughs['count_05'] / boroughs['area_adjusted']
boroughs['density_95'] = boroughs['count_95'] / boroughs['area_adjusted']

In [24]:
boroughs

Unnamed: 0,borough,count_15,area,area_adjusted,count_05,count_95,density_15,density_05,density_95
0,Queens,237812,109.0,97.0,236345,205942,2451.670103,2436.546392,2123.113402
1,Brooklyn,169652,71.0,69.8,141308,107828,2430.544413,2024.469914,1544.813754
2,Staten Island,101443,58.5,54.7,98425,73158,1854.533821,1799.360146,1337.440585
3,Bronx,80585,42.0,34.8,58710,43212,2315.66092,1687.068966,1241.724138
4,Manhattan,61495,22.8,21.5,49223,41608,2860.232558,2289.44186,1935.255814


In [25]:
boroughs.to_csv('/Users/ingasilk/Projects/NYCTreeAnalysis/nyc-tree-census/alive_tree_borough_densities.csv', 
                index=False)

Manhattan is now the greenest neighborhood in NYC, but this wasn't the case 10 and 20 years ago. Queens used to be the queen of trees with Bronx at the bottom. Now Staten Island is the least lush borough.

### Get df for popular tree species

Get 15 most popular trees for the three datasets. The union of them is the list "pop_trees".

In [26]:
pop_trees = ['London Planetree', 'Honey Locust', 'Callery Pear', 'Pin Oak', 'Norway Maple', 'Little Leaf Linden',
             'Other Cherry', 'Japanese Zelkova', 'Ginkgo', 'Japanese Pagoda Tree', 'Red Maple', 'Green Ash',
             'American Linden', 'Silver Maple', 'Sweetgum', 'Northern Red Oak', 'Sugar Maple', 'Sycamore Maple']

In [27]:
populars = pd.DataFrame(df15as[df15as.spc_common.isin(pop_trees)].spc_common.value_counts().reset_index())
populars.columns = ['species', 'count_15']

populars05 = pd.DataFrame(df05as[df05as.spc_common.isin(pop_trees)].spc_common.value_counts().reset_index())
populars05.columns = ['species', 'count_05']
populars = pd.merge(populars, populars05, on='species', how='left')

populars95 = pd.DataFrame(df95as[df95as.spc_common.isin(pop_trees)].spc_common.value_counts().reset_index())
populars95.columns = ['species', 'count_95']
populars = pd.merge(populars, populars95, on='species', how='left')

populars

Unnamed: 0,species,count_15,count_05,count_95
0,London Planetree,86933,89327,85634
1,Honey Locust,64236,52060,32231
2,Callery Pear,58916,63372,30253
3,Pin Oak,52948,43772,35713
4,Norway Maple,34167,73990,105393
5,Little Leaf Linden,29692,27475,25582
6,Other Cherry,29272,9563,710
7,Japanese Zelkova,29195,14556,5494
8,Ginkgo,20879,16155,13057
9,Japanese Pagoda Tree,19333,7025,8144


In [28]:
populars.to_csv('/Users/ingasilk/Projects/NYCTreeAnalysis/nyc-tree-census/popular_alive_trees.csv', index=False)

### Get df for tree densities for each zip code

Three zip codes that are in 1995, 2005 datasets, but not in 2005 (and thus won't appear in the visualization) are 0 and 10044 (Roosevelt Island). There are 20,933 (8735) trees in zipcode 0 in 1995 (2005) dataset.

In [30]:
alive_zip = pd.DataFrame(df15as.zipcode.value_counts().reset_index())
alive_zip.columns = ['zip_code', 'count_15']
alive_zip = pd.merge(alive_zip, zip_areas, on='zip_code', how='left')

azip05 = pd.DataFrame(df05as.zipcode.value_counts().reset_index())
azip05.columns = ['zip_code', 'count_05']
alive_zip = pd.merge(alive_zip, azip05, on='zip_code', how='left')

azip95 = pd.DataFrame(df95as.zipcode.value_counts().reset_index())
azip95.columns = ['zip_code', 'count_95']
alive_zip = pd.merge(alive_zip, azip95, on='zip_code', how='left')

alive_zip['density_15'] = alive_zip['count_15'] / alive_zip['area']
alive_zip['density_05'] = alive_zip['count_05'] / alive_zip['area']
alive_zip['density_95'] = alive_zip['count_95'] / alive_zip['area']

In [31]:
alive_zip.to_csv('/Users/ingasilk/Projects/NYCTreeAnalysis/nyc-tree-census/alive_tree_zip_densities.csv', index=False)

## Plots I Want to Make

* Density of dead trees per zip code for the 3 years.
* Total number of trees for NYC from 3 dfs. Need 3 numbers of tree counts.
* Percent growth chart from 3 dfs for each borough. Need tree counts per borough for 3 dfs. Also add a column of area of boroughs and tree density for 3 years. This can be one dataframe.
* List of most popular trees for 3 years. Do value count for each year and save top 15: rank, name, and count
* Density of trees for each zip in 2015. Need zips, counts per zip, area, density.