# NYC 2015 Street Tree Census
## Hackathon Project

### Lisa Hwang
#### February 3, 2020

New York City is home to many beautiful and old trees. The 2015-2016 Street Tree Census recorded a total of 666,134 street trees on 131,488 blocks in New York City, with the help of city staffers and many volunteers. 
- https://www.nycgovparks.org/trees/treescount
- https://data.cityofnewyork.us/Environment/2015-Street-Tree-Census-Tree-Data/uvpi-gqnh

I selected this dataset as part of a one-day hackthon exercise.

### Problem Statement
Given the data in the street tree census, can I predict tree health?

### Importing Libraries

In [1]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier

### Data Cleaning and EDA

I'll start by reading in the dataset which was downloaded from https://data.cityofnewyork.us/Environment/2015-Street-Tree-Census-Tree-Data/uvpi-gqnh.

In [2]:
# To be able to view all of the dataframe's columns
pd.set_option('display.max_columns', None)
df = pd.read_csv('2015_Street_Tree_Census_Data.csv')
df.head()

Unnamed: 0,tree_id,block_id,created_at,tree_dbh,stump_diam,curb_loc,status,health,spc_latin,spc_common,steward,guards,sidewalk,user_type,problems,root_stone,root_grate,root_other,trunk_wire,trnk_light,trnk_other,brch_light,brch_shoe,brch_other,address,postcode,zip_city,community board,borocode,borough,cncldist,st_assem,st_senate,nta,nta_name,boro_ct,state,latitude,longitude,x_sp,y_sp,council district,census tract,bin,bbl
0,180683,348711,08/27/2015,3,0,OnCurb,Alive,Fair,Acer rubrum,red maple,,,NoDamage,TreesCount Staff,,No,No,No,No,No,No,No,No,No,108-005 70 AVENUE,11375,Forest Hills,406,4,Queens,29,28,16,QN17,Forest Hills,4073900,New York,40.723092,-73.844215,1027431.148,202756.7687,29.0,739.0,4052307.0,4022210000.0
1,200540,315986,09/03/2015,21,0,OnCurb,Alive,Fair,Quercus palustris,pin oak,,,Damage,TreesCount Staff,Stones,Yes,No,No,No,No,No,No,No,No,147-074 7 AVENUE,11357,Whitestone,407,4,Queens,19,27,11,QN49,Whitestone,4097300,New York,40.794111,-73.818679,1034455.701,228644.8374,19.0,973.0,4101931.0,4044750000.0
2,204026,218365,09/05/2015,3,0,OnCurb,Alive,Good,Gleditsia triacanthos var. inermis,honeylocust,1or2,,Damage,Volunteer,,No,No,No,No,No,No,No,No,No,390 MORGAN AVENUE,11211,Brooklyn,301,3,Brooklyn,34,50,18,BK90,East Williamsburg,3044900,New York,40.717581,-73.936608,1001822.831,200716.8913,34.0,449.0,3338310.0,3028870000.0
3,204337,217969,09/05/2015,10,0,OnCurb,Alive,Good,Gleditsia triacanthos var. inermis,honeylocust,,,Damage,Volunteer,Stones,Yes,No,No,No,No,No,No,No,No,1027 GRAND STREET,11211,Brooklyn,301,3,Brooklyn,34,53,18,BK90,East Williamsburg,3044900,New York,40.713537,-73.934456,1002420.358,199244.2531,34.0,449.0,3338342.0,3029250000.0
4,189565,223043,08/30/2015,21,0,OnCurb,Alive,Good,Tilia americana,American linden,,,Damage,Volunteer,Stones,Yes,No,No,No,No,No,No,No,No,603 6 STREET,11215,Brooklyn,306,3,Brooklyn,39,44,21,BK37,Park Slope-Gowanus,3016500,New York,40.666778,-73.975979,990913.775,182202.426,39.0,165.0,3025654.0,3010850000.0


In [3]:
df.describe()

Unnamed: 0,tree_id,block_id,tree_dbh,stump_diam,postcode,community board,borocode,cncldist,st_assem,st_senate,boro_ct,latitude,longitude,x_sp,y_sp,council district,census tract,bin,bbl
count,683788.0,683788.0,683788.0,683788.0,683788.0,683788.0,683788.0,683788.0,683788.0,683788.0,683788.0,683788.0,683788.0,683788.0,683788.0,677269.0,677269.0,674229.0,674229.0
mean,365205.011085,313793.096236,11.279787,0.432463,10916.246044,343.505404,3.3585,29.943181,50.791583,20.615781,3404914.0,40.701261,-73.92406,1005280.0,194798.424625,30.02733,11957.368422,3495439.0,3413414000.0
std,208122.092902,114839.024312,8.723042,3.290241,651.553364,115.740601,1.166746,14.328531,18.96652,7.390844,1175863.0,0.090311,0.123583,34285.05,32902.061114,14.301717,30745.739811,1193275.0,1174892000.0
min,3.0,100002.0,0.0,0.0,83.0,101.0,1.0,1.0,23.0,10.0,1000201.0,40.498466,-74.254965,913349.3,120973.7922,1.0,1.0,1000000.0,0.0
25%,186582.75,221556.0,4.0,0.0,10451.0,302.0,3.0,19.0,33.0,14.0,3011700.0,40.631928,-73.9805,989657.8,169515.1537,19.0,202.0,3031991.0,3011240000.0
50%,366214.5,319967.0,9.0,0.0,11214.0,402.0,4.0,30.0,52.0,21.0,4008100.0,40.700612,-73.912911,1008386.0,194560.2525,30.0,516.0,4020352.0,4008560000.0
75%,546170.25,404624.0,16.0,0.0,11365.0,412.0,4.0,43.0,64.0,25.0,4103202.0,40.762228,-73.83491,1029991.0,217019.57195,43.0,1417.0,4263123.0,4105700000.0
max,722694.0,999999.0,450.0,140.0,11697.0,503.0,5.0,51.0,87.0,36.0,5032300.0,40.912918,-73.700488,1067248.0,271894.0921,51.0,157903.0,5515124.0,5080500000.0


In [4]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 683788 entries, 0 to 683787
Data columns (total 45 columns):
tree_id             683788 non-null int64
block_id            683788 non-null int64
created_at          683788 non-null object
tree_dbh            683788 non-null int64
stump_diam          683788 non-null int64
curb_loc            683788 non-null object
status              683788 non-null object
health              652172 non-null object
spc_latin           652169 non-null object
spc_common          652169 non-null object
steward             652173 non-null object
guards              652172 non-null object
sidewalk            652172 non-null object
user_type           683788 non-null object
problems            652124 non-null object
root_stone          683788 non-null object
root_grate          683788 non-null object
root_other          683788 non-null object
trunk_wire          683788 non-null object
trnk_light          683788 non-null object
trnk_other          683788 non-nu

In [5]:
df['status'].value_counts()

Alive    652173
Stump     17654
Dead      13961
Name: status, dtype: int64

In [6]:
df['health'].groupby(df['status']).value_counts()

status  health
Alive   Good      528850
        Fair       96504
        Poor       26818
Name: health, dtype: int64

Only trees that are ```Alive``` have a ```status``` of ```Good```, ```Fair```, or ```Poor```.

#### Review of Nulls

In [7]:
df.isnull().sum()

tree_id                 0
block_id                0
created_at              0
tree_dbh                0
stump_diam              0
curb_loc                0
status                  0
health              31616
spc_latin           31619
spc_common          31619
steward             31615
guards              31616
sidewalk            31616
user_type               0
problems            31664
root_stone              0
root_grate              0
root_other              0
trunk_wire              0
trnk_light              0
trnk_other              0
brch_light              0
brch_shoe               0
brch_other              0
address                 0
postcode                0
zip_city                0
community board         0
borocode                0
borough                 0
cncldist                0
st_assem                0
st_senate               0
nta                     0
nta_name                0
boro_ct                 0
state                   0
latitude                0
longitude   

The data dictionaries at https://data.cityofnewyork.us/Environment/2015-Street-Tree-Census-Tree-Data/uvpi-gqnhwas and https://data.cityofnewyork.us/api/views/uvpi-gqnh/files/8705bfd6-993c-40c5-8620-0c81191c7e25?download=true&filename=StreetTreeCensus2015TreesDataDictionary20161102.pdf were consulted.

- ```health``` (31616) Indicates the user's perception of tree health. Field left blank if the tree is dead or stump.
- ```spc_latin``` (31619) Scientific name for species, e.g. "Acer rubrum"
- ```spc_common``` (31619) Common name for species, e.g. "red maple"
- ```steward``` (31615) Indicates the number of unique signs of stewardship observed for this tree. Not recorded for stumps or dead trees.
- ```guards``` (31616) Indicates whether a guard is present, and if the user felt it was a helpful or harmful guard. Not recorded for dead trees and stumps.
- ```sidewalk``` (31616) Indicates whether one of the sidewalk flags immediately adjacent to the tree was damaged, cracked, or lifted. Not recorded for dead trees and stumps.
- ```problems``` (31664) Reviewing the below, specific problems were recorded as strings. There was no information in the data dictionary. Some problems were captured in other columns as root_stone, trnk_light, and brch_shoe to name a few. There were 232 different problems, with 'None' being the most common with 426,280.
- ```council district``` (6519) Captured under ```cncldist```
- ```census tract``` (6519) Captured under ``` boro_ct```
- ```bin``` (9559) Not in data dictionary, internal number?
- ```bbl``` (9559) Not in data dictionary, internal number?

In [8]:
# Looking at the different types of problems
df['problems'].value_counts()

None                                                              426280
Stones                                                             95673
BranchLights                                                       29452
Stones,BranchLights                                                17808
RootOther                                                          11418
                                                                   ...  
Stones,MetalGrates,RootOther,WiresRope                                 1
MetalGrates,WiresRope,TrunkLights,BranchLights,BranchOther             1
TrunkLights,TrunkOther,BranchOther                                     1
Stones,RootOther,WiresRope,TrunkLights,TrunkOther,BranchLights         1
Stones,TrunkOther,BranchLights,Sneakers,BranchOther                    1
Name: problems, Length: 232, dtype: int64

Now I am going to remove all the rows for trees that are dead or stumps. This should remove many of the nulls in the dataset.

In [9]:
df = df[df['status'] == 'Alive'].copy().reset_index(drop = True)

In [10]:
# Checking that there are only living trees in the dataset
df['status'].value_counts()

Alive    652173
Name: status, dtype: int64

The values for the ```health``` column are currently strings. I'll convert them to the following:
- 1 = Good
- 2 = Fair
- 3 = Poor

In [11]:
# Creating a dictionary for mapping purposes
health_status = {'Good': 1, 'Fair': 2, 'Poor': 3}
health_status

{'Good': 1, 'Fair': 2, 'Poor': 3}

In [12]:
# Making a new column called 'health_status'
df['health_status'] = df['health'].map(health_status)
df.head()

Unnamed: 0,tree_id,block_id,created_at,tree_dbh,stump_diam,curb_loc,status,health,spc_latin,spc_common,steward,guards,sidewalk,user_type,problems,root_stone,root_grate,root_other,trunk_wire,trnk_light,trnk_other,brch_light,brch_shoe,brch_other,address,postcode,zip_city,community board,borocode,borough,cncldist,st_assem,st_senate,nta,nta_name,boro_ct,state,latitude,longitude,x_sp,y_sp,council district,census tract,bin,bbl,health_status
0,180683,348711,08/27/2015,3,0,OnCurb,Alive,Fair,Acer rubrum,red maple,,,NoDamage,TreesCount Staff,,No,No,No,No,No,No,No,No,No,108-005 70 AVENUE,11375,Forest Hills,406,4,Queens,29,28,16,QN17,Forest Hills,4073900,New York,40.723092,-73.844215,1027431.148,202756.7687,29.0,739.0,4052307.0,4022210000.0,2.0
1,200540,315986,09/03/2015,21,0,OnCurb,Alive,Fair,Quercus palustris,pin oak,,,Damage,TreesCount Staff,Stones,Yes,No,No,No,No,No,No,No,No,147-074 7 AVENUE,11357,Whitestone,407,4,Queens,19,27,11,QN49,Whitestone,4097300,New York,40.794111,-73.818679,1034455.701,228644.8374,19.0,973.0,4101931.0,4044750000.0,2.0
2,204026,218365,09/05/2015,3,0,OnCurb,Alive,Good,Gleditsia triacanthos var. inermis,honeylocust,1or2,,Damage,Volunteer,,No,No,No,No,No,No,No,No,No,390 MORGAN AVENUE,11211,Brooklyn,301,3,Brooklyn,34,50,18,BK90,East Williamsburg,3044900,New York,40.717581,-73.936608,1001822.831,200716.8913,34.0,449.0,3338310.0,3028870000.0,1.0
3,204337,217969,09/05/2015,10,0,OnCurb,Alive,Good,Gleditsia triacanthos var. inermis,honeylocust,,,Damage,Volunteer,Stones,Yes,No,No,No,No,No,No,No,No,1027 GRAND STREET,11211,Brooklyn,301,3,Brooklyn,34,53,18,BK90,East Williamsburg,3044900,New York,40.713537,-73.934456,1002420.358,199244.2531,34.0,449.0,3338342.0,3029250000.0,1.0
4,189565,223043,08/30/2015,21,0,OnCurb,Alive,Good,Tilia americana,American linden,,,Damage,Volunteer,Stones,Yes,No,No,No,No,No,No,No,No,603 6 STREET,11215,Brooklyn,306,3,Brooklyn,39,44,21,BK37,Park Slope-Gowanus,3016500,New York,40.666778,-73.975979,990913.775,182202.426,39.0,165.0,3025654.0,3010850000.0,1.0


In [13]:
df['health_status'].value_counts()

1.0    528850
2.0     96504
3.0     26818
Name: health_status, dtype: int64

Since the type of tree may be important in predicting its health status, I will make dummy columns from ```spc_common```, making sure to drop the first column.

In [14]:
df['spc_common'].value_counts()

London planetree    87014
honeylocust         64263
Callery pear        58931
pin oak             53185
Norway maple        34189
                    ...  
black pine             37
pitch pine             33
Osage-orange           29
Scots pine             25
Virginia pine          10
Name: spc_common, Length: 132, dtype: int64

In [15]:
df = pd.get_dummies(df, columns=['spc_common'], prefix='name', drop_first = True)
df.head()

Unnamed: 0,tree_id,block_id,created_at,tree_dbh,stump_diam,curb_loc,status,health,spc_latin,steward,guards,sidewalk,user_type,problems,root_stone,root_grate,root_other,trunk_wire,trnk_light,trnk_other,brch_light,brch_shoe,brch_other,address,postcode,zip_city,community board,borocode,borough,cncldist,st_assem,st_senate,nta,nta_name,boro_ct,state,latitude,longitude,x_sp,y_sp,council district,census tract,bin,bbl,health_status,name_American beech,name_American elm,name_American hophornbeam,name_American hornbeam,name_American larch,name_American linden,name_Amur cork tree,name_Amur maackia,name_Amur maple,name_Atlantic white cedar,name_Atlas cedar,name_Callery pear,name_Chinese chestnut,name_Chinese elm,name_Chinese fringetree,name_Chinese tree lilac,name_Cornelian cherry,name_Douglas-fir,name_English oak,name_European alder,name_European beech,name_European hornbeam,name_Himalayan cedar,name_Japanese hornbeam,name_Japanese maple,name_Japanese snowbell,name_Japanese tree lilac,name_Japanese zelkova,name_Kentucky coffeetree,name_Kentucky yellowwood,name_London planetree,name_Norway maple,name_Norway spruce,name_Ohio buckeye,name_Oklahoma redbud,name_Osage-orange,name_Persian ironwood,name_Schumard's oak,name_Scots pine,name_Shantung maple,name_Siberian elm,name_Sophora,name_Turkish hazelnut,name_Virginia pine,name_arborvitae,name_ash,name_bald cypress,name_bigtooth aspen,name_black cherry,name_black locust,name_black maple,name_black oak,name_black pine,name_black walnut,name_blackgum,name_blue spruce,name_boxelder,name_bur oak,name_catalpa,name_cherry,name_cockspur hawthorn,name_common hackberry,name_crab apple,name_crepe myrtle,name_crimson king maple,name_cucumber magnolia,name_dawn redwood,name_eastern cottonwood,name_eastern hemlock,name_eastern redbud,name_eastern redcedar,name_empress tree,name_false cypress,name_flowering dogwood,name_ginkgo,name_golden raintree,name_green ash,name_hardy rubber tree,name_hawthorn,name_hedge maple,name_holly,name_honeylocust,name_horse chestnut,name_katsura tree,name_kousa dogwood,name_littleleaf linden,name_magnolia,name_maple,name_mimosa,name_mulberry,name_northern red oak,name_pagoda dogwood,name_paper birch,name_paperbark maple,name_pignut hickory,name_pin oak,name_pine,name_pitch pine,name_pond cypress,name_purple-leaf plum,name_quaking aspen,name_red horse chestnut,name_red maple,name_red pine,name_river birch,name_sassafras,name_sawtooth oak,name_scarlet oak,name_serviceberry,name_shingle oak,name_silver birch,name_silver linden,name_silver maple,name_smoketree,name_southern magnolia,name_southern red oak,name_spruce,name_sugar maple,name_swamp white oak,name_sweetgum,name_sycamore maple,name_tartar maple,name_tree of heaven,name_trident maple,name_tulip-poplar,name_two-winged silverbell,name_weeping willow,name_white ash,name_white oak,name_white pine,name_willow oak
0,180683,348711,08/27/2015,3,0,OnCurb,Alive,Fair,Acer rubrum,,,NoDamage,TreesCount Staff,,No,No,No,No,No,No,No,No,No,108-005 70 AVENUE,11375,Forest Hills,406,4,Queens,29,28,16,QN17,Forest Hills,4073900,New York,40.723092,-73.844215,1027431.148,202756.7687,29.0,739.0,4052307.0,4022210000.0,2.0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,200540,315986,09/03/2015,21,0,OnCurb,Alive,Fair,Quercus palustris,,,Damage,TreesCount Staff,Stones,Yes,No,No,No,No,No,No,No,No,147-074 7 AVENUE,11357,Whitestone,407,4,Queens,19,27,11,QN49,Whitestone,4097300,New York,40.794111,-73.818679,1034455.701,228644.8374,19.0,973.0,4101931.0,4044750000.0,2.0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,204026,218365,09/05/2015,3,0,OnCurb,Alive,Good,Gleditsia triacanthos var. inermis,1or2,,Damage,Volunteer,,No,No,No,No,No,No,No,No,No,390 MORGAN AVENUE,11211,Brooklyn,301,3,Brooklyn,34,50,18,BK90,East Williamsburg,3044900,New York,40.717581,-73.936608,1001822.831,200716.8913,34.0,449.0,3338310.0,3028870000.0,1.0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,204337,217969,09/05/2015,10,0,OnCurb,Alive,Good,Gleditsia triacanthos var. inermis,,,Damage,Volunteer,Stones,Yes,No,No,No,No,No,No,No,No,1027 GRAND STREET,11211,Brooklyn,301,3,Brooklyn,34,53,18,BK90,East Williamsburg,3044900,New York,40.713537,-73.934456,1002420.358,199244.2531,34.0,449.0,3338342.0,3029250000.0,1.0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,189565,223043,08/30/2015,21,0,OnCurb,Alive,Good,Tilia americana,,,Damage,Volunteer,Stones,Yes,No,No,No,No,No,No,No,No,603 6 STREET,11215,Brooklyn,306,3,Brooklyn,39,44,21,BK37,Park Slope-Gowanus,3016500,New York,40.666778,-73.975979,990913.775,182202.426,39.0,165.0,3025654.0,3010850000.0,1.0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


Now I'm going to recode a few more columns that I want to use in my model so that the values are no longer strings.

In [16]:
df['curb_loc'] = df['curb_loc'].map({'OnCurb': 1, 'OffsetFromCurb': 0})
df['steward'] = df['steward'].map({'1or2': 1, '3or4': 2, '4orMore': 3, 'None': 0})
df['guards'] = df['guards'].map({'Harmful': 2, 'Helpful': 3, 'Unsure': 1, 'None': 0})
df['sidewalk'] = df['sidewalk'].map({'Damage': 1, 'NoDamage': 0})
df['root_stone'] = df['root_stone'].map({'Yes': 1, 'No': 0})
df['root_grate'] = df['root_grate'].map({'Yes': 1, 'No': 0})
df['root_other'] = df['root_other'].map({'Yes': 1, 'No': 0})
df['trunk_wire'] = df['trunk_wire'].map({'Yes': 1, 'No': 0})
df['trnk_light'] = df['trnk_light'].map({'Yes': 1, 'No': 0})
df['trnk_other'] = df['trnk_other'].map({'Yes': 1, 'No': 0})
df['brch_light'] = df['brch_light'].map({'Yes': 1, 'No': 0})
df['brch_shoe'] = df['brch_shoe'].map({'Yes': 1, 'No': 0})
df['brch_other'] = df['brch_other'].map({'Yes': 1, 'No': 0})
df.head()

Unnamed: 0,tree_id,block_id,created_at,tree_dbh,stump_diam,curb_loc,status,health,spc_latin,steward,guards,sidewalk,user_type,problems,root_stone,root_grate,root_other,trunk_wire,trnk_light,trnk_other,brch_light,brch_shoe,brch_other,address,postcode,zip_city,community board,borocode,borough,cncldist,st_assem,st_senate,nta,nta_name,boro_ct,state,latitude,longitude,x_sp,y_sp,council district,census tract,bin,bbl,health_status,name_American beech,name_American elm,name_American hophornbeam,name_American hornbeam,name_American larch,name_American linden,name_Amur cork tree,name_Amur maackia,name_Amur maple,name_Atlantic white cedar,name_Atlas cedar,name_Callery pear,name_Chinese chestnut,name_Chinese elm,name_Chinese fringetree,name_Chinese tree lilac,name_Cornelian cherry,name_Douglas-fir,name_English oak,name_European alder,name_European beech,name_European hornbeam,name_Himalayan cedar,name_Japanese hornbeam,name_Japanese maple,name_Japanese snowbell,name_Japanese tree lilac,name_Japanese zelkova,name_Kentucky coffeetree,name_Kentucky yellowwood,name_London planetree,name_Norway maple,name_Norway spruce,name_Ohio buckeye,name_Oklahoma redbud,name_Osage-orange,name_Persian ironwood,name_Schumard's oak,name_Scots pine,name_Shantung maple,name_Siberian elm,name_Sophora,name_Turkish hazelnut,name_Virginia pine,name_arborvitae,name_ash,name_bald cypress,name_bigtooth aspen,name_black cherry,name_black locust,name_black maple,name_black oak,name_black pine,name_black walnut,name_blackgum,name_blue spruce,name_boxelder,name_bur oak,name_catalpa,name_cherry,name_cockspur hawthorn,name_common hackberry,name_crab apple,name_crepe myrtle,name_crimson king maple,name_cucumber magnolia,name_dawn redwood,name_eastern cottonwood,name_eastern hemlock,name_eastern redbud,name_eastern redcedar,name_empress tree,name_false cypress,name_flowering dogwood,name_ginkgo,name_golden raintree,name_green ash,name_hardy rubber tree,name_hawthorn,name_hedge maple,name_holly,name_honeylocust,name_horse chestnut,name_katsura tree,name_kousa dogwood,name_littleleaf linden,name_magnolia,name_maple,name_mimosa,name_mulberry,name_northern red oak,name_pagoda dogwood,name_paper birch,name_paperbark maple,name_pignut hickory,name_pin oak,name_pine,name_pitch pine,name_pond cypress,name_purple-leaf plum,name_quaking aspen,name_red horse chestnut,name_red maple,name_red pine,name_river birch,name_sassafras,name_sawtooth oak,name_scarlet oak,name_serviceberry,name_shingle oak,name_silver birch,name_silver linden,name_silver maple,name_smoketree,name_southern magnolia,name_southern red oak,name_spruce,name_sugar maple,name_swamp white oak,name_sweetgum,name_sycamore maple,name_tartar maple,name_tree of heaven,name_trident maple,name_tulip-poplar,name_two-winged silverbell,name_weeping willow,name_white ash,name_white oak,name_white pine,name_willow oak
0,180683,348711,08/27/2015,3,0,1,Alive,Fair,Acer rubrum,0,0.0,0.0,TreesCount Staff,,0,0,0,0,0,0,0,0,0,108-005 70 AVENUE,11375,Forest Hills,406,4,Queens,29,28,16,QN17,Forest Hills,4073900,New York,40.723092,-73.844215,1027431.148,202756.7687,29.0,739.0,4052307.0,4022210000.0,2.0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,200540,315986,09/03/2015,21,0,1,Alive,Fair,Quercus palustris,0,0.0,1.0,TreesCount Staff,Stones,1,0,0,0,0,0,0,0,0,147-074 7 AVENUE,11357,Whitestone,407,4,Queens,19,27,11,QN49,Whitestone,4097300,New York,40.794111,-73.818679,1034455.701,228644.8374,19.0,973.0,4101931.0,4044750000.0,2.0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,204026,218365,09/05/2015,3,0,1,Alive,Good,Gleditsia triacanthos var. inermis,1,0.0,1.0,Volunteer,,0,0,0,0,0,0,0,0,0,390 MORGAN AVENUE,11211,Brooklyn,301,3,Brooklyn,34,50,18,BK90,East Williamsburg,3044900,New York,40.717581,-73.936608,1001822.831,200716.8913,34.0,449.0,3338310.0,3028870000.0,1.0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,204337,217969,09/05/2015,10,0,1,Alive,Good,Gleditsia triacanthos var. inermis,0,0.0,1.0,Volunteer,Stones,1,0,0,0,0,0,0,0,0,1027 GRAND STREET,11211,Brooklyn,301,3,Brooklyn,34,53,18,BK90,East Williamsburg,3044900,New York,40.713537,-73.934456,1002420.358,199244.2531,34.0,449.0,3338342.0,3029250000.0,1.0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,189565,223043,08/30/2015,21,0,1,Alive,Good,Tilia americana,0,0.0,1.0,Volunteer,Stones,1,0,0,0,0,0,0,0,0,603 6 STREET,11215,Brooklyn,306,3,Brooklyn,39,44,21,BK37,Park Slope-Gowanus,3016500,New York,40.666778,-73.975979,990913.775,182202.426,39.0,165.0,3025654.0,3010850000.0,1.0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


In [17]:
pd.set_option('display.max_rows', None)  # To see all of the null rows
df.isnull().sum()

tree_id                          0
block_id                         0
created_at                       0
tree_dbh                         0
stump_diam                       0
curb_loc                         0
status                           0
health                           1
spc_latin                        5
steward                          0
guards                           1
sidewalk                         1
user_type                        0
problems                        49
root_stone                       0
root_grate                       0
root_other                       0
trunk_wire                       0
trnk_light                       0
trnk_other                       0
brch_light                       0
brch_shoe                        0
brch_other                       0
address                          0
postcode                         0
zip_city                         0
community board                  0
borocode                         0
borough             

I'll review the columns with nulls in them.

In [18]:
df.loc[df['health_status'].isnull()]

Unnamed: 0,tree_id,block_id,created_at,tree_dbh,stump_diam,curb_loc,status,health,spc_latin,steward,guards,sidewalk,user_type,problems,root_stone,root_grate,root_other,trunk_wire,trnk_light,trnk_other,brch_light,brch_shoe,brch_other,address,postcode,zip_city,community board,borocode,borough,cncldist,st_assem,st_senate,nta,nta_name,boro_ct,state,latitude,longitude,x_sp,y_sp,council district,census tract,bin,bbl,health_status,name_American beech,name_American elm,name_American hophornbeam,name_American hornbeam,name_American larch,name_American linden,name_Amur cork tree,name_Amur maackia,name_Amur maple,name_Atlantic white cedar,name_Atlas cedar,name_Callery pear,name_Chinese chestnut,name_Chinese elm,name_Chinese fringetree,name_Chinese tree lilac,name_Cornelian cherry,name_Douglas-fir,name_English oak,name_European alder,name_European beech,name_European hornbeam,name_Himalayan cedar,name_Japanese hornbeam,name_Japanese maple,name_Japanese snowbell,name_Japanese tree lilac,name_Japanese zelkova,name_Kentucky coffeetree,name_Kentucky yellowwood,name_London planetree,name_Norway maple,name_Norway spruce,name_Ohio buckeye,name_Oklahoma redbud,name_Osage-orange,name_Persian ironwood,name_Schumard's oak,name_Scots pine,name_Shantung maple,name_Siberian elm,name_Sophora,name_Turkish hazelnut,name_Virginia pine,name_arborvitae,name_ash,name_bald cypress,name_bigtooth aspen,name_black cherry,name_black locust,name_black maple,name_black oak,name_black pine,name_black walnut,name_blackgum,name_blue spruce,name_boxelder,name_bur oak,name_catalpa,name_cherry,name_cockspur hawthorn,name_common hackberry,name_crab apple,name_crepe myrtle,name_crimson king maple,name_cucumber magnolia,name_dawn redwood,name_eastern cottonwood,name_eastern hemlock,name_eastern redbud,name_eastern redcedar,name_empress tree,name_false cypress,name_flowering dogwood,name_ginkgo,name_golden raintree,name_green ash,name_hardy rubber tree,name_hawthorn,name_hedge maple,name_holly,name_honeylocust,name_horse chestnut,name_katsura tree,name_kousa dogwood,name_littleleaf linden,name_magnolia,name_maple,name_mimosa,name_mulberry,name_northern red oak,name_pagoda dogwood,name_paper birch,name_paperbark maple,name_pignut hickory,name_pin oak,name_pine,name_pitch pine,name_pond cypress,name_purple-leaf plum,name_quaking aspen,name_red horse chestnut,name_red maple,name_red pine,name_river birch,name_sassafras,name_sawtooth oak,name_scarlet oak,name_serviceberry,name_shingle oak,name_silver birch,name_silver linden,name_silver maple,name_smoketree,name_southern magnolia,name_southern red oak,name_spruce,name_sugar maple,name_swamp white oak,name_sweetgum,name_sycamore maple,name_tartar maple,name_tree of heaven,name_trident maple,name_tulip-poplar,name_two-winged silverbell,name_weeping willow,name_white ash,name_white oak,name_white pine,name_willow oak
31282,245041,413012,09/21/2015,16,0,1,Alive,,Fraxinus pennsylvanica,0,0.0,1.0,TreesCount Staff,,0,0,0,0,0,0,0,0,0,84 LUCILLE AVENUE,10309,Staten Island,503,5,Staten Island,51,62,24,SI32,Rossville-Woodrow,5020801,New York,40.548597,-74.216412,924106.8808,139219.632,51.0,20801.0,5086132.0,5070480000.0,,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


In [19]:
df.loc[df['guards'].isnull()]

Unnamed: 0,tree_id,block_id,created_at,tree_dbh,stump_diam,curb_loc,status,health,spc_latin,steward,guards,sidewalk,user_type,problems,root_stone,root_grate,root_other,trunk_wire,trnk_light,trnk_other,brch_light,brch_shoe,brch_other,address,postcode,zip_city,community board,borocode,borough,cncldist,st_assem,st_senate,nta,nta_name,boro_ct,state,latitude,longitude,x_sp,y_sp,council district,census tract,bin,bbl,health_status,name_American beech,name_American elm,name_American hophornbeam,name_American hornbeam,name_American larch,name_American linden,name_Amur cork tree,name_Amur maackia,name_Amur maple,name_Atlantic white cedar,name_Atlas cedar,name_Callery pear,name_Chinese chestnut,name_Chinese elm,name_Chinese fringetree,name_Chinese tree lilac,name_Cornelian cherry,name_Douglas-fir,name_English oak,name_European alder,name_European beech,name_European hornbeam,name_Himalayan cedar,name_Japanese hornbeam,name_Japanese maple,name_Japanese snowbell,name_Japanese tree lilac,name_Japanese zelkova,name_Kentucky coffeetree,name_Kentucky yellowwood,name_London planetree,name_Norway maple,name_Norway spruce,name_Ohio buckeye,name_Oklahoma redbud,name_Osage-orange,name_Persian ironwood,name_Schumard's oak,name_Scots pine,name_Shantung maple,name_Siberian elm,name_Sophora,name_Turkish hazelnut,name_Virginia pine,name_arborvitae,name_ash,name_bald cypress,name_bigtooth aspen,name_black cherry,name_black locust,name_black maple,name_black oak,name_black pine,name_black walnut,name_blackgum,name_blue spruce,name_boxelder,name_bur oak,name_catalpa,name_cherry,name_cockspur hawthorn,name_common hackberry,name_crab apple,name_crepe myrtle,name_crimson king maple,name_cucumber magnolia,name_dawn redwood,name_eastern cottonwood,name_eastern hemlock,name_eastern redbud,name_eastern redcedar,name_empress tree,name_false cypress,name_flowering dogwood,name_ginkgo,name_golden raintree,name_green ash,name_hardy rubber tree,name_hawthorn,name_hedge maple,name_holly,name_honeylocust,name_horse chestnut,name_katsura tree,name_kousa dogwood,name_littleleaf linden,name_magnolia,name_maple,name_mimosa,name_mulberry,name_northern red oak,name_pagoda dogwood,name_paper birch,name_paperbark maple,name_pignut hickory,name_pin oak,name_pine,name_pitch pine,name_pond cypress,name_purple-leaf plum,name_quaking aspen,name_red horse chestnut,name_red maple,name_red pine,name_river birch,name_sassafras,name_sawtooth oak,name_scarlet oak,name_serviceberry,name_shingle oak,name_silver birch,name_silver linden,name_silver maple,name_smoketree,name_southern magnolia,name_southern red oak,name_spruce,name_sugar maple,name_swamp white oak,name_sweetgum,name_sycamore maple,name_tartar maple,name_tree of heaven,name_trident maple,name_tulip-poplar,name_two-winged silverbell,name_weeping willow,name_white ash,name_white oak,name_white pine,name_willow oak
407647,630814,323764,07/18/2016,11,0,1,Alive,Poor,,0,,1.0,TreesCount Staff,,0,0,0,0,0,0,0,0,0,38-028 CEDAR LANE,11363,Little Neck,411,4,Queens,19,26,11,QN45,Douglas Manor-Douglaston-Little Neck,4148300,New York,40.771945,-73.750414,1053380.635,220615.7964,19.0,1483.0,4168471.0,4080630000.0,3.0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


In [20]:
df.loc[df['sidewalk'].isnull()]

Unnamed: 0,tree_id,block_id,created_at,tree_dbh,stump_diam,curb_loc,status,health,spc_latin,steward,guards,sidewalk,user_type,problems,root_stone,root_grate,root_other,trunk_wire,trnk_light,trnk_other,brch_light,brch_shoe,brch_other,address,postcode,zip_city,community board,borocode,borough,cncldist,st_assem,st_senate,nta,nta_name,boro_ct,state,latitude,longitude,x_sp,y_sp,council district,census tract,bin,bbl,health_status,name_American beech,name_American elm,name_American hophornbeam,name_American hornbeam,name_American larch,name_American linden,name_Amur cork tree,name_Amur maackia,name_Amur maple,name_Atlantic white cedar,name_Atlas cedar,name_Callery pear,name_Chinese chestnut,name_Chinese elm,name_Chinese fringetree,name_Chinese tree lilac,name_Cornelian cherry,name_Douglas-fir,name_English oak,name_European alder,name_European beech,name_European hornbeam,name_Himalayan cedar,name_Japanese hornbeam,name_Japanese maple,name_Japanese snowbell,name_Japanese tree lilac,name_Japanese zelkova,name_Kentucky coffeetree,name_Kentucky yellowwood,name_London planetree,name_Norway maple,name_Norway spruce,name_Ohio buckeye,name_Oklahoma redbud,name_Osage-orange,name_Persian ironwood,name_Schumard's oak,name_Scots pine,name_Shantung maple,name_Siberian elm,name_Sophora,name_Turkish hazelnut,name_Virginia pine,name_arborvitae,name_ash,name_bald cypress,name_bigtooth aspen,name_black cherry,name_black locust,name_black maple,name_black oak,name_black pine,name_black walnut,name_blackgum,name_blue spruce,name_boxelder,name_bur oak,name_catalpa,name_cherry,name_cockspur hawthorn,name_common hackberry,name_crab apple,name_crepe myrtle,name_crimson king maple,name_cucumber magnolia,name_dawn redwood,name_eastern cottonwood,name_eastern hemlock,name_eastern redbud,name_eastern redcedar,name_empress tree,name_false cypress,name_flowering dogwood,name_ginkgo,name_golden raintree,name_green ash,name_hardy rubber tree,name_hawthorn,name_hedge maple,name_holly,name_honeylocust,name_horse chestnut,name_katsura tree,name_kousa dogwood,name_littleleaf linden,name_magnolia,name_maple,name_mimosa,name_mulberry,name_northern red oak,name_pagoda dogwood,name_paper birch,name_paperbark maple,name_pignut hickory,name_pin oak,name_pine,name_pitch pine,name_pond cypress,name_purple-leaf plum,name_quaking aspen,name_red horse chestnut,name_red maple,name_red pine,name_river birch,name_sassafras,name_sawtooth oak,name_scarlet oak,name_serviceberry,name_shingle oak,name_silver birch,name_silver linden,name_silver maple,name_smoketree,name_southern magnolia,name_southern red oak,name_spruce,name_sugar maple,name_swamp white oak,name_sweetgum,name_sycamore maple,name_tartar maple,name_tree of heaven,name_trident maple,name_tulip-poplar,name_two-winged silverbell,name_weeping willow,name_white ash,name_white oak,name_white pine,name_willow oak
329915,540677,202468,12/29/2015,7,0,0,Alive,Good,Gleditsia triacanthos var. inermis,0,0.0,,TreesCount Staff,,0,0,0,0,0,0,0,0,0,1220 LIBERTY AVENUE,11208,Brooklyn,305,3,Brooklyn,37,54,19,QN56,Ozone Park,3118800,New York,40.67909,-73.864029,1021964.091,186716.496,37.0,1188.0,3094522.0,3042060000.0,1.0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


In [21]:
# Dropping the rows with nulls for 'health_status', 'guards', and 'sidewalk'
df.drop(df.index[[31282, 407647, 329915]], inplace = True)
df.loc[df['sidewalk'].isnull()]

Unnamed: 0,tree_id,block_id,created_at,tree_dbh,stump_diam,curb_loc,status,health,spc_latin,steward,guards,sidewalk,user_type,problems,root_stone,root_grate,root_other,trunk_wire,trnk_light,trnk_other,brch_light,brch_shoe,brch_other,address,postcode,zip_city,community board,borocode,borough,cncldist,st_assem,st_senate,nta,nta_name,boro_ct,state,latitude,longitude,x_sp,y_sp,council district,census tract,bin,bbl,health_status,name_American beech,name_American elm,name_American hophornbeam,name_American hornbeam,name_American larch,name_American linden,name_Amur cork tree,name_Amur maackia,name_Amur maple,name_Atlantic white cedar,name_Atlas cedar,name_Callery pear,name_Chinese chestnut,name_Chinese elm,name_Chinese fringetree,name_Chinese tree lilac,name_Cornelian cherry,name_Douglas-fir,name_English oak,name_European alder,name_European beech,name_European hornbeam,name_Himalayan cedar,name_Japanese hornbeam,name_Japanese maple,name_Japanese snowbell,name_Japanese tree lilac,name_Japanese zelkova,name_Kentucky coffeetree,name_Kentucky yellowwood,name_London planetree,name_Norway maple,name_Norway spruce,name_Ohio buckeye,name_Oklahoma redbud,name_Osage-orange,name_Persian ironwood,name_Schumard's oak,name_Scots pine,name_Shantung maple,name_Siberian elm,name_Sophora,name_Turkish hazelnut,name_Virginia pine,name_arborvitae,name_ash,name_bald cypress,name_bigtooth aspen,name_black cherry,name_black locust,name_black maple,name_black oak,name_black pine,name_black walnut,name_blackgum,name_blue spruce,name_boxelder,name_bur oak,name_catalpa,name_cherry,name_cockspur hawthorn,name_common hackberry,name_crab apple,name_crepe myrtle,name_crimson king maple,name_cucumber magnolia,name_dawn redwood,name_eastern cottonwood,name_eastern hemlock,name_eastern redbud,name_eastern redcedar,name_empress tree,name_false cypress,name_flowering dogwood,name_ginkgo,name_golden raintree,name_green ash,name_hardy rubber tree,name_hawthorn,name_hedge maple,name_holly,name_honeylocust,name_horse chestnut,name_katsura tree,name_kousa dogwood,name_littleleaf linden,name_magnolia,name_maple,name_mimosa,name_mulberry,name_northern red oak,name_pagoda dogwood,name_paper birch,name_paperbark maple,name_pignut hickory,name_pin oak,name_pine,name_pitch pine,name_pond cypress,name_purple-leaf plum,name_quaking aspen,name_red horse chestnut,name_red maple,name_red pine,name_river birch,name_sassafras,name_sawtooth oak,name_scarlet oak,name_serviceberry,name_shingle oak,name_silver birch,name_silver linden,name_silver maple,name_smoketree,name_southern magnolia,name_southern red oak,name_spruce,name_sugar maple,name_swamp white oak,name_sweetgum,name_sycamore maple,name_tartar maple,name_tree of heaven,name_trident maple,name_tulip-poplar,name_two-winged silverbell,name_weeping willow,name_white ash,name_white oak,name_white pine,name_willow oak


In [22]:
df.loc[df['council district'].isnull()].head()

Unnamed: 0,tree_id,block_id,created_at,tree_dbh,stump_diam,curb_loc,status,health,spc_latin,steward,guards,sidewalk,user_type,problems,root_stone,root_grate,root_other,trunk_wire,trnk_light,trnk_other,brch_light,brch_shoe,brch_other,address,postcode,zip_city,community board,borocode,borough,cncldist,st_assem,st_senate,nta,nta_name,boro_ct,state,latitude,longitude,x_sp,y_sp,council district,census tract,bin,bbl,health_status,name_American beech,name_American elm,name_American hophornbeam,name_American hornbeam,name_American larch,name_American linden,name_Amur cork tree,name_Amur maackia,name_Amur maple,name_Atlantic white cedar,name_Atlas cedar,name_Callery pear,name_Chinese chestnut,name_Chinese elm,name_Chinese fringetree,name_Chinese tree lilac,name_Cornelian cherry,name_Douglas-fir,name_English oak,name_European alder,name_European beech,name_European hornbeam,name_Himalayan cedar,name_Japanese hornbeam,name_Japanese maple,name_Japanese snowbell,name_Japanese tree lilac,name_Japanese zelkova,name_Kentucky coffeetree,name_Kentucky yellowwood,name_London planetree,name_Norway maple,name_Norway spruce,name_Ohio buckeye,name_Oklahoma redbud,name_Osage-orange,name_Persian ironwood,name_Schumard's oak,name_Scots pine,name_Shantung maple,name_Siberian elm,name_Sophora,name_Turkish hazelnut,name_Virginia pine,name_arborvitae,name_ash,name_bald cypress,name_bigtooth aspen,name_black cherry,name_black locust,name_black maple,name_black oak,name_black pine,name_black walnut,name_blackgum,name_blue spruce,name_boxelder,name_bur oak,name_catalpa,name_cherry,name_cockspur hawthorn,name_common hackberry,name_crab apple,name_crepe myrtle,name_crimson king maple,name_cucumber magnolia,name_dawn redwood,name_eastern cottonwood,name_eastern hemlock,name_eastern redbud,name_eastern redcedar,name_empress tree,name_false cypress,name_flowering dogwood,name_ginkgo,name_golden raintree,name_green ash,name_hardy rubber tree,name_hawthorn,name_hedge maple,name_holly,name_honeylocust,name_horse chestnut,name_katsura tree,name_kousa dogwood,name_littleleaf linden,name_magnolia,name_maple,name_mimosa,name_mulberry,name_northern red oak,name_pagoda dogwood,name_paper birch,name_paperbark maple,name_pignut hickory,name_pin oak,name_pine,name_pitch pine,name_pond cypress,name_purple-leaf plum,name_quaking aspen,name_red horse chestnut,name_red maple,name_red pine,name_river birch,name_sassafras,name_sawtooth oak,name_scarlet oak,name_serviceberry,name_shingle oak,name_silver birch,name_silver linden,name_silver maple,name_smoketree,name_southern magnolia,name_southern red oak,name_spruce,name_sugar maple,name_swamp white oak,name_sweetgum,name_sycamore maple,name_tartar maple,name_tree of heaven,name_trident maple,name_tulip-poplar,name_two-winged silverbell,name_weeping willow,name_white ash,name_white oak,name_white pine,name_willow oak
8,209610,407443,09/08/2015,6,0,1,Alive,Good,Gleditsia triacanthos var. inermis,0,0.0,0.0,TreesCount Staff,,0,0,0,0,0,0,0,0,0,65 JEROME AVENUE,10305,Staten Island,502,5,Staten Island,50,64,23,SI14,Grasmere-Arrochar-Ft. Wadsworth,5006400,New York,40.596579,-74.076255,963073.2,156635.5542,,,,,1.0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
66,198514,108318,09/02/2015,18,0,1,Alive,Good,Quercus palustris,0,0.0,0.0,Volunteer,Stones,1,0,0,0,0,0,0,0,0,1 MORNINGSIDE DRIVE,10025,New York,109,1,Manhattan,7,69,30,MN09,Morningside Heights,1019701,New York,40.802301,-73.96208,994748.4,231579.2036,,,,,1.0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
209,178606,105607,08/26/2015,7,0,1,Alive,Good,Gleditsia triacanthos var. inermis,1,3.0,0.0,Volunteer,,0,0,0,0,0,0,0,0,0,137 EAST 36 STREET,10016,New York,106,1,Manhattan,4,73,28,MN20,Murray Hill-Kips Bay,1008000,New York,40.747801,-73.97858,990185.2,211721.5855,,,,,1.0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
231,201590,338263,09/04/2015,2,0,0,Alive,Fair,Gleditsia triacanthos var. inermis,1,0.0,0.0,TreesCount Staff,,0,0,0,0,0,0,0,0,0,2-008 BEACH 108 STREET,11694,Rockaway Park,414,4,Queens,32,23,15,QN10,Breezy Point-Belle Harbor-Rockaway Park-Broad ...,4093800,New York,40.582117,-73.828646,1031847.0,151404.0424,,,,,2.0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
254,201593,338263,09/04/2015,2,0,0,Alive,Good,Gleditsia triacanthos var. inermis,1,0.0,0.0,TreesCount Staff,,0,0,0,0,0,0,0,0,0,2-008 BEACH 108 STREET,11694,Rockaway Park,414,4,Queens,32,23,15,QN10,Breezy Point-Belle Harbor-Rockaway Park-Broad ...,4093800,New York,40.582215,-73.82831,1031940.0,151439.7438,,,,,1.0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


In [23]:
df.loc[df['bin'].isnull()].head()

Unnamed: 0,tree_id,block_id,created_at,tree_dbh,stump_diam,curb_loc,status,health,spc_latin,steward,guards,sidewalk,user_type,problems,root_stone,root_grate,root_other,trunk_wire,trnk_light,trnk_other,brch_light,brch_shoe,brch_other,address,postcode,zip_city,community board,borocode,borough,cncldist,st_assem,st_senate,nta,nta_name,boro_ct,state,latitude,longitude,x_sp,y_sp,council district,census tract,bin,bbl,health_status,name_American beech,name_American elm,name_American hophornbeam,name_American hornbeam,name_American larch,name_American linden,name_Amur cork tree,name_Amur maackia,name_Amur maple,name_Atlantic white cedar,name_Atlas cedar,name_Callery pear,name_Chinese chestnut,name_Chinese elm,name_Chinese fringetree,name_Chinese tree lilac,name_Cornelian cherry,name_Douglas-fir,name_English oak,name_European alder,name_European beech,name_European hornbeam,name_Himalayan cedar,name_Japanese hornbeam,name_Japanese maple,name_Japanese snowbell,name_Japanese tree lilac,name_Japanese zelkova,name_Kentucky coffeetree,name_Kentucky yellowwood,name_London planetree,name_Norway maple,name_Norway spruce,name_Ohio buckeye,name_Oklahoma redbud,name_Osage-orange,name_Persian ironwood,name_Schumard's oak,name_Scots pine,name_Shantung maple,name_Siberian elm,name_Sophora,name_Turkish hazelnut,name_Virginia pine,name_arborvitae,name_ash,name_bald cypress,name_bigtooth aspen,name_black cherry,name_black locust,name_black maple,name_black oak,name_black pine,name_black walnut,name_blackgum,name_blue spruce,name_boxelder,name_bur oak,name_catalpa,name_cherry,name_cockspur hawthorn,name_common hackberry,name_crab apple,name_crepe myrtle,name_crimson king maple,name_cucumber magnolia,name_dawn redwood,name_eastern cottonwood,name_eastern hemlock,name_eastern redbud,name_eastern redcedar,name_empress tree,name_false cypress,name_flowering dogwood,name_ginkgo,name_golden raintree,name_green ash,name_hardy rubber tree,name_hawthorn,name_hedge maple,name_holly,name_honeylocust,name_horse chestnut,name_katsura tree,name_kousa dogwood,name_littleleaf linden,name_magnolia,name_maple,name_mimosa,name_mulberry,name_northern red oak,name_pagoda dogwood,name_paper birch,name_paperbark maple,name_pignut hickory,name_pin oak,name_pine,name_pitch pine,name_pond cypress,name_purple-leaf plum,name_quaking aspen,name_red horse chestnut,name_red maple,name_red pine,name_river birch,name_sassafras,name_sawtooth oak,name_scarlet oak,name_serviceberry,name_shingle oak,name_silver birch,name_silver linden,name_silver maple,name_smoketree,name_southern magnolia,name_southern red oak,name_spruce,name_sugar maple,name_swamp white oak,name_sweetgum,name_sycamore maple,name_tartar maple,name_tree of heaven,name_trident maple,name_tulip-poplar,name_two-winged silverbell,name_weeping willow,name_white ash,name_white oak,name_white pine,name_willow oak
8,209610,407443,09/08/2015,6,0,1,Alive,Good,Gleditsia triacanthos var. inermis,0,0.0,0.0,TreesCount Staff,,0,0,0,0,0,0,0,0,0,65 JEROME AVENUE,10305,Staten Island,502,5,Staten Island,50,64,23,SI14,Grasmere-Arrochar-Ft. Wadsworth,5006400,New York,40.596579,-74.076255,963073.2,156635.5542,,,,,1.0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
66,198514,108318,09/02/2015,18,0,1,Alive,Good,Quercus palustris,0,0.0,0.0,Volunteer,Stones,1,0,0,0,0,0,0,0,0,1 MORNINGSIDE DRIVE,10025,New York,109,1,Manhattan,7,69,30,MN09,Morningside Heights,1019701,New York,40.802301,-73.96208,994748.4,231579.2036,,,,,1.0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
209,178606,105607,08/26/2015,7,0,1,Alive,Good,Gleditsia triacanthos var. inermis,1,3.0,0.0,Volunteer,,0,0,0,0,0,0,0,0,0,137 EAST 36 STREET,10016,New York,106,1,Manhattan,4,73,28,MN20,Murray Hill-Kips Bay,1008000,New York,40.747801,-73.97858,990185.2,211721.5855,,,,,1.0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
231,201590,338263,09/04/2015,2,0,0,Alive,Fair,Gleditsia triacanthos var. inermis,1,0.0,0.0,TreesCount Staff,,0,0,0,0,0,0,0,0,0,2-008 BEACH 108 STREET,11694,Rockaway Park,414,4,Queens,32,23,15,QN10,Breezy Point-Belle Harbor-Rockaway Park-Broad ...,4093800,New York,40.582117,-73.828646,1031847.0,151404.0424,,,,,2.0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
254,201593,338263,09/04/2015,2,0,0,Alive,Good,Gleditsia triacanthos var. inermis,1,0.0,0.0,TreesCount Staff,,0,0,0,0,0,0,0,0,0,2-008 BEACH 108 STREET,11694,Rockaway Park,414,4,Queens,32,23,15,QN10,Breezy Point-Belle Harbor-Rockaway Park-Broad ...,4093800,New York,40.582215,-73.82831,1031940.0,151439.7438,,,,,1.0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


The following variables have many empty rows: 
- ```council district```: 6206
- ```census tract```: 6206
- ```bin```: 9103
- ```bbl```: 9103

Since it's not evident why there are missing values, I will delete them from the dataset.

In [24]:
df.dropna(inplace = True)

In [25]:
# Checking that all the null rows were removed:
df.isnull().sum().sum()

0

In [26]:
df.shape

(642961, 176)

Now there are 642,961 rows in the dataset and 176 columns.

### Defining Variables

For the predictor variables, first I'll create a list called ```columns_to_keep```.

In [27]:
columns_to_keep = ['tree_dbh', 
                   'curb_loc', 
                   'steward', 
                   'guards', 
                   'sidewalk', 
                   'root_stone', 
                   'root_grate', 
                   'root_other', 
                   'trunk_wire', 
                   'trnk_light', 
                   'trnk_other', 
                   'brch_light', 
                   'brch_shoe', 
                   'brch_other', 
                   'postcode', 
                   'community board', 
                   'borocode', 
                   'cncldist', 
                   'st_assem', 
                   'st_senate', 
                   'boro_ct', 
                   'latitude', 
                   'longitude', 
                   'x_sp', 
                   'y_sp', 
 'name_American beech',
 'name_American elm',
 'name_American hophornbeam',
 'name_American hornbeam',
 'name_American larch',
 'name_American linden',
 'name_Amur cork tree',
 'name_Amur maackia',
 'name_Amur maple',
 'name_Atlantic white cedar',
 'name_Atlas cedar',
 'name_Callery pear',
 'name_Chinese chestnut',
 'name_Chinese elm',
 'name_Chinese fringetree',
 'name_Chinese tree lilac',
 'name_Cornelian cherry',
 'name_Douglas-fir',
 'name_English oak',
 'name_European alder',
 'name_European beech',
 'name_European hornbeam',
 'name_Himalayan cedar',
 'name_Japanese hornbeam',
 'name_Japanese maple',
 'name_Japanese snowbell',
 'name_Japanese tree lilac',
 'name_Japanese zelkova',
 'name_Kentucky coffeetree',
 'name_Kentucky yellowwood',
 'name_London planetree',
 'name_Norway maple',
 'name_Norway spruce',
 'name_Ohio buckeye',
 'name_Oklahoma redbud',
 'name_Osage-orange',
 'name_Persian ironwood',
 "name_Schumard's oak",
 'name_Scots pine',
 'name_Shantung maple',
 'name_Siberian elm',
 'name_Sophora',
 'name_Turkish hazelnut',
 'name_Virginia pine',
 'name_arborvitae',
 'name_ash',
 'name_bald cypress',
 'name_bigtooth aspen',
 'name_black cherry',
 'name_black locust',
 'name_black maple',
 'name_black oak',
 'name_black pine',
 'name_black walnut',
 'name_blackgum',
 'name_blue spruce',
 'name_boxelder',
 'name_bur oak',
 'name_catalpa',
 'name_cherry',
 'name_cockspur hawthorn',
 'name_common hackberry',
 'name_crab apple',
 'name_crepe myrtle',
 'name_crimson king maple',
 'name_cucumber magnolia',
 'name_dawn redwood',
 'name_eastern cottonwood',
 'name_eastern hemlock',
 'name_eastern redbud',
 'name_eastern redcedar',
 'name_empress tree',
 'name_false cypress',
 'name_flowering dogwood',
 'name_ginkgo',
 'name_golden raintree',
 'name_green ash',
 'name_hardy rubber tree',
 'name_hawthorn',
 'name_hedge maple',
 'name_holly',
 'name_honeylocust',
 'name_horse chestnut',
 'name_katsura tree',
 'name_kousa dogwood',
 'name_littleleaf linden',
 'name_magnolia',
 'name_maple',
 'name_mimosa',
 'name_mulberry',
 'name_northern red oak',
 'name_pagoda dogwood',
 'name_paper birch',
 'name_paperbark maple',
 'name_pignut hickory',
 'name_pin oak',
 'name_pine',
 'name_pitch pine',
 'name_pond cypress',
 'name_purple-leaf plum',
 'name_quaking aspen',
 'name_red horse chestnut',
 'name_red maple',
 'name_red pine',
 'name_river birch',
 'name_sassafras',
 'name_sawtooth oak',
 'name_scarlet oak',
 'name_serviceberry',
 'name_shingle oak',
 'name_silver birch',
 'name_silver linden',
 'name_silver maple',
 'name_smoketree',
 'name_southern magnolia',
 'name_southern red oak',
 'name_spruce',
 'name_sugar maple',
 'name_swamp white oak',
 'name_sweetgum',
 'name_sycamore maple',
 'name_tartar maple',
 'name_tree of heaven',
 'name_trident maple',
 'name_tulip-poplar',
 'name_two-winged silverbell',
 'name_weeping willow',
 'name_white ash',
 'name_white oak',
 'name_white pine',
 'name_willow oak']

In [28]:
X = df[columns_to_keep]
y = df['health_status']

### Train/Test Split

In [29]:
X_train, X_test, y_train, y_test = train_test_split(X, y, stratify = y, random_state = 42)

### Baseline Accuracy

In [30]:
y_train.value_counts(normalize=True)

1.0    0.811362
2.0    0.147667
3.0    0.040971
Name: health_status, dtype: float64

I would have 81% accuracy if I were to predict good health of a tree for all trees.

### Modeling
#### Logistic regression model

In [31]:
lr = LogisticRegression()
lr.fit(X_train, y_train)
lr.score(X_train, y_train), lr.score(X_test, y_test)



(0.8113620339264236, 0.8113611337493234)

This model was equally accurate with both train and test data (```0.8114```). However, it did not improve over baseline.

#### Random Forest

In [32]:
rfc_1 = RandomForestClassifier(random_state = 42)
rfc_1.fit(X_train, y_train)
rfc_1.score(X_train, y_train), rfc_1.score(X_test, y_test) 



(0.9807971465306292, 0.8208857727648826)

This model did very well on train data (```0.9808```), but not as well on test data (```0.8209```), indicating overfitting. I'll try modifying the hyperparameters in case I can improve accuracy and reduce overfitting.

In [33]:
rfc_2 = RandomForestClassifier(n_estimators = 100, max_depth = 6, random_state = 42)
rfc_2.fit(X_train, y_train)
rfc_2.score(X_train, y_train), rfc_2.score(X_test, y_test)

(0.811418024967857, 0.8113984608780584)

Playing around with the n_estimators and max_depth values resulted in similar train and test scores (```0.8114```) and cutting down overfitting, though they were still around 81%.

#### Decision Tree

In [34]:
dt_1 = DecisionTreeClassifier(random_state = 42)
dt_1.fit(X_train, y_train)
dt_1.score(X_train, y_train), dt_1.score(X_test, y_test) 

(0.9999854838040728, 0.7587174398566638)

Like the first random forest, this decision tree did very well on train data (```1.0000```) but not as well on test data (```0.7587```), indicating overfitting. I'll try modifying the hyperparameters in case I can improve test accuracy and reduce overfitting.

In [35]:
dt_2 = DecisionTreeClassifier(max_depth = 5, 
                            min_samples_split = 10, 
                            min_samples_leaf = 3, 
                            random_state = 42)
dt_2.fit(X_train, y_train)
dt_2.score(X_train, y_train), dt_2.score(X_test, y_test) 

(0.8124652648168885, 0.812151224640882)

While I was able to increase the accuracy up to ```0.8122```, the test accuracy fell down to ```0.8125```. However, overfitting was eliminated.

### Conclusions
For this hackathon-style project, unfortunately I wasn't able to improve upon baseline accuracy of 81% with my logistic regression, random forests, and decision tree models for predicting tree health in NYC. At any rate. it was still a great dataset to explore! And who wouldn't love using random forests and decision trees on a tree dataset?