In [None]:
%%capture
import numpy as np
import pandas as pd
import geopandas as gpd
import json

import plotly.express as px
!pip3 install stylecloud
import stylecloud

In [None]:
trees = pd.read_csv('data/trees.csv')
neighborhoods = gpd.read_file('data/nta.shp')

<div align="center"> 
    <h1 align="center"> 
        Manhattan's Future Greenery:<br> 
        Embracing Optimal Trees  
    </h1> 
</div>

Have you ever been to Manhattan and noticed the abundance of trees that line its streets? While it may seem that there are many different types of trees, it turns out that certain species are more common than others. These trees provide much-needed shade on scorching hot summer days while also helping to clean the air by filtering pollutants from cars and other sources of air pollution. Furthermore, these urban trees are relatively low maintenance compared to other species meaning they require less effort from caretakers while still offering numerous benefits for both people and wildlife alike. 


![manhattan_2](pictures/manhattan_2.png)


Let's take a look at the tree species in Manhattan, their characteristics, benefits, health indicators and location, both with respect to the curb and the neighborhoods. The urban design team believes tree size (using trunk diameter as a proxy for size) and health are the most desirable characteristics of city trees. The city would like to learn more about which tree species are the best choice to plant on the streets of Manhattan.

The aim of this exploration is to:
 - characterise the dataset;
 - identify most common tree species in Manhattan;
 - select neighborhoods with the most trees;
 - visualize trees' location in Manhattan's neighborhood; and
 - pinpoint 10 most suitable tree species suitable for future planting based on health parameters.

## Data

The team has provided access to the 2015 tree census and geographical information on New York City neighborhoods ([trees](https://data.cityofnewyork.us/Environment/2015-Street-Tree-Census-Tree-Data/uvpi-gqnh), [neighborhoods](https://data.cityofnewyork.us/City-Government/NTA-map/d3qk-pfyz)):

**Tree Census**
- "tree_id" - Unique id of each tree.
- "tree_dbh" - The diameter of the tree in inches measured at 54 inches above the ground.
- "curb_loc" - Location of the tree bed in relation to the curb. Either along the curb (OnCurb) or offset from the curb (OffsetFromCurb).
- "spc_common" - Common name for the species.
- "status" - Indicates whether the tree is alive or standing dead.
- "health" - Indication of the tree's health (Good, Fair, and Poor).
- "root_stone" - Indicates the presence of a root problem caused by paving stones in the tree bed.
- "root_grate" - Indicates the presence of a root problem caused by metal grates in the tree bed.
- "root_other" - Indicates the presence of other root problems.
- "trunk_wire" - Indicates the presence of a trunk problem caused by wires or rope wrapped around the trunk.
- "trnk_light" - Indicates the presence of a trunk problem caused by lighting installed on the tree.
- "trnk_other" - Indicates the presence of other trunk problems.
- "brch_light" - Indicates the presence of a branch problem caused by lights or wires in the branches.
- "brch_shoe" - Indicates the presence of a branch problem caused by shoes in the branches.
- "brch_other" - Indicates the presence of other branch problems.
- "postcode" - Five-digit zip code where the tree is located.
- "nta" - Neighborhood Tabulation Area (NTA) code from the 2010 US Census for the tree.
- "nta_name" - Neighborhood name.
- "latitude" - Latitude of the tree, in decimal degrees.
- "longitude" - Longitude of the tree, in decimal degrees.

**Neighborhoods' Geographical Information**
- "ntacode" - NTA code (matches Tree Census information).
- "ntaname" - Neighborhood name (matches Tree Census information).
- "geometry" - Polygon that defines the neighborhood.

_Tree census and neighborhood information from the City of New York [NYC Open Data](https://opendata.cityofnewyork.us/data/)._

In [None]:
tree_boro = (
    trees
    .rename(columns={'nta': 'ntacode', 'nta_name': 'ntaname'})
    .merge(neighborhoods, on=['ntacode', 'ntaname'], validate="m:1")
)

## Descriptive Analysis

In [None]:
print(f'There are {len(tree_boro)} items in the dataset. Considering that there are also {len(tree_boro.tree_id.drop_duplicates())} unique tree IDs, the number of items in the dataset corresponds to individual unique trees.')

In the dataset with total count of 64,229 trees in Manhattan, 1,802 are dead, as evidenced in Fig. 1. Those dead trees do not have any assigned tree species (except 1 that is honeylocust) nor do they have an indicator of health status. These trees can be excluded from the basic analysis for now. However, they can provide us with clues on what causes trees to die later.

In [None]:
alive_dead = (
    tree_boro.groupby(['status'])[['status']].count()
    .rename(columns={'status':'Count of Trees'})
    .reset_index()
    .sort_values('Count of Trees', ascending=False)
)

In [None]:
fig_0 = px.bar(
    alive_dead, 
    x='status', 
    y='Count of Trees', 
    color='status',
    title='Fig. 1: Count of Alive vs Dead Trees in Manhattan',
    labels={'status': 'Status'}
)
fig_0.show()

In [None]:
tree_boro_filtered = tree_boro[tree_boro['status'] == 'Alive']

#### Tree Species

Fig. 2 shows a wordloud illustration of the living tree species present in the sample. It seems that honeylocust, pin oak, callery pear, sophora, ginkgo, japanese zelkova, littleleaf linden and london planetree are amongst the most common.

For example, london planetree has a rounded canopy which provides plenty of shade during summer months while still allowing light through during winter months when its leaves drop off. The Callery pear is another popular species often seen in Manhattan's streetscape due to its ability to thrive in urban settings with minimal maintenance required. Ginkgo biloba trees thrive in sunny conditions with adequate water drainage—making them ideal candidates for planting along sidewalks or other locations throughout the city.

In [None]:
text = '\n'.join(list(tree_boro_filtered['spc_common'].values)).replace(' ', '_').replace('\n', ' ')

In [None]:
stylecloud.gen_stylecloud(text=text,
                          icon_name='fas fa-tree',
                          palette='colorbrewer.diverging.Spectral_11',
                          background_color='white',
                          gradient='vertical',
                          collocations=False)

#### Fig. 2: Wordcloud of Individual Tree Species of Manhattan</b>

![wordcloud](pictures/wordcloud.png)

Indeed, the absolute count of living tree species in Fig. 3 shows that, by far, honeylocust is the most common tree species with 13,175 examples. Callery pear and ginkgo occupy the second and third place respectively with approximately half of the honeylocust count. From pin oak to littleleaf linden, the number of trees hovers around 4,ooo. American elm and American linden come last at 2,000 plants per species.

In [None]:
most_common_manhattan = (
    tree_boro_filtered
    .groupby('spc_common')[['tree_id']].count()
    .sort_values('tree_id', ascending=False)
    .rename_axis('Common Name of Tree Species')
    .rename(columns={'tree_id': 'Absolute Count of Trees'})
)

In [None]:
fig_1 = px.bar(
    most_common_manhattan[:10], 
    x=most_common_manhattan[:10].index, 
    y='Absolute Count of Trees', 
    title='Fig. 3: 10 Most Common Tree Species in Manhattan',
    labels={'index':'Common Name of Tree Species'}
)
fig_1.show()

#### Tree Diameter

When we look at Fig. 4, the histogram of living tree diameters at breast height irrespective of tree species, health and other parameters, the highest count is reached for 4 inches. There is a notable right skew in the data, meaning great number of trees reach much more than that, as can be seen in Fig. 8.

In [None]:
fig_2 = px.histogram(
    tree_boro_filtered['tree_dbh'],
    title='Fig. 4: Histogram of Tree Diameters At Breast Height [inch]',
    labels={'value': 'Tree Diameters At Breast Height'}
)
fig_2.show()

#### Tree Health

Three quarters of living trees are in good health while 18.4% are considered to be in fair  and 5.78% in poor condition.

In [None]:
health_count = (
    tree_boro_filtered.groupby('health')[['health']].count()
    .rename_axis('Health')
    .rename(columns={'health':'Count'})
    .reset_index()
)

In [None]:
fig_3 = px.pie(
    health_count, 
    values='Count', 
    names='Health', 
    title='Fig. 5: Fraction of Tree Health Statuses'
)
fig_3.show()

#### Tree Location

An overwhelming majority of tress (93.3%) are located on the curb.

In [None]:
curb_count = (
    tree_boro_filtered.groupby('curb_loc')[['curb_loc']].count()
    .rename_axis('Curb Location')
    .rename(columns={'curb_loc':'Count'})
    .reset_index()
)

In [None]:
fig_4 = px.pie(
    curb_count, 
    values='Count', 
    names='Curb Location', 
    title='Fig. 6: Fraction of Tree Curb Locations'
)
fig_4.show()

#### Tree Problems

Significant number of tree problems labelled as 'other' irrespective of tree part are present in the dataset while the most prevalent tree problem is the presence of paving stones in the tree bed.

In [None]:
tree_problem_count = (
    tree_boro_filtered[['root_stone', 'root_grate', 'root_other', 
                        'trunk_wire', 'trnk_light', 'trnk_other', 
                        'brch_light', 'brch_shoe', 'brch_other']]
    .pipe(lambda df: pd.melt(df, value_vars=['root_stone', 'root_grate', 'root_other', 
                                             'trunk_wire', 'trnk_light', 'trnk_other', 
                                             'brch_light', 'brch_shoe', 'brch_other']))
    .groupby(['variable', 'value'])[['value']].count()
    .rename_axis(index={'variable':'Tree Problem', 'value':'Presence'})
    .rename(columns={'value':'Count'})
    .reset_index()
)

In [None]:
fig_5 = px.bar(tree_problem_count, 
               x="Tree Problem", 
               y="Count",
               color='Presence', 
               title='Fig. 7: Count of Tree Problem Presence by Tree Problem Category')
fig_5.show()

#### Administrative Districts of Manhattan

In [None]:
print(f'There are {len(tree_boro.postcode.drop_duplicates())} unique postcodes in Manhattan while only {len(tree_boro.ntacode.drop_duplicates())} neighborhood tabulation area (NTA) codes are present. Manhattan is composed of {len(tree_boro.ntaname.drop_duplicates())} named neighborhoods (NTA name) and delineated into {len(tree_boro.geometry.drop_duplicates())} unique shapes. There are also {len(tree_boro.shape_area.drop_duplicates())} unique shape areas. Thus, each unique NTA code corresponds to a unique NTA name, shape, and shape area.')

In [None]:
admin_neigh = (
    tree_boro.groupby(['ntacode', 'ntaname', 'shape_area'])[['geometry']]
    .agg(lambda df: df.drop_duplicates())
)

<p align="center">
  <b>Tab. 1: Table of Manhattan Neighborhoods with Their Corresponding Name, Code, Area, and Shape</b>
</p>

In [None]:
admin_neigh

## Exploratory Analysis

Trees are an essential part of a healthy urban environment. Not only do they provide shade, shelter and cleaner air, but they can also reduce noise pollution, lower temperatures and increase property values. But how do we know if a tree is healthy in an urban environment like Manhattan? 

In order to determine the health of a tree, there are several indicators that need to be taken into account. These include factors such as leaf color and shape, bark texture and thickness, root systems and trunk diameter. By examining these aspects of a tree's physical makeup, it is possible to visualize how healthy an individual tree may be.

Additionally, looking at the environment around a tree can provide further insight into its overall health. Ambient air quality is one factor that has been identified as having an effect on the robustness of trees in urban areas like Manhattan. High levels of pollutants and particulate matter can damage leaves and stunt growth while also weakening their defenses against disease-carrying pests like aphids and scale insects. Furthermore, high temperatures can cause dehydration which can lead to poor performance during photosynthesis and oxygen production. Unfortunately, these indicators are not present in the data.

#### Is There An Association Between Tree Diameter, Health, And Location With Respect To Curb?

The diameter at brest height (DBH) in inches is associated with health status and curb location, with trees offset from curb of good health having the highest median value of 10 inches while trees on the curb of poor health measuring a median of 6 inches in diameter (Fig. 8). Both trees of good and fair health have statistically significantly lower DBH when they are located on the curb.

In [None]:
tree_dbh = (
    tree_boro_filtered[['health', 'curb_loc', 'tree_dbh']]
)

In [None]:
fig_6 = px.box(
    tree_dbh, 
    x='health', 
    y='tree_dbh', 
    color='curb_loc',
    title='Fig. 8: Median Living Tree Diameter At Brest Height per Health Status and Curb Location',
    category_orders={'health': ['Good', 'Fair', 'Poor']},
    labels={'health': 'Health', 'tree_dbh': 'Tree Diameter At Breast Height [inch]', 'curb_loc': 'Curb Location'},
    notched=True,
    range_y=[0, 35]
)
fig_6.show()

#### Do Tree Problems Depend On Health And Location?

Mean number of tree problems, which is the presence of a root, trunk or branch problem summed, is associated with health and location with respect to curb. In all health categories and in agreement with conclusions from Fig. 7, trees located on the curb show higher mean number of tree problems than those located off the curb. While 3.3 out of 10 trees in good health off the curb exhibit 1 problem, 7.1 and 8.2 trees of fair and poor health respectively show 1 problem. For trees in good health, the number of trees having 1 problem rises the most when on the curb, in particular, up to 4.7 out of 10 trees (Fig. 9). 

In [None]:
tree_problems = (
    tree_boro_filtered
    .replace({'Yes': 1, 'No': 0})
    .assign(tree_problems=lambda df: df[['root_stone', 'root_grate', 'root_other', 
                                 		 'trunk_wire', 'trnk_light', 'trnk_other', 
                                 		 'brch_light', 'brch_shoe', 'brch_other']].apply(np.sum, axis=1))
    .groupby(['health', 'curb_loc'])[['tree_problems']].mean()
    .reset_index()
)

In [None]:
fig_7 = px.bar(
    tree_problems, 
    x='health', 
    y='tree_problems', 
    color='curb_loc',
    title='Fig. 9: Mean Number of Tree Problems per Health',
    category_orders={'health': ['Good', 'Fair', 'Poor']},
    labels={'health': 'Health', 'tree_problems': 'Tree Problems', 'curb_loc':'Curb Location'},
    barmode='group'
)
fig_7.show()

In [None]:
weighted_health_dbh = (
    tree_boro_filtered
    .replace({'Good': 3, 'Fair': 2, 'Poor': 1})
    .groupby(['ntaname'])[['health','tree_dbh']].agg([np.mean, 'count'])
    .assign(Health=lambda df: df['health']['mean'] * (df['health']['count'] / 62427))
    .assign(Tree_DBH=lambda df: df['tree_dbh']['mean'] * (df['tree_dbh']['count'] / 62427))
    .drop(['health', 'tree_dbh'], axis=1)
    .reset_index()
)

#### Are Trees In Certain Manhattan Neighborhoods Healthier Than In Others?

First, let's look at which neighborhoods have the most amount of trees. Most of Harlem seems to occupy the first places.

In [None]:
boro_most_trees = (
    tree_boro
    .groupby(['ntaname','shape_area'])[['tree_id']].count()
    [:10]
    .sort_values('tree_id', ascending=False)
    .rename_axis(['New York Neighborhood Names','Neighborhood Area [feet\u00b2]'])
    .rename(columns={'tree_id': 'Count of Trees'})
    .reset_index()
    .assign(CountPerSquareFeet=lambda df: df['Count of Trees'] / df['Neighborhood Area [feet\u00b2]'])
    .drop('Neighborhood Area [feet\u00b2]', axis=1)
)

In [None]:
fig_8a = px.bar(
    boro_most_trees, 
    x='New York Neighborhood Names', 
    y='Count of Trees', 
    title='Fig. 10a: 10 Manhattan Neighborhoods with the Highest Absolute Number of Trees'
)
fig_8a.show()

However, when normalized per feet<sup>2</sup>, the sequence completely changes, except for Central Harlem South still keeping one of the top places while Gramercy and East Village jump ahead.

In [None]:
fig_8b = px.bar(
    boro_most_trees.sort_values(by='CountPerSquareFeet', ascending=False), 
    x='New York Neighborhood Names', 
    y='CountPerSquareFeet', 
    title='Fig. 10b: 10 Manhattan Neighborhoods with the Highest Count of Trees per Feet\u00b2',
    labels={'CountPerSquareFeet': 'Count Per Feet\u00b2'}
)
fig_8b.show()

To find out which neigborhoods have the healthiest trees, 'Good' health was assigned 3, 'Fair' 2, and 'Poor' 1. Subsequently, mean values for the neigborhood were multiplied by a weight, the percentage of trees they have within their borders out of all trees on Manhattan, in order to produce weighted mean. The top 10 results are shown below in Fig. 11.

In [None]:
weighted_health = weighted_health_dbh[['ntaname', 'Health']].sort_values(by='Health', ascending=False)[:10]

In [None]:
fig_9 = px.bar(
    weighted_health, 
    x='ntaname', 
    y='Health', 
    title='Fig. 11: 10 Manhattan Neighborhoods with Highest Weighted Mean Health of Trees',
    labels={'ntaname': 'Neighborhood Name', 'Health': 'Weighted Mean Health'}
)
fig_9.show()

Similar results were obtained for a weighted mean of tree diameter at breast heaight, the main indicator of good health. Except for 1 neighborhood, the top 10 is almost the same, albeit the sequence changed.

In [None]:
weighted_dbh = weighted_health_dbh[['ntaname', 'Tree_DBH']].sort_values(by='Tree_DBH', ascending=False)[:10]

In [None]:
fig_10 = px.bar(
    weighted_dbh, 
    x='ntaname', 
    y='Tree_DBH', 
    title='Fig. 12: 10 Manhattan Neighborhoods with Highest Weighted Mean DBH of Trees',
    labels={'ntaname': 'Neighborhood Name', 'Tree_DBH': 'Weighted Mean Tree DBH [inch]'}
)
fig_10.show()

Referencing back to how curb location is associated with indicators of health (Fig. 8 and Fig. 9), a plot of top 10 neighborhoods with highest count of trees off the curb normalized per feet<sup>2</sup> is shown below. It isn't the same as the top ten neighborhoods by other health measures (Fig. 11 and Fig. 12), but it shows that in some neighborhoods, the count of trees offset from curb makes up around 10% (Upper West Side) while in others (Cooper Village), it reaches more than 50%.

In [None]:
boro_curb_loc = (
    tree_boro_filtered.groupby(['ntaname', 'shape_area', 'curb_loc'])[['curb_loc']].count()
    .rename(columns={'curb_loc': 'Count'})
    .rename_axis(index={'ntaname':'Neighborhood Name', 'curb_loc':'Curb Location', 
                        'shape_area': 'Neighborhood Area [square feet]'})
    .reset_index()
    .assign(CountPerSquareFeet=lambda df: df['Count'] / df['Neighborhood Area [square feet]'])
    .sort_values(
        by=['Curb Location', 'CountPerSquareFeet'], 
        ascending=[True, False], 
        ignore_index=True)
    .loc[lambda row: row['Neighborhood Name'].isin(list(row.iloc[:10, 0].values))]
)

In [None]:
fig_11 = px.bar(boro_curb_loc, 
               x="Neighborhood Name", 
               y="CountPerSquareFeet", 
               color='Curb Location',
               title='Fig. 13: Top 10 Count of Trees Offset From Curb Per Feet\u00b2 in Manhattan Neighborhoods', 
               category_orders={'Neighborhood Name': 
                                list(boro_curb_loc['Neighborhood Name'].values),
                                'Curb Location': ['OnCurb', 'OffsetFromCurb']},               
               color_discrete_sequence=['grey', 'green'],
               labels={'CountPerSquareFeet':'Count Per Feet\u00b2'})
fig_11.show()

#### What Can Dead Trees Tell Us About Their Fate?

In comparison with living trees in Fig. 6, the pie chart shows that dead trees are slightly more frequently located on the curb.

In [None]:
dead_curb_loc = (
    tree_boro[tree_boro['status'] == 'Dead'].groupby('curb_loc')[['curb_loc']].count()
    .rename_axis('Curb Location')
    .rename(columns={'curb_loc':'Count'})
    .reset_index()
)

In [None]:
fig_12 = px.pie(
    dead_curb_loc, 
    values='Count', 
    names='Curb Location', 
    title='Fig. 14: Fraction of Dead Tree Curb Locations'
)
fig_12.show()

With respect to Fig. 13 depicting the top 10 neighborhoods with most trees offset from the curb per feet<sup>2</sup>, the top 10 neighborhoods with least amount of dead trees per feet<sup>2</sup> looks different in half of the items while 5 neighborhoods are the same, with Cooper Village still occupying the first place. Therefore, other factors play role in the tree death other than the location with respect to the curb, which are unfortunately unavailable in the data for dead trees.

In general, the trees in Manhattan are dying due to several factors including the urban heat island effect (that causes higher temperatures due to urbanization and human activities, leading to extreme weather events), soil compaction (there isn’t enough space for oxygen or water to move through soil freely, depriving them of essential nutrients and moisture), invasive species, and poor maintenance practices (trees require regular pruning and trimming in order to stay healthy).

In [None]:
boro_status = (
    tree_boro.groupby(['ntaname', 'shape_area','status'])[['status']].count()
    .rename(columns={'status':'Count'})
    .rename_axis(index={'ntaname':'Neighborhood Name', 
                        'status':'Status',
                        'shape_area':'Neighborhood Area [square feet]'})
    .reset_index()
    .assign(CountPerSquareFeet=lambda df: df['Count'] / df['Neighborhood Area [square feet]'])
    .sort_values(
        by=['Status', 'CountPerSquareFeet'], 
        ascending=[False, True], 
        ignore_index=True)
    .loc[lambda row: row['Neighborhood Name'].isin(list(row.iloc[:10, 0].values))]
)

In [None]:
fig_13 = px.bar(boro_status, 
               x="Neighborhood Name", 
               y="CountPerSquareFeet", 
               color='Status',
               title='Fig. 15: Top 10 Count of Dead vs Alive Per Feet\u00b2 in Manhattan Neighborhoods', 
               category_orders={'Neighborhood Name': 
                                list(boro_status['Neighborhood Name'].values),
                                'Status': ['Alive', 'Dead']},               
               color_discrete_sequence=['white', 'black'],
               labels={'CountPerSquareFeet':'Count Per Feet\u00b2'})
fig_13.show()

## Map Of Manhattan's Neighborhoods And Tree Locations

Let's take a closer look at how Manhattan's trees are scattered throughout the borough in Fig. 16. 

Interestingly enough, tree-lined streets are not as common in Manhattan as you might think. Tree distribution is based on geography—the further east or south you go in Manhattan, the fewer trees you will find and the lower health status they have. This phenomenon is attributed to differences in soil quality and density; for instance, lower-lying parts of the borough contain sandy soils which prevent roots from taking hold and growing deep enough for large tree growth. In addition, many areas near the waterfront have been paved over due to rising sea levels and coastal flooding concerns—which makes it difficult for new tree growth. 

South of Central Park, particularly where Broadway goes through Midtown and Union Square towards Lower Manhattan, both westward and eastward streets are almost devoid of trees. Of note, in especially pollutant-exposed areas like Lincoln Tunnel, Port Authority Bus Terminal, Madison Square Park or the surroundings of Trans-Manhattan Expressway, either dead or living trees of fair and poor quality occur. On the other hand, refering back to Fig. 11, 12, and 13, Upper East Side, Upper West Side and West Village appear to be locations where trees thrive, as well as Cooper Village, although small in side, having the largest proportion of trees off the curb per feet<sup>2</sup> and least amount of dead trees as per Fig. 15, perhaps due to its unique nature. 

In [None]:
trees.loc[trees['health'].isna(), 'health'] = 'Dead'
trees.loc[trees['spc_common'].isna(), 'spc_common'] = 'Not Known'

In [None]:
fig_14 = (
    px.scatter_mapbox(
        trees,
        title='Fig. 16: Location Of Trees By Health Within Neighborhood Borders of Manhattan',
        lat="latitude",
        lon="longitude",
        color="health",
        category_orders={'health':['Good', 'Fair', 'Poor', 'Dead']},
        color_discrete_sequence=['green', 'yellow', 'red', 'black'],
        hover_data=["nta_name", "spc_common", "curb_loc"],
        labels={'health': 'Health', 
                'spc_common':'Tree Species', 
                'curb_loc':'Curb Location', 
                'nta_name':'Neighborhood Name',
                'latitude':'Latitude',
                'longitude':'Longitude'},
    )
    .update_traces(marker={'size':4})
    .update_layout(
        mapbox={
            "style": "open-street-map",
            "zoom": 10,
            "layers": [
                {
                    "source": json.loads(neighborhoods.query('boroname == "Manhattan"').geometry.to_json()),
                    "type": "line",
                    "color": "grey",
                    "line": {"width": 2},
                }
            ],
        },
        margin={"l": 0, "r": 0, "t": 40, "b": 0},
    )
)
fig_14.show()

## What Ten Tree Species Would Be Best For The City To Plant In The Future?

With 8.6 million people calling it home, Manhattan is one of the most densely populated places in the world. This means that nature can often take a backseat to the hustle and bustle of city life. However, maintaining a healthy balance between urban life and nature is essential for keeping any city livable, especially in times of climate change. 

As previously mentinoed, the urban design team believes trunk diameter and health are the most desirable characteristics of city trees. Indeed, Fig. 8 shows and association of trunk diameter with health. Furthermore, tree diameter was also negatively correlated with location on the curb for 'Good' and 'Fair' health category. Additionally, Fig. 9 demonstrates that location on the curb is connected to more overall tree problem count for all health categories. Fig. 11 and Fig. 12 also depict that certain Manhattan neighborhoods have healthier trees with greater trunk diameter. 

So, which tree species should Manhattan plant? Let’s explore this question further. 

In [None]:
fig_15 = px.box(
    tree_boro_filtered, 
    x='spc_common', 
    y='tree_dbh', 
    color='curb_loc',
    title='Fig. 17: Median Living Tree Diameter At Brest Height per Tree Species and Curb Location',
    category_orders={'spc_common': most_common_manhattan.index},
    labels={'spc_common': 'Tree Species', 'tree_dbh': 'Median Tree Diameter At Breast Height [inch]', 'curb_loc': 'Curb Location'},
    notched=True,
    range_y=[0, 50],
    range_x=[-0.5, 19.5]
)
fig_15.show()

In [None]:
species_health = (
    tree_boro_filtered.replace({'Good': 3, 'Fair': 2, 'Poor': 1}).groupby(['spc_common','curb_loc'])[['health']].mean()
    .reset_index()
)

In [None]:
fig_16 = px.bar(
    species_health, 
    x='spc_common', 
    y='health', 
    color='curb_loc',
    title='Fig. 18: Mean Living Tree Health per Tree Species and Curb Location',
    category_orders={'spc_common': most_common_manhattan.index, 'curb_loc': ['OnCurb','OffsetFromCurb']},
    labels={'spc_common': 'Tree Species', 'health': 'Mean Health', 'curb_loc': 'Curb Location'},
    barmode='group',
    range_x=[-0.5, 19.5]
)
fig_16.show()

In [None]:
species_problems = (
    tree_boro_filtered
    .replace({'Yes': 1, 'No': 0})
    .assign(tree_problems=lambda df: df[['root_stone', 'root_grate', 'root_other', 
                                 		 'trunk_wire', 'trnk_light', 'trnk_other', 
                                 		 'brch_light', 'brch_shoe', 'brch_other']].apply(np.sum, axis=1))
    .groupby(['spc_common','curb_loc'])[['tree_problems']].mean()
    .reset_index()
)

In [None]:
fig_17 = px.bar(
    species_problems, 
    x='spc_common', 
    y='tree_problems', 
    color='curb_loc',
    title='Fig. 19: Mean Number of Tree Problems per Tree Species and Curb Location',
    category_orders={'spc_common': most_common_manhattan.index, 'curb_loc': ['OnCurb','OffsetFromCurb']},
    labels={'spc_common': 'Tree Species', 'tree_problems': 'Tree Problems', 'curb_loc': 'Curb Location'},
    barmode='group',
    range_x=[-0.5, 19.5]
)
fig_17.show()

## Conclusion

According to Fig. 17, 18, and 19, while picking out of 20 most common trees to ensure the highest validity, top 10 recommended tree species based on median tree DBH, mean health and mean number of tree problems include:
 - American elm that demonstrates the highest median tree DBH, mean health status and least amount of tree problems when offset from curb;
 - London planetree that closely follows American elm in all parameters;
 - pin oak, ginkgo, and Sophora, which reach median tree DBH of around 10 inches and also demonstrate more favorable outcomes when offset from curb;
 - honeylocust and green ash that reach similar tree DBH, health and number of tree problems;
 - callery pear; and
 - willow oak that surprisingly fares better when on the curb.

Generally speaking, the most common tree problem according to Fig. 7 is roots damaged by paving stones, which might explain why most of the trees exhibit better indicators when off the curb. Therefore, the solution might be to provide more space for the trees when they are on the curb (as Fig. 6 shows overwhelming majority of them are). Additionally, since different tree species have various requirements for soil, moisture, crown size etc., a look into which neighborhoods might be suitable for the selected trees is also recommended.