## Uncovering the Blossoming Beauty of Vancouver: A Data Visualization of Flowering Cherry Trees Abundance
##### By: Nava
<hr>

In [1]:
from IPython.display import Image
from IPython.core.display import HTML 
Image(url= "https://www.treehugger.com/thmb/5bEXseQgaDRbM8jD1b8Sw9jvKS8=/1500x1000/filters:fill(auto,1)/GettyImages-1074095002-5c77fa0046e0fb00011bf26e.jpg", width=500, height=333)

*A name to rememember: This is a Kwanzan Cherry Tree*

### Introduction

Despite living outside of Vancouver my entire life, I have always enjoyed driving through different neighbourhoods in the city to admire its lush trees. In particular, I have cherished memories of watching the cherry trees bloom each spring.

For our analysis, we will be focusing on the trees of Vancouver, including the most abundant species of flowering cherry tree. 

The data set we will be using is called `small_unique_trees_vancouver.csv`. The data was obtained from The city of Vancouver's Open Data Portal and was wrangled by [UBC Data Visualzation](https://raw.githubusercontent.com/UBC-MDS/data_viz_wrangled/main/data/Trees_data_sets/small_unique_vancouver.csv).

The `small_unique_trees_vancouver.csv` dataset is composed of $20$ columns. Let's get a better understanding of what our columns are in our dataset. 

### [`small_unique_trees_vancouver.csv`](https://raw.githubusercontent.com/UBC-MDS/data_viz_wrangled/main/data/Trees_data_sets/small_unique_vancouver.csv): 

| Column                                  | Description                        |
|-----------------------------------------|:-----------------------------------|
| <font color='blue'>**std_name**</font>        | This is the street the tree is on in Vancouver |
| <font color='blue'>**on_street**</font> | This is the street the tree is on in Vancouver|
| <font color='blue'>**species_name**</font> | The scientific name of the species name|
| <font color='orange'>**neighbourhood_name**</font>        | The general area the tree is in |
| <font color='blue'>**date_planted**</font>        | The date the tree was planted |
| <font color='blue'>**diameter**</font>        | The diameter of the base of the tree |
| <font color='blue'>**street_side_name**</font>        | If the `street_name` is even or odd |
| <font color='blue'>**genus_name**</font>        | The genus name |
| <font color='blue'>**assigned**</font>        | If the tree is assigned or not |
| <font color='blue'>**civic_number**</font>        | The tree's civic number for records |
| <font color='blue'>**plant_area**</font>        | The plant area |
| <font color='blue'>**curb**</font>        | If it's planted on a curb |
| <font color='blue'>**tree_id**</font>        | The tree's ID |
| <font color='orange'>**common_name**</font>        | The tree's common name |
| <font color='blue'>**height_range**</font>        | A scale of height range |
| <font color='blue'>**on_street_block**</font>        | Which block the tree is on |
| <font color='blue'>**cultivar_name**</font>        | The name of the person who planted the tree |
| <font color='blue'>**root_barrier**</font>        | Whether or not the tree has a root barrier|
| <font color='orange'>**latitude**</font>        |  Latitude |
| <font color='orange'>**longitude**</font>        | Longitude |

                    Highlighted in orange will be the variables we will use for our analysis.

### Exploring the data


In [2]:
# Import libraries 

import pandas as pd
import altair as alt
alt.data_transformers.enable('default', max_rows=1000000)
from vega_datasets import data
import json

alt.data_transformers.enable("data_server")

DataTransformerRegistry.enable('data_server')

In [3]:
#Read in the data

van_trees_df = pd.read_csv("https://raw.githubusercontent.com/UBC-MDS/data_viz_wrangled/main/data/Trees_data_sets/small_unique_vancouver.csv")
van_trees_df.head()

Unnamed: 0.1,Unnamed: 0,std_street,on_street,species_name,neighbourhood_name,date_planted,diameter,street_side_name,genus_name,assigned,...,plant_area,curb,tree_id,common_name,height_range_id,on_street_block,cultivar_name,root_barrier,latitude,longitude
0,10747,W 20TH AV,W 20TH AV,PLATANOIDES,Riley Park,2000-02-23,28.5,EVEN,ACER,N,...,15,Y,21421,NORWAY MAPLE,4,0,,N,49.252711,-123.106323
1,12573,W 18TH AV,W 18TH AV,CALLERYANA,Arbutus-Ridge,1992-02-04,6.0,ODD,PYRUS,N,...,7,Y,129645,CHANTICLEER PEAR,2,2300,CHANTICLEER,N,49.25635,-123.158709
2,29676,ROSS ST,ROSS ST,NIGRA,Sunset,,12.0,ODD,PINUS,N,...,7,Y,154675,AUSTRIAN PINE,4,7800,,N,49.213486,-123.083254
3,8856,DOMAN ST,DOMAN ST,AMERICANA,Killarney,1999-11-12,11.0,EVEN,FRAXINUS,N,...,7,Y,180803,AUTUMN APPLAUSE ASH,4,6900,AUTUMN APPLAUSE,N,49.220839,-123.036721
4,21098,EAST BOULEVARD,EAST BOULEVARD,HIPPOCASTANUM,Shaughnessy,,15.5,ODD,AESCULUS,Y,...,N,Y,74364,COMMON HORSECHESTNUT,4,5200,,N,49.238514,-123.154958


Below we'll be exploring the data using **summary tables** to get a better understanding of the type of data we have. This helps us explore and feel comfortable with our data. 

In [4]:
van_trees_df.info()
print("\n")
van_trees_df.describe()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5000 entries, 0 to 4999
Data columns (total 21 columns):
 #   Column              Non-Null Count  Dtype  
---  ------              --------------  -----  
 0   Unnamed: 0          5000 non-null   int64  
 1   std_street          5000 non-null   object 
 2   on_street           5000 non-null   object 
 3   species_name        5000 non-null   object 
 4   neighbourhood_name  5000 non-null   object 
 5   date_planted        2363 non-null   object 
 6   diameter            5000 non-null   float64
 7   street_side_name    5000 non-null   object 
 8   genus_name          5000 non-null   object 
 9   assigned            5000 non-null   object 
 10  civic_number        5000 non-null   int64  
 11  plant_area          4950 non-null   object 
 12  curb                5000 non-null   object 
 13  tree_id             5000 non-null   int64  
 14  common_name         5000 non-null   object 
 15  height_range_id     5000 non-null   int64  
 16  on_str

Unnamed: 0.1,Unnamed: 0,diameter,civic_number,tree_id,height_range_id,on_street_block,latitude,longitude
count,5000.0,5000.0,5000.0,5000.0,5000.0,5000.0,5000.0,5000.0
mean,14861.9204,12.340888,2975.7076,128682.5846,2.7344,2960.227,49.247349,-123.107128
std,8680.023278,9.2666,2078.580429,75412.260406,1.56957,2086.861052,0.021251,0.049137
min,2.0,0.0,2.0,36.0,0.0,0.0,49.202783,-123.22056
25%,7192.75,4.0,1300.5,61321.5,2.0,1300.0,49.230152,-123.144178
50%,14870.0,10.0,2639.0,130130.5,2.0,2600.0,49.247981,-123.105861
75%,22366.75,18.0,4123.0,191332.0,4.0,4100.0,49.263275,-123.063484
max,29992.0,71.0,9113.0,270750.0,9.0,9100.0,49.29393,-123.023311


#### *Filtering Columns*



We can see here that there are many columns that are not relevent to our questions so lets filter out the columns we need for our analysis. We only want `common_name`, `neighbourhood_name`, `latitude`, and `longitude` to explore the abundance of trees. 

In [5]:
#Filtered data set

vt_cleaned = van_trees_df.filter(['common_name','neighbourhood_name', 'latitude', 'longitude'] )
vt_cleaned.head()

Unnamed: 0,common_name,neighbourhood_name,latitude,longitude
0,NORWAY MAPLE,Riley Park,49.252711,-123.106323
1,CHANTICLEER PEAR,Arbutus-Ridge,49.25635,-123.158709
2,AUSTRIAN PINE,Sunset,49.213486,-123.083254
3,AUTUMN APPLAUSE ASH,Killarney,49.220839,-123.036721
4,COMMON HORSECHESTNUT,Shaughnessy,49.238514,-123.154958


Here is our filtered data set.

In [6]:
print(vt_cleaned.describe())
print()
vt_cleaned.info()

          latitude    longitude
count  5000.000000  5000.000000
mean     49.247349  -123.107128
std       0.021251     0.049137
min      49.202783  -123.220560
25%      49.230152  -123.144178
50%      49.247981  -123.105861
75%      49.263275  -123.063484
max      49.293930  -123.023311

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5000 entries, 0 to 4999
Data columns (total 4 columns):
 #   Column              Non-Null Count  Dtype  
---  ------              --------------  -----  
 0   common_name         5000 non-null   object 
 1   neighbourhood_name  5000 non-null   object 
 2   latitude            5000 non-null   float64
 3   longitude           5000 non-null   float64
dtypes: float64(2), object(2)
memory usage: 156.4+ KB


#### *Little summary:* 

The dataset has 5000 entries.
The dataset has 4 columns: common_name, neighbourhood_name, latitude, and longitude.
common_name and neighbourhood_name are object data types while latitude and longitude are float data types.
There are no null values in the dataset.
The mean latitude and longitude values are 49.247349 and -123.107128, respectively.
The standard deviation of latitude and longitude values are 0.021251 and 0.049137, respectively.
The minimum latitude and longitude values are 49.202783 and -123.220560, respectively.
The maximum latitude and longitude values are 49.293930 and -123.023311, respectively.

## Lets look at the Top Planted Trees


In [8]:
#Bar Chart

species_abundance = (
    alt.Chart(vt_cleaned).mark_bar().encode(
        alt.X("count()", title="Number of Times"),
        alt.Y("neighbourhood_name:O", title="Type of Tree", sort = 'x'),
        tooltip=["count():N", "common_name"]
    )
    .properties(width=800, title="Fig 1. Abundance of Tree Planted")
)

species_abundance

#### Hover over the bars: 

You'll be able to see which tree and how many times it's been planted but this is a bit of a handle if you'd like to know the tree. It's more useful to see how many times the most popular tree has been planted. 


#### Better Bar Chart: 

The following chart presents a clearer visualization of the top trees in each neighborhood, displaying the top 3 trees in each case.

Compared to the previous chart, this one is superior because it groups the neighborhoods and provides a clear indication of how much the top tree dominates the others.

In [9]:
grouped_data = vt_cleaned.groupby(['neighbourhood_name', 'common_name']).size().reset_index(name='count')

# sort the data by neighbourhood and count
sorted_data = grouped_data.sort_values(['neighbourhood_name', 'count'], ascending=[True, False])

# create a list of the top 3 common names for each neighbourhood
top_3 = sorted_data.groupby('neighbourhood_name').head(3)

# create a bar chart for each neighbourhood showing the top 3 common names
charts = []
for neighbourhood in grouped_data['neighbourhood_name'].unique():
    subset = top_3[top_3['neighbourhood_name'] == neighbourhood]
    chart = alt.Chart(subset).mark_bar().encode(
        x='common_name',
        y='count',
        color='common_name',
        tooltip=['common_name', 'count']
    ).properties(title=neighbourhood)
    charts.append(chart)

# combine the charts into a single chart using Altair's hconcat function
final_chart = alt.concat(*charts, columns=5)

# show the chart
final_chart

By performing a count, we can observe that the *Pissard Plum, Norway Maple, and Kwanzan Cherry* are the top trees planted.

*Note that one of the top trees is a flowering cherry trees.*

#### Another Way of Viewing Top Planted Trees

We will be examining the most popular trees in each neighborhood, focusing on those that have been planted more than 25 times.

The circular chart presents a clear visualization of each tree's density, with Pissard Plum, Norway Maple, and Kwanzan Cherry being the most prevalent. This chart is an excellent alternative for visualizing the data.

In [10]:
filtered_df = vt_cleaned.groupby('common_name').filter(lambda x: len(x) >= 25)
neighbourhood_plot = alt.Chart(filtered_df).mark_circle().encode(
    alt.X('neighbourhood_name', title='Neighbourhood'),
    alt.Y('common_name', title= 'Tree Type'),
    alt.Color('count()', scale=alt.Scale(scheme='viridis')),
    alt.Size('count()', scale=alt.Scale(range=[100, 2000])),
    tooltip=["count():N", "common_name"]
).properties(title='Fig. 3 Density Of Trees Planted 25+ in Vancouver')

neighbourhood_plot


#### *Little summary:* 

I have demonstrated the most planted trees in Vancouver using various visualizations, including a bar chart of the top planted tree, a bar chart of the top 3 planted trees, and a bubble plot showing the density of the top 3 and other commonly planted trees.

This information is important because it can help us understand the patterns and preferences of tree planting in the city, and also inform decisions regarding urban forestry and green space planning.

Now, let's shift our focus to the following cherry trees.

## What the Distribution of Flowering Trees in Vancouver Look Like?


Lets take a look at the top trees in tabular form:

In [11]:
#Viewing all tree names and count

all_trees = vt_cleaned['common_name'].value_counts().reset_index()
all_trees.head()

Unnamed: 0,index,common_name
0,KWANZAN FLOWERING CHERRY,383
1,PISSARD PLUM,295
2,NORWAY MAPLE,215
3,CRIMEAN LINDEN,152
4,PYRAMIDAL EUROPEAN HORNBEAM,100


From above we can see it's Kwanza Flowering Cherry, Akebono Flowering Cherry, and Japanese Flowering Crabapple

In [12]:
#Filtered dataframe for Top 3 Flowering Trees only

filtered = vt_cleaned['common_name'].isin(['KWANZAN FLOWERING CHERRY', 'AKEBONO FLOWERING CHERRY', 'JAPANESE FLOWERING CRABAPPLE'])
                 
filtered_top3 =vt_cleaned[filtered]
filtered_top3

Unnamed: 0,common_name,neighbourhood_name,latitude,longitude
21,KWANZAN FLOWERING CHERRY,Sunset,49.225494,-123.087200
42,KWANZAN FLOWERING CHERRY,Shaughnessy,49.239992,-123.152677
60,KWANZAN FLOWERING CHERRY,Dunbar-Southlands,49.246430,-123.196900
62,KWANZAN FLOWERING CHERRY,Mount Pleasant,49.261203,-123.091148
63,AKEBONO FLOWERING CHERRY,Marpole,49.217274,-123.133047
...,...,...,...,...
4978,KWANZAN FLOWERING CHERRY,Victoria-Fraserview,49.220258,-123.074637
4988,KWANZAN FLOWERING CHERRY,Shaughnessy,49.257003,-123.130106
4992,KWANZAN FLOWERING CHERRY,Victoria-Fraserview,49.221161,-123.060833
4993,JAPANESE FLOWERING CRABAPPLE,Victoria-Fraserview,49.210781,-123.063509


In this data frame we have only the most abudance flowering cherry trees and now we will look in what neighbourhood they are abundant in. 

By hovering over the blocks we can see the amount of tree and type of tree per area.

In [13]:
#Heat map of Neighbourhoods and Top 3 Flowering Cherry Trees
neighbourhood_heatmap = alt.Chart(filtered_top3).mark_rect().encode(
    alt.Y('neighbourhood_name', title='neighbourhood_name'),
    alt.X('common_name', title= 'tree type'),
    alt.Color('count()'),
    tooltip=["count():N", "common_name"]).properties(title='Fig. 3 Abundance of Top 3 Flowering Cherry Trees in Vary Neighbourhoods')

neighbourhood_heatmap

#### Understanding the Heat Map

The heat map shows tree abundance in Vancouver. 

Dark blue represents high abundance while light green indicates low abundance and white represents zero presence. 

Flowering cherry trees are abundant in most parts of Vancouver, except for areas like Strathcona, Fairview, and Downtown. 

Renfrew-Collingwood and Victoria-Fraserview have the highest number of Kwanzan Flowering Cherry trees, with the former having the most abundance of all top 3 types of flowering cherry trees. 

Kwanza is the dominant species among the three in almost all areas, except for Hasting-Sunrise where the Japanese Flowering Crabapple is more dominant. However, a map would provide a clearer picture of the distribution of flowering cherry trees.

In [14]:
#create points for map
points=alt.Chart(filtered_top3).mark_point().encode(
        longitude = "longitude",
        latitude = "latitude",
    color = "common_name", 
tooltip = ['neighbourhood_name']).properties(width = 700)

#Load map
url_geojson = 'https://raw.githubusercontent.com/UBC-MDS/exploratory-data-viz/main/data/local-area-boundary.geojson'

data_geojson_remote = alt.Data(url=url_geojson, format=alt.DataFormat(property='features',type='json'))


vancouver_map = alt.Chart(data_geojson_remote).mark_geoshape(
    color = 'grey', opacity= 0.7, stroke='white').encode(
).project(type='identity', reflectY=True).properties(title='Fig. 4 Map of Top 3 Flowering Cherry Trees in Vancouver')

#Overlay map and points
flowering_tree_map = (vancouver_map + points).configure_view(stroke=None)
flowering_tree_map

#### Understanding the Heat Map

By hovering over parts of Vancouver on the map, we can easily locate the top $3$ flowering cherry trees. 

The dominance of Kwanza Flowering Cherry throughout Vancouver is clearly visible. A large cluster of Kwanzan Flowering Cherry can be seen in the Mount Pleasant area, which is likely to be Queen Elizabeth Park. 

The top $3$ flowering cherry trees are widespread in Vancouver, with fewer sightings in Downtown and Strathcona. Moreover, Akebono Flowering Cherry seems to be more common in the western part of Vancouver, as indicated by the large red cluster.

### *Summary*

This analysis explores the abundance and distribution of cherry trees in Vancouver, with Kwanza Flowering Cherry being the most dominant species. 

Fig. 1 and Fig. 2 provide visualizations of the top planted trees in each neighborhood, with Fig. 3 illustrating the most planted trees in Vancouver. 
Fig. 4 shows a heatmap of tree abundance, with Renfrew-Collingwood and Victoria-Fraserview having the most Kwanzan Flowering Cherry trees. 
Fig. 5 allows us to locate the top three flowering cherry trees in Vancouver. 

Strathcona has the least diverse tree species and few sightings of the top three flowering cherry trees. Future questions include exploring why Kwanza is the most popular and identifying the oldest recorded tree. Despite improvements that could be made to the visualizations, the analysis successfully explores the cherry trees in Vancouver.

I hope you found this analysis informative and insightful in learning more about the tree species and distribution in Vancouver, particularly regarding the prevalence of the Kwanza Flowering Cherry and its dominance throughout the city. I also hope this analysis can be helpful in locating your desired cherry trees come springtime

## References

Not all the work in this notebook is original. Parts that were borrowed from other resources are as follows:

### Resources used
- Fig 1. Referenced from Assignment 2 (Q. 3.4)
- Fig 2. Referenced from Assignment 4 (Q. 4.3)
- Fig 3. Referenced from Assignment 4 (Q. 4.2)
- Fig 4. Referenced from final_project_extras.html