# **Tracking Tree Canopy Equality with Green Coverage Model**

![](https://www.nationalgeographic.com/content/dam/science/2020/07/06/crown-shyness/crownshyness_mm9404_200619_00101.ngsversion.1594031243744.adapt.1900.1.jpg)
*image source*: https://www.nationalgeographic.com/science/2020/07/tree-crown-shyness-forest-canopy/

There is a multitude of benefits to tree coverage. Trees regulate extreme temperatures and manage flash flooding. They also absorb carbon dioxide from the atmosphere, helping to offset greenhouse gas emissions.

Given these benefits, it is clear that trees have an implicit value. In this notebook, we explore tree coverage inequality within cities and how that perpetuates existing imbalances between the rich and the poor. The imbalances we discuss are public health outcomes related to extreme heat and higher vulnerability to flooding. No other category of hazardous weather event in the United States has caused more fatalities over the last few decades than extreme heat.

The **KPI** we suggest to CDP is the **Gini Coefficient** for a 20 km x 20 km region around a central coordinate. We calculate this by querying satellite images for the area of interest, then extract a subset of the green spectrum to find the density of green coverage. We then analyse how evenly the coverage is spread to calculate the distribution of greenery.


# Executive Summary

<a></a>

* Have created an automated tool which takes a city name as input to output a green coverage map and Lorenz Curve

* Provides actionable insights for any city by showing which areas are underrepresented with tree coverage and whether the tree distribution is equitable

* Creates demand for low skilled work. A group severely impacted by the Covid-19 pandemic

* Is an effective infrastructure project that can be started with relative ease to kick start an economy 

* Removes systematic bias towards minorities and lower-income groups

* As global temperatures and extreme weather rise, improves public health outcomes for low socio-economic groups

The kernel is ordered as follows. We start with an exploratory analysis of the CDP disclosure data. Next, a detailed explanation of our method alongside examples of the application being used with Los Angeles and Sydney to illustrate the robustness of the product. We then elaborate on the advantages and shortcomings of the approach and discuss further iterations of the tool.

# Exploratory Data Analysis on CDP Disclosure Data

To understand what KPIs would add value to cities, we explored the disclosure data provided by CDP.

In [None]:
# Library Imports

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from IPython.core.display import HTML
from IPython.display import  Markdown
import seaborn as sns
import random
from sklearn.cluster import KMeans
from sklearn.preprocessing import StandardScaler
import geopandas as gpd
from shapely.geometry import Point
from  sklearn.metrics import davies_bouldin_score
from sklearn.decomposition import PCA
from matplotlib.path import Path
from matplotlib.spines import Spine
from matplotlib.projections.polar import PolarAxes
from matplotlib.projections import register_projection
from matplotlib.patches import Circle, RegularPolygon
from matplotlib.path import Path
from matplotlib.projections.polar import PolarAxes
from matplotlib.projections import register_projection
from matplotlib.spines import Spine
from matplotlib.transforms import Affine2D
from sklearn import tree
import statsmodels.api as sm
from scipy import stats
import json
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.feature_extraction.text import TfidfVectorizer
import plotly.graph_objects as go
import json
import torch
import torchvision.transforms as transforms
import os
import requests
import shutil
from PIL import Image
from tqdm import tqdm
import cv2
import folium
from folium.plugins import HeatMap
from IPython.display import display
from kaggle_secrets import UserSecretsClient

google_api_key = UserSecretsClient().get_secret("google_api_key")

#centering figures
HTML("""<style> .output_png { display: table-cell; text-align: center; vertical-align: middle;}</style>""")

#my colors
colors= ['#458B00','#629632','#397D02','#567E3A','#A6D785','#687E5A','#8AA37B','#476A34','#7BBF6A','#3D8B37','#426F42','#215E21']

#remove pandas display limit
pd.options.display.max_colwidth = None

In [None]:
# Data Imports

cities_2018 = pd.read_csv("/kaggle/input/cdp-unlocking-climate-solutions/Cities/Cities Responses/2018_Full_Cities_Dataset.csv",
                         usecols=[c for c in list(pd.read_csv("/kaggle/input/cdp-unlocking-climate-solutions/Cities/Cities Responses/2018_Full_Cities_Dataset.csv", nrows =1)) if c not in ['Questionnaire','Year Reported to CDP','File Name','Last update','Comments']])
cities_2019 = pd.read_csv("/kaggle/input/cdp-unlocking-climate-solutions/Cities/Cities Responses/2019_Full_Cities_Dataset.csv",
                         usecols=[c for c in list(pd.read_csv("/kaggle/input/cdp-unlocking-climate-solutions/Cities/Cities Responses/2019_Full_Cities_Dataset.csv", nrows =1)) if c not in ['Questionnaire','Year Reported to CDP','File Name','Last update','Comments']])
cities_2020 = pd.read_csv("/kaggle/input/cdp-unlocking-climate-solutions/Cities/Cities Responses/2020_Full_Cities_Dataset.csv",
                         usecols=[c for c in list(pd.read_csv("/kaggle/input/cdp-unlocking-climate-solutions/Cities/Cities Responses/2020_Full_Cities_Dataset.csv", nrows =1)) if c not in ['Questionnaire','Year Reported to CDP','File Name','Last update','Comments']])

companies_climate_change_2018 = pd.read_csv("/kaggle/input/cdp-unlocking-climate-solutions/Corporations/Corporations Responses/Climate Change/2018_Full_Climate_Change_Dataset.csv",
                         usecols=[c for c in list(pd.read_csv("/kaggle/input/cdp-unlocking-climate-solutions/Corporations/Corporations Responses/Climate Change/2018_Full_Climate_Change_Dataset.csv", nrows =1)) if c not in ['Questionnaire','Year Reported to CDP','File Name','Last update','Comments']])
companies_climate_change_2019 = pd.read_csv("/kaggle/input/cdp-unlocking-climate-solutions/Corporations/Corporations Responses/Climate Change/2019_Full_Climate_Change_Dataset.csv",
                         usecols=[c for c in list(pd.read_csv("/kaggle/input/cdp-unlocking-climate-solutions/Corporations/Corporations Responses/Climate Change/2019_Full_Climate_Change_Dataset.csv", nrows =1)) if c not in ['Questionnaire','Year Reported to CDP','File Name','Last update','Comments']])
companies_climate_change_2020 = pd.read_csv("/kaggle/input/cdp-unlocking-climate-solutions/Corporations/Corporations Responses/Climate Change/2020_Full_Climate_Change_Dataset.csv",
                         usecols=[c for c in list(pd.read_csv("/kaggle/input/cdp-unlocking-climate-solutions/Corporations/Corporations Responses/Climate Change/2020_Full_Climate_Change_Dataset.csv", nrows =1)) if c not in ['Questionnaire','Year Reported to CDP','File Name','Last update','Comments']])

geo_cities_2020 = pd.read_csv("/kaggle/input/cdp-unlocking-climate-solutions/Cities/Cities Disclosing/2020_Cities_Disclosing_to_CDP.csv", usecols=["Account Number","City Location", "Organization", "City"]) 

First, we investigated the most significant climate hazards facing cities today. As shown below, flooding, extreme heat and extreme precipitation account for over 50% of the hazards reported.

In [None]:
# Isolating Cities Question

question_3_responses = cities_2020[
    (cities_2020['Question Number'] == '2.1') &
    (cities_2020['Column Name'] == 'Climate Hazards')
]

question_3_responses = question_3_responses['Response Answer'].str.split(">", n = 1, expand = True).loc[:,0]
question_3_response_count = question_3_responses.value_counts()

# Isolating Coorperations Question

question_2_responses = companies_climate_change_2020[
    (companies_climate_change_2020['question_number'] == 'C2.3a') &
    (companies_climate_change_2020['column_number'] == 3)
]

risk_types = [
    'Acute physical',
    'Emerging regulation',
    'Current regulation',
    'Chronic physical',
    'Market',
    'Reputation',
    'Technology'
]

question_2_responses = question_2_responses[~question_2_responses['response_value'].isin(risk_types)]
question_2_response_count = question_2_responses['response_value'].value_counts().head(10)

# Graphing

fig = plt.figure(figsize=(15,9))
ax1 = fig.add_subplot(121)

question_3_response_count.plot.pie(
    textprops={'color':"w"},
    pctdistance=0.7,
    autopct='%.2f%%',
    colors=colors, 
    labels=None,
    ax=ax1,
    ylabel=""
)


ax1.title.set_text("Climate Hazards Identified By Cities")
ax1.legend(
    question_3_response_count.index,
    loc="lower center", 
    bbox_to_anchor=(0.5, -0.4)
)

plt.show()


Our next step was to understand what actions and goals cities are implementing to increase resilience to these hazards. Vectorising responses to Question 3.3 (2020)

“Please describe the main goals of your city’s adaptation efforts and the metrics / KPIs for each goal.”

gave us the results you see below. Interestingly, the two most common actions were ‘tree planting creation green’ and ‘planting creation green space’. To understand why these actions are so common, we then evaluated which climate issues the discussed actions addressed:

In [None]:

pd.set_option('mode.chained_assignment', None)

adaptation_actions_ques_3 = cities_2020[
    (cities_2020['Question Number'] == '3.0') &
    (cities_2020['Column Number'] == 2)
]

adaptation_actions_ques_3.loc[:, ['Response Answer']] = adaptation_actions_ques_3.loc[:, ['Response Answer']].fillna('No Response')

def get_top_n_bigram(corpus, n=None):
    vec = TfidfVectorizer(ngram_range=(4,4), stop_words='english').fit(corpus)
    bag_of_words = vec.transform(corpus)
    sum_words = bag_of_words.sum(axis=0) 
    words_freq = [(word, sum_words[0, idx]) for word, idx in vec.vocabulary_.items()]
    words_freq =sorted(words_freq, key = lambda x: x[1], reverse=True)
    return words_freq[:n]
common_words = get_top_n_bigram(adaptation_actions_ques_3['Response Answer'], 20)

adaptation_actions_ques_3 = pd.DataFrame(common_words, columns = ['word' , 'count'])
  
adaptation_actions_ques_3  = adaptation_actions_ques_3.set_index('word')[:7]  ## Taking the first 5



adaptation_goals_ques_3 = cities_2020[
    (cities_2020['Question Number'] == '3.3') &
    (cities_2020['Column Number'] == 1) 
]
adaptation_goals_ques_3.loc[:, ['Response Answer']] = adaptation_goals_ques_3.loc[:, ['Response Answer']].fillna('No Response')

def get_top_n_bigram(corpus, n=None):
    vec = TfidfVectorizer(ngram_range=(4,4), stop_words='english').fit(corpus)
    bag_of_words = vec.transform(corpus)
    sum_words = bag_of_words.sum(axis=0) 
    words_freq = [(word, sum_words[0, idx]) for word, idx in vec.vocabulary_.items()]
    words_freq =sorted(words_freq, key = lambda x: x[1], reverse=True)
    return words_freq[:n]
common_words = get_top_n_bigram(adaptation_goals_ques_3['Response Answer'], 20)
# for word, freq in common_words:
#     print(word, freq)
adaptation_goals_ques_3 = pd.DataFrame(common_words, columns = ['word' , 'count'])

adaptation_goals_ques_3.loc[adaptation_goals_ques_3.word=='reforzar el sistema salud','word']= 'make a stronger health care'
adaptation_goals_ques_3.loc[adaptation_goals_ques_3.word=='monitoramento risco em tempo','word']= 'monitoring risk in real time'

adaptation_goals_ques_3  = adaptation_goals_ques_3.set_index('word')[:6]  ## Taking the first 5

# Graphing

fig = plt.figure(figsize=(15,9))
ax1 = fig.add_subplot(121)
ax2 = fig.add_subplot(122)

adaptation_actions_ques_3['count'].plot.pie(
    textprops={'color':"w"},
    pctdistance=0.7,
    explode=(0.1,0.1,0,0,0,0,0),
    autopct='%.2f%%',
    colors=colors, 
    labels=None,
    ax=ax1,
    ylabel=""
)

adaptation_goals_ques_3['count'].plot.pie(
    textprops={'color':"w"},
    explode=(0.1,0,0,0,0,0),
    pctdistance=0.7,
    autopct='%.2f%%',
    colors=colors, 
    labels=None,
    ax=ax2,
    ylabel=""
)

ax1.title.set_text("Actions - Present") ## present 
ax2.title.set_text("Goals/KPI's - Future") ## future
ax1.legend(
    adaptation_actions_ques_3.index,
    loc="lower center", 
    bbox_to_anchor=(0.5, -0.4)
)
ax2.legend(
    adaptation_goals_ques_3.index,
    loc="lower center", 
    bbox_to_anchor=(0.5, -0.4)
)

plt.show()



As the above visualisation shows, improved green coverage increases resilience towards each of the top three risks we discovered earlier, clearly justifying its popularity.

By selecting these actions, we were then able to observe the metrics used to measure their implementation and success. As shown below, this analysis showed us that cities do not have a common KPI to measure the same objective.

Ultimately, our processing and visualisation of disclosure data revealed the demand for a unified green coverage KPI that could be easily used by any city, anywhere in the world. Such a tool would allow cities around the world to understand their urban greenery and compare their tree equality distribution with other cities. As a global organisation who has set themselves apart in disclosure, this KPI is perfect for CDP.

In [None]:
question_3_responses = cities_2020[cities_2020['Question Number'] == '3.3'].copy()

question_3_3_responses = cities_2020[(cities_2020['Question Number'] == '3.3') &
                                     (cities_2020['Column Number'] == 1)].copy()

question_3_3_responses['Response Answer'] = question_3_3_responses['Response Answer'].fillna('No Response')
 
# Filter question 3.3 to cities that mention green spaces in adaptation goals

# list_of_strings = ['tree', 'plant', 'park', 'planting', 'canopy', 'canopies']
list_of_strings = ['tree']


pattern = '|'.join(list_of_strings)

question_3_3_filtered = question_3_3_responses[question_3_3_responses['Response Answer'].str.contains(pattern)]

# Retrieve all columns for the rows that have the list of strings in thier adaptation goals.

all_green_responses = pd.DataFrame()

for index, row in question_3_3_filtered.iterrows():

    all_green_responses = all_green_responses.append(question_3_responses[(question_3_responses['Account Number'] == row['Account Number']) &
                                          (question_3_responses['Row Number'] == row['Row Number'])], ignore_index=True)
    
all_green_responses_2 = all_green_responses['Response Answer'][all_green_responses['Column Number'] == 2]

all_green_responses_2 = all_green_responses_2.str.split(">", n=1, expand=True).loc[:, 0].value_counts()

fig = go.Figure(data=[go.Sankey(
    node = dict(
      pad = 15,
      thickness = 20,
      line = dict(color = "#215E21", width = 0.5),
      label = ["Tree", "Extreme hot temperature", "Flood and sea level rise", "Extreme Precipitation", "Storm and wind", "Water Scarcity", "Chemical change", "Extreme cold temperature", "Biological hazards", "Mass movement"],
      color = "#4BB74C"
    ),
    link = dict(
      source = [0, 0, 0, 0, 0, 0, 0, 0, 0], # indices correspond to labels, eg A1, A2, A1, B1, ...
      target = [1, 2, 3, 4, 5, 6, 7, 8, 9],
      value = [20, 10, 8, 6, 5, 3, 2, 2, 1],
    color = "#C1FFC1"
  ))])

fig.update_layout(title_text="Hazard related to Tree", font_size=15)
fig.show()


# The intersection between the tree coverage and social inequality

## Health outcomes 

Medical and health researchers have shown that fatalities during heatwaves are most commonly due to respiratory and cardiovascular diseases, primarily from heats negative effect on the cardiovascular system. In an attempt to control one's internal temperature, the body's instinct is to circulate large quantities of blood to the skin. However, to perform this protective measure against overheating actually harms the body by inducing extra strain on the heart. This excess strain has the potential to trigger a cardiac event in those with chronic health problems, such as the elderly (Cui et al.). Frumkin showed that the relationship between mortality and temperature creates a J-shaped function, showing a steeper slope at higher temperatures. Records show that more casualties have resulted from heat waves than hurricanes, floods, and tornadoes together. The significance of this statistic is that extreme heat events are so deadly and are becoming more prevalent (Stone et al.).

A study held from 1989 to 2000 have also recorded a rise of 5.7% in mortality during heatwaves. It revealed that Rome's elderly population endures a higher mortality rate during heat waves, at 8% excess for the 65–74 age group and 15% for above 74 (Schifano et al.). Another study found that in French cities during the 2003 heatwave, small towns saw an average excess mortality rate of 40%, while Paris witnessed an increase of 141%. During this period, a 0.5 °C increase above the average minimum nighttime temperature doubled the risk of death in the elderly (Dousset et al.).

Since the air temperature of urban areas with more trees can be around 4C cooler than those without. On a more local level, the air temperature on a treeless residential street can be 10C higher than a nearby shady street (Sinfield et al.).

A city’s poorest areas tend to have less tree canopy than wealthier areas, a pattern that is especially pronounced on the concrete-dense neighbourhoods where temperature regulation is most important (Ready et al.).

The areas which have faced systematic inequality with tree coverage are now seeing poorer public health outcomes because of it. Through no fault of their own, at-risk people are now more vulnerable to extreme heat because of where they live.

The best way to counter this issue is to plant trees in the areas which need it most. By planting trees, city councils can effectively regulate temperatures and therefore improve public health outcomes for vulnerable members of their society. Since global temperatures are rising, this is an issue which will only become more prevalent as time goes on. It is critical for councils to act now, which is what has inspired our KPI.


## Danger from flooding

There are several types of flooding that affect communities:
- River Flooding: When the amount of water entering a river exceeds its holding capacity and overflows its banks
- Surface water flooding – when heavy rainfall runs on hard or saturated surfaces without getting absorbed into the ground and collects in low areas damaging properties and misplacing communities
- Drain and sewer flooding – When heavy rain causes drain and sewers to be blocked
- Coastal flooding – due to climate change, weather and tidal conditions cause an increase in sea levels affecting coastal neighbourhoods.

Planting trees and extending greenspaces help by intercepting rainfall which significantly slows the rains speed. Tree canopies can capture about 30% of the rainfall which then evaporates back into the atmosphere without reaching the ground. This can even occur in winter where trees intercept and re-evaporate rainfall. The root systems of trees also allow water to penetrate deeper into the soil, decreasing surface run-off while also increasing the water storage capacity of the soil. In urban spaces, an increase in impermeable surfaces such as roads, pavements and driveways has led to increased surface water run-off. Trees reduce surface run-off by 80% compared to asphalt. Incorporating green spaces in urban spaces would drastically reduce run-off leading to a decreased risk of flooding (Woodland).


## How can this pull cities out of a recession without perpetuating socical inequities?

Tree planting projects can help pull cities out of a recession by mobilising the hardest-hit workers from the Covid-19 pandemic, low skilled workers. By getting this group employed, the economy can pull itself out of the recession as it will put money into the hands of people who need it most. This will stimulate local economies by allowing consumer spending to increase. Being a labour intensive project that requires practically no qualifications, it is a great way to get the masses employed in any city in the world. On top of this, trees often need at least two years of constant care (Ready et al.). Which means these employees can stay hired until the economy has rebounded, and the labour market is thriving again. Now, contrast this with investment in renewable energy. Of course, renewables are fantastic, but the workers involved are typically highly skilled and therefore are most likely still in the job market. If they are not, this is not a big issue as there are plenty of opportunities available to them. By investing in renewables, we are only worsening the income inequality as the low skilled workers are left behind.


## How can this be done in a socialy equitable way with the backdrop of a global pandemic?

As mentioned above, by mobilising low skilled workers for these projects, cities can put money in the hands of people who need it most. This contributes to making society more equitable since it balances income distribution. With industries and hospitality practically reaching a standstill with the pandemic, we can help these newly unemployed workers find meaningful work again.


## How can corporations help solve this problem?

Cities could outsource the labour requirement to corporations as it is not a councils expertise to directly hire and manage labour oriented workforces.


## How does this measure the intersection in the context of resiliency?

The intersection allows cities to be resilient in a myriad of different ways. By planting more trees, cities are defending themselves against rising temperatures, they are protecting the health of their most vulnerable citizens and are mitigating flash flooding events.



# Tree Canopy Equality KPI

Our KPI allows cities around the world to evaluate their green coverage consistently and reliably. We have created an interface for collecting and assessing satellite images for a given city to understand the equality of their greenery distribution.

## Method:

We follow the following steps:
1. Take a city name as input (must be the same as that sent to CDP)
2. Retrieve coordinates of the city centre from CDP data
3. Fetch zoomed out satellite images
4. Determine city boundaries using a custom ML model
5. Collect thousands of zoomed-in images within the city bounds and asses green coverage
6. Combine results to form a Lorenz curve and calculate the Gini coefficient
7. Visualise areas with least tree coverage on a heat map


Code:

We have encapsulated this logic in the following blocks and included them below:
1. The geographical grid search system
2. Machine learning-based city detection
3. Satellite imagery pipeline
4. Complete pipeline


## Green Coverage Tool

In [None]:
# 1. Geographical grid search system 

def getBoundingBoxes(bounds, zoom=14):
    
    resolution_map = {
        14: 0.054885,
        18: 0.003251
    }
    resolution = resolution_map[zoom]

    xSorted = sorted(bounds, key=lambda p: p[0])
    minX = xSorted[0][0]
    maxX = xSorted[-1][0]

    ySorted = sorted(bounds, key=lambda p: p[1])
    minY = ySorted[0][1]
    maxY = ySorted[-1][1]


    gridWidth = int((maxX - minX) / resolution)
    gridHeight = int((maxY - minY) / resolution)
    return [
        [
            [minX + x * resolution, minY + y * resolution],
            [minX + (x + 1) * resolution, minY + y * resolution],
            [minX + (x + 1) * resolution, minY + (y + 1) * resolution],
            [minX + x * resolution, minY + (y + 1) * resolution],
            [minX + x * resolution, minY + y * resolution],
        ]
        for x in range(gridWidth)
        for y in range(gridHeight)
    ]

In [None]:
# 2. Machine learning based city detection
# Only the functions needed for inference are included within this document
# All training was completed in a seperate directory to avoid complexity balooning

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
MODEL_PATH = "/kaggle/input/urban-classification-model/urban_classifier.pth"
model=torch.load(MODEL_PATH, map_location=torch.device('cpu'))
model.eval()


class BasicClassificationDataset(torch.utils.data.Dataset):
    def __init__(self, dir=None, image_paths=None):
        self.dir = dir
        self.image_paths = image_paths
        self.transforms = transforms.Compose(
            [
                transforms.ToTensor(),
                transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5)),
                # transforms.RandomHorizontalFlip(0.5),
                # transforms.RandomVerticalFlip(0.5),
            ]
        )
        if dir:
            self.labels = os.listdir(dir)
            self.images = []

            for i in range(len(self.labels)):
                label = self.labels[i]
                images = os.listdir(os.path.join(dir, label))
                self.images += [
                    [os.path.join(dir, label, image), i] for image in images
                ]
        if image_paths:
            self.labels = ["None"]
            self.images = list(map(lambda p: [p, 0], self.image_paths))

    def __getitem__(self, idx):
        image_path, image_class = self.images[idx]
        img = Image.open(image_path).convert("RGB")
        img = self.transforms(img)
        return img, image_class

    def __len__(self):
        return len(self.images)
        
def urban_classification(image_paths):
    dataset = BasicClassificationDataset(image_paths=image_paths)
    loader = torch.utils.data.DataLoader(
        dataset, batch_size=1, shuffle=False, num_workers=0
    )

    classifications = []

    with torch.no_grad():
        for i, data in enumerate(loader, 0):
            images, labels = data
            images = images.to(device)
            outputs = model(images)
            _, predicted = torch.max(outputs.data, 1)
            classifications += predicted.tolist()

    return classifications

## Basic color detection for trees
## If given more time we would train a segmentation model for the task

def get_green_mask(image_path):
    img = Image.open(image_path).convert('RGB')
    arr = np.array(np.asarray(img))

    R = [(36,86),(0,255),(0,255)]
    red_range = np.logical_and(R[0][0] < arr[:,:,0], arr[:,:,0] < R[0][1])
    green_range = np.logical_and(R[1][0] < arr[:,:,0], arr[:,:,0] < R[1][1])
    blue_range = np.logical_and(R[2][0] < arr[:,:,0], arr[:,:,0] < R[2][1])
    valid_range = np.logical_and(red_range, green_range, blue_range)
    mask = valid_range
    # print(valid_range)
    green_score = mask.sum() / (600 * 600)
    return (img, mask, green_score)

def display_mask(image_path):
    img, mask, green_score = get_green_mask(image_path)

    print(green_score)
    fig = plt.figure(figsize=(15, 9))
    ax1 = fig.add_subplot(121)
    ax2 = fig.add_subplot(122)

    ax1.imshow(mask)
    ax2.imshow(img)
    plt.show()
    

In [None]:
# 3. Satelite imagery pipeline

def satelite_download(tile, output_dir, zoom):
    size = [600, 600]
    zoom = zoom

    centerX = (tile[0][0] + tile[2][0]) / 2
    centerY = (tile[0][1] + tile[2][1]) / 2

    image_url = f"http://maps.googleapis.com/maps/api/staticmap?center={centerY},{centerX}&zoom={zoom}&size={size[0]}x{size[1]}&maptype=satellite&key={google_api_key}"
    filename = f"{centerX},{centerY}.png"
    filename = os.path.join(output_dir, filename)
    
    #     print(image_url)

    if os.path.exists(filename):
        return filename
    else:
        # Open the url image, set stream to True, this will return the stream content.
        r = requests.get(image_url, stream=True)

        # Check if the image was retrieved successfully
        if r.status_code == 200:
            # Set decode_content value to True, otherwise the downloaded image file's size will be zero.
            r.raw.decode_content = True

            # Open a local file with wb ( write binary ) permission.
            with open(filename, "wb") as f:
                shutil.copyfileobj(r.raw, f)

#             print("Image sucessfully Downloaded: ", filename)
            return filename
        else:
#             print("Image Couldn't be retreived")
            return None

In [None]:
# 4. Complete pipeline

def get_green_scores(city_name, data_dir, output_path):

    if (google_api_key=='') or (google_api_key==None):
        print('Google Maps API Key Needed')
        return {}
    
    if (city_name not in list(geo_cities_2020["City"])):
        print('City name not found in CDP dataset')
        print('May have different name like City of {}'.format(city_name))
        return {}
    
    print(list(geo_cities_2020[geo_cities_2020["City"] == city_name]["City Location"]))
#     if (list(geo_cities_2020[geo_cities_2020["City"] == city_name]["City Location"])):
#         print('City name not found in CDP dataset')
#         print('May have different name like City of {}'.format(city_name))
#         return {}
    
    longitude, latitude  = geo_cities_2020[geo_cities_2020["City"] == city_name]\
        ["City Location"].iloc[0].replace('POINT (', '').replace(')', '').split(" ")
    longitude, latitude = float(longitude), float(latitude)

    width = 0.7
    city_surrounding_area = [
        [longitude - width/2,latitude - width/2],
        [longitude + width/2,latitude + width/2]
    ]
    
    #     print(city_surrounding_area)
    #     print("city_surrounding_area")
    large_grids = getBoundingBoxes(city_surrounding_area, zoom=14)
    large_image_paths = [satelite_download(grid, data_dir, 14) for grid in tqdm(large_grids, desc="Downloading large images")]
    
    #     print("large_image_paths")
    #     print(large_image_paths)
    urban_grid_mask = urban_classification(large_image_paths)
    urban_grids = [large_grids[i] for i in range(len(urban_grid_mask)) if (urban_grid_mask[i]==0)]

    small_grids = []
    for grid in urban_grids:
        small_grids += getBoundingBoxes(grid, zoom=18)
    small_image_paths = [satelite_download(grid, data_dir, 18) for grid in tqdm(small_grids, desc="Downloading small images")]
    
    mapped_data = {}
        
    count = 0
    for path in tqdm(small_image_paths, desc="Segmenting images"):
        try:
            file_name = path.split("/")[-1]
        except:
            print("Something went wrong")

        if (count % 10 == 0):
            with open(output_path, "w") as outfile:
                json.dump(mapped_data, outfile)
                #                 print("saved")


        if file_name not in mapped_data:
            centerX = float(file_name.split(",")[1].replace(".png", ""))
            centerY = float(file_name.split(",")[0])
            _, mask, green_score = get_green_mask(path)
            mapped_data[file_name] = green_score
            
        count += 1

    with open(output_path, "w") as outfile:
        json.dump(mapped_data, outfile)

    return mapped_data

In [None]:
def make_map(green_scores, city_name):

    sorted_green_scores = sorted(list(green_scores.values()))
    Q1 = sorted_green_scores[int(len(sorted_green_scores)/6)]

    lowest_green_scores = []
    for file_name in green_scores:
        centerX = float(file_name.split(",")[1].replace(".png", ""))
        centerY = float(file_name.split(",")[0])
        if green_scores[file_name]<Q1:
            lowest_green_scores.append([centerX, centerY, green_scores[file_name]])

    longitude, latitude  = geo_cities_2020[geo_cities_2020["City"] == city_name]\
        ["City Location"].iloc[0].replace('POINT (', '').replace(')', '').split(" ")
    longitude, latitude = float(longitude), float(latitude)

    base_map = folium.Map(location=[latitude, longitude], control_scale=True, zoom_start=12, max_zoom=13, min_zoom=10)
    HeatMap(data=lowest_green_scores, radius=10, max_zoom=20, gradient={0.0: 'red', 0.05: 'red'}).add_to(base_map)
    return base_map

In [None]:
def graph_lorenz(green_scores, city_name):
    fig = plt.figure(figsize=(15,9))
    ax1 = fig.add_subplot(111)
    green_scores = list(green_scores.values())
    y = np.array(green_scores)
    lorenz_curve(y, ax1)
    ax1.title.set_text(f"{city_name} - Gini Coefficient = {str(gini(y))[:5]}")
    plt.show()

## Example: Los Angeles

In [None]:
city_name = 'Los Angeles'

saved_la_data_path = "/kaggle/input/green-score-tree-canopy/green_scores_la.json"
with open(saved_la_data_path) as json_file:
    la_green_scores = json.load(json_file)

display(make_map(la_green_scores, city_name))

## Example: Sydney

In [None]:
city_name = 'Sydney'

with open("/kaggle/input/green-score-tree-canopy/green_scores_sydney.json") as json_file:
    syd_mapped_data = json.load(json_file)
    
lowest_green_scores = []
for file_name in syd_mapped_data:
    centerX = float(file_name.split(",")[1].replace(".png", ""))
    centerY = float(file_name.split(",")[0])
    if syd_mapped_data[file_name]<0.1:
        lowest_green_scores.append([centerX, centerY, syd_mapped_data[file_name]])

base_map = folium.Map(location=[-33.9358211, 151.03409150000005], control_scale=True, zoom_start=12, max_zoom=13, min_zoom=10)
HeatMap(data=lowest_green_scores, radius=10, max_zoom=20, gradient={0.0: 'red', 0.05: 'red'}).add_to(base_map)
base_map

## Lorenz Curve and Gini Coefficient

To allow cities to measure their overall Tree Canopy Equality, we calculate a Lorenz curve in realtime so councils can get a graphical representation of what their unique distribution is. This curve allows cities to see the proportion of overall tree canopy assumed by the bottom x% of the land area. *(An individual land area is regarded as a 50 m x 50 m block of land)*

We compute the Gini Coefficient by calculating the area between the line of equality and the Lorenz curve. This final value is the **KPI** which cities want to optimise as it means they have equitable tree coverage throughout their city.


In [None]:
with open("/kaggle/input/green-score-tree-canopy/green_scores_sydney.json") as json_file:
    syd_mapped_data = json.load(json_file)
with open("/kaggle/input/green-score-tree-canopy/green_scores_la.json") as json_file:
    la_mapped_data = json.load(json_file)

syd_ma = list(syd_mapped_data.values())
la_ma = list(la_mapped_data.values())

def gini(arr):
    ## first sort
    sorted_arr = arr.copy()
    sorted_arr.sort()
    n = arr.size
    coef_ = 2. / n
    const_ = (n + 1.) / n
    weighted_sum = sum([(i+1)*yi for i, yi in enumerate(sorted_arr)])
    return coef_*weighted_sum/(sorted_arr.sum()) - const_


def lorenz_curve(arr, ax):
    sorted_arr = arr.copy()
    sorted_arr.sort()
    X_lorenz = sorted_arr.cumsum() / sorted_arr.sum()
    X_lorenz = np.insert(X_lorenz, 0, 0)
    X_lorenz[0], X_lorenz[-1]
    ## scatter plot of Lorenz curve
    ax.scatter(np.arange(X_lorenz.size) / (X_lorenz.size - 1), X_lorenz,
               marker='x', color='darkgreen', s=2)
    ## line plot of equality
    ax.plot([0, 1], [0, 1], color='k')
    
fig = plt.figure(figsize=(15,9))
ax1 = fig.add_subplot(121)
ax2 = fig.add_subplot(122)


x = np.array(syd_ma)

lorenz_curve(x, ax1)

y = np.array(la_ma)

lorenz_curve(y, ax2)

ax1.title.set_text(f"Sydney - Gini Coefficient = {str(gini(x))[:5]}") ## Sydney 
ax2.title.set_text(f"Los Angeles - Gini Coefficient = {str(gini(y))[:5]}") ## LA

## Try it yourself

Enter a city name to see a map of where trees are needed most and a Lorenz Curve to see the total equality. To use yourself, uncomment the code below and enter your googe api key in to your kaggle secret keys.

In [None]:
# city_name = 'London'
# output_path = '../../green_scores2.json'

# green_scores = get_green_scores(city_name, './', output_path)
# if green_scores != {}:
#     display(make_map(green_scores, city_name))
#     graph_lorenz(green_scores, city_name)

## Advantages

The best benchmark for our Green Coverage Tool and Gini Coefficient is Treepedia by MIT Senseable City Lab. They have tried to capture the green canopy by utilising Google Street view. They scan each panorama and use a predictive model to identify tree canopy size from eye level. The outputs they provide is a map of the city where each street view location as a dot with a varying green intensity depending on how much tree was in the panorama (this shows a similar visualisation to our heat map). They also provide a Green View Index which shows the percentage of canopy coverage for the entire city (similar to our Gini coefficient).

Our solution recognises considerably more greenery, including parklands and backyards. As discussed, these features increase resilience to extreme weather events and therefore, need to be taken into consideration. Our product also has the advantage of directly integrating with CDPs data and customers.

## Disadvantages

Although our work represents a great start, there are still many improvements to be made. These include:

1. Training an ML segmentation model for identifying greenery rather than a colour based approach
2. Training another classification model to identify area type at a smaller scale
3. Create an online interface that allows organisations reporting to CDP to easily view and interact with the data

These steps are possible and would significantly improve the value of our product, however, time constraints prevented us from pursuing them.

# Final Thoughts

Overall this was a great exercise in predicting tree canopy inequality in cities. Although we see much more room for growth and improvement with our models and visualisations, we have created a great tool that is ready to be used by cities to see where to plant trees and how equal their current tree distribution is.

We believe the generic nature of the tool strongly complements CDP's mission for global disclosure as it can be applied to any city with relative ease.


## Our team

We had the pleasure of working with an experienced and diverse team. Scattered across the world, we connected over zoom, slack and git:

<table>
  <tr>
    <td><img src="https://media-exp1.licdn.com/dms/image/C5603AQHbEQCGU89V6A/profile-displayphoto-shrink_400_400/0/1596957301931?e=1611792000&v=beta&t=fZn_O9amBy7bHGYLu9sd1XEIwjURey3TW-AWQ5Mtn1w" width="150"></td>
      <td style='text-align: left;'><strong>Adrian Sarstedt</strong><br/><i>Data Engineer at the Florey Institute</i><br/><i>Australia</i></td>
      <td><img src="https://media-exp1.licdn.com/dms/image/C5603AQHVTBlErFKspw/profile-displayphoto-shrink_400_400/0/1551742466431?e=1611792000&v=beta&t=wBO3P2BMWHC39dkXI3Kkjq5gKV4mZLoFeCZs20YaomQ" width="150"></td>
    <td style='text-align: left;'><strong>Hamish Gunasekara</strong><br/><i>Data/Risk Analyst at Afterpay</i><br/><i>Australia</i></td>
  </tr>
    <tr>
    <td><img src="https://media-exp1.licdn.com/dms/image/C4D03AQHeh45CPg4YsA/profile-displayphoto-shrink_400_400/0?e=1611792000&v=beta&t=IDlea0_mutyIpY6FKT4Btbtc3DtmAB8TFSoAkdfejWE" width="150"></td>
    <td style='text-align: left;'><strong>Adham Al Hossary</strong><br/><i>Data Scientist at C-Capture</i><br/><i>England</i></td>
      <td><img src="https://media-exp1.licdn.com/dms/image/C4D03AQFUX0Sfw87-JQ/profile-displayphoto-shrink_400_400/0?e=1611792000&v=beta&t=klWv7UqcQhpXRzP2tjgsLOWIdMYlFMd_ZHrCYyuutes" width="150"></td>
    <td style='text-align: left;'><strong>Alexandra Golab</strong><br/><i>Business Analyst at CitiBank</i><br/><i>Spain</i></td>
  </tr>
</table>

# References

The study on heat and redlining:
The Effects of Historical Housing Policies on Resident Exposure to Intra-Urban Heat: A Study of 108 US Urban Areas - Jeremy S. Hoffman, Vivek Shandas and Nicholas Pendleton
https://www.mdpi.com/2225-1154/8/1/12...

Interactive maps of neighborhood heat and redlining:
https://www.arcgis.com/apps/dashboard...

Robert K. Nelson, LaDale Winling, Richard Marciano, Nathan Connolly, et al., “Mapping Inequality,” American Panorama, ed. Robert K. Nelson and Edward L. Ayers, accessed August 4, 2020.
https://dsl.richmond.edu/panorama/red...

All about Urban Heat Islands from Climate Central [PDF]:
http://assets.climatecentral.org/pdfs...

Can trees and woods reduce flooding:
https://www.woodlandtrust.org.uk/trees-woods-and-wildlife/british-trees/flooding/

Satellite monitoring of summer heat waves in the Paris metropolitan area:
https://rmets.onlinelibrary.wiley.com/doi/10.1002/joc.2222

Urban tree canopy governance and redlined neighborhoods: an analysis of five cities
https://dspace.mit.edu/handle/1721.1/127588

Residential housing segregation and urban tree canopy in 37 US Cities
https://osf.io/preprints/socarxiv/97zcs/
