#### In this notebook, we take in the results of San Jose's aerial imagery tree detection using a fine-tuned version of DeepForest based on 2022 and try to understand its findings from the years 2018 and 2022. From this, we attempt to find the area of the canopy in meters and the location of each tree and understand the results of the model on the dataset.

In [None]:
import pandas as pd
import math
from pyproj import Transformer
from geopy.distance import geodesic

Set the folder path if you are not using colab

In [None]:
folder_path = '/'

Folder path for Colab

In [None]:
from google.colab import drive
import os
# from google.colab import drive
drive.mount('/content/drive')

folder_path = '/content/drive/My Drive/'


Mounted at /content/drive


File names for our CSV files from DeepForest

In [None]:
predicted_18 = '18predicted.csv'
predicted_22 = '22predicted.csv'

Load the data frames

In [None]:
# Load the CSV file
df_18 = pd.read_csv(folder_path + predicted_18)

# Display the loaded data
df_18.head()

Unnamed: 0.1,Unnamed: 0,xmin,ymin,xmax,ymax,label,score,image_path
0,0,365.0,1076.0,448.0,1148.0,Tree,0.376751,-13564033.725074895_4470636.967057719_-1356325...
1,1,0.0,1222.0,51.0,1280.0,Tree,0.356745,-13564033.725074895_4470636.967057719_-1356325...
2,2,979.0,167.0,1062.0,251.0,Tree,0.353718,-13564033.725074895_4470636.967057719_-1356325...
3,3,367.0,795.0,421.0,849.0,Tree,0.296108,-13564033.725074895_4470636.967057719_-1356325...
4,4,171.0,883.0,222.0,932.0,Tree,0.290287,-13564033.725074895_4470636.967057719_-1356325...


In [None]:
# Load the CSV file
df_22 = pd.read_csv(folder_path + predicted_22)

# Display the loaded data
df_22.head()

Unnamed: 0.1,Unnamed: 0,xmin,ymin,xmax,ymax,label,score,image_path
0,0,1027.0,249.0,1111.0,327.0,Tree,0.528083,-13572747.546299405_4480803.091819649_-1357196...
1,1,843.0,422.0,905.0,485.0,Tree,0.412388,-13572747.546299405_4480803.091819649_-1357196...
2,2,1233.0,638.0,1311.0,730.0,Tree,0.409269,-13572747.546299405_4480803.091819649_-1357196...
3,3,1106.0,687.0,1218.0,794.0,Tree,0.374981,-13572747.546299405_4480803.091819649_-1357196...
4,4,1042.0,581.0,1145.0,678.0,Tree,0.353348,-13572747.546299405_4480803.091819649_-1357196...


We load the detected trees as a CSV, with each line containing one tree, the location in pixels of the canopy, and the location of the image.

The EPSG:3857 coordinate values for the top left and bottom right of the image are saved in the image name. We need to extract the coordinates and return them as variables we can use for our algorithm.

In [None]:
def name_to_cords(image_name):
    # Split the file name to get EPSG:3857 meter coordinate values
    name_parts = image_name.split('_')
    name_parts[-1] = name_parts[-1].split('.tiff')[0]
    return name_parts


Function for converting pixels to meters. EPSG:3857 is in a projected meter space from the center of the globe. While useful for representing the whole planet, this means that the farther out the measurement is, the more inaccurate one meter will be. To fix this issue, we must convert to EPSG:4326, which is the common coordinate system that is used by everyone. We then use geopy to calculate the length between the corners of the image to find the real length in meters of the image. We can then later convert from pixels to meters, as we know how long the image is in meters and the number of pixels the image is in height and length.

In [None]:
# Meter calculation from EPSG:3857 coordinate system
# https://gis.stackexchange.com/questions/242545/how-can-epsg3857-be-in-meters
# https://gis.stackexchange.com/questions/78838/converting-projected-coordinates-to-lat-lon-using-python
# https://www.geeksforgeeks.org/python-calculate-distance-between-two-places-using-geopy/

# Set the transformer to convert from meter projection to coordinates
transformer = Transformer.from_crs("EPSG:3857", "EPSG:4326")

def meter_lengths(name_parts):
    # Get the coordinates to measure the latitude and longitude of the image
    longitude_length_point_one = transformer.transform(name_parts[0], name_parts[1])
    longitude_length_point_two = transformer.transform(name_parts[2], name_parts[1])

    latitude_length_point_one = transformer.transform(name_parts[0], name_parts[1])
    latitude_length_point_two = transformer.transform(name_parts[0], name_parts[3])

    # Use geopy to calculate the coordinates to meter distance
    meter_latitude_length = geodesic(longitude_length_point_one, longitude_length_point_two).meters
    meter_latitude_length = geodesic(latitude_length_point_one, latitude_length_point_two).meters

    # return the longitude and latitude length of the image
    return (meter_latitude_length, meter_latitude_length)

# Original attempt, was significantly off due to how EPSG:3857 projects the globe from coordinates at 0,0
# def meter_lengths(image_name):
#     name_parts = image_name.split('_')
#     name_parts[-1] = name_parts[-1].split('.tiff')[0]
#     for coord in name_parts:
#         print(coord)
#     meter_length = float(name_parts[2]) - float(name_parts[0])
#     meter_height = float(name_parts[3]) - float(name_parts[1])

#     return (meter_length, meter_height)

Loop through every tree and calculate the meter size in each dimension and the total area of the tree. can multiply the canopy length in pixels by the ratio of the length of the image in meters by the total number of pixels to find the canopy length in meters. Once we have a canopy's length and width, we can simply multiply the values to find the area of the tree canopy.

In [None]:
def size_calculation(df):
  for index, row in df.iterrows():

      # Get the length of the tree canopy in pixels
      xmax = row['xmax']
      xmin = row['xmin']
      ymax = row['ymax']
      ymin = row['ymin']

      canopy_pixel_longitude_length = xmax - xmin
      canopy_pixel_latitude_length = ymax - ymin

      # get the coordinates for the cornors of the image in EPSG:3857
      image_cornor_cords = name_to_cords(row['image_path'])

      # Get the length of the image in meters
      image_meter_length, image_meter_height = meter_lengths(image_cornor_cords)

      # Scale the pixel length of the tree canopy to meters (ratio of meters per pixel)
      canopy_meter_length = canopy_pixel_longitude_length * (image_meter_length / 1312.0)
      canopy_meter_height = canopy_pixel_latitude_length * (image_meter_height / 1312.0)

      # Find the total area of the tree's canopy
      meter_area = (canopy_meter_length) * (canopy_meter_height)

      # Add new data to data frame
      df.at[index, 'meter_x'] = canopy_meter_length
      df.at[index, 'meter_y'] = canopy_meter_height
      df.at[index, 'meter_area'] = meter_area

In [None]:
size_calculation(df_18)

In [None]:
size_calculation(df_22)

To find the location of each tree, we can simply

In [None]:
def loc_calculation(df):
  for index, row in df.iterrows():
      image_cornor_cords = name_to_cords(row['image_path'])

      # Get the length of the image in EPSG:3857 coordinate system
      image_long_length = float(image_cornor_cords[2]) - float(image_cornor_cords[0])
      image_lat_length = float(image_cornor_cords[3]) - float(image_cornor_cords[1])

      # Get the center of the detected tree to approximate pixel location in the image
      average_x_loc = (row['xmax'] + row['xmin']) / 2
      average_y_loc = (row['ymax'] + row['ymin']) / 2

      # Ratio from image length in EPSG:3857 meters to pixels
      image_long_per_pixel = image_long_length / 1312
      image_lat_per_pixel = image_lat_length / 1312

      # Add the coordinate of the corner of the image to the distance from the corner
      # to the center of the canopy after it has been multiplied by the ratio from pixels to meters
      df.at[index, 'long_3857'] = float(image_cornor_cords[0]) + average_x_loc * image_long_per_pixel
      df.at[index, 'lat_3857'] = float(image_cornor_cords[1]) + average_y_loc * image_lat_per_pixel

In [None]:
loc_calculation(df_18)

In [None]:
loc_calculation(df_22)

The data frame now has the computed values, we drop a few irrelevant columns for visual simplicity.

In [None]:
df_18.drop(['label', 'score', 'image_path'], axis=1).head()

Unnamed: 0.1,Unnamed: 0,xmin,ymin,xmax,ymax,meter_x,meter_y,meter_area,long_3857,lat_3857
0,0,365.0,1076.0,448.0,1148.0,39.345044,34.130641,1342.871583,-13563790.0,4471301.0
1,1,0.0,1222.0,51.0,1280.0,24.175871,27.494127,664.694468,-13564020.0,4471384.0
2,2,979.0,167.0,1062.0,251.0,39.345044,39.819081,1566.683513,-13563420.0,4470762.0
3,3,367.0,795.0,421.0,849.0,25.597981,25.597981,655.256616,-13563800.0,4471128.0
4,4,171.0,883.0,222.0,932.0,24.175871,23.227797,561.552223,-13563920.0,4471179.0


In [None]:
df_22.drop(['label', 'score', 'image_path'], axis=1).head()

Unnamed: 0.1,Unnamed: 0,xmin,ymin,xmax,ymax,meter_x,meter_y,meter_area,long_3857,lat_3857
0,0,1027.0,249.0,1111.0,327.0,39.781159,36.939648,1469.50202,-13572110.0,4480975.0
1,1,843.0,422.0,905.0,485.0,29.362284,29.835869,876.049281,-13572230.0,4481074.0
2,2,1233.0,638.0,1311.0,730.0,36.939648,43.569841,1609.454594,-13571990.0,4481212.0
3,3,1106.0,687.0,1218.0,794.0,53.041546,50.67362,2687.807114,-13572050.0,4481245.0
4,4,1042.0,581.0,1145.0,678.0,48.779279,45.937767,2240.811154,-13572090.0,4481179.0


Function to find some basic info about the dataset

In [None]:
def dataframe_stats(df):
  print('Total tree canopy size: {} meters squared'.format(round(df['meter_area'].sum())))
  print('Total number of trees: {}'.format(df.shape[0]))
  average_canopy = df['meter_area'].sum() / df.shape[0]
  print('Average tree canopy: {} meters squared'.format(round(average_canopy)))
  average_length = math.sqrt(df['meter_area'].sum() / df.shape[0])
  print('Average tree canopy length: {} meters'.format(round(average_length, 1)))
  print('Tree length standard deviation: {} meters'.format(df['meter_x'].std()))

In [None]:
dataframe_stats(df_18)

Total tree canopy size: 32343955 meters squared
Total number of trees: 34443
Average tree canopy: 939 meters squared
Average tree canopy length: 30.6 meters
Tree length standard deviation: 12.989129103027404 meters


In [None]:
dataframe_stats(df_22)

Total tree canopy size: 61318479 meters squared
Total number of trees: 47643
Average tree canopy: 1287 meters squared
Average tree canopy length: 35.9 meters
Tree length standard deviation: 17.535218246095464 meters


We trained the model on a subset of the 2022 dataset and tested it on the 2018 and 2022 datasets. While the 2022 results closely align with the expected values published by the City of San Jose, when predicting 2018 data with the 2022 model, the model has unrealistic values, with half as much tree canopy and an unrealistic amount of tree change.

Due to this analysis, we found promise in our results when predicting values of the same year as the model was trained on, but little proof to show that models can be transferred to other years.

In [None]:
percentage_changed = round(((df_22['meter_area'].sum() - df_18['meter_area'].sum()) / df_18['meter_area'].sum()) * 100, 2)
if percentage_changed > 0:
  print("Growth in tree canopy: {}% from 2018 to 2022".format(percentage_changed))
else:
  print("Shrinkage in tree canopy: {}% from 2018 to 2022".format(percentage_changed))

tree_count_changed = df_22.shape[0] - df_18.shape[0]
if tree_count_changed > 0:
  print("Increase in trees since 2018: {} trees".format(tree_count_changed))
else:
  print("Decrease in trees since 2018: {} trees".format(tree_count_changed))

Growth in tree canopy: 89.58% from 2018 to 2022
Increase in trees since 2018: 13200 trees


Export the edited dataframes to a new CSV

In [None]:
df_18.to_csv("analysis_output_18.csv", encoding='utf-8', index=False)

In [None]:
df_22.to_csv("analysis_output_22.csv", encoding='utf-8', index=False)