# Homework 2: Leaning about Datasets
## LA County Urban Tree Canopy Cover 
#### By Paloma Cartwright, Halina Do-Linh, and Mia Forsline
#### Due 2021-11-05
#### EDS 220: Remote Sensing 

# 1. Write code to load in the dataset 

In [1]:
import geopandas as gpd 
import rasterio as rio
from rasterio.plot import show
#import numpy as np
import pandas as pd
import matplotlib.image as mpimg
import matplotlib.pyplot as plt
import cartopy.crs as ccrs
#import os
import contextily as ctx
%matplotlib inline
import folium
import hvplot.pandas

ModuleNotFoundError: No module named 'hvplot'

## Exploring the LA County Data 
- read in LA County shapefile 

In [None]:
aoiFile = "data/Tree_Canopy_LA/geo_export_36bb7874-a8f8-4d1c-b3f3-23eca2fca1af.shp"

AOI = gpd.read_file(aoiFile)
AOI.crs = "EPSG:4326"
#AOI.crs= "EPSG:4326" #({'init': 'epsg:4326'})

proj = ccrs.PlateCarree()
fig = plt.figure(figsize = (10, 6), dpi = 300)
ax = plt.axes(projection = proj)

#name AOI
AOI.plot('imperv_p', ax = ax, legend = True)

#add basemap 
ctx.add_basemap(ax, crs = "EPSG:4326", source = ctx.providers.CartoDB.Positron); 

#add title and axes titles 
plt.title("% Impervious Surfaces in Los Angeles County for 2014")
plt.xlabel("Longitude")
plt.ylabel("Latitude")

In [None]:
#examine other column names that we can plot 
AOI.head()

# LA Tree Canopy CSV

In [None]:
#read in the data
#set the CRS (WGS 84 - units: degrees)
treeCanopyLA = gpd.read_file("data/Tree_Canopy_in_Urban_and_Non-Urban_LA_County__2014_.csv")
tcLA = treeCanopyLA.set_crs("EPSG:4326")
tcLA = tcLA.to_crs("EPSG:4326")
tcLA.crs

In [None]:
#check CSV column names - we are interested in Can_P
tcLA.head()

In [None]:
# move the geometry info from `the_geom` column to the `geometry` column 
#create a GeoSeries from the dataframe
#create a GeoDataFrame from the GeoSeries
#check data types to ensure the `geometry` column is a geometry 
tcLA['geometry'] = tcLA['the_geom']
tcLA['geometry'] = gpd.GeoSeries.from_wkt(tcLA['geometry'])
tcLA_gdf = gpd.GeoDataFrame(tcLA, geometry='geometry')
print(tcLA_gdf.dtypes)

In [None]:
#check CRS (WGS 84)
tcLA_gdf.crs

In [None]:
#ensure % canopy cover (Can_P) is a numeric column 
#can you calculate a mean/max/etc. ? 
#check `Can_P` is a numeric (float64)
tcLA['Can_P'] = pd.to_numeric(tcLA['Can_P'])
tcLA['Can_P'].max()
tcLA_gdf['Can_P'] = pd.to_numeric(tcLA['Can_P'])
tcLA_gdf['Can_P'].max()
print(tcLA_gdf.dtypes)

In [None]:
#plot % canopy cover
proj = ccrs.PlateCarree()
fig = plt.figure(figsize = (10, 6), dpi = 300)
ax = plt.axes(projection = proj)

tcLA.plot('Can_P', ax=ax, legend = True)
tcLAt = tcLA.explore()

#add basemap 
ctx.add_basemap(ax, crs="EPSG:4326", source=ctx.providers.CartoDB.Positron); 

#add title and axes titles 
plt.title("% Canopy Cover in Los Angeles County for 2014")
plt.xlabel("Longitude")
plt.ylabel("Latitude")


# Redlining in LA County 

In [None]:
red = gpd.read_file("data/redlining/CALosAngeles1939/cartodb-query.shp")
red.head()

In [None]:
#plot % canopy cover
proj = ccrs.PlateCarree()
fig = plt.figure(figsize = (10, 6), dpi = 300)
ax = plt.axes(projection = proj)

tcLA.plot(column = "Can_P", ax=ax)
red.plot(column = 'holc_grade', ax=ax, legend = True, cmap = 'YlOrRd', 
         legend_kwds={'title': "Grade"}, alpha = 0.7)

#add basemap 
ctx.add_basemap(ax, crs="EPSG:4326", source=ctx.providers.Stamen.TonerLite); 

# 2. Investigate data quality

The **Tree Canopy in Urban and Non-Urban LA County (2014)** dataset we downloaded from LA County's Open Data site is a combination of two datasets:

1. LA County tree canopy data from TreePeople and the University of Vermont.
2. 2014 imagery of LA County land cover data from the Los Angeles Regional Imagery Acquistion Consortium (LARIAC).

There is more data collection information and metadata about the land cover data than there is about the tree canopy data. We have reached out to the data owners requesting more metadata, but have not received a response yet. 

Additionally, the data set was combined with census tract data. 

The tree canopy data used high-resolution remote sensing data to create fine scale 8-class land cover. From the 2015 *Tree Canopy Assessment: Los Angeles Coastal Zone* report, it appears high resolution land cover data from 2014 was combined with LiDAR data from 2009 to obtain inforamation about the structure of urban forests. With this data, individual trees could be detected (i.e. as small as 8ft) and it is currently the most accurate accounting of tree canopy ever completed for LA.

Because there is limited metadata on the tree canopy data that was acquired using LiDAR, we can not determine if there is any systematic biases or limitation from the instruments used for the LiDAR data. 

The land cover data comes from the LARIAC which is a program for obtaining digital aerial imagery data in collaboration with 30+ County departments, 40+ municipalities, and other public agencies. Since the data is used for multiple and varying reasons for numerous agencies, the data is of high quality and is easily accessible. 

LARIAC captures ortho and oblique imagery every 3 years. Ortho imagery removes the effects of tilt and relief to show the true features of the image. Oblique imagery is aerial photography that is collected at an angle (usually 40 or 50 degrees) which allows viewers to see the sides of objects in addition to the top of objects and gives a more natural viewing of the landscape. 

Collection and post-processing of the LARIAC land cover data was done by the company, EagleView. The processed imagery produced a 4-band (RGBNIR), 32-bit (8-bit per channel) digital orthophotos at 4-inch and 9-inch pixel resolution for project areas with no overlap between areas. The 4 x 9 inch pixels were color balanced independently. 

After post-processing, the imagery undergoes rigorous quality assurance and quality control by Dewberry Engingeers Inc. From the QA/QC report we did not see any systematic biases or limitations listed about the instruments.


The most significant limitation with the **Tree Canopy in Urban and Non-Urban LA County (2014)** dataset is that it is outdated. Both datasets are more than 5 years old. However, LARIAC land cover data has been udpated with 2020 data and it is possible for us to combine tree canopy data with the 2020 land cover data.

# 3. Consider appropriate use cases 

This 2014 Los Angeles County urban tree cover data is appropriate for identifying areas in LA county with relatively higher or lower tree canopy cover based on the calculated % canopy coverage. This gives a snapshot of what tree canopy cover looked like during this year. Combined with other data sets, this could help answer a variety of interesting questions. 

For example, we could compare 2014 LA County tree cover with socioeconomic data such as 2014 annual average/median income or ethnicity/race to determine if there are correlations between greenery in LA and socioeconomic status. 

In addition to comparing various data within the same year, we could overlay a map of historical redlining in LA to investigate how historical inequalities might have influenced today's built environment. Similarly, we could compare tree canopy cover data in LA County from a different year to explore changes over time. For instance, if a tree policy was enacted after 2014, we could compare 2014 and 2020 tree canopy cover data to see if there was an observable change in % canopy cover in response to the policy enactment. 