# Upscale GEDI biomass with PALSAR-2 backscatter

This practical will take the GEDI data we used in the last notebook, overlay it with PALSAR data and look for a relationship between the two. First we will load GEDI data into RAM using the same method as the last notebook.

In [None]:
# import libraries
import numpy as np
from sys import path
path.append("/geos/netdata/active_sensing/code_active/10_upscaling")
from gediL4Areader import gediL4A,dataTable

# define a filename
gediName='/Users/dougal/data/teaching/active_sensing/10_fusion/gedi/L4A/subset.GEDI04_A_2020178113837_O08714_02_T00308_02_002_02_V002.h5'

# read the data
gedi=gediL4A(gediName)

# filter out poor quality data
gedi.filterQuality()

Now load the PALSAR data and check that it looks sensible. The files are large and this may take a minute. *Do these figures match those you have in QGIS?*

In [None]:
# import libraries
import rasterio
import matplotlib.pyplot as plt

# open the two files
palsarHHname='/Users/dougal/data/teaching/active_sensing/10_fusion/palsar/merged_HH.tif'
palsarHVname='/Users/dougal/data/teaching/active_sensing/10_fusion/palsar/merged_HV.tif'
palsarHH=rasterio.open(palsarHHname)
palsarHV=rasterio.open(palsarHVname)

# read the raster layers
hh=palsarHH.read(1)
hv=palsarHV.read(1)

# plot to the screen for sanity check
#plt.imshow(hh)
#plt.title('HH')
#plt.show()

#plt.imshow(hv)
#plt.title('HV')
#plt.show()


## Determining the relationship

We have points of AGBD estimates from GEDI and a raster of backscatter from PALSAR-2. Is there a useful relationship between one and the other? To test this we need to make a table of GEDI AGBD values and backscatter from the corresponding PALSAR-2 pixels.

Run the code below to make a plot and determine the linear correlation.

*Which layer has the higher correlation? Why do you think that is, based on your understanding of radars?**

In [None]:
# make a table from the two datasets
mergedData=dataTable(gedi,palsarHH,hh,palsarHV,hv)

# plot that table up
mergedData.plotHH()
mergedData.plotHV()

# determine linear correlation
mergedData.correlHH()
mergedData.correlHV()

## Machine learning

The relationship is not entirely clear. Machine learning is a useful for finding the best relationship between a variable of interest and multiple variables of interest. Here we will use it to predict AGBD from a combination of PALSAR-2 HH and HV backscatter.

We will use the random forest algoithm as implemented in the sklearn python package. The code below will split the data into a training and a validate set, then train a model to predict GEDI's AGBD from PALSAR with the table of data we have extracted and then predict biomass across all of the PALSAR data.

In [None]:
# parameters for our RF
n_estimators=200
max_depth=None

# split into 70% training, 30% validation
mergedData.splitData(trainFrac=0.7)

# calibrate the model
mergedData.buildRF(self,n_estimators,max_depth)
    
# predict the model
mergedData.predict()

## Validation

## Spatial validation