<a href="https://colab.research.google.com/github/joelm67/Remote-Sensing-Python/blob/main/Copy_of_biophysical_variables.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Introduction


The goal of this notebook is to demonstrate how to estimate biophysical variables from remote sensing.  The biophysical variables of interest include Leaf Area Index (LAI), fraction of Photosynthetically Active Radiation (fPAR), biomass, yield to name a few. As you may remember from previous classes, there are a number of different ways to get at these *continuous* variables.  In this notebook we will use regression and machine learning algorithms in regression mode to achive our goal.  The general steps are:
- acquire location specific biophysical variable measurements (e.g. LAI)
- acquire satellite reflectance data over the same location/time period
- build a regression model that explains the relationship between spectral reflectance and the biophysical variable
- apply the model to the reflectance image to produce a biopysical variable *map*

For this assignment, we are going to use MODIS LAI/fPAR product to extract LAI data at 500-meter spatial resolution.  We will then extract the Sentinel-2 MSI reflectance data that have been downscaled to 500-meter spatial resolution.  We will then build a relationship between MSI spectral bands and LAI and apply the regression model to a 10-meter MSI image to make a LAI map.  For your convenience, I used the following Google Earth Engine code to extract the LAI and MSI reflectance samples into a CSV file that we will work with: 
[GEE code](https://code.earthengine.google.com/c68270f84af46205122aab17ff8f1880?noload=1)


# Import libraries

Lets start by importing appropriate libraries

In [None]:
#!add-apt-repository ppa:ubuntugis/ppa
#!apt update
#!apt install gdal-bin libgdal-dev
!pip3 install rasterio

import rasterio
from rasterio.plot import reshape_as_raster, reshape_as_image
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
from pandas import read_csv
from pandas import Series, DataFrame, Panel


Collecting rasterio
[?25l  Downloading https://files.pythonhosted.org/packages/c0/a8/63d45bb74c17c60e607b4beae77d68ad4c9ea6dff788534ce8c835d1d2f1/rasterio-1.2.0-cp36-cp36m-manylinux1_x86_64.whl (19.1MB)
[K     |████████████████████████████████| 19.1MB 1.3MB/s 
[?25hCollecting affine
  Downloading https://files.pythonhosted.org/packages/ac/a6/1a39a1ede71210e3ddaf623982b06ecfc5c5c03741ae659073159184cd3e/affine-2.3.0-py2.py3-none-any.whl
Collecting click-plugins
  Downloading https://files.pythonhosted.org/packages/e9/da/824b92d9942f4e472702488857914bdd50f73021efea15b4cad9aca8ecef/click_plugins-1.1.1-py2.py3-none-any.whl
Collecting cligj>=0.5
  Downloading https://files.pythonhosted.org/packages/42/1e/947eadf10d6804bf276eb8a038bd5307996dceaaa41cfd21b7a15ec62f5d/cligj-0.7.1-py3-none-any.whl
Collecting snuggs>=1.4.1
  Downloading https://files.pythonhosted.org/packages/cc/0e/d27d6e806d6c0d1a2cfdc5d1f088e42339a0a54a09c3343f7f81ec8947ea/snuggs-1.4.7-py3-none-any.whl
Installing collected pa

# Grab data and explore

Lets grab a a single pixel time series data first.  It is contained in a comma separated file to be imported from a cloud location.

In [None]:
# read the csv file into a pandas series object
# header=0 tells pandas that the first row of the csv file contains the column headers
# squeeze=True means we only have one data column and that we are interested in a Series and not a DataFrame.
laiData = read_csv('https://storage.googleapis.com/alexi_daily/EnvSt956/s2_lai_rand.csv', header=0, squeeze=True)

# It is often easier to perform manipulations of your time series data in a 
# DataFrame rather than a Series object and you can easily convert your 
# loaded Series to a DataFrame as follows
df = pd.DataFrame(laiData)

# lets explore the dataset
# peek at the data
print(df.head(10))

# Descriptive statistics
print(df.describe())

# let's add a few additional features
df['SR'] = df['nir']/df['red']
df['NDVI'] = (df['nir'] - df['red'])/(df['nir'] + df['red'])

# Descriptive statistics
print(df.describe())

   rand  blue  green   red   nir  swir1  swir2  LAI
0     1   664    814   878  2041   2393   1740   12
1     3   138    188   102  1066    504    277   23
2    12   251    462   217  5130   1883    822   29
3    14   265    527   274  4670   1816    801   43
4    21   380    586   416  3313   1843    930   17
5    24  1116   1450  1799  3301   4154   2918    4
6    31   326    590   393  4570   2266   1070   34
7    36   328    524   361  3535   2186   1171   22
8    40   432    649   478  4281   2008   1039   26
9    43   309    708   299  3769   1908    934   51
               rand          blue  ...         swir2           LAI
count  16932.000000  16932.000000  ...  16932.000000  16932.000000
mean   50030.296008    582.200035  ...   1637.589712     16.517777
std    28776.371810    554.804054  ...    797.850504     15.813860
min        1.000000      2.000000  ...     23.000000      1.000000
25%    25093.750000    327.000000  ...   1004.000000      4.000000
50%    49912.500000    460

Now lets build a relationship between MSI reflectance and LAI

In [None]:
# Import function to create training and test set splits
from sklearn.model_selection import train_test_split
# Import function to automatically create polynomial features! 
from sklearn.preprocessing import PolynomialFeatures
# Import Linear Regression and a regularized regression function
from sklearn.linear_model import LinearRegression
from sklearn.linear_model import LassoCV
# Finally, import function to make a machine learning pipeline
from sklearn.pipeline import make_pipeline

from sklearn.linear_model import LinearRegression

df['X'] = df['SR']
df['y'] = df['LAI']

# Alpha (regularization strength) of LASSO regression
lasso_eps = 0.0001
lasso_nalpha=20
lasso_iter=5000
# Min and max degree of polynomials features to consider
degree_min = 2
degree_max = 8
# Test/train split
X_train, X_test, y_train, y_test = train_test_split(df['X'], df['y'],test_size=0.3)
print(X_train)
print(y_train)




# Make a pipeline model with polynomial transformation and LASSO regression with cross-validation, run it for increasing degree of polynomial (complexity of the model)
for degree in range(degree_min,degree_max+1):
    model = make_pipeline(PolynomialFeatures(degree, interaction_only=False), LassoCV(eps=lasso_eps,n_alphas=lasso_nalpha,max_iter=lasso_iter,
                                                                                      normalize=True,cv=5))
    model.fit(X_train,y_train)
    test_pred = np.array(model.predict(X_test))
    RMSE=np.sqrt(np.sum(np.square(test_pred-y_test)))
    test_score = model.score(X_test,y_test)







8757     19.782427
6172      2.089080
9588      1.861230
1645      9.394089
9114      9.633721
           ...    
15892     1.902639
6850      1.442645
5214      2.394811
7952      3.687500
11370     2.020436
Name: X, Length: 11852, dtype: float64
8757     42
6172      3
9588      5
1645     36
9114     36
         ..
15892     2
6850      2
5214     18
7952      6
11370     4
Name: y, Length: 11852, dtype: int64


ValueError: ignored

Now lets do image form

In [None]:
# read the spring image
!wget https://storage.googleapis.com/alexi_daily/EnvSt956/grassland_ndvi_1990-2020.tif grassland_ndvi_1990-2020.tif
tsimage = rasterio.open('grassland_ndvi_1990-2020.tif')
tsarr = tsimage.read()
[bands,rows,cols] = tsarr.shape
print(bands)
print(rows)
print(cols)
#tsarr = reshape_as_image(tsarr) # reshape my numpy array from <bands><rows><cols> to <rows><cols><bands>

dates = read_csv('https://storage.googleapis.com/alexi_daily/EnvSt956/grassland_ndvi_bandnames.csv', header=0, parse_dates=[0], index_col=0, squeeze=True)

#print(dates)

tsarr = tsarr.reshape(bands,rows*cols) # collapse 3D array into a 2D array
df = pd.DataFrame(tsarr, index=dates)

print(df)


--2021-01-25 23:06:52--  https://storage.googleapis.com/alexi_daily/EnvSt956/grassland_ndvi_1990-2020.tif
Resolving storage.googleapis.com (storage.googleapis.com)... 172.217.219.128, 142.250.125.128, 74.125.124.128, ...
Connecting to storage.googleapis.com (storage.googleapis.com)|172.217.219.128|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 132273858 (126M) [image/tiff]
Saving to: ‘grassland_ndvi_1990-2020.tif.1’


2021-01-25 23:06:53 (225 MB/s) - ‘grassland_ndvi_1990-2020.tif.1’ saved [132273858/132273858]

--2021-01-25 23:06:53--  http://grassland_ndvi_1990-2020.tif/
Resolving grassland_ndvi_1990-2020.tif (grassland_ndvi_1990-2020.tif)... failed: Name or service not known.
wget: unable to resolve host address ‘grassland_ndvi_1990-2020.tif’
FINISHED --2021-01-25 23:06:53--
Total wall clock time: 0.8s
Downloaded: 1 files, 126M in 0.6s (225 MB/s)
2755
207
218
                             0      1      2      ...  45123  45124  45125
(1990, 3, 28, LT05, 3203

# Fit Linear Trend and get coefficients

bbbb

In [None]:
coefficients, residuals, _, _, _ = np.polyfit(range(len(annual.index)),annual,1,full=True)
mse = residuals[0]/(len(annual.index))
nrmse = np.sqrt(mse)/(annual.max() - annual.min())
print('Slope ' + str(coefficients[0]))
print('NRMSE: ' + str(nrmse))





NameError: ignored