<a href="https://colab.research.google.com/github/jshogland/SpatialModelingTutorials/blob/main/Notebooks/kernel_pca.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Estimating kernel weights using PCA and creating optimal predictor surfaces
## In this notebook we will explore the use of Raster Tools and Scikit learn functions to project predictor surfaces into orthogonal space for modeling purposes. A few key objectives of using this approach:
- Determine and use the optimal cell weights of a convolution kernel (user defined size) that can be used to transform a given image and its bands into a subset of surfaces that explain a user specified amount of the variation in the image data.
- Efficiently create orthogonal predictor surfaces that account for band and spatial covariation.
- Create predictor surfaces that highlight various attributes within the data.
### The approach
- Use sampling to create training sets
- Scale input rasters to unit variance
- Perform PCA on scaled training sets that include all input cell values for the cells within a user specified convolution kernel
- Center scaled kernel cell values, multiply PCA score weights by centered values, and sum values within the kernel to perform the convolution
- Optionally, rescale PCA transformed values to a specified bit depth for storage and downstream analyses

John Hogland 12/6/20024

#### Study area for this example includes portions of the Custer Gallatin Nation Forest

Install packages

In [None]:
!pip install mapclassify
!pip install osmnx
!pip install raster_tools
!pip install planetary-computer
!pip install pystac-client
!pip install stackstac

Import libraries

In [None]:
import numpy as np, os, geopandas as gpd, pandas as pd, osmnx as ox, pystac_client, planetary_computer, stackstac
from raster_tools import Raster, general
from raster_tools import raster
from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA


## Introduction
### Multispectral remotely sensed datasets are commonly used as predictor variables in many classification, regression, and clustering models. Often these data are highly correlated and covary across bands and x and y space. For compression, storage, and predictive modeling purposes it is often advantageous to project those data along shared axes of covariance to create independent transformed variables. One common technique used to project data along shared covariance axes is a principal component analysis (PCA). 
### For many remote sensing projects, PCAs have successfully been used to project data onto orthogonal axes called components and select subsets of components that account for a known amount of variation to reduce the dimensionality. However, contemporary remote sensing projects go beyond using just the band values acquired at a given location within an image (the cell). For example, computer visions and convolution neural networks include neighboring cell values that define textural aspects of an image for a specific region around a location (kernel). Defining and using kernels to enhance, smooth, and quantify edges have provided key insights into shapes, boundaries, and various textural attributes contained within an image. However, like multispectral band cell values, texture can covary across both cells and bands. Additionally, the total number of potential kernels that can be defined and used to quantify texture are infinite.
### This has led some to a priori define kernels known to capture specific textural relationships and apply those kernels across remotely sensed data to develop predictor variables. Yet, others use iteration and deep learning to optimally determine kernel weights. However, few remote sensing projects have addressed the issues of covariance in neighboring cell values across image bands. Moreover, no studies have leveraged PCA component scores to define kernel weights.
### In this example we explore the use of a PCA to project multispectral imagery along orthogonal axes derived from both band and neighboring cell values. Our procedure automates the selection of optimal kernel weights for multidimensional convolution kernels based on principal component scores and the proportion of the variation (information) explained by each component. The spatial extent of our study includes portions of the Custer Gallatin National Forest located in southeastern Montana, USA (Figure 1). 

#### Get the boundary data for portions of the Custer Gallatin National Forest and create a interactive location map of the study (Figure 1).

In [None]:
#use OpenStreetMaps to get the boundary of the NF
nf=ox.geocode_to_gdf('Custer Gallatin National Forest, MT, USA')

#get first polygon of the NF
nfe=nf.explode()
nf1=gpd.GeoSeries(nfe.geometry.iloc[10],crs=nf.crs)

#project to Albers equal area
nf1p=nf1.to_crs(5070)

#Visualize the nf1 and sample locations
m=nf1p.explore()
m

__Figure 1.__ Interactive location map of the study area.

## Methods
### To project Landsat 8 imagery to independent component surfaces (ICSs) we will implement a multistep approach. In Step One we downloaded part of a Landsat scene for the area around the Custer Gallatin Nation Forest (Figure 1) from Planetary Computer. For Step Two we created a series of functions to extract the values from a systematic random sample of cell locations within the Landsat image and surrounding kernel cell values. In Step Three we flattened kernel cell values by band for each sampled location and performed a PCA. For Step Four we extracted component scores for components that accounted for 95% of the variation in the data. Finally, in Step Five, we applied component scores to kernel cell values within each Landsat band. To perform these steps, we created a series of python functions that utilize Scikit Learn and Raster Tools application programming interface (API). Relevant functions can be found within the following section/function bulleted list:
- Get Landsat 8 Imagery
    - Create download definition
        - Mosaic_stac
        - Get_stac_data
    - Download the data
- Visualize the boundary and imagery
- Convolution PCA
    - Create Definitions to perform convolution PCA
        - _conv_pca
        - _expand_for_kernel
        - _sys_sample_image
        - conv_pca
    - Perform pca convolution analysis


### Get Landsat 8 Imagery
Create download definitions

In [None]:
#create definition to mosaic stac data
def mosaic_stac(xr):
    return stackstac.mosaic(xr)

#create definition to extract stac data
def get_stac_data(geo,url="https://planetarycomputer.microsoft.com/api/stac/v1",name="sentinel-2-l2a",res=30,crs=5070,**kwarg):
    '''
    gets tiled data from planetary computer as a dask backed xarray that intersects the geometry of the point, line, or polygon

    geo = (polygon) geometry bounding box (WGS84)
    url = (string) base url to planetary computer https://planetarycomputer.microsoft.com/api/stac/v1
    name = (string) catelog resource
    qry =  (dictoinary) of property values {'eo:cloud_cover':{'lt':1}}
    res = (tuple of numbers) output resolution (x,y)
    crs = (int) output crs
    dt = (string) data time intervale e.g., one month: 2023-06, range: 2023-06-02/2023-06-17
    limit = (int) max number of items to return

    returns (xarray data array and stac item catalog)
    '''
    catalog = pystac_client.Client.open(url, modifier=planetary_computer.sign_inplace)
    srch = catalog.search(collections=name, intersects=geo, **kwarg)
    ic = srch.item_collection()
    if(len(ic.items)>0):
        xra = stackstac.stack(ic,resolution=res,epsg=crs)
        xra = mosaic_stac(xra)
    else:
        xra=None

    return xra,ic


Download the data and create a raster object

In [None]:
#get stac data landsat data
if(not os.path.exists('ls82016.tif')):
    xmin,ymin,xmax,ymax=nf1p.buffer(200).total_bounds
    ls30, ic =get_stac_data(nf1.geometry[0],"https://planetarycomputer.microsoft.com/api/stac/v1",name="landsat-c2-l2",res=30,crs=5070,datetime='2016-06-15/2016-06-30',query={'eo:cloud_cover':{'lt':10},'platform':{'eq':'landsat-8'}},limit=1000)
    ls30s=Raster(ls30.sel(band=['red', 'green', 'blue','nir08', 'lwir11','swir16', 'swir22'],x=slice(xmin,xmax),y=slice(ymax,ymin)))
    ls30s=ls30s.save('ls82016.tif')

ls30s=Raster('ls82016.tif')

### Visualize the boundary and  imagery

In [None]:
p=nf1p.plot(edgecolor='red',facecolor='none',figsize=(15,10))
p=ls30s.get_bands([1,2,3]).xdata.plot.imshow(ax=p,robust=True)

__Figure 2.__ Overlay of Landsat 8 image subset (RGB bands) and the study area boundary outline in red.

### Perform convolution PCA
Create definitions to sample the data, generate weights, and perform convolution analysis.

In [None]:
from sklearn.preprocessing import MinMaxScaler
import numba as nb

@nb.jit(nopython=True, nogil=True)
def _conv_pca(x,cmp_, m_, size):
    '''
    Performs the convolution given a array, component scores, means, and kernel size using dask's map_overlap function
    x=(numpy array) of data
    cmp_= (numpy array) component scores from sklearn PCA procedure
    m_= (numpy array) mean values from sklearn PCA procedure
    size= (int) width of the kernel

    returns a numpy array of correct shape for PCA transformation
    '''
    bnd,rws,clms=x.shape
    hs=int(size/2)
    outarr=np.empty((cmp_.shape[0],rws,clms))
    for ri in range(hs,rws-hs):
        sr=ri-hs
        for ci in range(hs,clms-hs):
            sc=ci-hs
            vls=x[:,sr:sr+size,sc:sc+size].flatten()
            for b in range(cmp_.shape[0]):
                vls2=((vls-m_)*cmp_[b,:]).sum() #removed the centering piece in standard scaler it is done here
                outarr[b,ri,ci]=vls2

    return outarr

@nb.jit(nopython=True, nogil=True)
def _expand_for_kernel(isys,isxs,wsize):#,mr,mc):
    '''
    Extracts values for kernel cells. Cells indices falling on the boundary of the image are moved in one index value.
    isys=array of row index locations
    isxs=array of column index locations
    wsize=width of the kernel
    mr=max row index
    mc=max column index

    returns two new lists of index values that can be used to extract coordinate from an xarray data array
    '''
    hw=int(wsize/2)
    isys2=np.zeros(isys.shape[0]*wsize,dtype='int32')
    isxs2=np.zeros(isxs.shape[0]*wsize,dtype='int32')
    for r in range(isys.shape[0]):
        cvl=isys[r]
        cvlm=cvl-hw
        for r2 in range(wsize):
            nr=r2+r*wsize
            nvl=cvlm+r2
            isys2[nr]=nvl

    for c in range(isxs.shape[0]):
        cvl=isxs[c]
        cvlm=cvl-hw
        for c2 in range(wsize):
            nc=c2+c*wsize
            nvl=cvlm+c2
            isxs2[nc]=nvl

    return isys2,isxs2

def _sys_sample_image(rs,p,wsize=0):
    '''
    Creates a systematic sample of an image given a percent of cells sampled.
    rs = Raster object to be sampled
    p = percent of pixels to sample
    wsize=(int) width of a square kernel in cells if using convolution type analyses

    returns a 2d array of cell values rows=point centroid columns= band values
    if using kernels columns correspond to kernel cell values for each point
    '''
    bnds,rws,clms=rs.shape
    ys=rs.y
    xs=rs.x
    psq=np.sqrt(p)
    sr=int(rws/(rws*psq))
    sc=int(clms/(clms*psq))
    rstr=int(np.random.rand()*sr)
    rstc=int(np.random.rand()*sc)
    isys=np.arange(rstr+wsize,rws-wsize,sr)
    isxs=np.arange(rstc+wsize,clms-wsize,sc)

    if(wsize>0):
        isys,isxs=_expand_for_kernel(isys,isxs,wsize)#,rws,clms)

    sys=ys[isys]
    sxs=xs[isxs]
    sel=rs.xdata.sel(x=sxs,y=sys)
    bnds,rws,clms=sel.shape
    
    if(wsize>0):
        bnds=bnds*wsize*wsize
        rws=int(rws/wsize)
        clms=int(clms/wsize)

    #sel2=sel[sel!=rs.null_value]

    # lx,ly=np.meshgrid(sel.x,sel.y) #if you want to look at the location of each sampled cell
    # pnts=gpd.GeoSeries(gpd.points_from_xy(x=lx.flatten(),y=ly.flatten(),crs=rs.crs))
    vls=np.moveaxis(sel.values,0,-1).flatten().reshape((rws*clms,bnds))
    df=pd.DataFrame(vls)
    vls=df[df!=rs.null_value].dropna().values

    return vls#,  sel#, pnts

def conv_pca(rs,prc=0.9,smp=0.01,ksize=0,output_bit_depth=None):
    '''
    determines convolution kernel weights for an optimal raster projection and returns a transformed raster
    using those weights. Weights are derived from a PCA analysis of each kernel cell value. Kernel cell
    values are extracted for each band in the rs stack.

    rs=(Raster) input raster object
    wsize=(int) window diameter for a square kernel measured in cells
    prc=(float) the proportion of variation in the data kept in the final raster dataset (0-1)
    smp=(float) the proportion of data used to build the transformation (training data - systematic random sample of location)
    ksize=(int) kernel width to use measured in cells
    output_bit_depths=(int) optional parameter used to scale the analysis outputs to a specified bit depth (e.g., 8,16,32).
    By default (None) output values will not be scaled

    returns a projected raster object and the pca object
    '''
    #scale values using a sample
    vls=_sys_sample_image(rs,p=smp)
    ss=StandardScaler(with_mean=False)
    ss.fit(vls)

    #apply the scaling to the input raster
    ss_mdl = general.ModelPredictAdaptor(ss,'transform')
    nch=(rs.nbands,*rs.xdata.chunks[1:])
    sc_pred_rs=rs.model_predict(ss_mdl,rs.nbands).chunk(nch)

    #perform pca using a sample of the scaled raster values
    vls2=_sys_sample_image(sc_pred_rs,p=smp,wsize=ksize)
    
    pca=PCA()
    pca.fit(vls2)
    #determine number of components to use
    ev=pca.explained_variance_ratio_
    sev=0
    for i in range(ev.shape[0]):
        e=ev[i]
        sev+=e
        if sev>prc:
            break
    kc=i+1 #add one to address 0 start

    #extract component scores for _conv_pca method
    kdf=pca.components_[0:kc,:] #scores,component (rows have weightings)
    hw=int(ksize/2)

    nch=sc_pred_rs.data.chunks
    och=list(nch)
    och[0]=(kc,)
    och[1]=tuple(np.array(och[1])+hw*2)
    och[2]=tuple(np.array(och[2])+hw*2)

    #apply _conv_pca to map_overlap function
    darr=sc_pred_rs.data.map_overlap(
        _conv_pca,
        depth={0: 0, 1: hw, 2: hw},
        chunks=och,
        boundary=np.nan,
        dtype='f8',
        meta=np.array((),dtype='f8'),
        cmp_=kdf,
        m_=pca.mean_,
        size=ksize
    )
    #convert dask array back to a raster
    cmp_rs=raster.data_to_raster(darr,x=sc_pred_rs.x,y=sc_pred_rs.y,affine=sc_pred_rs.affine,crs=sc_pred_rs.crs,nv=sc_pred_rs.null_value).chunk((1,*darr.chunks[1:]))

    #scale projected raster to bit depth if specified
    if(not output_bit_depth is None):
        pcasvls=pca.transform(vls2)[:,0:kc]
        mmsc=MinMaxScaler()
        mmsc.fit(pcasvls)
        mmsc_mdl=general.ModelPredictAdaptor(mmsc,'transform')
        cmp_rs=(cmp_rs.model_predict(mmsc_mdl,cmp_rs.nbands)*(2**output_bit_depth-1)).astype('uint'+str(output_bit_depth))

    return cmp_rs,pca



Perform the PCA convolution process

In [None]:
#perform the pca convolution process on the landsat image; kernel size 5 by 5, output bit depth 16.
# This can be any size kernel. I have tried up to a 15 by 15.
ksize=5
conv_rs,pca=conv_pca(ls30s,prc=0.95,smp=0.1,ksize=ksize,output_bit_depth=16)

## Results
### Using our described approach, we accounted for 95% of the variation within the image using the first eleven principal components (Figure 3). Component convolution kernel weights (Display 1) highlight the linear relationships among both Landsat image bands and neighboring kernel cell values. For a random systematic sample of approximately 10% of the first 3 component values, five unique clusters of information with varying trends appear in component space (Figure 4). 
### To transform our Landsat 8 image into 11 ICSs (Figure 5) required processing 77, 5 by 5 convolution kernels. To explain the remaining 5% of the variation in the data, an additional 1148 convolution kernels would need to be processed. Using a threshold of 95% of the variation within the data substantially reduced the total amount of processing time needed to create ICSs while simultaneously keeping the vast majority of information within the data. Moreover, implementing this processing within the Raster Tools processing architecture was easy and extremely efficient.
### Visually, each ICS highlights a different aspect of the data within the image (Figure 5). Within each ICS, linear combinations of both hue and texture are highlighted to varying degrees and are projected such that each surface is independent, which can be ideal for downstream predictive modeling.
### The following processing steps were used to create Figures 3-5 and Display 1.
- Visualize % variation
- Look at kernel weights
- Look at 3d plot
- Visualize the PCA convolution raster surfaces

### - Visualize % variation explained in each band (component) of the transformed image (Figure 3).
#### The number of bands correspond to the number of components that account for 95% of the variation in the 5 by 5 convolved image. Note that most of the variation/information in the data is explained in the first 3 components

In [None]:
var_exp=pd.DataFrame(pca.explained_variance_ratio_[:conv_rs.nbands])
print('Total % variance explained in components =',var_exp.sum().values[0])
p=var_exp.plot(kind='barh',title='Components by % Variation Explained', xlabel='% Variation', ylabel='Component', figsize=(15,8),legend=False).invert_yaxis()
p

__Figure 3.__ Proportion of variation explained in each principal component. 

### - Look at the kernel weights for each selected component (Display 1).
#### Each component is made of the summation of 7 convolution kernels with the following weights. Convolution weights are applied to the centered and scaled input surface values of each band within the specified kernel of the input raster surface (ls30s in our example).

In [None]:
kw=pca.components_[0:conv_rs.nbands,:]
for k in range(kw.shape[0]):
    krs=kw[k].reshape((ls30s.nbands,ksize,ksize))
    print('\nWeights for component',k)
    print(krs)

__Display 1.__ Kernel weights for each component of the PCA convolution.

### - Look at a 3d plot of the first 3 components for systematic random locations using 10% of the data. (Figure 4)

In [None]:
import plotly.express as px
#get a sample of the PCA convolution raster values
vls=_sys_sample_image(conv_rs,0.1,1) #convolve 1 to remove convolved image edge effect in sampling 
df=pd.DataFrame(vls)
print('n =', df.shape[0])
fig = px.scatter_3d(df, x=0, y=1, z=2
              ,width=1500,height=800)

fig.show()


__Figure 4.__ A 3 dimensional representation of the first 3 principal components scores values for ~10% of the cell locations within the Landsat 8 image. 

### - Visualize the PCA convolution raster surfaces (Figure 5)

In [None]:
conv_rs.plot(x='x',y='y',col='band',col_wrap=3,figsize=(15,conv_rs.nbands),robust=True,cmap='PRGn')

__Figure 5.__ Independent component surfaces (ICSs) derived from a principal component analysis of values taken from a systematic sample of cell locations within a subset of a Landsat scene for a kernel size of five by five cells. The first eleven ICSs account for a more than 95% of the variance within the data.    

## Discussion
### In this notebook we have demonstrated how to use sampling, convolution kernels, and PCA to project multi-band raster data into independent component space (Figure 4) that highlights various orthogonal aspects of the input data while simultaneously reducing the dimensionality of the data. The results of our analysis emphasize both hue and textural aspects of the data within each principal component that can also be viewed as a series of convolution kernel weights. While kernel cell weights provide unique insights into the linear relationships among image bands and neighbor cell values, when combined with Raster Tools processing architecture they can be used to produce ICSs (Figure 5) that spatially emphasize independent hue and textural aspects of the input data. 
### For example, the first ICS in Figure 5 highlights a hue change across spectral bands that emphasizes bright to dark cell values within the image. The kernel weights for the first component (Display 1) verify this assertion with all positive values that generally share importance equally across all band kernels and cell values. When compared against second component ICS (Figure 5), it is clear that this component emphasizes the differences between color and infrared band values while also highlighting an edge effect of neighboring pixel values (Display 1). Investigating the weights of each kernel for each component can provide insights to the relative amounts of information (variation) within the data across both raster bands and neighboring cell space. 

### Additionally, projecting band and neighboring cell values into orthogonal space provides a straightforward way to maintain as much information as possible within the data while simultaneously reducing the duplicated aspects of the data (Figure 3). In our example we ordered each component based on the amount of variation explained in the data for that component and used a threshold of 95% of the total variation in the data to determine the number of ICSs to create. Depending on the question at hand, analyst may want to pick and choose which components have relevance to the phenomenon of interest. Once decided and projected, ICSs can be used as potential predictor variables in downstream analyses. Additionally, kernel weights could be used as potential kernel cell starting values in deep learning convolution neural networks to aid in solving for optimal kernel cell weights given a set of labels. 

### Although the objective in this example was to linear project the raster data across both image bands and neighboring cell values in a way that reduces the dimensionality of the data and determine a set of kernel weights used to produce ICSs, one could easily apply this same type of approach to optimize kernel cell weights for regression and classification. The primary difference in this case is that a regression or classification methodology along with training response values would be used to optimally determine kernel cell weights as opposed to using a PCA to determine kernel cell values. In the case of optimizing kernel cell weights for regression or classification, ICSs surface could also be inputs to the process. Regardless of intended use, our described approach provides an optimal solution that linearly describes unique hue and textural aspects of the data across both raster band channels and neighboring cell values while simultaneously reducing the dimensionality of the data, making it ideal for downstream regression and classification analysis.                   

## Conclusion
### This approach to estimating weights for each convolution kernel and then using those weights to convolve an image (raster stack) highlights various features in the image, is quick, and mathematically determines kernel weights such that each band in the output convolved image is independent of the other bands values. Moreover, the process removes all redundant information across bands and within kernel cell values, which is also desirable. These surfaces should make for effective predictor variables and should make it easy to spread and balance a sample across all the information in a convolved image.    
 