# Creating a classification map
### Let's build a Random forest model using only the subset imagery and interpreted responses within the boundary of the imagery. If you have not worked through the [Getting the imagery](./GettingTheImagery.ipynb) or the [Summarizing plot data](./Summarizing_plot_data.ipynb), please do so before working through this notebook.

## Import packages

In [None]:
from raster_tools import Raster, focal, zonal, general
import geopandas as gpd, shapely, pandas as pd, numpy as np

## Get the data

In [None]:
df=pd.read_csv('./plot_subplot_data.csv')
gdf=gpd.GeoDataFrame(df,geometry=gpd.GeoSeries.from_wkt(df['geometry']),crs=4326)
rs=Raster('./medoid_subset.tif')

### Subset the points to the boundary of the image and just the image values

In [None]:
#get the boundary of the  image
img_bnd=gpd.GeoSeries(shapely.box(*rs.bounds),crs=rs.crs)

#features needed Use and medoid 
n_clms=['plotid', 'sampleid', 'Use', 'BLUE','GREEN', 'NIR', 'RED', 'SWIR1', 'SWIR2','geometry']

#get all points inside the boundary
pint=gdf.loc[gdf.intersects(img_bnd.to_crs(gdf.crs).unary_union)].to_crs(rs.crs)[n_clms]
 
#explore the plot subplot
pint.explore()

### Get the response and predictor columns and create a train and validate datasets (75%)

In [None]:
from sklearn.model_selection import train_test_split

#convert labels in Use to integers for mapping
uvls=pint['Use'].unique()
cdic=dict(zip(uvls,np.arange(uvls.shape[0])))
pint['Use2']=pint['Use'].map(cdic)
print(cdic)

#specify respons and predictor features
resp='Use2'
prednm=['BLUE','GREEN', 'NIR', 'RED', 'SWIR1', 'SWIR2']

p=0.75
train,val=train_test_split(pint,train_size=p,random_state=0)

### Create a random forest model for the Use feature, estimate overall accuracy from the OOB and validation dataset, and create a confusion matrix with class cell frequencies.

In [None]:
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import ConfusionMatrixDisplay

#make the random forest classifier model
rf=RandomForestClassifier(max_samples=0.80,oob_score=True,random_state=0) #split the data into 80% for training and 20% for testing (max_sample)

#get the data
X=train[prednm]
X2=val[prednm]
y=train[resp]
y2=val[resp]

#fit the model
rf.fit(X.values,y.values)

#look at map overall accuracy
print('obb = ',rf.oob_score_)
pred=rf.predict(X2.values)
oa=(pred==y2.values).sum()/y2.shape[0]
print('val = ',oa)
print('Sample size =',y2.shape[0])

#create a confusion matrix
ConfusionMatrixDisplay.from_predictions(y2.values,pred,cmap='plasma')

### Create a map of predicted labels and plot the map and the medoid rgb image.
Here we will be using Raster Tools' model predict function to create the use cover map from the model and medoid image.

In [None]:
cl_rs=rs.model_predict(rf,1)

colors={0:'green',1:'yellow',2:'tan',3:'white',4:'grey'} #used to give a specific color to each integer
cl_rs.plot(levels=list(colors.keys()),colors=list(colors.values()),figsize=(15,8))
rs.get_bands([3,2,1]).xdata.plot.imshow(figsize=(12,8),robust=True)


## Exercise 1: Interpreting the results
- Why did we have to convert our Use labels to integers?
- What was the oob and validation overall map accuracies?
- What does overall map accuracy mean?
- How is the oob value calculated?
- What does the confusion matrix tell us?
- Why does our map predict forest in the ocean?

## A random forest model can also estimate the proportion of models that labeled a given class for each observation. Let's turn those proportions into a map
Here we will use the ModelPredictAdaptor function to select the predict_proba function in the random forest model and then use the Raster Tools model predict function. 

In [None]:
mdl=general.ModelPredictAdaptor(rf,'predict_proba')
pr_rs=rs.model_predict(mdl,rf.n_classes_)

### Let's plot out each proportion surface.

In [None]:
print('label:band =',dict(zip(list(cdic.keys()),np.array(list(cdic.values()))+1)))
pr_rs.plot(x='x',y='y',col='band',col_wrap=2,figsize=(15,12),robust=True,cmap='PRGn')

## Exercise 2: Interpreting the results 2
- What do these surfaces mean?
- Could you use these surfaces to create the labeled map?
- What rule is being used to create the labeled map?
- What value would you get if you added each surface together?
- How many plots/subplots did we have in the ocean?
- If each cell has a proportion estimate then how variable are those estimates?
- Task: Downloading the entire medoid image, use all the data plots/subplots, create a model, and create a labeled map with overall map accuracy and a confusion matrix.