# PyEumap - Land-Cover Mapping

In this tutorial, we will use the overlayed points (see [Overlay tutorial](02_overlay.ipynb)) to train a ML-model and predict the land-cover (LC) in the last two decades, using the **LandMapper** class.

The training step will use *elevation*, *slope*, *landsat* (7 spectral bands, 4 seasons and 3 percentiles per year) and *night light* (VIIRS Night Band) data to predict the follow LC classes:
* 211: Non-irrigated arable land
* 311: Broad-leaved forest
* 312: Coniferous forest
* 324: Transitional woodland-shrub
* 411: Inland wetlands
* 512: Water bodies

First, let's import the necessary modules

In [2]:
import sys
sys.path.append('../../')

import os
import gdal
from pathlib import Path
import pandas as pd
import geopandas as gpd
from pyeumap.mapper import LandMapper

## Dataset

Our dataset refers to 1 tile, located in Sweden, extracted from a tiling system created for European Union (7,042 tiles) by GeoHarmonizer Project.

In [3]:
from pyeumap import datasets

tile = datasets.TILES[0]

data_root = datasets.DATA_ROOT_NAME
data_dir = Path(os.getcwd()).joinpath(data_root,tile)

Let's load the overlayed points

In [4]:
fn_points = Path(os.getcwd()).joinpath(data_dir, tile + '_landcover_samples_overlayed.gpkg')
points = gpd.read_file(fn_points)
points

Unnamed: 0,lucas,survey_date,confidence,tile_id,lc_class,overlay_id,dtm_elevation,dtm_slope,landsat_ard_spring_nir_p25,landsat_ard_spring_nir_p50,...,landsat_ard_summer_blue_p50,landsat_ard_spring_thermal_p75,landsat_ard_fall_blue_p50,landsat_ard_winter_thermal_p75,landsat_ard_winter_blue_p25,landsat_ard_summer_blue_p25,landsat_ard_fall_thermal_p75,landsat_ard_winter_blue_p50,landsat_ard_spring_blue_p50,geometry
0,False,2006-06-30T00:00:00,85,10636,321,1,1948.0,36.313705,83.0,83.0,...,8.0,185.0,6.0,176.0,128.0,8.0,186.0,130.0,11.0,POINT (4145221.759 2594636.440)
1,False,2006-06-30T00:00:00,85,10636,321,2,2209.0,7.917305,120.0,137.0,...,5.0,177.0,6.0,176.0,118.0,5.0,183.0,118.0,120.0,POINT (4142366.664 2598169.380)
2,False,2006-06-30T00:00:00,85,10636,321,3,1990.0,32.722038,62.0,63.0,...,9.0,185.0,5.0,179.0,100.0,9.0,188.0,116.0,10.0,POINT (4140249.007 2596954.755)
3,False,2006-06-30T00:00:00,85,10636,322,4,2142.0,49.800537,55.0,55.0,...,2.0,172.0,6.0,176.0,48.0,2.0,179.0,48.0,37.0,POINT (4148638.412 2595538.585)
4,False,2006-06-30T00:00:00,85,10636,332,5,2420.0,27.018671,172.0,188.0,...,16.0,176.0,17.0,178.0,228.0,16.0,188.0,228.0,201.0,POINT (4156286.754 2595790.720)
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1119,False,2000-06-30T00:00:00,85,10636,312,277,1729.0,16.108473,44.0,45.0,...,3.0,183.0,7.0,177.0,41.0,3.0,186.0,41.0,4.0,POINT (4140414.076 2582953.315)
1120,False,2000-06-30T00:00:00,85,10636,332,278,2562.0,31.661921,108.0,108.0,...,9.0,169.0,12.0,170.0,44.0,9.0,179.0,44.0,93.0,POINT (4157045.539 2609917.600)
1121,False,2000-06-30T00:00:00,85,10636,321,279,2174.0,15.649096,58.0,61.0,...,5.0,188.0,6.0,177.0,145.0,5.0,187.0,150.0,8.0,POINT (4141237.722 2583848.400)
1122,False,2000-06-30T00:00:00,85,10636,333,280,2368.0,21.605083,72.0,72.0,...,22.0,181.0,29.0,177.0,82.0,22.0,189.0,82.0,50.0,POINT (4141257.016 2584469.100)


What are the columns avaiable to the ML-model ?

In [5]:
print("Columns:")
columns = []
for col_name, col_type in zip(points.columns, points.dtypes):
    print(f' - {col_name} ({col_type})')

Columns:
 - lucas (bool)
 - survey_date (object)
 - confidence (int64)
 - tile_id (int64)
 - lc_class (int64)
 - overlay_id (int64)
 - dtm_elevation (float64)
 - dtm_slope (float64)
 - landsat_ard_spring_nir_p25 (float64)
 - landsat_ard_spring_nir_p50 (float64)
 - landsat_ard_winter_green_p75 (float64)
 - landsat_ard_summer_green_p25 (float64)
 - landsat_ard_summer_green_p50 (float64)
 - landsat_ard_fall_nir_p50 (float64)
 - landsat_ard_spring_green_p50 (float64)
 - landsat_ard_summer_nir_p50 (float64)
 - landsat_ard_winter_nir_p50 (float64)
 - landsat_ard_fall_nir_p25 (float64)
 - landsat_ard_spring_blue_p75 (float64)
 - landsat_ard_spring_green_p75 (float64)
 - landsat_ard_fall_nir_p75 (float64)
 - landsat_ard_summer_nir_p25 (float64)
 - landsat_ard_summer_blue_p75 (float64)
 - landsat_ard_winter_blue_p75 (float64)
 - landsat_ard_spring_red_p25 (float64)
 - landsat_ard_fall_blue_p75 (float64)
 - landsat_ard_fall_green_p75 (float64)
 - landsat_ard_winter_green_p50 (float64)
 - landsat

## Training 

To map the land-cover classes we will use LandMapper, which will train a ML-model and do the space time prediction. The LandMapper receives the follow parameters:
* *fn_points*: the geopackage filepath or [GeoPandas DataFrame](https://geopandas.org/reference/geopandas.GeoDataFrame.html) instance
* *feat_col_prfxs*: the prefix of all columns that should be included as covariates in the feature space 
* *target_col*: the name of the column that should be considered as the target variable by the model
* *estimator*: The model implementation, which could be any one available in the [sklearn](https://scikit-learn.org/stable/modules/classes.html) 
* *val_samples*: The sample proportion that should be used by validation
* *min_samples_per_class*: The minimum sample proportion per class. For example, all the classes with less than 5% of samples will be removed from the training.

In [6]:
from sklearn.ensemble import RandomForestClassifier

feat_col_prfxs = ['landsat', 'dtm', 'night_lights']
target_col = 'lc_class'
estimator = RandomForestClassifier(n_estimators=100)

landmapper = LandMapper(fn_points, feat_col_prfxs, target_col, 
                        estimator=estimator, 
                        val_samples_pct=0.5, 
                        min_samples_per_class=0.05,
                        verbose = True
)

[16:07:11] Removing 74 sampes due min_samples_per_class condition (< 0.05)


Let's train the model

In [7]:
landmapper.train()

[16:07:11] Training and evaluating the model
[16:07:11] Training the final model using all data


and check the summary of the model performance:

In [8]:
print(f'Overall accuracy: {landmapper.overall_acc * 100:.2f}%\n\n')
print(landmapper.classification_report)

Overall accuracy: 63.62%


              precision    recall  f1-score   support

       231.0       0.71      0.44      0.54        39
       312.0       0.58      0.86      0.69        92
       321.0       0.60      0.62      0.61        88
       322.0       0.58      0.40      0.47        65
       324.0       0.43      0.13      0.20        46
       332.0       0.70      0.61      0.65        49
       333.0       0.65      0.81      0.72       104
       335.0       0.88      0.88      0.88        42

    accuracy                           0.64       525
   macro avg       0.64      0.59      0.60       525
weighted avg       0.63      0.64      0.61       525



It's possible also access the confusion matrix:

In [9]:
landmapper.cm

array([[17,  8, 11,  1,  1,  0,  1,  0],
       [ 0, 79,  6,  5,  2,  0,  0,  0],
       [ 5,  7, 55,  4,  2,  1, 14,  0],
       [ 0, 12, 14, 26,  2,  0, 10,  1],
       [ 2, 26,  3,  4,  6,  0,  5,  0],
       [ 0,  0,  0,  0,  0, 30, 15,  4],
       [ 0,  4,  3,  5,  1,  7, 84,  0],
       [ 0,  0,  0,  0,  0,  5,  0, 37]])

## Predictions

Now we are ready to run the predictions. To do it the LandMapper shoudl receive as parameter:
* *dirs_layers*: a file path list to access all the raster layers used by training phase.
* *fn_result*: The file path to write the model output
* *data_type*: The gdal data type for the output file

First, let's predict only the year of 2000:

In [12]:
dir_timeless_layers = os.path.join(data_dir, 'timeless') 
dir_2000_layers = os.path.join(data_dir, '2000')

dirs_layers = [dir_2000_layers, dir_timeless_layers]
fn_result = os.path.join('land_cover_2000.tif')
data_type = gdal.GDT_Int16

landmapper.predict(dirs_layers=dirs_layers, fn_result=fn_result, data_type=data_type)

[16:16:04] Reading /home/jupyter/leandro/Code/eumap/demo/python/eumap_data/10636_switzerland/2000/landsat_ard_spring_nir_p25.tif
[16:16:04] Reading /home/jupyter/leandro/Code/eumap/demo/python/eumap_data/10636_switzerland/2000/landsat_ard_spring_nir_p50.tif
[16:16:04] Reading /home/jupyter/leandro/Code/eumap/demo/python/eumap_data/10636_switzerland/2000/landsat_ard_winter_green_p75.tif
[16:16:04] Reading /home/jupyter/leandro/Code/eumap/demo/python/eumap_data/10636_switzerland/2000/landsat_ard_summer_green_p25.tif
[16:16:04] Reading /home/jupyter/leandro/Code/eumap/demo/python/eumap_data/10636_switzerland/2000/landsat_ard_summer_green_p50.tif
[16:16:04] Reading /home/jupyter/leandro/Code/eumap/demo/python/eumap_data/10636_switzerland/2000/landsat_ard_fall_nir_p50.tif
[16:16:04] Reading /home/jupyter/leandro/Code/eumap/demo/python/eumap_data/10636_switzerland/2000/landsat_ard_spring_green_p50.tif
[16:16:04] Reading /home/jupyter/leandro/Code/eumap/demo/python/eumap_data/10636_switzerlan

To predict the other years we will call the same method changing the dirs_layers and fn_result parameters:

In [11]:
dir_timeless_layers = os.path.join(data_dir, 'timeless') 

for year in range(2001, 2002):
    dir_time_layers = os.path.join(data_dir, str(year))
    dirs_layers = [dir_time_layers, dir_timeless_layers]
    fn_result = os.path.join(f'land_cover_{year}.tif')
    
    print(f"Predicting the land-cover for {year} and saving the result in {fn_result}")
    landmapper.predict(dirs_layers=dirs_layers, fn_result=fn_result, data_type=data_type)

Predicting the land-cover for 2001 and saving the result in land_cover_2001.tif
[16:07:32] Reading /home/jupyter/leandro/Code/eumap/demo/python/eumap_data/10636_switzerland/2001/landsat_ard_spring_nir_p25.tif
[16:07:32] Reading /home/jupyter/leandro/Code/eumap/demo/python/eumap_data/10636_switzerland/2001/landsat_ard_spring_nir_p50.tif
[16:07:32] Reading /home/jupyter/leandro/Code/eumap/demo/python/eumap_data/10636_switzerland/2001/landsat_ard_winter_green_p75.tif
[16:07:32] Reading /home/jupyter/leandro/Code/eumap/demo/python/eumap_data/10636_switzerland/2001/landsat_ard_summer_green_p25.tif
[16:07:32] Reading /home/jupyter/leandro/Code/eumap/demo/python/eumap_data/10636_switzerland/2001/landsat_ard_summer_green_p50.tif
[16:07:32] Reading /home/jupyter/leandro/Code/eumap/demo/python/eumap_data/10636_switzerland/2001/landsat_ard_fall_nir_p50.tif
[16:07:32] Reading /home/jupyter/leandro/Code/eumap/demo/python/eumap_data/10636_switzerland/2001/landsat_ard_spring_green_p50.tif
[16:07:32] 