## RQ1 Data Preparation

[Add description]

Steps:
1. **Manually** download all datasets (5 forest definitions and Natura 2000) 
2. Filter Natura 2000 for areas in Germany
3. Mosaic data which comes in tiles (Hansen & JAXA)
4. Threshold & update (Hansen)
5. Extract layer from netcdf (ESA)
6. Reproject to the most common projection - EPSG 3035
7. Rasterise or Upsample (WITHOUT INTERPOLATION) to 5m 
8. Convert all datasets to FNF
9. ~~Clip data to Germany~~ (SKIPPED FOR NOW)

### Initial Setup

Used this for help with directory setup: 
https://www.freecodecamp.org/news/creating-a-directory-in-python-how-to-create-a-folder/

In [None]:
# 0: SETUP

# Import packages
import os
import warnings
import glob
import math
import subprocess

import geopandas as gpd
import rasterio
from rasterio.merge import merge
from rasterio.crs import CRS 
import xarray as xr 
import rioxarray as rio

#from osgeo import gdal 

# Create required directories if they don't already exist
# Note: these directories are ignored in git
path_list = ("./rawdata", "./processing", "./outputs")

for path in path_list:
  if not os.path.exists(path):
    os.mkdir(path)
    print("Folder %s created!" % path)
  else:
    print("Folder %s already exists" % path)

Folder ./rawdata already exists
Folder ./processing already exists
Folder ./outputs already exists


### Step 1: Manually download datasets

As several of the datasets require login credentials and are not available through an API, I decided to manually download all the required data. I have stored everything in the "rawdata" folder. This folder is set to be ignored by git because the files are too big to push onto the GitHub repo.

**So for the first step: manually download all datasets using the notes below and save to the "rawdata" folder.** 

Note: For the forest definiton layers I have downloaded the 2018 datasets as this is the most recent data available across all datasets. 

**1. UMD (Hansen) / Global Forest Watch**
- Download from: https://storage.googleapis.com/earthenginepartners-hansen/GFC-2023-v1.11/download.html
    - Using the map interface, download the treecover2000, gain & lossyear layers for the 4 granules with top-left corner at: (60N, 0E), (60N, 10E), (50N, 0E) and (50N, 10E). 
    - The files will be used in combination with each other to generate a dataset that corresponds (roughly) to forest cover in 2018.
- My info:
    - Download date: 15 Jan 2025
    - File: rawdata/Hansen_GFC-2023-v1.11 (folder contains 12 tifs - 4 each for cover, gain and loss)

**2. ESA Land Cover**
- Download from: https://cds.climate.copernicus.eu/datasets/satellite-land-cover?tab=download 
    - Login credentials required (there is a prompt in the page to sign up/login).
    - Select 2018 map and v2.1.1.
    - Only download the sub-region for ~Germany bounding box (N:56, W:1, E:16, S:46).
- My Info:
    - Download date: 14 Jan 2025
    - File: rawdata/C3S-LC-L4-LCCS-Map-300m-P1Y-2018-v2.1.1.area-subset.56.1.46.16.nc

**3. JAXA FNF** 
- Download from: https://www.eorc.jaxa.jp/ALOS/en/palsar_fnf/data/index.htm
    - Login credentials required (To register, go here: https://www.eorc.jaxa.jp/ALOS/en/palsar_fnf/registration.htm).
    - Under the heading "PALSAR/PALSAR-2 mosaic and forest/non-forest (FNF) map", select the 2018 data.
    - Use the map interface to click through until you can download tiles. I opted to download four 5 x 5 tiles using the link above the map (for N55E005, N55E010, N50E005, N50E010), but also had to supplement with some individual tiles. I used QGIS to sort through the tiles and figure out which ones were needed. In total, 73 tiles are needed for the Germany Natura areas - a list of the required tile names is available in: other/jaxa_tile_list.txt
- My Info:
    - Download date: 15 Jan 2025
    - File: rawdata/jaxa_2018_fnf_ger (folder contains 73 tifs)  

**4. CORINE Land Use** 
- Download from: https://land.copernicus.eu/en/products/corine-land-cover/clc2018#download
    - Login credentials required (there is a prompt in the page to sign up/login).
    - Click on “Go to download by area”. Then select the CORINE land cover 2018 layer, use the area selection tool to click on Germany and then click on the download icon beside the layer name. 
    - From the cart, select the dataset and chose "vector" and "shapefile". I opted for vector so that I can rasterise at a common resolution that makes sense with the other data. Click the "Process Download Request" button.
    - NOTE: At this point the download request enters a queue which can take a long time. When it is ready to download, an email will be sent so you don't have to keep checking it.
- My Info:
    - Download date: 16 Jan 2025 (request date - ready for download on 18 Jan 2025)
    - File: U2018_CLC2018_V2020_20u1.zip  (contains: 1 shp & its components)

**5. German Land Use**
- Download from: https://gdz.bkg.bund.de/index.php/default/corine-land-cover-5-ha-stand-2018-clc5-2018.html  
    - Click on the “Direktdownload” tab, and then click on "Georeferenzierung: UTM32s, Format: Shape (ZIP, 1,24 GB)". This will download 5 shapefiles which represent the 5 main land cover classes (also used in CORINE) - individual features within these files have their more precise class as an attribute. Class 3 contains the classes related to forests and natural features, but Class 4 (marshlands, etc) is also required for producing the FAO map. So these 2 are retained for now and Class 1 (urban), Class 2 (agriculture) and Class 5 (water) are discarded.
- My Info:
    - Download date: 14 Jan 2025
    - Files: rawdata/clc5_class3xx.zip, rawdata/clc5_class4xx.zip (each contains: 1 shp & its components)

**6. Natura 2000 protected areas**
- Download from: https://www.eea.europa.eu/en/datahub/datahubitem-view/6fc8ad2d-195d-40f4-bdec-576e7d1268e4
    - Download the most recent date available (in my case: 2022 - direct link: https://sdi.eea.europa.eu/data/95e717d4-81dc-415d-a8f0-fecdf7e686b0).
- My Info:
    - Download date: 15 Jan 2025
    - File: rawdata/Natura2000_end2022_epsg3035.zip (contains: 1 shp & its components)

### Step 2: Filter Natura 2000 

Use the attributes of the Natura shapefile to filter the "MS" field (i.e. "Member States") to only include "DE" (i.e. Germany). 

I also save the results as a shapefile in the outputs folder as this maybe be useful for visualisations at the end. 

In [2]:
# 2: FILTER NATURA2000

# Load the Natura 2000 shapefile as a geodataframe
# You can do this directly from the zipped file
natura_gdf = gpd.read_file("./rawdata/Natura2000_end2022_epsg3035.zip")

#print(natura_gdf[1:20])

# Extract only the German areas
natura_de_gdf = natura_gdf.loc[natura_gdf["MS"] == "DE"]

# Check - there should be 5200 areas
natura_de_gdf.count()

# Save the file to outputs folder (turned off warnings which is about a datetime column)
with warnings.catch_warnings():
    warnings.simplefilter('ignore')
    natura_de_gdf.to_file('./outputs/natura2000_3035_DE.shp')

  _init_gdal_data()


### Step 3: Mosaic tiled data

JAXA and Hansen

Help with mosaicing using rasterio: https://automating-gis-processes.github.io/CSC18/lessons/L6/raster-mosaic.html

In [3]:
# 3: MOSAIC TILES

# Store paths for tiles in list
jaxa_paths = glob.glob('./rawdata/jaxa_2018_fnf_ger/*.tif')
hansen_cover_paths = glob.glob('./rawdata/Hansen_GFC-2023-v1.11/Hansen_GFC-2023-v1.11_tree*.tif')
hansen_gain_paths = glob.glob('./rawdata/Hansen_GFC-2023-v1.11/Hansen_GFC-2023-v1.11_gain*.tif')
hansen_loss_paths = glob.glob('./rawdata/Hansen_GFC-2023-v1.11/Hansen_GFC-2023-v1.11_loss*.tif')

# Store the path for the output mosaics
jaxa_mosaic = "./processing/jaxa_FNF_4326_DE.tif"
hansen_cover_mosaic = "./processing/hansen_treecover2000_4326_DE.tif"
hansen_gain_mosaic = "./processing/hansen_gain_4326_DE.tif"
hansen_loss_mosaic = "./processing/hansen_lossyear_4326_DE.tif"

# Create a function which mosaics the tiles
def mosaic_rasters(input_paths, output_path):
    # Create an empty list to store the opened tiles
    tiles_to_mosaic = []
    # Iterate through the tile paths to open them and store the opened tiles a list
    for path in input_paths:
        tile = rasterio.open(path)
        tiles_to_mosaic.append(tile)
    # Create the mosaic and store the transform information
    mosaic, transform = merge(tiles_to_mosaic)
    # Copy the metadata for the mosaic from a tile
    mosaic_meta = tile.meta.copy()
    # Update the metadata with the new information for the mosaic
    mosaic_meta.update({"driver": "GTiff",
                        "height": mosaic.shape[1],
                        "width": mosaic.shape[2],
                        "transform": transform,
                        # crs is not included, as it is copied from the tiles
                        }
                        )
        # Write the mosaic with its metata to a tif file
    with rasterio.open(output_path, "w", **mosaic_meta, compress="LZW") as dest:
        dest.write(mosaic)

# Run the function to mosaic the tiles
# If mosaic already exists, make sure it's not open in QGIS :) you'll get permission error if so!
mosaic_rasters(jaxa_paths, jaxa_mosaic)
mosaic_rasters(hansen_cover_paths, hansen_cover_mosaic)
mosaic_rasters(hansen_gain_paths, hansen_gain_mosaic)
mosaic_rasters(hansen_loss_paths, hansen_loss_mosaic)

### Step 4: Threshold & update for gain/loss 

Hansen data only

>60% cover - provides a good range with the other datasets (which are lower), and it's also the threshold used by the International Geosphere-Biosphere Programme (IGBP) definition

reclassify 1-100 values so that 61-100 are forest, everything else is non-forest

reclassify loss so that 1-18 has a value of 1, everything else is 0

subtract/add loss and gain from treecover layer

### Step 5: Extract layer from netcdf

This is required for the ESA dataset only. The netcdf file includes several different layers of information - the one I want to use is called "lccs_class". The aim here is to simply extract the single layer and save it as a geotiff.

Help with saving netcdf layer as geotiff: https://help.marine.copernicus.eu/en/articles/5029956-how-to-convert-netcdf-to-geotiff 

In [8]:
# 5: EXTRACT & CONVERT NETCDF DATA

# Open the ESA netcdf for conversion
esa_netcdf = xr.open_dataset("./rawdata/C3S-LC-L4-LCCS-Map-300m-P1Y-2018-v2.1.1.area-subset.56.1.46.16.nc", engine="netcdf4")
#esa_netcdf

# Extract the lccs_class variable
esa_lccs_class = esa_netcdf['lccs_class']

# Provide spatial axis & define the CRS
esa_lccs_class = esa_lccs_class.rio.set_spatial_dims(x_dim='lon', y_dim='lat')
esa_lccs_class.rio.crs
esa_lccs_class.rio.write_crs("EPSG:4326", inplace=True)

# Save the geotiff
esa_lccs_class.rio.to_raster("./processing/esa_lccs_class_4326_DE.tif")

### Step 6: Reproject

The data needed to be projected to work in units of meters for calculating area. Three datasets are not in a projected CRS (they are WGS 1984 / EPSG:4326). The most common projection is ETRS89-extended / LAEA Europe (EPSG: 3035) which is used by the Natura 2000 areas and the CORINE dataset. 

So for this step, all datasets which are not already in this projection, will be (re)projected to 3035.

Help with reprojecting using rioxarray: https://www.earthdatascience.org/courses/use-data-open-source-python/intro-raster-data-python/raster-data-processing/reproject-raster/

In [9]:
# 6: REPROJECT RASTERS

# A quick function for reprojecting individual rasters to EPSG:3035
def reproject_raster_3035(input_path, output_path):
    # Open the raster (NOTE: need to use rio.open_rasterio() here!)
    input = rio.open_rasterio(input_path)
    # Run the reprojection
    output = input.rio.reproject("EPSG:3035")
    # Write the reprojected output raster
    output.rio.to_raster(output_path, compress = "LZW")

# Run this per file 
reproject_raster_3035("./processing/jaxa_FNF_4326_DE.tif", "./processing/jaxa_FNF_3035_DE.tif")
reproject_raster_3035("./processing/esa_lccs_class_4326_DE.tif", "./processing/esa_lccs_class_3035_DE.tif")

# TO DO: HANSEN?

In [10]:
# 6: REPROJECT SHPS

# Store paths for shp zips in list
ger_lulc_paths = glob.glob('./rawdata/clc5_class*.zip')

# Create a function which reprojects the shp to 3035 (and saves to processing folder)
def reproj_shp_3035(input_paths):
    # Iterate through the shp paths 
    for path in input_paths:
        # Open the shp for each path (excludes extra cols as they cause problems & are not needed)
        shp = gpd.read_file(path, columns = ["CLC18"])
        
        # Reprojects to 3035
        shp_3035  = shp.to_crs("EPSG:3035")

        # For output file naming: extract the input file name (with extension)
        name_w_ext = os.path.split(path)[1] 
        # For output file naming: remove extension from input file name 
        name_wo_ext = os.path.splitext(name_w_ext)[0]
        # For output file naming: create the new name for reprojected shp
        new_name = name_wo_ext + "_3035_DE.shp"

        # Write the reprojected shp to the processing folder
        shp_3035.to_file('./processing/' + new_name)

# Run the function for the German LULC zipped shps
reproj_shp_3035(ger_lulc_paths)


In [5]:
# 6: CLEAN UP

# Create a list of the data paths for deletion (files with "4326" in their name)
old_data =  glob.glob('./processing/*4326*')

# Create a function which deletes the input paths
def clean_up(input_paths):
    for path in input_paths:
        # Check that the paths exist
        if os.path.exists(path):  
           os.remove(path)
        else:
            print("Nothing to clean!") 

# Run the function to remove any data with "4326" in the file name
clean_up(old_data)

### Step 7: Rasterise or Upsample

In this step, I rasterise the vector files (German LULC - Class 3 only & CORINE) and upsample the already exisiting rasters to 5m. 

5m was selected as is the commonly divisible unit across all datasets; so all pixels can be approximately divided by 5, meaning there is as little transformation as possible. It also means that a lot of the detail of the shapefiles can be retained during rasterisation. 

Importantly, the upsampling needs to happen WITHOUT INTERPOLATION so that no "new" information is created.

In [26]:
# 7: RASTERISE VECTORS (WITH ATTRIBUTE VALUE)
# This is only required for GER LULC Class3 and CORINE

# First some prep is needed for GER LULC Class 3
# Read in shp
ger_lulc_class3xx = gpd.read_file("./processing/clc5_class3xx_3035_DE.shp")

# Convert the CLC18 column to integer (stored as string in original file)
ger_lulc_class3xx['CLC18'] = ger_lulc_class3xx['CLC18'].astype('int')

# Save the output
ger_lulc_class3xx.to_file('./processing/clc5_class3xx_3035_DE_int.shp')

# And we need to unzip the CORINE data
# Read in shp, Code_18 column only 
corine_shp = gpd.read_file("./rawdata/U2018_CLC2018_V2020_20u1.zip", columns = ["Code_18"])

# Save the output
corine_shp.to_file('./processing/U2018_CLC2018_V2020_DE.shp')

  _init_gdal_data()


In [3]:
# 7: RASTERISE VECTORS (WITH ATTRIBUTE VALUE) - CONTINUED

# TAKES ABOUT 35 MIN TO RUN
# Runs batch script which runs gdal_rasterize for GER LULC Class 3 & CORINE - outputs 5m tifs
rasterise_5m = subprocess.run(["rq1_step1_sub1_rasterise.bat"], 
                                    capture_output=True, 
                                    text=True)

print(rasterise_5m.stdout)
print(rasterise_5m.stderr)

0...10...20...30...40...50...60...70...80...90...100 - done.
Input file size is 127980, 173478




In [2]:
# 7: UPSAMPLE RASTERS 

# TAKES ABOUT 30 MIN TO RUN
# Runs batch script which runs gdal_translate to resample rasters (ESA & JAXA) to 5m - outputs tifs
upsample_5m = subprocess.run(["rq1_step1_sub2_upsample.bat"], 
                             capture_output=True, 
                             text=True)

print(upsample_5m.stdout)
print(upsample_5m.stderr)

Input file size is 4942, 4884
0...10...20...30...40...50...60...70...80...90...100 - done.
Input file size is 42602, 51782
0...10...20...30...40...50...60...70...80...90...100 - done.




### Step 8: Convert all datasets to FNF

To prepare the datasets for turning into a presence/absence consensus map, and also to prepare the JAXA and GER LULC maps for use in the workflow to create the FAO-aligned map, they need to be converted to Forest-Nonforest maps - i.e. maps where there are only two classes, 0 = Nonforest and 1 = Forest.

As each dataset has its own set of classes, this conversion needs to be customisted for each map.

Help with reclassifying (use gdal_calc): https://gis.stackexchange.com/questions/245170/reclassifying-raster-using-gdal 

#### Step 8.1: JAXA Reclassify

For JAXA, the reclassification to true FNF is as follows:

| Original Value | Original Label               | New Value | New Label  |
| -------------- | ---------------------------- | --------- | ---------- |
| 0              | NoData                       | -9999     | NoData     |
| 1              | Forest (>90% canopy cover)   | 1         | Forest     |
| 2              | Forest (10-90% canopy cover) | 1         | Forest     |
| 3              | Non-Forest                   | 0         | Non-Forest |
| 4              | Water                        | 0         | Non-Forest |

Both forest categories are converted to forest here as this fits with the FAO canopy cover thresholds. 

In [None]:
# 8.1: RECLASSIFY JAXA 

# Store path to 5m Jaxa dataset as the intput file
jaxa_input = "./processing/jaxa_FNF_3035_DE_5m.tif"

# Store the path to where gdal_calc.py is (for some reason this is different than where the .exe scripts are)
gdal_calc = "./thesis_env_conda/Lib/site-packages/GDAL-3.10.1-py3.12-win-amd64.egg-info/scripts/gdal_calc.py"

# TAKES ABOUT 30 MIN TO RUN
# Runs gdal_calc.py in order to reclassify JAXA 5m raster 
reclass_jaxa = subprocess.run(['python', gdal_calc, '-A', jaxa_input, '--outfile=./processing/jaxa_FNF_3035_DE_5m_reclass.tif', '--calc=-9999*(A==0)+1*(A==1)+1*(A==2)+0*(A>=3)', '--co=COMPRESS=LZW', '--co=BIGTIFF=YES', '--NoDataValue=-9999'],
                              capture_output=True, 
                              text=True)

print(reclass_jaxa.stdout)
print(reclass_jaxa.stderr)

0...10...20...30...40...50...60...70...80...90...100 - done.




#### Step 8.2: CORINE Reclassify

For CORINE, the reclassification to FNF is as follows:

| Original Value | Original Label                 | New Value | New Label  |
| -------------- | ------------------------------ | --------- | ---------- |
| 0              | NoData                         | -9999     | NoData     |
| 1xx            | Urban classes                  | 0         | Non-Forest |
| 2xx            | Agricultural classes           | 0         | Non-Forest |
| 311            | Broad-leaved forest            | 1         | Forest     |
| 312            | Coniferous forest              | 1         | Forest     |
| 313            | Mixed forest                   | 1         | Forest     |
| 321            | Natural grasslands             | 0         | Non-Forest |
| 322            | Moors and heathland            | 0         | Non-Forest |
| 323            | Sclerophyllous vegetation      | 1         | Forest     |
| 324            | Transitional woodland-shrub    | 1         | Forest     |
| 331            | Beaches - dunes - sands        | 0         | Non-Forest |
| 332            | Bare rocks                     | 0         | Non-Forest |
| 333            | Sparsely vegetated areas       | 0         | Non-Forest |
| 334            | Burnt areas                    | 0         | Non-Forest |
| 335            | Glaciers and perpetual snow    | 0         | Non-Forest |
| 4xx            | Marsh, bog, intertidal classes | 0         | Non-Forest |
| 5xx            | Water body classes             | 0         | Non-Forest |
| >=600          | NoData                         | -9999     | NoData     |

More simply: classes <311 = Non-Forest; classes 311-324 = Forest, classes >325 = Non-Forest.

This is based on *Natura 2000 and forests. Part I-II* (European Commission, 2015) which describes how forest area calculations were performed with data from CORINE with "CLC classes grouped as forests: 311 Broad-leaf forests; 312 Coniferous forests; 313 Mixed forests; 323 Sclerophyllous vegetation; 324 Transitional woodland-shrub." 

** Note the 323 class (Sclerophyllous vegetation) is not present in the Germany dataset.

Whether class 324 (Transitional woodland-shrub) is included is a bit uncertain. In the definition above it is included, and in the *State of nature in the EU...* report (EEA, 2020) forest area tends to be reported as "Forests and transitional woodland shrubbodies" - so they are also sort of grouped together. 

For now, I will include class 324 in the definition of forest for CORINE. 


In [None]:
# 8.2: RECLASSIFY CORINE

# Store path to 5m Corine dataset as the intput file
corine_input = "./processing/U2018_CLC2018_V2020_3035_DE_5m.tif"

# Store the path to where gdal_calc.py is
gdal_calc = "./thesis_env_conda/Lib/site-packages/GDAL-3.10.1-py3.12-win-amd64.egg-info/scripts/gdal_calc.py"

# TAKES ABOUT 10 MIN TO RUN
# Runs gdal_calc.py in order to reclassify CORINE 5m raster 
reclass_corine = subprocess.run(['python', gdal_calc, '-A', corine_input, '--outfile=./processing/U2018_CLC2018_V2020_3035_DE_5m_reclass.tif', '--calc=-9999*(A==0)+0*(A<=310)+1*((A>=311)*(A<=324))+0*((A>=325)*(A<=599))+-9999*(A>=600)', '--co=COMPRESS=LZW', '--co=BIGTIFF=YES', '--NoDataValue=-9999'],
                              capture_output=True, 
                              text=True)

print(reclass_corine.stdout)
print(reclass_corine.stderr)


0...10...20...30...40...50...60...70...80...90...100 - done.




#### Step 8.3: GER LULC Class 3 Reclassify

The GER LULC class conversion is essentially the same as the CORINE one; I am applying the same definition of forest to both, but this dataset is a different operationalisation of that definition (and also how they apply the classes appears to be different?). Because the different main classes (1xx, 2xx, 3xx, 4xx and 5xx) come in separate shapefiles, the reclassification is simplier since I'm only dealing with class 3 for forests. 

For GER LULC Class 3, the reclassification to FNF is as follows:

| Original Value | Original Label                 | New Value | New Label  |
| -------------- | ------------------------------ | --------- | ---------- |
| 0              | NoData                         | 0         | Non-Forest |
| 311            | Broad-leaved forest            | 1         | Forest     |
| 312            | Coniferous forest              | 1         | Forest     |
| 313            | Mixed forest                   | 1         | Forest     |
| 321            | Natural grasslands             | 0         | Non-Forest |
| 322            | Moors and heathland            | 0         | Non-Forest |
| 323            | Sclerophyllous vegetation      | 1         | Forest     |
| 324            | Transitional woodland-shrub    | 1         | Forest     |
| 331            | Beaches - dunes - sands        | 0         | Non-Forest |
| 332            | Bare rocks                     | 0         | Non-Forest |
| 333            | Sparsely vegetated areas       | 0         | Non-Forest |
| 334            | Burnt areas                    | 0         | Non-Forest |
| 335            | Glaciers and perpetual snow    | 0         | Non-Forest |
| >=336          | NoData                         | -9999     | NoData     |

More simply: classes <310 = Non-Forest; classes 311-324 = Forest, classes >325 = Non-Forest.

See the CORINE reclassification for more explantion on the class conversion. 

In [3]:
# 8.3: RECLASSIFY GER LULC Class 3

# Store path to 5m GER LULC Class 3 as the intput file
ger_lulc3_input = "./processing/clc5_class3xx_3035_DE_5m.tif"

# Store the path to where gdal_calc.py is
gdal_calc = "./thesis_env_conda/Lib/site-packages/GDAL-3.10.1-py3.12-win-amd64.egg-info/scripts/gdal_calc.py"

# TAKES ABOUT 15 MIN TO RUN
# Runs gdal_calc.py in order to reclassify GER LULC Class 3 5m raster 
reclass_ger_lulc3 = subprocess.run(['python', gdal_calc, '-A', ger_lulc3_input, '--outfile=./processing/clc5_class3xx_3035_DE_5m_reclass.tif', '--calc=0*(A<=310)+1*((A>=311)*(A<=324))+0*((A>=325)*(A<=335))+-9999*(A>=336)', '--co=COMPRESS=LZW', '--co=BIGTIFF=YES', '--NoDataValue=-9999'],
                              capture_output=True, 
                              text=True)

print(reclass_ger_lulc3.stdout)
print(reclass_ger_lulc3.stderr)


0...10...20...30...40...50...60...70...80...90...100 - done.




#### Step 8.4: ESA Reclassify

For ESA, the reclassification to FNF is as follows:

| Original Value          | Original Label                 | New Value | New Label  |
| ----------------------- | ------------------------------ | --------- | ---------- |
| 0                       | NoData                         | -9999     | NoData     |
| 10, 11, 12, 20, 30, 40  | Agriculture classes            | 0         | Non-Forest |
| 50                      | Broadleaf evergreen            | 1         | Forest     |
| 60, 61, 62              | Broadleaf deciduous            | 1         | Forest     |
| 70, 71, 72              | Needleleaf evergreen           | 1         | Forest     |
| 80, 81, 82              | Needleleaf deciduous           | 1         | Forest     |
| 90                      | Mixed (broad & needle leaf)    | 1         | Forest     |
| 100                     | Mosaic tree & shrub / herb.    | 1         | Forest     |
| 110                     | Mosaic herb. / tree & shrub    | 0         | Non-Forest |
| 120, 121, 122           | Shrubland                      | 0         | Non-Forest |
| 130                     | Grassland                      | 0         | Non-Forest |
| 140, 150, 151, 152, 153 | Sparse vegetation              | 0         | Non-Forest |
| 160, 170                | Tree cover, flooded            | 1         | Forest     |
| 180                     | Wetland                        | 0         | Non-Forest |
| 190                     | Urban                          | 0         | Non-Forest |
| 200, 201, 202           | Bare Areas                     | 0         | Non-Forest |
| 210, 220                | Water / Permanent Snow & Ice   | 0         | Non-Forest |


This is based on how the producers of the data align their ESA classes with the IPCC land categories - see page 30 of the *Land Cover CCI Product User Guide*.

Note that the classes 160, 170 (which correspond to mangroves and are included as forests) are not present in the Germany dataset. 

In [4]:
# 8.4: RECLASSIFY ESA

# Store path to 5m ESA as the intput file
esa_input = "./processing/esa_lccs_class_3035_DE_5m.tif"

# Store the path to where gdal_calc.py is 
gdal_calc = "./thesis_env_conda/Lib/site-packages/GDAL-3.10.1-py3.12-win-amd64.egg-info/scripts/gdal_calc.py"

# TAKES ABOUT 60 MIN TO RUN 
# Runs gdal_calc.py in order to reclassify ESA 5m raster 
reclass_esa = subprocess.run(['python', gdal_calc, '-A', esa_input, '--outfile=./processing/esa_lccs_class_3035_DE_5m_reclass.tif', '--calc=-9999*(A==0)+0*((A>=10)*(A<=40))+1*((A>=50)*(A<=100))+0*((A>=110)*(A<=153))+1*((A>=160)*(A<=170))+0*(A>=180)', '--co=COMPRESS=LZW', '--co=BIGTIFF=YES', '--NoDataValue=-9999'],
                              capture_output=True, 
                              text=True)

print(reclass_esa.stdout)
print(reclass_esa.stderr)

0...10...20...30...40...50...60...70...80...90...100 - done.




#### Step 8.5: Hansen Reclassify

This might not be needed (depending on if I use Hansen and also whether I already end up with a FNF for Hansen from Step 4)

### Step 9: Clip to Germany

All the rasters created so far can now be clipped to the area of interest. I originally intended to clip the rasters to the German Natura2000 areas, however this proved to be too slow / create insanely large outputs . I have therefore adjusted my plans to clip to Germany instead - just to eliminate unneeded data from the larger mosaiced rasters and to ensure that there's a common extent.

Note that I decided to clip to the footprint of the CORINE data. This does mean that some data on the edges are lost from the GER LULC output. However, it was relatively little data and it makes sense to clip to a common region so that I am comparing the same areas across maps. 


IMPORTANT: This now ouputs vrts for each input dataset, BUT they are very large and I haven't been able to view them properly in QGIS. For now I will NOT work with these clipped versions and plan to try the following at later stages:
- use zonal statistics to extract pixel counts (which can then be summed and multiplied to convert to m2) for the Natura 2000 areas
- if I want to have some outputs for Germany as a whole, I could try clipping with the clipper.shp (generated in the batch script called below) at a later stage - i.e. when the maps have been converted to FNF, and maybe therefore be a bit smaller. 

In [None]:
# 9: CLIP RASTERS

# Runs batch script which runs gdalwarp for all vrt files ending in "_5m" in the processing folder - outputs a vrt for each
clip_to_DE = subprocess.run(["rq1_step1_sub3_clip.bat"], 
                             capture_output=True, 
                             text=True)

print(clip_to_DE.stdout)
print(clip_to_DE.stderr)