## RQ1 Forest Definition Comparisons 

[Add description]

Steps
1. Analyse & Compare FNF maps for Natura 2000 areas 
    1. Generate zonal statistics
    2. Format into geodataframes
    3. Calculations
    4. Visualisations
2. Create *Forest Presence/Absence Consensus Map* (Sexton et al 2016 approach)
3. Analyse Consensus Map for Natura 2000 areas
    1. Generate zonal statistics
    2. Format into geodataframe
    3. Calculations
    4. Visualisations

4. Germany analysis?
**NOTE: for now I'm only doing statistics and comparisons for the Natura 2000 areas, but there is data for all of Germany - so I could do some similar calculations and comparisons for all of Germany. Rather than using zonal statistics, here I would just need to calculate the count and sum for each whole FNF raster. Alternatively, I could do zonal statistics using the CORINE footprint area.**


In [91]:
# SETUP

# Note: this .ipynb file depends on files & folder structures created in rq1_step1_data_prep.ipynb 
# and rq1_step2_fao_forest.ipynb

# Import packages
import subprocess
import pickle
import glob
import json
import os
import pandas as pd

from rasterstats import zonal_stats
import geopandas as gpd
from shapely.geometry import shape

# Store gdal.py paths
gdal_calc = "./thesis_env_conda/Lib/site-packages/GDAL-3.10.1-py3.12-win-amd64.egg-info/scripts/gdal_calc.py"

### Step 1: Analyse & Compare FNF Maps for Natura 2000 areas

Generally, I will calculate the count (total pixels) and sum (total forest pixels) for the Natura 2000 areas for each FNF map. I will then perform additional calculations to generate results across all areas, and compare results across the different FNF maps.

#### Step 1.1: Generate zonal statistics

Zonal stats (sum and count) per geometry for each FNF map

Convert each to a feature collection

In [None]:
# 1.1: FNF NATURA ZONAL STATS 
# EACH OUTPUT TAKES ABUOT 30 MIN (3 HOURS TOTAL)

# Store paths to the FNF maps
esa_FNF = "./outputs/esa_lccs_class_3035_DE_5m_2018_FNF.tif"
jaxa_FNF = "./outputs/jaxa_FNF_3035_DE_5m_2018_FNF.tif"
corine_FNF = "./outputs/U2018_CLC2018_V2020_3035_DE_5m_2018_FNF.tif"
ger_lulc_FNF = "./outputs/clc5_class3xx_3035_DE_5m_2018_FNF.tif"
hansen_FNF = "./outputs/hansen_60cover_2018_3035_DE_5m_2018_FNF.tif"
fao_FNF = "./outputs/fao_approx_3035_DE_5m_2018_FNF.tif"

# Load the German Natura 2000 areas (w/o "RELEASE_DA" column as this causes problems later)
natura = gpd.read_file("./outputs/natura2000_3035_DE.shp", 
                       columns=["SITECODE", "SITENAME", "MS", "SITETYPE"])

# Calculate the sum & count for each Natura 2000 geometry, for each FNF map 
# Keep the geometeries by outputting in geojson format
esaFNF_natura_stats = zonal_stats(natura, esa_FNF,
                                  stats=['sum', 'count'],
                                  geojson_out=True)

jaxaFNF_natura_stats = zonal_stats(natura, jaxa_FNF,
                                  stats=['sum', 'count'],
                                  geojson_out=True)

corineFNF_natura_stats = zonal_stats(natura, corine_FNF,
                                  stats=['sum', 'count'],
                                  geojson_out=True)  

gerlulcFNF_natura_stats = zonal_stats(natura, ger_lulc_FNF,
                                  stats=['sum', 'count'],
                                  geojson_out=True)

hansenFNF_natura_stats = zonal_stats(natura, hansen_FNF,
                                  stats=['sum', 'count'],
                                  geojson_out=True)

faoFNF_natura_stats = zonal_stats(natura, fao_FNF,
                                  stats=['sum', 'count'],
                                  geojson_out=True)

# Convert results to feature collections
esaFNF_natura_stats_fc = {"type": "FeatureCollection","features": esaFNF_natura_stats}
jaxaFNF_natura_stats_fc = {"type": "FeatureCollection","features": jaxaFNF_natura_stats}
corineFNF_natura_stats_fc = {"type": "FeatureCollection","features": corineFNF_natura_stats}
gerlulcFNF_natura_stats_fc = {"type": "FeatureCollection","features": gerlulcFNF_natura_stats}
hansenFNF_natura_stats_fc = {"type": "FeatureCollection","features": hansenFNF_natura_stats}
faoFNF_natura_stats_fc = {"type": "FeatureCollection","features": faoFNF_natura_stats}


#### Step 1.2: Format into geodataframes

Use some functions to extract the zonal statistics data (which is in geojson format) into lists and then convert these lists to geodataframes (geopandas).

At this point, I also save the outputs using pickle.dump() so that step 1.1 doesn't need to be run every time. 

In [92]:
# 1.2: EXTRACT INTO LISTS

# Extract features from the feature collections
esa_features = esaFNF_natura_stats_fc["features"]
jaxa_features = jaxaFNF_natura_stats_fc["features"]
corine_features = corineFNF_natura_stats_fc["features"]
gerlulc_features = gerlulcFNF_natura_stats_fc["features"]
hansen_features = hansenFNF_natura_stats_fc["features"]
fao_features = faoFNF_natura_stats_fc["features"]

# Create empty lists for the properties I'm interested in
esa_codes, esa_names, esa_counts, esa_sums, esa_geoms = ([] for i in range(5))
jaxa_codes, jaxa_names, jaxa_counts, jaxa_sums, jaxa_geoms = ([] for i in range(5))
corine_codes, corine_names, corine_counts, corine_sums, corine_geoms = ([] for i in range(5))
ger_codes, ger_names, ger_counts, ger_sums, ger_geoms = ([] for i in range(5))
hansen_codes, hansen_names, hansen_counts, hansen_sums, hansen_geoms = ([] for i in range(5))
fao_codes, fao_names, fao_counts, fao_sums, fao_geoms = ([] for i in range(5))

# Create a function for filling lists (extracting according to json format)
def fill_lists(input_features, code_list, name_list, count_list, sum_list, geom_list):
    for row in input_features:
        code_list.append(row['properties']['SITECODE'])
        name_list.append(row['properties']['SITENAME'])
        count_list.append(row['properties']['count'])
        sum_list.append(row['properties']['sum'])
        geom_list.append(row['geometry'])

# Use the function to fill the lists
fill_lists(esa_features, esa_codes, esa_names, esa_counts, esa_sums, esa_geoms)
fill_lists(jaxa_features, jaxa_codes, jaxa_names, jaxa_counts, jaxa_sums, jaxa_geoms)
fill_lists(corine_features, corine_codes, corine_names, corine_counts, corine_sums, corine_geoms)
fill_lists(gerlulc_features, ger_codes, ger_names, ger_counts, ger_sums, ger_geoms)
fill_lists(hansen_features, hansen_codes, hansen_names, hansen_counts, hansen_sums, hansen_geoms)
fill_lists(fao_features, fao_codes, fao_names, fao_counts, fao_sums, fao_geoms)


In [None]:
# 1.2: CREATE GDFS 

# Create a function for creating the geodataframes from the lists
def create_stats_gdf(code_list, name_list, count_list, sum_list, geom_list):
    gdf = gpd.GeoDataFrame({'site_code' : code_list,
                            'site_name' : name_list,
                            'total_pixel_count' : count_list,
                            'forest_pixel_sum' : sum_list,
                            'geometry' : [shape(i) for i in geom_list] # Convert geometries to correct format
                            })
    return gdf

# Use the function to create the gdfs
esa_FNF_natura_stats_gdf = create_stats_gdf(esa_codes, esa_names, esa_counts, 
                                            esa_sums, esa_geoms)
jaxa_FNF_natura_stats_gdf = create_stats_gdf(jaxa_codes, jaxa_names, jaxa_counts, 
                                             jaxa_sums, jaxa_geoms)
corine_FNF_natura_stats_gdf = create_stats_gdf(corine_codes, corine_names, corine_counts, 
                                               corine_sums, corine_geoms)
gerlulc_FNF_natura_stats_gdf = create_stats_gdf(ger_codes, ger_names, ger_counts, 
                                                ger_sums, ger_geoms)
hansen_FNF_natura_stats_gdf = create_stats_gdf(hansen_codes, hansen_names, hansen_counts, 
                                               hansen_sums, hansen_geoms)
fao_FNF_natura_stats_gdf = create_stats_gdf(fao_codes, fao_names, fao_counts, 
                                            fao_sums, fao_geoms)



Unnamed: 0,site_code,site_name,total_pixel_count,forest_pixel_sum,geometry
0,DE1522301,Kalkquellmoor bei Klein Rheide,7672,0.0,"POLYGON ((4286080.879 3484370.839, 4286192.964..."
1,DE1631304,Seegalendorfer Gehölz,5089,4830.0,"POLYGON ((4383163.901 3467566.091, 4383189.582..."
2,DE1728351,Kalkflachmoor bei Mucheln,4487,2210.0,"POLYGON ((4348889.555 3461513.635, 4348877.247..."
3,DE2142301,Wald- und Kleingewässerlandschaft südöstlich v...,281021,135195.0,"POLYGON ((4503062.387 3425166.483, 4502986.885..."
4,DE2314431,Voslapper Groden-Nord,103028,70839.0,"POLYGON ((4195543.404 3392071.356, 4195227.043..."
...,...,...,...,...,...
5195,DE2527421,NSG Besenhorster Sandberge und Elbsandwiesen,60160,51066.0,"POLYGON ((4343427.976 3370091.622, 4343427.294..."
5196,DE2548301,Daberkower Heide,135306,134096.0,"POLYGON ((4564850.018 3384133.344, 4564901.113..."
5197,DE2549303,Schanzberge bei Brietzig,5100,7.0,"POLYGON ((4579195.958 3383253.136, 4579227.098..."
5198,DE2549304,Mühlbach Beeke,71086,29203.0,"POLYGON ((4576264.971 3382903.171, 4576276.153..."


In [94]:
# 1.2: SAVE GDFS 

# Save the geodataframe (so preceeding steps only need to be run once)
pickle.dump(esa_FNF_natura_stats_gdf, open("./outputs/esaFNF_natura_stats.p", "wb"))
pickle.dump(jaxa_FNF_natura_stats_gdf, open("./outputs/jaxaFNF_natura_stats.p", "wb"))
pickle.dump(corine_FNF_natura_stats_gdf, open("./outputs/corineFNF_natura_stats.p", "wb"))
pickle.dump(gerlulc_FNF_natura_stats_gdf, open("./outputs/gerlulcFNF_natura_stats.p", "wb"))
pickle.dump(hansen_FNF_natura_stats_gdf, open("./outputs/hansenFNF_natura_stats.p", "wb"))
pickle.dump(fao_FNF_natura_stats_gdf, open("./outputs/faoFNF_natura_stats.p", "wb"))

In [96]:
# 1.2: LOAD GDFS 
# START FROM HERE WHEN RETURNING TO SCRIPT

# Load the zonal stats outputs 
esa_FNF_natura_stats_gdf = pickle.load(open("./outputs/esaFNF_natura_stats.p", "rb"))
jaxa_FNF_natura_stats_gdf = pickle.load(open("./outputs/jaxaFNF_natura_stats.p", "rb"))
corine_FNF_natura_stats_gdf = pickle.load(open("./outputs/corineFNF_natura_stats.p", "rb"))
gerlulc_FNF_natura_stats_gdf = pickle.load(open("./outputs/gerlulcFNF_natura_stats.p", "rb"))
hansen_FNF_natura_stats_gdf = pickle.load(open("./outputs/hansenFNF_natura_stats.p", "rb"))
fao_FNF_natura_stats_gdf = pickle.load(open("./outputs/faoFNF_natura_stats.p", "rb"))


#### Step 1.3: Calculations

**Per Geometry**
1. Convert count and sum to area values 
    - Each pixel is 5m x 5m = 25m2 or 0.0025 ha 
2. Use the total forest area (sum) and total area (count) values to calculate percentage forest for each geometry

**Per FNF map (for all Natura 2000 areas)**
3. Sum the per geometry sum and count area values to get the total forest and total area for all Natura 2000 areas, for each FNF map
4. Use the total forest and total area to get the percentage forest for each FNF map

**Comparisons**
5. Calculate the average forest area (all Natura 2000 areas) across all FNF maps (one result)
6. Calculate the difference from the average for each FNF map (step 3 total forest outputs minus step 5 output) 

In [101]:
# 1.3: PER GEOMETRY CALCULATIONS

# Create a list of all the gdfs with the Natura 2000 zonal stats results
gdf_list = [esa_FNF_natura_stats_gdf, jaxa_FNF_natura_stats_gdf, corine_FNF_natura_stats_gdf,
            gerlulc_FNF_natura_stats_gdf, hansen_FNF_natura_stats_gdf, fao_FNF_natura_stats_gdf]

# Create new columns for count and sum in area units (hectares)
for gdf in gdf_list:
    gdf["total_area_ha"] = gdf["total_pixel_count"] * 0.0025
    gdf["total_forest_ha"] = gdf["forest_pixel_sum"] * 0.0025

# Calculate the percentage of forest in each geometry
for gdf in gdf_list:
    gdf["percent_forest"] = (gdf["total_forest_ha"] / gdf["total_area_ha"]) * 100

# Check
jaxa_FNF_natura_stats_gdf[0:3]  

Unnamed: 0,site_code,site_name,total_pixel_count,forest_pixel_sum,geometry,total_area_ha,total_forest_ha,percent_forest
0,DE1522301,Kalkquellmoor bei Klein Rheide,7672,0.0,"POLYGON ((4286080.879 3484370.839, 4286192.964...",19.18,0.0,0.0
1,DE1631304,Seegalendorfer Gehölz,5089,4362.0,"POLYGON ((4383163.901 3467566.091, 4383189.582...",12.7225,10.905,85.714286
2,DE1728351,Kalkflachmoor bei Mucheln,4487,2726.0,"POLYGON ((4348889.555 3461513.635, 4348877.247...",11.2175,6.815,60.753287


In [None]:
# 1.3: PER FNF MAP CALCULATIONS

# Create list of names for FNF maps
fnf_map_names = ["ESA Land Cover", "JAXA Forest/Non-Forest", "CORINE Land Use",
                 "Germany Land Use", "Hansen Tree Cover", "FAO-Forest Approximation"]

# Create a new partially filled dataframe for storing the results
all_fnf_natura_stats = pd.DataFrame(fnf_map_names, columns = ["FNF Map"])

# Create empty lists for total forest values & total area values
forest_per_FNF = []
area_per_FNF = []

# Calculate the total forest in Natura 2000 areas for each FNF map & append to list
for gdf in gdf_list:
    total_forest = gdf["total_forest_ha"].sum()
    forest_per_FNF.append(total_forest)

    total_area = gdf["total_area_ha"].sum()
    area_per_FNF.append(total_area)    # These should all be the same results for each FNF map

# Add the total forest values and total area values to the df
all_fnf_natura_stats["Total Forest (ha)"] = forest_per_FNF
all_fnf_natura_stats["Total Area (ha)"] = area_per_FNF

# Calculate the percentage of forest cover for each FNF map
all_fnf_natura_stats["Percent Forest (%)"] = (all_fnf_natura_stats["Total Forest (ha)"] / 
                                              all_fnf_natura_stats["Total Area (ha)"]) * 100

all_fnf_natura_stats

Unnamed: 0,FNF Map,Total Forest (ha),Total Area (ha),Percent Forest (%)
0,ESA Land Cover,3870882.0,7247848.0,53.407329
1,JAXA Forest/Non-Forest,3858510.0,7247848.0,53.236626
2,CORINE Land Use,4114077.0,7247848.0,56.762742
3,Germany Land Use,4142021.0,7247731.0,57.149211
4,Hansen Tree Cover,3527466.0,7247848.0,48.669147
5,FAO-Forest Approximation,4185290.0,7247731.0,57.746205


In [128]:
# 1.3: COMPARISON CALCULATIONS

# Calculate the average forest area across maps
avg_forest_area = all_fnf_natura_stats["Total Forest (ha)"].mean()

# Calculate the difference of each map to the average
avg_diff = all_fnf_natura_stats["Total Forest (ha)"] - avg_forest_area

# Add the difference results to the df
all_fnf_natura_stats["Difference from Mean (ha)"] = avg_diff

all_fnf_natura_stats

Unnamed: 0,FNF Map,Total Forest (ha),Total Area (ha),Percent Forest (%),Difference from Mean (ha)
0,ESA Land Cover,3870882.0,7247848.0,53.407329,-78825.60875
1,JAXA Forest/Non-Forest,3858510.0,7247848.0,53.236626,-91197.90625
2,CORINE Land Use,4114077.0,7247848.0,56.762742,164369.59625
3,Germany Land Use,4142021.0,7247731.0,57.149211,192313.60875
4,Hansen Tree Cover,3527466.0,7247848.0,48.669147,-422241.80875
5,FAO-Forest Approximation,4185290.0,7247731.0,57.746205,235582.11875


#### Step 1.4: Visualisations

- Make a prettier table output out of the df
- Create a bar chart of total forest per FNF map with dashed line for mean

- Create a map series of total forest (both absolute and relative for now) per geometry, per FNF map


### Step 2: Create Forest Presence/Absence Consensus map 

Sexton et al 2016 approach - basically just sum the FNF maps

In [None]:
# 2: FOREST CONSENSUS MAP
# TAKES ABOUT 20 MIN

# Store paths to the FNF maps
esa_FNF = "./outputs/esa_lccs_class_3035_DE_5m_2018_FNF.tif"
jaxa_FNF = "./outputs/jaxa_FNF_3035_DE_5m_2018_FNF.tif"
corine_FNF = "./outputs/U2018_CLC2018_V2020_3035_DE_5m_2018_FNF.tif"
ger_lulc_FNF = "./outputs/clc5_class3xx_3035_DE_5m_2018_FNF.tif"
hansen_FNF = "./outputs/hansen_60cover_2018_3035_DE_5m_2018_FNF.tif"
fao_FNF = "./outputs/fao_approx_3035_DE_5m_2018_FNF.tif" 

# Runs gdal_calc.py to add the input rasters together (ABOUT 15 MIN w 5 rasters)
consensus_map = subprocess.run(['python', 
                                gdal_calc, 
                                '-A', esa_FNF, 
                                '-B', jaxa_FNF, 
                                '-C', corine_FNF, 
                                '-D', ger_lulc_FNF, 
                                '-E', hansen_FNF,
                                '-F', fao_FNF,
                                '--outfile=./outputs/forest_consensus_3035_DE_5m_2018.tif', 
                                '--calc=A+B+C+D+E+F', 
                                '--co=COMPRESS=LZW', 
                                '--co=BIGTIFF=YES', 
                                '--NoDataValue=-9999'
                                ],
                                capture_output=True, 
                                text=True)

print(consensus_map.stdout)
print(consensus_map.stderr)


0...10...20...30...40...50...60...70...80...90...100 - done.




### Step 3: Analyse Consensus Map

I follow a roughly similar process here as step 1 - first generating the zonal stats (this time only the pixel count per class as the consensus map is treated as categorical) and then generating some basic sums per Natura 2000 area and for all the Natura 2000 areas together. In the last section I generate some visualisations of the results. 

#### Step 3.1: Generate zonal statistics

Zonal stats (sum and count) per geometry for the consensus map
- Because the data is qualitative/categorical, make sure to use the categorical=True option and (optionally) the cmap parameter (see rasterstats documentation)

Convert to a feature collection



In [None]:
# 3.1: CONSENSUS MAP ZONAL STATS 

#### Step 3.2: Format into geodataframe

Use the functions from step 1.2 to extract the zonal statistics data (which is in geojson format) into lists and then convert these lists to a geodataframe (geopandas).

At this point, save the outputs using pickle.dump() so that step 3.1 doesn't need to be run every time. 

In [None]:
# 3.2: EXTRACT INTO LISTS

In [None]:
# 3.2: CREATE GDF

In [None]:
# 3.2: SAVE GDF

In [None]:
#3.2: LOAD GDF
# START FROM HERE WHEN RETURNING TO SCRIPT

#### Step 3.3: Calculations

**Per Geometry**
1. Convert count per class to area values 
    - Each pixel is 5m x 5m = 25m2 or 0.0025 ha 
2. Calculate percentages per class by dividing each class area by the total area

**For whole consensus map (for all Natura 2000 areas)**
3. Create a dataframe with the sum of the area for each class
4. Calculate the percentage of each class across all Natura areas

#### Step 3.4: Visualisations

- Make a prettier table output out of the df
- Create a bar chart of total area in each class (see pic in "writing folder")
- Create pie chart of percentage of each class (see pic in "writing folder")

- Create a figure showing the whole consensus map
- Create some figures for specific areas to highlight differences
    - find area with low consensus (e.g. TüpOhrdruf-Jonastal - almost all value of 3)
    - find area with high consensus
    - find area with mixed results
