<a href="https://colab.research.google.com/github/andyarnell/sepal_mgci/blob/master/SDG_15_4_2_Sub_A_Default_values.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **SDG 15.4.2 Subcomponent A: Calculate Global Default Values**

* This script allows batch processing for this indicator for all countries.

* Output is a combined excel file on your Google Drive.

* Runs on the cloud using [Google Colab](https://research.google.com/colaboratory/faq.html)

* Requires: [Google Earth Engine](https://earthengine.google.com/) (GEE) account and project and access to Google Drive


### Install required packages
NB may get error requiring a restart on first run only (works after this - just rerun from start)

TO DO Check with Daniel if can shift sepal_ui import requirement

In [14]:
# to automatically reload modules.
%load_ext autoreload

# Set to reload all modules before executing code.
%autoreload 2

# Function to install a package if it's not already installed
def install_if_not_exists(package_name):
    try:
        __import__(package_name)
        print(f"{package_name} is already installed.")
    except ImportError:
        !pip install -q {package_name}
        print(f"{package_name} has been installed.")

# List of packages to install if not already installed
packages_to_install = ['ee', 'geemap', 'unidecode', 'google-api-python-client',
                      'google-auth-httplib2', 'google-auth-oauthlib','sepal_ui']

# Install necessary packages
for package in packages_to_install:
    install_if_not_exists(package)

ee is already installed.
geemap is already installed.
unidecode is already installed.
google-api-python-client has been installed.
google-auth-httplib2 has been installed.
google-auth-oauthlib has been installed.
sepal_ui is already installed.


### Import packages

In [11]:
import os

import ee # google earth engine

ee.Authenticate()

ee.Initialize(project="ee-andyarnellgee") # NB gee project name is defined in parameters section

from datetime import datetime # for time stamping error log
import pandas as pd # pandas library for tabular data manipulation
import re # for manipulating strings
from unidecode import unidecode # converting symbols in country names to ascii compliant (required for naming GEE tasks)

# formatting excel report file
from openpyxl.utils import get_column_letter
from openpyxl.styles import Alignment

# Change current directory to sepal_mgci (i.e. the local copy of the github repository)
%cd "/content/sepal_mgci"

# Import parameters for the default DEM asset and a lookup table for land cover reclassification
from component.parameter.module_parameter import DEM_DEFAULT, LC_MAP_MATRIX

# # # # Import scripts and modules from cloned GitHub repository (i.e., functions for indicator calculation and formatting)
from component.scripts.gee import reduce_regions # for running summary statistics in GEE
from component.scripts.scripts import get_a_years, map_matrix_to_dict, parse_result, read_from_csv# parameter prep and reformatting
from component.scripts import sub_a, sub_b, mountain_area as mntn ###TO DO: ADD DESCRIPTIONS

# print("Imports complete")


/content/sepal_mgci


##Set parameters and install packages


Inputs

In [1]:
# Google Earth Engine project
gee_project_name = "insert cloud project here"  # a registered cloud project (if unsure of name see pic here: https://developers.google.com/earth-engine/cloud/assets)


# Admin boundaries asset
admin_asset_id = "FAO/GAUL/2015/level0" # administrative units feature collection

admin_asset_property_name = "ADM0_NAME" # property/column name for selecting admin boundaries (e.g. ISO3 code or country name)


# Land cover assets

# For SUB_A indicator, we need to set the following structure.
a_years = {
    1: {"asset": "users/amitghosh/sdg_module/esa/cci_landcover/2000", "year": 2000}, # baseline
    2: {"year": 2003, "asset": "users/amitghosh/sdg_module/esa/cci_landcover/2003"}, # subsequent reporting years...
    3: {"year": 2007, "asset": "users/amitghosh/sdg_module/esa/cci_landcover/2007"},
    4: {"year": 2010, "asset": "users/amitghosh/sdg_module/esa/cci_landcover/2010"},
}


Outputs

---



In [2]:
final_report_folder = "sdg_15_4_2_A_combined_report" # folder name in Google Drive for final output (if doesnt exist creates one)

final_report_name = "sdg_15_4_2_A_default_global.xlsx" # file name for final excel output

# export GEE tasks or not
export = False # default: True. Set to False if debugging or limiting accidental re-exporting of tasks

# prints more messages
debug = False # default: False. Set to True if debugging code

Temporary outputs


In [3]:
stats_csv_folder = "sdg_15_4_2_A_csvs" # for storing stats tables exported from GEE for each admin boundary/AOI

excel_reports_folder = "sdg_15_4_2_A_reports" # for storing formatted excel tables for each admin boundary/AOI

drive_home ="/content/drive/MyDrive/" # Google Drive location. Don't change unless you know this is incorrect

error_log_file_path = drive_home + excel_reports_folder + "/"+"1_error_log" +".csv" # for storing errors


### Access GitHub repository
Clones repository for SDG 15.4.2 into colab.
Provides functions and lookup tables etc.

In [4]:
# Change the current working directory to "/content".
%cd "/content"

# Clone the GitHub repository "sepal_mgci" into the current directory.
# NB 'fatal' error on reruns are typically just saying it already exists
!git clone https://github.com/sepal-contrib/sepal_mgci

/content
Cloning into 'sepal_mgci'...
remote: Enumerating objects: 2897, done.[K
remote: Counting objects: 100% (835/835), done.[K
remote: Compressing objects: 100% (296/296), done.[K
remote: Total 2897 (delta 552), reused 729 (delta 538), pack-reused 2062[K
Receiving objects: 100% (2897/2897), 4.89 MiB | 10.13 MiB/s, done.
Resolving deltas: 100% (1828/1828), done.


#### Setup Google Earth Engine
Launches access request pop up window

In [6]:
# import ee # google earth engine

# ee.Authenticate()

# ee.Initialize(project="ee-andyarnellgee") # NB gee project name is defined in parameters section

#### Setup Google Drive
Launches access request pop up window

In [7]:
# for accessing google drive
from google.colab import auth, drive
from googleapiclient.discovery import build

drive.mount('/content/drive')

Mounted at /content/drive


### Functions
TO DO: these will be stored somewhere else in GitHub hopefully - if Daniel is happy

In [12]:
def folder_exists(folder_name, parent_folder_id=None):
    """
    Check if a folder exists in Google Drive.

    Args:
    - folder_name (str): Name of the folder to check.
    - parent_folder_id (str): ID of the parent folder where to search for the folder.
                              Default is None, meaning the search will be performed in the root.

    Returns:
    - bool: True if the folder exists, False otherwise.
    """
    # Authenticate user
    auth.authenticate_user()

    # Build the Drive v3 service
    drive_service = build('drive', 'v3')

    # Prepare query to check if folder exists
    query = f"name='{folder_name}' and mimeType='application/vnd.google-apps.folder' and trashed=false"
    if parent_folder_id:
        query += f" and '{parent_folder_id}' in parents"

    try:
        # Execute the search query
        folders = drive_service.files().list(q=query, fields='files(id)', includeItemsFromAllDrives=True, supportsAllDrives=True).execute().get('files', [])
        return bool(folders)
    except Exception as e:
        print(f"An error occurred: {e}")
        return False


def create_folder(folder_name, parent_folder_id=None):
    """
    Create a folder in Google Drive.

    Args:
    - folder_name (str): Name of the folder to be created.
    - parent_folder_id (str): ID of the parent folder where the new folder will be created.
                              Default is None, meaning the folder will be created in the root.

    Returns:
    - str: ID of the newly created folder.
    """
    # Authenticate user
    auth.authenticate_user()

    # Build the Drive v3 service
    drive_service = build('drive', 'v3')

    # Prepare folder metadata
    folder_metadata = {
        'name': folder_name,
        'mimeType': 'application/vnd.google-apps.folder'
    }
    if parent_folder_id:
        folder_metadata['parents'] = [parent_folder_id]

    # Create the folder
    folder = drive_service.files().create(body=folder_metadata, fields='id').execute()

    # Return the ID of the newly created folder
    return folder.get('id')


def create_folder_if_not_exists(folder_name, parent_folder_id=None):
    """
    Create a folder in Google Drive if it doesn't already exist.

    Args:
    - folder_name (str): Name of the folder to be created.
    - parent_folder_id (str): ID of the parent folder where the new folder will be created.
                              Default is None, meaning the folder will be created in the root.

    Returns:
    - str: ID of the newly created folder or the existing folder if it already exists.
    """
    if folder_exists(folder_name, parent_folder_id):
        print(f"Folder '{folder_name}' already exists.")
        return None
    else:
        return create_folder(folder_name, parent_folder_id)


def sanitize_description(description):
    allowed_characters_pattern = r"[^a-zA-Z0-9.,:;_ \-]"  # Define a regex pattern for characters not in the allowed set
    sanitized_description = re.sub(allowed_characters_pattern, "", description)  # Remove characters not in the allowed set
    return sanitized_description


def append_excel_files(file_paths, num_sheets, output_file_path):
    # Initialize a dictionary to store combined DataFrames from different files
    combined_dfs = {}

    # Initialize a counter to track the progress of file processing
    counter = 0

    # Iterate over each file path in the list
    for file_path in file_paths:
        # Load the Excel file
        # xls = pd.ExcelFile(file_path)  # Reads file and stores as an ExcelFile object (using the Pandas library)
        xls = pd.ExcelFile(file_path, engine='openpyxl')  # Reads file and stores as an ExcelFile object (using the Pandas library)

        # xls = pd.ExcelFile(file_path, engine='xlrd')  # Reads file and stores as an ExcelFile object (using the Pandas library)

        # Increment the counter for each iteration
        counter += 1

        # Read each sheet from the Excel file into a DataFrame
        # Only read up to num_sheets specified
        dfs = {sheet_name: xls.parse(sheet_name) for sheet_name in xls.sheet_names[:num_sheets]}

        # Append the DataFrames to the combined_dfs dictionary
        for sheet_name, df in dfs.items():
            if sheet_name in combined_dfs:
                # If the sheet already exists in combined_dfs, concatenate the current DataFrame with the existing one
                combined_dfs[sheet_name] = pd.concat([combined_dfs[sheet_name], df], ignore_index=True)
            else:
                # If the sheet does not exist in combined_dfs, add the DataFrame directly
                combined_dfs[sheet_name] = df

        # Print the progress of processing, overwriting the previous progress
        print(f"\rProcessing {counter}/{len(file_paths)}: {file_path}", end="")

    # Write the combined DataFrames to the specified output file path
    with pd.ExcelWriter(output_file_path) as writer:
        for sheet_name, df in combined_dfs.items():
            # Write each DataFrame to a separate sheet in the output Excel file
            df.to_excel(writer, sheet_name=sheet_name, index=False)



## SUB INDICATOR A

Create list of boundaries to process

In [13]:
# admin boundary
admin_boundaries = ee.FeatureCollection(admin_asset_id)

# list to process
list_of_countries = admin_boundaries.aggregate_array(admin_asset_property_name).getInfo()

print ("Length of admin boundaries to process", len(list_of_countries))

list_of_countries = list(set(list_of_countries)) # remove dupicates

print ("Length of distinct admin boundaries to process", (len(set(list_of_countries))))


NameError: name 'admin_asset_id' is not defined

Read the default land cover remapping table and convert it to a dictionary

In [None]:
default_map_matrix = map_matrix_to_dict(LC_MAP_MATRIX)

Select years of land cover to process

In [None]:
# extracts the years from the a_years dictionary (as defined in parameters)
single_years = [y["year"] for  y in a_years.values()]

#### Run area statistics within admin boundaries
* Runs for each country and each mountain biobelt
* Gets area of land cover reclassified into the 10 SEAM classes
* Repeat for each year specified


In [None]:
# you can monitor your GEE tasks here : https://code.earthengine.google.com/tasks

create_folder_if_not_exists(stats_csv_folder) # to store outputs in google drive

counter=0 # starting place of counter used to keep track of number of tasks that are being run

for aoi_name in list_of_countries:

    aoi = admin_boundaries.filter(ee.Filter.eq(admin_asset_property_name,aoi_name))#.first()

    # gets areas of landcover in each mountain belt in each country
    # uses reduce_regions function imported from the cloned sepal_mgci git hub repository (see Imports section)
    # pixels counted at native resolution (scale) of input land cover (or DEM if RSA implementation)
    process = ee.FeatureCollection([
        ee.Feature(
            None,
            reduce_regions(
                aoi,
                remap_matrix=default_map_matrix,
                rsa=False,
                # dem=param.DEM_DEFAULT,
                dem=DEM_DEFAULT, #default digital elevation model (DEM). Relevant for the real surface area (RSA) implementation.
                lc_years= year,
                transition_matrix=False
            )
        ).set("process_id", year[0]["year"])
        for year in get_a_years(a_years) # creates GEE images and runs stats on each. Images to run are in the 'a_years" dictionary (above)
    ])

    #make name acceptable for running tasks (i.e., removes special characters)
    task_name = str(sanitize_description(unidecode(aoi_name)))


    task = ee.batch.Export.table.toDrive(
        **{  #asterisks unpack dictionary into keyword arguments format
            "collection": process,
            "description": task_name,
            "fileFormat": "CSV",
            "folder":stats_csv_folder,
            "selectors": [
                "process_id",
                "sub_a",
            ],
        }
    )

    counter+=1

    print (f"\r process {counter}/{len(list_of_countries)} {aoi_name} ", end="") #print in place (remove \r and end="" for verbose version)

    if export:
      task.start()



# Read, process, and create report tables

#####Manually check your earth engine task status, once the tasks are complete, run the next cell. https://code.earthengine.google.com/tasks

This formats individual excel reports for each country.
See Error_log.csv for missing files/errors

In [None]:
# Initialize the counter
counter = 0

# to store outputs in google drive
create_folder_if_not_exists(excel_reports_folder)

# Loop over each AOI name in the list of countries
for aoi_name in list_of_countries:
    counter += 1

    # Clean the AOI name
    aoi_name_clean = str(sanitize_description(unidecode(aoi_name)))

    # Construct the file path for the stats CSV file
    stats_csv_file = aoi_name_clean + ".csv"
    stats_csv_file_path = os.path.join(drive_home, stats_csv_folder, stats_csv_file)

    message = f"Process {counter}, {stats_csv_file}"

    try:
        # Read the results from the CSV file and parse it to a dictionary
        dict_results = read_from_csv(stats_csv_file_path)

        details = {
            "geo_area_name": aoi_name,
            "ref_area": " ",
            "source_detail": " ",
        }

        # Generate reports for the sub_a and mtn indicators
        sub_a_reports = [sub_a.get_reports(parse_result(dict_results[year]["sub_a"], single=True), year, **details) for year in single_years]
        mtn_reports = [mntn.get_report(parse_result(dict_results[year]["sub_a"], single=True), year, **details) for year in single_years]

        # Concatenate the mtn reports
        mtn_reports_df = pd.concat(mtn_reports)

        # Concatenate the sub a reports
        er_mtn_grnvi_df = pd.concat([report[0] for report in sub_a_reports])
        er_mtn_grncov_df = pd.concat([report[1] for report in sub_a_reports])

        # Define the output report file path
        report_file_path = os.path.join(drive_home, excel_reports_folder, aoi_name_clean + ".xlsx")

        # Create the Excel file with the reports
        with pd.ExcelWriter(report_file_path) as writer:
            mtn_reports_df.to_excel(writer, sheet_name="Table1_ER_MTN_TOTL", index=False)
            er_mtn_grncov_df.to_excel(writer, sheet_name="Table2_ER_MTN_GRNCOV", index=False)
            er_mtn_grnvi_df.to_excel(writer, sheet_name="Table3_ER_MTN_GRNCVI", index=False)

            # Adjust column widths and alignment for each sheet
            for sheetname in writer.sheets:
                worksheet = writer.sheets[sheetname]
                for col in worksheet.columns:
                    max_length = max(len(str(cell.value)) for cell in col)
                    column = col[0]
                    adjusted_width = max(max_length, len(str(column.value))) + 4
                    worksheet.column_dimensions[get_column_letter(column.column)].width = adjusted_width

                    # Align "obs_value" column to the right
                    if "OBS" in column.value:
                        for cell in col:
                            cell.alignment = Alignment(horizontal="right")

    except Exception as e:
        # If an error occurs, catch the exception and handle it
        message = f"process {counter}, {stats_csv_file}, Error: {e}"

        # Get the current time
        current_time = datetime.now().strftime('%Y-%m-%d %H:%M:%S')

        # Write the error message and file name to the error log file
        error_info = pd.DataFrame([[stats_csv_file, str(e), current_time]], columns=['File Name', 'Error Message', 'Time'])

        mode = 'w' if not os.path.exists(error_log_file_path) else 'a'
        header = False if os.path.exists(error_log_file_path) else True

        # Append or write to the error log file
        error_info.to_csv(error_log_file_path, mode=mode, header=header, index=False)

    print(message)


#### Combine excel files

Make a list of files to combine

In [None]:
# Directory path where Excel reports are stored
directory_path = os.path.join(drive_home, excel_reports_folder)

# List files in the directory with '.xlsx' extension
files = [file for file in os.listdir(directory_path) if file.endswith('.xlsx')]

# Create a list of full file paths
full_file_paths = [os.path.join(directory_path, file) for file in files]

# Print the number of Excel files found in the folder
print(f"Number of Excel files in folder: {len(full_file_paths)}")

# folder to store outputs in google drive
create_folder_if_not_exists(final_report_folder)

# File path for the combined final report
reports_combined_file_path = os.path.join(drive_home, final_report_folder, final_report_name)


##### Run function to combine into a single report

In [None]:
append_excel_files(file_paths=full_file_paths,num_sheets=3,output_file_path=reports_combined_file_path)

print (f"\n Complete! Output file for SDG 15.4.2 Component A here: {reports_combined_file_path}")