# Post Processing

This notebook is used to post process submissions. It's divided into 5 majors steps :
1. Extract automatically all archives from the input directory.
2. Convert all denses point cloud files (`.las` or `.laz`) into DEMs with `point2dem` command from ASP.
3. Coregister DEMs with references DEMs. And compute difference between coregistered DEMs and references DEMs.
4. Compute some global statistics on all the Post Processing.
5. Generate some plots.


In [1]:
%load_ext autoreload
%autoreload 2

In [2]:
import history
import history.postprocessing.visualization as viz
from pathlib import Path
import pandas as pd
import numpy as np

## Settings and Paths Managing

For the post Preprocessing workflow their is a lot of directory and paths. To simplify it, let's used the class `history.postprocessing.PathsManager`. 

In [3]:
BASE_DIR = Path("/mnt/summer/USERS/DEHECQA/history/output") 

CUSTOM_PATHS = {
    "processing_dir": BASE_DIR / "test_data",
    "raw_dems_dir": BASE_DIR / "test_data" / "dems",
    "coreg_dems_dir": BASE_DIR / "test_data" / "coregistered_dems",
    "casagrande_ref_dem_zoom": BASE_DIR / "test_data" / "ref_lowres_dems" / "casa_grande_reference_dem_zoom_30m.tif",
    "casagrande_ref_dem_large": Path("/mnt/summer/USERS/DEHECQA/history/data_final") / "casa_grande" / "aux_data" / "reference_dem_large.tif",
    "iceland_ref_dem_zoom": BASE_DIR / "test_data" / "ref_lowres_dems" / "iceland_reference_dem_zoom_30m.tif",
    "iceland_ref_dem_large": Path("/mnt/summer/USERS/DEHECQA/history/data_final") / "iceland" / "aux_data" / "reference_dem_large.tif",
}
paths_manager = history.postprocessing.PathsManager(BASE_DIR, CUSTOM_PATHS)

# other settings
OVERWRITE = False
DRY_RUN = False # set this to True to avoid process
MAX_WORKERS = 4
VERBOSE = True

postproc = history.postprocessing.PostProcessing(paths_manager)

## Step 1 : extract submissions

In [4]:
postproc.uncompress_all_submissions(OVERWRITE, DRY_RUN, VERBOSE)

## Visualize Post Processing files

In [5]:
postproc.plot_files_recap()

## Step 2: Convert Dense Point Cloud Files into DEMs

This step processes **dense point cloud files** to generate DEMs that are spatially aligned with their corresponding reference DEMs.

- **Input search**: Point cloud files are searched **recursively** in the directory `extracted_submissions_dir` using the pattern `*_dense_pointcloud.{las,laz}`.  
- **Output location**: The resulting DEMs are saved in the directory `raw_dems_dir`.  
- **Reference selection**: For each point cloud file, the appropriate reference DEM is selected based on its `site` and `dataset` metadata.  

**Note:**  
The method `iter_point2dem` launches the ASP `point2dem` command with the parameters:  
- `max_concurrent_commands`: Maximum number of parallel `point2dem` processes.  
- `max_threads_per_command`: Maximum number of threads allocated to each individual process.  


In [6]:
asp_path = None
max_concurrent_commands = 1

postproc.iter_point2dem(OVERWRITE, DRY_RUN, asp_path, max_concurrent_commands, max_threads_per_command=MAX_WORKERS)

### Clean up the RAW_DEM_DIRECTORY
Simply move log files into a folder and eventually remove temporary files

In [7]:
!mkdir -p {RAW_DEM_DIRECTORY}/log
!mv {RAW_DEM_DIRECTORY}/*-log-point2dem-*.txt {RAW_DEM_DIRECTORY}/log 2>/dev/null || true
!rm -f {RAW_DEM_DIRECTORY}/*-tmp-*.tif

## Step 3: Coregister DEMs

Coregister multiple DEMs in a directory to appropriate reference DEMs and return summary statistics.

This function iterates over all DEM files in `input_directory` ending with '-DEM.tif', 
selects the appropriate reference DEM and mask based on the site and dataset information,
and applies coregistration using the `coregister_dem` function. 

Coregistered DEMs are saved to `output_directory`. If `overwrite` is False, existing output files are skipped.
If `dry_run` is True, no coregistration is performed, only file names and planned operations are printed.

In [8]:
postproc.iter_coregister_dems(OVERWRITE, DRY_RUN, VERBOSE)

## Step 4: Compute statistics

In [24]:
# compute all statistics
global_statistics = history.postprocessing.statistics.compute_global_statistics(paths_manager)

# save statistics in a csv file
global_statistics.to_csv(paths_manager.get_path("processing_dir") / "postprocessing_statistics.csv")

In [95]:
landcover_statistics = history.postprocessing.statistics.compute_landcover_statistics(paths_manager)


landcover_statistics.to_csv(paths_manager.get_path("processing_dir") / "landcover_statistics.csv")

## Step 5 : Generate plots



In [26]:
global_statistics_path = paths_manager.get_path("processing_dir") / "postprocessing_statistics.csv"
landcover_statistics_path = paths_manager.get_path("processing_dir") / "landcover_statistics.csv"

plot_dirs = {"all": paths_manager.get_path("processing_dir") / "plots", "inliers": paths_manager.get_path("processing_dir") / "plots_inliers"}

stat = pd.read_csv(global_statistics_path, index_col="code")
stat_inliers = stat.loc[stat["inliers"]]

lc_stat = pd.read_csv(landcover_statistics_path)
lc_stat_inliers = lc_stat.loc[lc_stat["code"].isin(stat_inliers.index)]

stat_dict = {"all": stat, "inliers": stat_inliers}
lc_stat_dict = {"all": lc_stat, "inliers": lc_stat_inliers}


# print some informations about the global df
for key, df in stat_dict.items():
    print(f"\nSummary with {key} submissions : \n")
    participant_number = len(df["author"].unique())
    submission_number = len(df)

    print(f"Number of participants : {participant_number}")
    print(f"Number of submissions : {submission_number}")

    print("submission by site/dataset :\n")
    print(pd.crosstab(df["dataset"], df["site"]))

generate all statistics plots

In [42]:
for key, df in stat_dict.items():

    stat_dir = plot_dirs[key] / "statistics"
    viz.generate_nmad_groupby(df, stat_dir / "nmad")
    viz.barplot_var(df, stat_dir, "dense_pointcloud_point_count", "Point count")
    viz.barplot_var(df, stat_dir, "ddem_before_nmad", "NMAD before coregistration")
    viz.plot_coregistration_shifts(df, stat_dir / "coreg_shifts")

    # landcover plots
    viz.generate_landcover_boxplot_by_dataset_site(lc_stat_dict[key], stat_dir / "landcover_stats")
    viz.generate_landcover_nmad_by_dataset_site(lc_stat_dict[key], stat_dir / "landcover_stats")
    viz.generate_landcover_grouped_boxplot_by_dataset_site(lc_stat_dict[key], stat_dir / "landcover_stats")

generate individual coregistration plots

In [None]:
for key, df in stat_dict.items():
    viz.generate_coregistration_individual_plots(df, plot_dirs[key] / "coregistration_individual_plots")

generate std dems

In [41]:
for key, df in stat_dict.items():
    std_dems_dir = plot_dirs[key] / "std-DEMs"

    # computing std mnt and landcover stats on thoses 
    history.postprocessing.statistics.generate_std_dems_by_site_dataset(df, std_dems_dir)
    lc_df = history.postprocessing.statistics.compute_landcover_statistics_from_std_dems(paths_manager, std_dems_dir)
    lc_df.to_csv(std_dems_dir / "landcover_statistics.csv", index=False)

    # generate plots
    viz.generate_std_dem_plots(std_dems_dir)
    viz.generate_landcover_grouped_boxplot_from_std_dems(lc_df, std_dems_dir / "grouped-boxplot-landcover-std.png")

generate all mosaic plots (can take a while)

In [None]:
max_cols = {
        "all":{
        ("aerial", "casa_grande"): 5,
        ("aerial", "iceland"): 4,
        ("kh9mc", "casa_gande"): 4,
        ("kh9mc", "iceland"): 4,
        ("kh9pc", "casa_grande"): 4,
        ("kh9pc", "iceland"): 4,
    },
    "inliers":{
        ("aerial", "casa_grande"): 5,
        ("aerial", "iceland"): 4,
        ("kh9mc", "casa_gande"): 4,
        ("kh9mc", "iceland"): 4,
        ("kh9pc", "casa_grande"): 4,
        ("kh9pc", "iceland"): 4,
    }

}
for key, df in stat_dict.items():
    viz.generate_dems_mosaic(df, plot_dirs[key] / "mosaic-DEMs", max_cols[key])
    viz.generate_ddems_mosaic(df, plot_dirs[key] / "mosaic-DDEMs", max_cols[key])
    viz.generate_hillshades_mosaic(df, plot_dirs[key] / "mosaic-hillshades", max_cols[key])
    viz.generate_slopes_mosaic(df, plot_dirs[key] / "mosaic-slopes", max_cols[key])