# CHM Batch Mapper - Multiple Canopy Height Model Processing

This notebook processes **multiple CHM files** in batch mode and generates high-quality PDF maps for each.

## Batch Processing Features
- Process all CHM files in a folder automatically
- Single vector file used for all CHMs
  - **Supported formats:** `.gpkg` (GeoPackage), `.shp` (Shapefile), `.kml`, `.geojson`
- Automatic detection of overlapping vectors for each CHM
- Individual PDF map for each CHM file
- Progress tracking and error handling
- Consistent map styling across all outputs

## 1. Setup and Imports

First, let's import all necessary libraries and our custom CHM mapper module.

In [None]:
# Standard libraries
import os
import glob
from pathlib import Path
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from datetime import datetime

# Geospatial libraries
import rasterio
from rasterio.mask import mask
from rasterio.transform import array_bounds
import geopandas as gpd
from shapely.geometry import box

# Our custom CHM mapper from modules folder
import importlib
import modules.chm_mapper as chm_mapper
importlib.reload(chm_mapper)
from modules.chm_mapper import CHMMapper

# Display settings
%matplotlib inline
plt.rcParams['figure.dpi'] = 100

print("‚úì All imports successful!")
print("‚úì Module reloaded - any code changes are now active")

‚úì All imports successful!
‚úì Module reloaded - any code changes are now active


## 2. Define Batch Processing Paths

**Input:**
- **CHM Folder:** Directory containing multiple CHM GeoTIFF files
- **Vector Path:** Can be either:
  - A single vector file (`.gpkg`, `.shp`, `.kml`, `.geojson`)
  - A folder containing multiple vector files (will auto-discover all supported formats)
- **GeoPackage (`.gpkg`) is recommended for best performance**

**Output:**
- Individual PDF map for each CHM file

In [None]:
# Folder containing multiple CHM GeoTIFF files
chm_folder = r"data/chm_files"  # Update this path to your CHM folder

# Path to your vector data - can be either:
# 1. A single vector file: vector.gpkg, vector.shp, vector.kml, vector.geojson
# 2. A folder containing multiple vector files (will auto-discover all supported formats)
vector_path = r"data/vector_files"  # Update this path to your vector folder or file

# Path to CSV files (optional) - set to None if no CSV files available
# CSV files should have the same name as shapefile (e.g., Arrach.shp -> Arrach.csv)
# Will be used to merge additional data if fields are missing from shapefile
csv_folder = None  # Set to your CSV folder path if needed, e.g., r"data/csv_files"

# Merge key column (must exist in both shapefile and CSV)
MERGE_KEY_COLUMN = "afl"  # Column used to merge shapefile and CSV data

# Output directory for batch maps
output_dir = r"output/batch_maps"
os.makedirs(output_dir, exist_ok=True)

# Custom Field Configuration
# Location Name: extracted from shapefile field (shown once if unique)
LOCATION_FIELD = "name"  # Field for location name (e.g., "Forest_Area_1")

# Subtitle Configuration
# Format: "FIELD1" + "_" + "FIELD2_value1" + "_" + "FIELD2_value2" + ...
# Example with default settings: "Area_1a_1b_2a"
SUBTITLE_USE_CUSTOM_FORMAT = True  # Set to False to use CHM filename
SUBTITLE_FIELD1 = ""  # First field (usually location name, shows once if unique)
SUBTITLE_FIELD2 = "id"  # Second field (shows all unique values, e.g., compartments)

# Clipping Configuration
CLIP_TO_SHAPES = True         # If True, clip CHM to shapes extent
CLIP_BUFFER_METERS = 30       # Buffer (meters) around shapes when clipping

print(f"CHM folder: {chm_folder}")
print(f"Vector path: {vector_path}")
print(f"Output directory: {output_dir}")

CHM folder: D:\Drohnendaten\15_FESMART\01_Daten\06_AELF-Cham\02_Data_output\CHMs\Zandt
Vector path: D:\Drohnendaten\15_FESMART\01_Daten\06_AELF-Cham\00_Planung\Shapes FE\Zandt
CSV folder: D:\Drohnendaten\15_FESMART\01_Daten\06_AELF-Cham\00_Planung\Info
Merge key column: 'afl'
Output directory: D:\Drohnendaten\15_FESMART\01_Daten\06_AELF-Cham\03_Mapping_output\Zandt
Location field: 'name_eigen'
Clip to shapes: True (buffer 30 m)
Subtitle format: '_lage1_lage2_...'

‚úì Vector folder detected - searching for vector files...
Found 1 vector file(s):
  1. Zandt.shp

Searching for CHM files...
Found 5 CHM file(s):
  1. 20250909_066_Zandt-13305024_M3E_CHM.tif
  2. 20250909_067_Zandt-17655713_M3E_CHM.tif
  3. 20250909_068_Zandt-5187721_M3E_CHM.tif
  4. 20250909_069_Zandt-14027838_M3E_CHM.tif
  5. 20250909_070_Zandt-1158151_M3E_CHM.tif


## 3. Inspect Vector Data

Let's check the vector data that will be used for the CHMs.

## 3a. Test CSV Merge (Diagnostic)

Let's manually test the CSV merge to verify it's working correctly.

In [108]:
# TEST: Manually check CSV merge for the first vector file
if vector_files and csv_folder:
    test_vector = vector_files[0]
    print(f"Testing with vector file: {os.path.basename(test_vector)}")
    print(f"CSV folder: {csv_folder}\n")
    
    # Load shapefile
    gdf_test = gpd.read_file(test_vector)
    print(f"1. Shapefile columns: {list(gdf_test.columns)}")
    print(f"   Number of features: {len(gdf_test)}")
    if MERGE_KEY_COLUMN in gdf_test.columns:
        print(f"   '{MERGE_KEY_COLUMN}' values: {gdf_test[MERGE_KEY_COLUMN].tolist()}")
    print()
    
    # Look for CSV
    vector_basename = Path(test_vector).stem
    csv_test_path = os.path.join(csv_folder, f"{vector_basename}.csv")
    print(f"2. Looking for CSV: {csv_test_path}")
    print(f"   CSV exists: {os.path.exists(csv_test_path)}\n")
    
    if os.path.exists(csv_test_path):
        # Load CSV (try different encodings and separators for German CSV files)
        csv_test = None
        for encoding in ['utf-8', 'cp1252', 'iso-8859-1', 'latin1']:
            for sep in [';', ',']:
                try:
                    csv_test = pd.read_csv(csv_test_path, encoding=encoding, sep=sep)
                    # Check if it was parsed correctly (more than 1 column)
                    if len(csv_test.columns) > 1:
                        print(f"3. CSV loaded successfully with encoding: {encoding}, separator: '{sep}'")
                        break
                except (UnicodeDecodeError, Exception):
                    continue
            if csv_test is not None and len(csv_test.columns) > 1:
                break
        
        if csv_test is None or len(csv_test.columns) <= 1:
            print("   ‚úó Could not read CSV with any standard encoding/separator")
        else:
            print(f"   CSV columns: {list(csv_test.columns)}")
            print(f"   Number of rows: {len(csv_test)}")
            if MERGE_KEY_COLUMN in csv_test.columns:
                print(f"   '{MERGE_KEY_COLUMN}' values: {csv_test[MERGE_KEY_COLUMN].tolist()}")
            print(f"\n   CSV Preview:")
            display(csv_test.head())
            
            # Try merge
            print(f"\n4. Attempting merge on '{MERGE_KEY_COLUMN}'...")
            if MERGE_KEY_COLUMN in gdf_test.columns and MERGE_KEY_COLUMN in csv_test.columns:
                gdf_merged_test = gdf_test.merge(csv_test, on=MERGE_KEY_COLUMN, how='left', suffixes=('', '_csv'))
                print(f"   ‚úì Merge successful!")
                print(f"   Merged columns: {list(gdf_merged_test.columns)}")
                
                # Check for 'lage'
                if 'lage' in gdf_merged_test.columns:
                    print(f"\n   ‚úì‚úì‚úì 'lage' column is present!")
                    print(f"   'lage' values: {gdf_merged_test['lage'].dropna().tolist()}")
                else:
                    print(f"\n   ‚úó‚úó‚úó 'lage' column NOT found after merge")
                
                print(f"\n   Merged data preview:")
                display(gdf_merged_test[[col for col in gdf_merged_test.columns if col != 'geometry']].head())
            else:
                print(f"   ‚úó Merge key '{MERGE_KEY_COLUMN}' not found in both files")
    else:
        print(f"   ‚ö† CSV file not found!")
        print(f"\n   Files in CSV folder:")
        if os.path.isdir(csv_folder):
            csv_files = [f for f in os.listdir(csv_folder) if f.endswith('.csv')]
            for f in csv_files:
                print(f"      - {f}")
else:
    print("‚ö† No vector files or CSV folder not configured")

Testing with vector file: Zandt.shp
CSV folder: D:\Drohnendaten\15_FESMART\01_Daten\06_AELF-Cham\00_Planung\Info

1. Shapefile columns: ['objectid', 'gemeinde', 'gmkgcode', 'zaehler', 'nenner', 'zaeh_nenn', 'afl', 'egtid', 'name_eigen', 'vorname', 'anrede', 'namensbest', 'akademisch', 'geburtsnam', 'geburtsdat', 'strassehau', 'plz', 'ort', 'herkunft', 'amtsgerich', 'grundbuchb', 'miteigentu', 'artrechtsg', 'anteileige', 'buchungsar', 'lfdnrbesta', 'flstkennz', 'gbbz', 'blatt', 'aelf_kurz', 'globalid', 'geometry']
   Number of features: 9
   'afl' values: [24170, 7465, 480, 5170, 4305, 8417, 2476, 71096, 1500]

2. Looking for CSV: D:\Drohnendaten\15_FESMART\01_Daten\06_AELF-Cham\00_Planung\Info\Zandt.csv
   CSV exists: True

3. CSV loaded successfully with encoding: cp1252, separator: ';'
   CSV columns: ['Gemeinde', 'gmkgcode', 'zaehler', 'nenner', 'afl', 'lage', 'gemeinde', 'zaeh_nenn', 'Fl√§che [qm]:']
   Number of rows: 9
   'afl' values: [5170, 8417, 71096, 7465, 480, 4305, 2476, 1

Unnamed: 0,Gemeinde,gmkgcode,zaehler,nenner,afl,lage,gemeinde,zaeh_nenn,Fl√§che [qm]:
0,Zandt,5154,371,6,5170,Auf der Riesel,372177,371/6,"5.167,59"
1,,5154,371,39,8417,Auholz,372177,371/39,"8.425,11"
2,,5161,314,34,71096,Pfahlholz,372143,314/34,"71.693,80"
3,,5154,247,1,7465,Aufelder,372177,247/1,"7.471,83"
4,,5154,789,2,480,Ochsenweide,372177,789/2,48288



4. Attempting merge on 'afl'...
   ‚úì Merge successful!
   Merged columns: ['objectid', 'gemeinde', 'gmkgcode', 'zaehler', 'nenner', 'zaeh_nenn', 'afl', 'egtid', 'name_eigen', 'vorname', 'anrede', 'namensbest', 'akademisch', 'geburtsnam', 'geburtsdat', 'strassehau', 'plz', 'ort', 'herkunft', 'amtsgerich', 'grundbuchb', 'miteigentu', 'artrechtsg', 'anteileige', 'buchungsar', 'lfdnrbesta', 'flstkennz', 'gbbz', 'blatt', 'aelf_kurz', 'globalid', 'geometry', 'Gemeinde', 'gmkgcode_csv', 'zaehler_csv', 'nenner_csv', 'lage', 'gemeinde_csv', 'zaeh_nenn_csv', 'Fl√§che [qm]:']

   ‚úì‚úì‚úì 'lage' column is present!
   'lage' values: ['Hohe Rieder', 'Aufelder', 'Ochsenweide', 'Auf der Riesel', 'Bergh√§usl', 'Auholz', 'Hohe Rieder', 'Pfahlholz', 'Bergh√§usl']

   Merged data preview:


Unnamed: 0,objectid,gemeinde,gmkgcode,zaehler,nenner,zaeh_nenn,afl,egtid,name_eigen,vorname,...,aelf_kurz,globalid,Gemeinde,gmkgcode_csv,zaehler_csv,nenner_csv,lage,gemeinde_csv,zaeh_nenn_csv,Fl√§che [qm]:
0,1158151,372177,5154,794,0,794/0,24170,095154_1210_1,Gemeinde Zandt,,...,ch,{AF1ACD63-6327-49E9-A492-416D3BD5043E},,5154,794,0,Hohe Rieder,372177,794/0,"24.345,41"
1,5187721,372177,5154,247,1,247/1,7465,095154_1210_1,Gemeinde Zandt,,...,ch,{CE64DB4E-728A-4FBF-8821-59D7821819DA},,5154,247,1,Aufelder,372177,247/1,"7.471,83"
2,7896069,372177,5154,789,2,789/2,480,095154_1210_1,Gemeinde Zandt,,...,ch,{1D9636EF-89D5-4C13-9576-38F86FF02784},,5154,789,2,Ochsenweide,372177,789/2,48288
3,13305024,372177,5154,371,6,371/6,5170,095154_1210_1,Gemeinde Zandt,,...,ch,{195B01BB-49D5-4D20-B1CD-9DA3ECBD5B59},Zandt,5154,371,6,Auf der Riesel,372177,371/6,"5.167,59"
4,13437600,372177,5154,786,0,786/0,4305,095154_1210_1,Gemeinde Zandt,,...,ch,{43A29E44-F662-4FFB-A690-62E4D050948D},,5154,786,0,Bergh√§usl,372177,786/0,"4.309,08"


In [109]:
# Load and inspect vector data
if vector_files:
    print(f"Loading {len(vector_files)} vector file(s)...\n")
    
    for i, vf in enumerate(vector_files, 1):
        print(f"Vector File {i}: {os.path.basename(vf)}")
        gdf = gpd.read_file(vf)
        print(f"  CRS: {gdf.crs}")
        print(f"  Features: {len(gdf)}")
        print(f"  Bounds: {gdf.total_bounds}")
        print(f"  Columns: {list(gdf.columns)}\n")
    
    # Show preview of first vector file
    print("Preview of first vector file:")
    gdf_first = gpd.read_file(vector_files[0])
    display(gdf_first.head())
else:
    print("‚ö† No vector files found!")

Loading 1 vector file(s)...

Vector File 1: Zandt.shp
  CRS: EPSG:25832
  Features: 9
  Bounds: [ 770837.0501 5449162.8901  773254.2001 5453261.4201]
  Columns: ['objectid', 'gemeinde', 'gmkgcode', 'zaehler', 'nenner', 'zaeh_nenn', 'afl', 'egtid', 'name_eigen', 'vorname', 'anrede', 'namensbest', 'akademisch', 'geburtsnam', 'geburtsdat', 'strassehau', 'plz', 'ort', 'herkunft', 'amtsgerich', 'grundbuchb', 'miteigentu', 'artrechtsg', 'anteileige', 'buchungsar', 'lfdnrbesta', 'flstkennz', 'gbbz', 'blatt', 'aelf_kurz', 'globalid', 'geometry']

Preview of first vector file:


Unnamed: 0,objectid,gemeinde,gmkgcode,zaehler,nenner,zaeh_nenn,afl,egtid,name_eigen,vorname,...,artrechtsg,anteileige,buchungsar,lfdnrbesta,flstkennz,gbbz,blatt,aelf_kurz,globalid,geometry
0,1158151,372177,5154,794,0,794/0,24170,095154_1210_1,Gemeinde Zandt,,...,,,1100,208,095154___00794______,95154,1210,ch,{AF1ACD63-6327-49E9-A492-416D3BD5043E},"MULTIPOLYGON (((772924.97 5453020.18, 772929.9..."
1,5187721,372177,5154,247,1,247/1,7465,095154_1210_1,Gemeinde Zandt,,...,,,1100,73,095154___002470001__,95154,1210,ch,{CE64DB4E-728A-4FBF-8821-59D7821819DA},"POLYGON ((770941.58 5451489.42, 770951.67 5451..."
2,7896069,372177,5154,789,2,789/2,480,095154_1210_1,Gemeinde Zandt,,...,,,1100,399,095154___007890002__,95154,1210,ch,{1D9636EF-89D5-4C13-9576-38F86FF02784},"POLYGON ((773240.02 5452981.84, 773254.2 54529..."
3,13305024,372177,5154,371,6,371/6,5170,095154_1210_1,Gemeinde Zandt,,...,,,1100,101,095154___003710006__,95154,1210,ch,{195B01BB-49D5-4D20-B1CD-9DA3ECBD5B59},"POLYGON ((771080.44 5450302.71, 771077.2 54503..."
4,13437600,372177,5154,786,0,786/0,4305,095154_1210_1,Gemeinde Zandt,,...,,,1100,397,095154___00786______,95154,1210,ch,{43A29E44-F662-4FFB-A690-62E4D050948D},"POLYGON ((773147.05 5452892.57, 773169.77 5452..."


## 4. Quick Preview of One CHM

Let's inspect the first CHM file to verify everything is working.

In [110]:
if chm_files:
    preview_chm = chm_files[0]
    print(f"Preview of: {os.path.basename(preview_chm)}")
    
    with rasterio.open(preview_chm) as src:
        print(f"\nCHM Raster Information:")
        print(f"  CRS: {src.crs}")
        print(f"  Dimensions: {src.width} x {src.height}")
        print(f"  Bounds: {src.bounds}")
        print(f"  Resolution: {src.res}")
        print(f"  NoData value: {src.nodata}")
        
        # Read and calculate statistics
        chm_data = src.read(1, masked=True)
        if hasattr(chm_data, 'filled'):
            chm_data = np.where(chm_data.mask, np.nan, chm_data.data)
        
        valid_data = chm_data[~np.isnan(chm_data)]
        print(f"\n  Height range: {np.min(valid_data):.2f} - {np.max(valid_data):.2f} m")
        print(f"  Mean height: {np.mean(valid_data):.2f} m")
        print(f"  Valid pixels: {len(valid_data):,} / {chm_data.size:,} ({len(valid_data)/chm_data.size*100:.1f}%)")
else:
    print("No CHM files found!")

Preview of: 20250909_066_Zandt-13305024_M3E_CHM.tif

CHM Raster Information:
  CRS: EPSG:25832
  Dimensions: 7830 x 7651
  Bounds: BoundingBox(left=770901.6136531901, bottom=5450141.332445897, right=771284.596175752, top=5450515.559684738)
  Resolution: (0.048912199560908476, 0.048912199560897075)
  NoData value: -9999.0

  Height range: -5.53 - 35.89 m
  Mean height: 13.28 m
  Valid pixels: 39,206,635 / 59,907,330 (65.4%)


## 5. Map Configuration (Same for All CHMs)

Define the map styling that will be applied to all batch-generated maps.

In [111]:
# Map Element Configuration
# This configuration will be used for ALL CHM files in the batch
MAP_CONFIG = {
    # ========== TEXT CONTENT (Change all text labels here!) ==========
    'title': 'Baumh√∂henkarte',           # Main title text (can be overridden per CHM)
    'subtitle': '',                      # Will be set automatically from CHM filename
    'legend_title': 'Legende',           # Legend title text
    'legend_subtitle1': '',  # First line below legend title
    'legend_subtitle2': 'Drohnenbefliegung 2025',  # Second line below legend title
    'overview_label': '√úbersicht',       # Text below overview map
    'location_name': 'Gemeinde Parsberg',  # Text above overview map
    
    # ========== TITLE AND SUBTITLE ==========
    'title_fontsize': 16,
    'subtitle_fontsize': 11,
    'title_position_x': 0.82,
    'title_position_y': 0.95,
    'title_align': 'left',
    
    # ========== LEGEND ==========
    'legend_fontsize': 8,
    'legend_title_fontsize': 11,
    'legend_subtitle_fontsize': 8,
    'legend_ncol': 1,
    'legend_position_x': 0.88,
    'legend_position_y': 0.63,
    'legend_loc': 'center',
    
    # Legend size controls
    'legend_labelspacing': 0.8,
    'legend_handlelength': 1.5,
    'legend_handletextpad': 0.81,
    'legend_columnspacing': 1.0,
    'legend_border_linewidth': 0,
    
    # Custom background box for legend
    'legend_background_box_x': 0.815,
    'legend_background_box_y': 0.18,
    'legend_background_box_width': 0.17,
    'legend_background_box_height': 0.60,
    'legend_background_box_linewidth': 1,
    
    # ========== NORTH ARROW ==========
    'north_arrow_fontsize': 14,
    'north_arrow_position_x': 0.97,
    'north_arrow_position_y': 0.22,
    'north_arrow_length': 0.015,
    'north_arrow_width': 2,
    'north_arrow_pad': 0.3,
    
    # ========== OVERVIEW/LOCATOR MAP ==========
    'overview_position_x': 0.90,
    'overview_position_y': 0.27,
    'overview_width': 0.15,
    'overview_height': 0.15,
    'overview_fontsize': 7,
    'overview_border_width': 0.8,
    'overview_chm_box_width': 1.5,
    
    # Location name text
    'location_fontsize': 10,
    'location_y_offset': 0.005,
    
    # ========== SCALE BAR ==========
    'scalebar_fontsize': 9,
    'scalebar_height': 0.015,
    'scalebar_length': 0.22,
    'scalebar_location': 'lower right',
    'scalebar_pad_x': 0.02,  # keep snug to lower-right
    'scalebar_pad_y': 4,     # point gap between number and bar
    
    # ========== SCALE TEXT ==========
    'scale_text_fontsize': 10,
    'scale_text_position_x': 0.91,
    'scale_text_position_y': 0.02,
    'scale_text_align': 'left',
    
    # ========== VISUAL STYLING ==========
    'vector_linewidth': 1.5,
    'border_linewidth': 0,
    
    # ========== MAP POSITION ON PAGE ==========
    'map_left': 0.02,
    'map_right': 0.8,
    'map_top': 0.98,
    'map_bottom': 0.02,
    
    # ========== DEBUGGING ==========
    'show_box_border': True
}

print("‚úì Configuration loaded")
print("This configuration will be applied to all CHM files in the batch")

‚úì Configuration loaded
This configuration will be applied to all CHM files in the batch


## 6. Batch Processing Function

This function processes each CHM file and generates a map.

In [112]:
def load_and_merge_vector_csv(vector_path, csv_folder, merge_key):
    """
    Load vector file and merge with CSV data if available.
    """
    gdf = gpd.read_file(vector_path)
    print(f"       Vector columns before merge: {list(gdf.columns)}")
    
    if csv_folder is None or not os.path.isdir(csv_folder):
        return gdf
    
    vector_basename = Path(vector_path).stem
    csv_path = os.path.join(csv_folder, f"{vector_basename}.csv")
    
    if not os.path.exists(csv_path):
        print(f"       No CSV file found at: {csv_path}")
        return gdf
    
    try:
        csv_df = None
        for encoding in ['utf-8', 'cp1252', 'iso-8859-1', 'latin1']:
            for sep in [';', ',']:
                try:
                    csv_df = pd.read_csv(csv_path, encoding=encoding, sep=sep)
                    if len(csv_df.columns) > 1:
                        print(f"       ‚úì Found matching CSV: {os.path.basename(csv_path)} (encoding: {encoding}, separator: '{sep}')")
                        break
                except (UnicodeDecodeError, Exception):
                    continue
            if csv_df is not None and len(csv_df.columns) > 1:
                break
        
        if csv_df is None or len(csv_df.columns) <= 1:
            print(f"       ‚ö† Warning: Could not read CSV. Skipping merge.")
            return gdf
        
        print(f"       CSV columns: {list(csv_df.columns)}")
        
        if merge_key not in gdf.columns:
            print(f"       ‚ö† Warning: Merge key '{merge_key}' not found in shapefile.")
            return gdf
        
        if merge_key not in csv_df.columns:
            print(f"       ‚ö† Warning: Merge key '{merge_key}' not found in CSV.")
            return gdf
        
        gdf_merged = gdf.merge(csv_df, on=merge_key, how='left', suffixes=('', '_csv'))
        merged_count = gdf_merged[merge_key].notna().sum()
        print(f"       ‚úì Merged CSV data: {merged_count} features matched on '{merge_key}'")
        
        if 'lage' in gdf_merged.columns:
            lage_values = gdf_merged['lage'].dropna().unique()
            print(f"       'lage' unique values (raw): {lage_values}")
        
        return gdf_merged
    
    except Exception as e:
        print(f"       ‚ö† Error merging CSV: {e}")
        return gdf


def get_unique_filename(base_filename, used_filenames):
    """
    Generate a unique filename by adding _2, _3, etc. suffix if needed.
    """
    if base_filename not in used_filenames:
        used_filenames[base_filename] = 1
        return base_filename
    else:
        used_filenames[base_filename] += 1
        count = used_filenames[base_filename]
        return f"{base_filename}_{count}"


def get_valid_field_values(gdf, field_name):
    """
    Get valid (non-empty, non-whitespace) values from a GeoDataFrame column.
    Returns list of valid string values.
    """
    if field_name not in gdf.columns:
        return []
    
    values = gdf[field_name].dropna().unique()
    valid_values = []
    for v in values:
        str_val = str(v).strip()
        # Check if it's valid: non-empty, not 'nan', not 'none', not just whitespace
        if str_val and str_val.lower() not in ('nan', 'none', 'null', ''):
            valid_values.append(str_val)
    
    return valid_values


def process_single_chm(chm_path, vector_files, output_dir, config, csv_folder=None, 
                       merge_key="afl", use_custom_subtitle=True, field1="ort", 
                       field2="lage", location_field="name_eigen", clip_to_shapes=False, 
                       clip_buffer=10, index=None, total=None, used_filenames=None):
    """
    Process a single CHM file and generate a PDF map.
    """
    if used_filenames is None:
        used_filenames = {}
    
    try:
        chm_filename = Path(chm_path).stem
        
        if index and total:
            print(f"\n{'='*80}")
            print(f"Processing [{index}/{total}]: {chm_filename}")
            print(f"{'='*80}")
        else:
            print(f"\nProcessing: {chm_filename}")
        
        if isinstance(vector_files, str):
            vector_files_list = [vector_files]
        else:
            vector_files_list = vector_files
        
        print(f"  1. Checking {len(vector_files_list)} vector file(s) for overlap...")
        with rasterio.open(chm_path) as src:
            chm_bounds = src.bounds
            chm_crs = src.crs
        
        all_overlapping_gdfs = []
        matching_vector_files = []
        
        for vf in vector_files_list:
            gdf = load_and_merge_vector_csv(vf, csv_folder, merge_key)
            
            if gdf.crs != chm_crs:
                gdf_reprojected = gdf.to_crs(chm_crs)
            else:
                gdf_reprojected = gdf
            
            overlapping_indices = []
            for idx, geom in enumerate(gdf_reprojected.geometry):
                geom_bounds = geom.bounds
                overlaps = not (geom_bounds[2] < chm_bounds.left or 
                               geom_bounds[0] > chm_bounds.right or
                               geom_bounds[3] < chm_bounds.bottom or 
                               geom_bounds[1] > chm_bounds.top)
                if overlaps:
                    overlapping_indices.append(idx)
            
            if overlapping_indices:
                overlapping_gdf = gdf_reprojected.iloc[overlapping_indices].copy()
                all_overlapping_gdfs.append(overlapping_gdf)
                matching_vector_files.append(vf)
                print(f"     ‚úì {os.path.basename(vf)}: {len(overlapping_indices)} geometry(ies)")
        
        if not all_overlapping_gdfs:
            print("  ‚ö† Warning: No overlapping vectors found for this CHM. Skipping.")
            return None
        
        combined_gdf = gpd.GeoDataFrame(pd.concat(all_overlapping_gdfs, ignore_index=True))
        print(f"  Total overlapping geometries: {len(combined_gdf)}")
        
        print("  2. Loading CHM data...")
        vector_path_for_init = matching_vector_files[0]
        mapper = CHMMapper(chm_path, vector_path_for_init)
        mapper.load_data()
        
        if clip_to_shapes:
            try:
                buffered_geoms = combined_gdf.geometry.buffer(clip_buffer)
                shapes = [geom.__geo_interface__ for geom in buffered_geoms if not geom.is_empty]
                if shapes:
                    print(f"  3b. Clipping CHM to shapes (+{clip_buffer} m buffer)...")
                    with rasterio.open(chm_path) as src:
                        clipped_data, clipped_transform = mask(src, shapes, crop=True, nodata=src.nodata)
                    mapper.chm_data = clipped_data[0]
                    mapper.chm_transform = clipped_transform
                    bounds = array_bounds(mapper.chm_data.shape[0], mapper.chm_data.shape[1], clipped_transform)
                    mapper.chm_bounds = rasterio.coords.BoundingBox(*bounds)
            except Exception as e:
                print(f"  ‚ö† Clip failed ({e}); using full CHM.")
        
        print(f"  3. Using combined geometries from {len(matching_vector_files)} vector file(s)...")
        mapper.vector_gdf = combined_gdf
        
        chm_config = config.copy()
        
        # Extract location_name
        if location_field in combined_gdf.columns:
            location_values = get_valid_field_values(combined_gdf, location_field)
            if location_values:
                chm_config['location_name'] = location_values[0]
                print(f"  Location: {chm_config['location_name']}")
        
        # Create subtitle and filename
        subtitle_for_filename = None
        if use_custom_subtitle:
            try:
                subtitle_parts = []
                has_field1 = field1 in combined_gdf.columns
                has_field2 = field2 in combined_gdf.columns
                
                print(f"  Checking subtitle fields: field1='{field1}' (exists: {has_field1}), field2='{field2}' (exists: {has_field2})")
                
                if has_field1 or has_field2:
                    if has_field1:
                        field1_values = get_valid_field_values(combined_gdf, field1)
                        if len(field1_values) == 1:
                            subtitle_parts.append(field1_values[0])
                        elif len(field1_values) > 1:
                            subtitle_parts.extend(sorted(field1_values))
                    
                    if has_field2:
                        field2_values = get_valid_field_values(combined_gdf, field2)
                        print(f"  field2 ('{field2}') valid values: {field2_values}")
                        if field2_values:
                            sorted_field2 = sorted(field2_values)
                            if len(sorted_field2) > 4:
                                subtitle_parts.extend(sorted_field2[:4])
                                subtitle_parts.append('...')
                            else:
                                subtitle_parts.extend(sorted_field2)
                        else:
                            # FALLBACK: use merge_key (afl) instead of field2 (lage)
                            print(f"  ‚ö† No valid '{field2}' values. Falling back to '{merge_key}'...")
                            fallback_values = get_valid_field_values(combined_gdf, merge_key)
                            if fallback_values:
                                sorted_fallback = sorted(fallback_values)
                                if len(sorted_fallback) > 4:
                                    subtitle_parts.extend(sorted_fallback[:4])
                                    subtitle_parts.append('...')
                                else:
                                    subtitle_parts.extend(sorted_fallback)
                                print(f"  Using '{merge_key}' values: {fallback_values}")
                    
                    if subtitle_parts:
                        subtitle = "_".join(subtitle_parts)
                        chm_config['subtitle'] = subtitle
                        subtitle_for_filename = subtitle
                        print(f"  Subtitle: {subtitle}")
                    else:
                        chm_config['subtitle'] = chm_filename
                        subtitle_for_filename = chm_filename
                        print(f"  ‚ö† No valid values in '{field1}'/'{field2}'. Using CHM filename.")
                else:
                    chm_config['subtitle'] = chm_filename
                    subtitle_for_filename = chm_filename
                    print(f"  ‚ö† Fields not found. Using CHM filename.")
            except Exception as e:
                chm_config['subtitle'] = chm_filename
                subtitle_for_filename = chm_filename
                print(f"  ‚ö† Error creating subtitle: {e}. Using CHM filename.")
        else:
            chm_config['subtitle'] = chm_filename
            subtitle_for_filename = chm_filename
        
        # Create unique filename (avoid overwriting)
        print("  4. Creating map...")
        base_filename = f"Baumhoehenkarte_{subtitle_for_filename}"
        unique_filename = get_unique_filename(base_filename, used_filenames)
        
        if unique_filename != base_filename:
            print(f"  ‚ö† Filename '{base_filename}.pdf' already used, using '{unique_filename}.pdf' instead.")
        
        output_filename = f"{unique_filename}.pdf"
        output_path = os.path.join(output_dir, output_filename)
        
        mapper.create_map(
            data=mapper.chm_data,
            transform=mapper.chm_transform,
            bounds=mapper.chm_bounds,
            output_path=output_path,
            figsize=(16.53, 11.69),
            dpi=300,
            add_overview=True,
            config=chm_config
        )
        
        print(f"  ‚úì Success! Saved to: {output_path}")
        return True
        
    except Exception as e:
        print(f"  ‚úó Error processing {chm_filename}: {str(e)}")
        import traceback
        traceback.print_exc()
        return False

print("‚úì Batch processing function defined (with duplicate filename handling)")

‚úì Batch processing function defined (with duplicate filename handling)


## 7. Run Batch Processing

Process all CHM files and generate PDF maps.

In [113]:
# Run batch processing
print(f"\n" + "="*80)
print(f"BATCH PROCESSING STARTED")
print(f"Total CHM files to process: {len(chm_files)}")
print(f"Total vector files available: {len(vector_files)}")
print(f"CSV folder: {csv_folder if csv_folder else 'None (not used)'}")
print(f"Merge key column: '{MERGE_KEY_COLUMN}'")
print(f"Location field: '{LOCATION_FIELD}'")
print(f"Clip to shapes: {CLIP_TO_SHAPES} (buffer {CLIP_BUFFER_METERS} m)")
if SUBTITLE_USE_CUSTOM_FORMAT:
    print(f"Subtitle format: '{SUBTITLE_FIELD1}_{SUBTITLE_FIELD2}1_{SUBTITLE_FIELD2}2_...'")
else:
    print(f"Subtitle format: CHM filename")
print(f"Output directory: {output_dir}")
print(f"="*80)

start_time = datetime.now()
success_count = 0
failed_count = 0
skipped_count = 0

# Track used filenames to avoid duplicates (shared across all CHM files)
used_filenames = {}

# Process each CHM file
for i, chm_file in enumerate(chm_files, 1):
    result = process_single_chm(
        chm_path=chm_file,
        vector_files=vector_files,
        output_dir=output_dir,
        config=MAP_CONFIG,
        csv_folder=csv_folder,
        merge_key=MERGE_KEY_COLUMN,
        use_custom_subtitle=SUBTITLE_USE_CUSTOM_FORMAT,
        field1=SUBTITLE_FIELD1,
        field2=SUBTITLE_FIELD2,
        location_field=LOCATION_FIELD,
        clip_to_shapes=CLIP_TO_SHAPES,
        clip_buffer=CLIP_BUFFER_METERS,
        index=i,
        total=len(chm_files),
        used_filenames=used_filenames  # Pass shared dict to track duplicates
    )
    
    if result is True:
        success_count += 1
    elif result is None:
        skipped_count += 1
    else:
        failed_count += 1

# Summary
end_time = datetime.now()
duration = end_time - start_time

print(f"\n" + "="*80)
print(f"BATCH PROCESSING COMPLETED")
print(f"="*80)
print(f"Total files processed: {len(chm_files)}")
print(f"  ‚úì Successful: {success_count}")
print(f"  ‚ö† Skipped (no overlap): {skipped_count}")
print(f"  ‚úó Failed: {failed_count}")
print(f"\nDuration: {duration}")
print(f"Average time per file: {duration / len(chm_files) if chm_files else 0}")
print(f"\nOutput directory: {output_dir}")


BATCH PROCESSING STARTED
Total CHM files to process: 5
Total vector files available: 1
CSV folder: D:\Drohnendaten\15_FESMART\01_Daten\06_AELF-Cham\00_Planung\Info
Merge key column: 'afl'
Location field: 'name_eigen'
Clip to shapes: True (buffer 30 m)
Subtitle format: '_lage1_lage2_...'
Output directory: D:\Drohnendaten\15_FESMART\01_Daten\06_AELF-Cham\03_Mapping_output\Zandt

Processing [1/5]: 20250909_066_Zandt-13305024_M3E_CHM
  1. Checking 1 vector file(s) for overlap...
       Vector columns before merge: ['objectid', 'gemeinde', 'gmkgcode', 'zaehler', 'nenner', 'zaeh_nenn', 'afl', 'egtid', 'name_eigen', 'vorname', 'anrede', 'namensbest', 'akademisch', 'geburtsnam', 'geburtsdat', 'strassehau', 'plz', 'ort', 'herkunft', 'amtsgerich', 'grundbuchb', 'miteigentu', 'artrechtsg', 'anteileige', 'buchungsar', 'lfdnrbesta', 'flstkennz', 'gbbz', 'blatt', 'aelf_kurz', 'globalid', 'geometry']
       ‚úì Found matching CSV: Zandt.csv (encoding: cp1252, separator: ';')
       CSV columns: ['Ge

## 8. List Generated Maps

Show all the PDF files that were created.

In [114]:
# List all PDF files in output directory
pdf_files = glob.glob(os.path.join(output_dir, "*.pdf"))
pdf_files = sorted(pdf_files)

print(f"Generated PDF maps ({len(pdf_files)} files):\n")
for i, pdf_file in enumerate(pdf_files, 1):
    file_size = os.path.getsize(pdf_file) / (1024 * 1024)  # MB
    print(f"  {i}. {os.path.basename(pdf_file)} ({file_size:.2f} MB)")

if pdf_files:
    print(f"\n‚úì All maps saved in: {output_dir}")
else:
    print("\n‚ö† No PDF files were generated.")

Generated PDF maps (5 files):

  1. Baumhoehenkarte_Auf der Riesel_geo.pdf (0.60 MB)
  2. Baumhoehenkarte_Aufelder_geo.pdf (0.38 MB)
  3. Baumhoehenkarte_Auholz_geo.pdf (0.57 MB)
  4. Baumhoehenkarte_Bergh√§usl_Hohe Rieder_Ochsenweide_geo.pdf (0.53 MB)
  5. Baumhoehenkarte_Pfahlholz_geo.pdf (0.52 MB)

‚úì All maps saved in: D:\Drohnendaten\15_FESMART\01_Daten\06_AELF-Cham\03_Mapping_output\Zandt


---

## üìù Usage Examples

### Option 1: Single Vector File
```python
vector_path = r"D:\path\to\boundaries.gpkg"  # or .shp, .kml, .geojson
```

### Option 2: Vector Folder (Auto-Discovery)
```python
vector_path = r"D:\path\to\vector_folder"  # Will find all .gpkg, .shp, .kml, .geojson files
```

**How it works:**
- For each CHM file, the code checks ALL vector files for overlapping geometries
- Only overlapping geometries from all vector files are included in each map
- CHMs without any overlapping vectors are automatically skipped
- Supports mixing formats: can use .gpkg, .shp, .kml files together!