# Exploratory Data Analysis (EDA) of LiDAR Data

This notebook performs an initial EDA on processed LiDAR data (DTMs and hillshades) for a selected test AOI. The goal is to visually inspect the data, generate various topographic visualizations, and identify potential anomalies or features indicative of past human activity.

In [None]:
import configparser
from pathlib import Path
import rasterio
from rasterio.plot import show, show_hist
import matplotlib.pyplot as plt
import numpy as np
import subprocess # For gdaldem for now
import geopandas

# Assuming WhiteboxTools is installed and accessible if used
# from whitebox_tools import WhiteboxTools

## 1. Configuration and Setup

In [None]:
CONFIG_FILE_PATH = "../scripts/satellite_pipeline/config.ini" # Adjust if your config is elsewhere
SCRIPT_DIR = Path(".").resolve().parent # Assuming notebook is in 'notebooks' dir, so parent is project root
EDA_OUTPUT_DIR = SCRIPT_DIR / "eda_outputs" / "lidar"
EDA_OUTPUT_DIR.mkdir(parents=True, exist_ok=True)

def load_config(config_path):
    config = configparser.ConfigParser(interpolation=None, allow_no_value=True)
    if not Path(config_path).exists():
        raise FileNotFoundError(f"Configuration file '{config_path}' not found.")
    config.read(config_path)
    return config

config = load_config(CONFIG_FILE_PATH)

# Get relevant paths from config
base_processed_dir_raw = config['DEFAULT'].get('base_processed_data_dir', '../../data')
lidar_processed_suffix = config['LIDAR'].get('lidar_processed_suffix', 'lidar/processed')

# Construct absolute path for processed_lidar_dir from SCRIPT_DIR (project root)
PROCESSED_LIDAR_DIR = (SCRIPT_DIR / base_processed_dir_raw.replace('../../', '') / lidar_processed_suffix).resolve()

print(f"Processed LiDAR Directory: {PROCESSED_LIDAR_DIR}")
print(f"EDA Output Directory: {EDA_OUTPUT_DIR}")

# AOI definition (example: using the bbox from config for context)
aoi_bbox_str = config['DEFAULT'].get('aoi_bbox')
aoi_geojson_path = config['DEFAULT'].get('aoi_geojson_path')
aoi_geom = None

if aoi_geojson_path and Path(SCRIPT_DIR / aoi_geojson_path.replace('../../','').replace('../','')).exists():
    aoi_gdf = geopandas.read_file(Path(SCRIPT_DIR / aoi_geojson_path.replace('../../','').replace('../','')))
    aoi_geom = aoi_gdf.geometry.iloc[0]
    print(f"Using AOI from GeoJSON: {Path(SCRIPT_DIR / aoi_geojson_path.replace('../../','').replace('../',''))}")
elif aoi_bbox_str:
    coords = [float(c.strip()) for c in aoi_bbox_str.split(',')]
    minx, miny, maxx, maxy = coords
    aoi_geom = geopandas.GeoSeries([box(minx, miny, maxx, maxy)], crs="EPSG:4326") # Assuming WGS84 for bbox
    print(f"Using AOI from BBOX (EPSG:4326): {coords}")
else:
    print("No AOI geometry found in config (aoi_geojson_path or aoi_bbox).")

## 2. Load Processed LiDAR Data

We need to identify a specific DTM and its corresponding hillshade file from the `lidar_processed_dir`. For this EDA, let's assume we pick one of the processed DTMs (e.g., `_dtm_clipped_aoi.tif`) and its hillshade (`_hillshade_clipped_aoi.tif`).

In [None]:
# Manually specify or find the first DTM and Hillshade file for now
# In a real scenario, you might iterate or select based on criteria
dtm_files = list(PROCESSED_LIDAR_DIR.glob("*_dtm_clipped_aoi.tif"))
hillshade_files = list(PROCESSED_LIDAR_DIR.glob("*_hillshade_clipped_aoi.tif"))

if not dtm_files:
    print(f"No DTM files found in {PROCESSED_LIDAR_DIR} matching '*_dtm_clipped_aoi.tif'")
    # Attempt to find unclipped DTMs if clipped are not available
    dtm_files = list(PROCESSED_LIDAR_DIR.glob("*_dtm_unclipped.tif"))
    if not dtm_files:
        raise FileNotFoundError(f"No DTM files (clipped or unclipped) found in {PROCESSED_LIDAR_DIR}")
    else:
        print(f"Found unclipped DTMs. Using the first one: {dtm_files[0]}")

if not hillshade_files:
    print(f"No Hillshade files found in {PROCESSED_LIDAR_DIR} matching '*_hillshade_clipped_aoi.tif'")
    # Attempt to find unclipped hillshades
    hillshade_files = list(PROCESSED_LIDAR_DIR.glob("*_hillshade_unclipped.tif"))
    if not hillshade_files:
        print(f"No hillshade files (clipped or unclipped) found in {PROCESSED_LIDAR_DIR}. Will generate if needed.")
    else:
        print(f"Found unclipped hillshades. Using the first one: {hillshade_files[0]}")

selected_dtm_path = dtm_files[0]
selected_hillshade_path = hillshade_files[0] if hillshade_files else None 

print(f"Selected DTM: {selected_dtm_path}")
if selected_hillshade_path:
    print(f"Selected Hillshade: {selected_hillshade_path}")
else:
    print("No pre-generated hillshade found for the selected DTM. Will generate one.")

# Load DTM data
with rasterio.open(selected_dtm_path) as dtm_src:
    dtm_data = dtm_src.read(1) # Read first band
    dtm_profile = dtm_src.profile
    dtm_bounds = dtm_src.bounds
    dtm_crs = dtm_src.crs
    # Replace masked values (if any) with NaN for plotting
    if dtm_profile.get('nodata') is not None:
        dtm_data = np.ma.masked_where(dtm_data == dtm_profile['nodata'], dtm_data)
        dtm_data = dtm_data.filled(np.nan)
        
print(f"DTM CRS: {dtm_crs}")

## 3. DTM Visualizations

### 3.1. Basic DTM Display

In [None]:
fig, ax = plt.subplots(1, 1, figsize=(10, 10))
show(dtm_data, ax=ax, transform=dtm_profile['transform'], cmap='terrain', title='Digital Terrain Model (DTM)')
plt.xlabel("Easting (m)")
plt.ylabel("Northing (m)")
plt.colorbar(ax.images[0], label='Elevation (m)')
plt.savefig(EDA_OUTPUT_DIR / f"{selected_dtm_path.stem}_basic_display.png")
plt.show()

**Observations (Basic DTM):**
- *[TODO: Add observations here based on the output. E.g., Describe the overall terrain, any immediately obvious large features, elevation range, etc.]*
- *Does the AOI clipping seem correct? Is the resolution adequate for visual inspection?*

### 3.2. Hillshades

Generate hillshades from multiple azimuths to reveal features oriented differently. We'll use `gdaldem` via `subprocess` for this as it's robust and simple for this task.

In [None]:
def generate_gdal_hillshade(input_dtm_path, output_hillshade_path, azimuth=315, altitude=45, z_factor=1):
    """Generates hillshade using gdaldem command-line tool."""
    try:
        cmd = [
            "gdaldem", "hillshade",
            "-az", str(azimuth),
            "-alt", str(altitude),
            "-z", str(z_factor),
            "-of", "GTiff",
            str(input_dtm_path),
            str(output_hillshade_path)
        ]
        result = subprocess.run(cmd, check=True, capture_output=True, text=True)
        if result.stderr:
            print(f"GDAL Hillshade STDERR: {result.stderr}")
        print(f"Generated hillshade: {output_hillshade_path}")
        return True
    except subprocess.CalledProcessError as e:
        print(f"Error generating hillshade for {input_dtm_path.name} with azimuth {azimuth}: {e}")
        print(f"GDAL STDERR: {e.stderr}")
        print(f"GDAL STDOUT: {e.stdout}")
        return False

azimuths = {
    "NW": 315,
    "N": 0, # Added North for more options
    "NE": 45,
    "E": 90,
    "SE": 135,
    "S": 180, # Added South
    "SW": 225,
    "W": 270
}

fig, axes = plt.subplots(2, 4, figsize=(20, 10))
axes = axes.flatten()
fig.suptitle('Hillshades from Multiple Azimuths', fontsize=16)

generated_hillshade_paths = {}

for i, (direction, az) in enumerate(azimuths.items()):
    ax = axes[i]
    hillshade_output_path = EDA_OUTPUT_DIR / f"{selected_dtm_path.stem}_hillshade_{direction}.tif"
    generated_hillshade_paths[direction] = hillshade_output_path
    
    if not hillshade_output_path.exists(): # Generate if it doesn't exist
        generate_gdal_hillshade(selected_dtm_path, hillshade_output_path, azimuth=az)
    
    if hillshade_output_path.exists():
        with rasterio.open(hillshade_output_path) as src:
            show(src, ax=ax, cmap='gray', title=f'Azimuth: {az}° ({direction})')
    else:
        ax.set_title(f'Azimuth: {az}° ({direction}) - Failed')
        ax.text(0.5, 0.5, 'Failed to generate/load', horizontalalignment='center', verticalalignment='center', transform=ax.transAxes)
    ax.set_xticks([])
    ax.set_yticks([])

plt.tight_layout(rect=[0, 0, 1, 0.96])
plt.savefig(EDA_OUTPUT_DIR / f"{selected_dtm_path.stem}_hillshades_multi_azimuth.png")
plt.show()

**Observations (Hillshades):**
- *[TODO: Add observations. Which azimuth best highlights certain types of features? Are there linear features, subtle mounds, or depressions visible? Note any that appear consistently across multiple hillshades.]*
- *Compare these with the features noted in the `EDA_FEATURE_ENGINEERING_STRATEGY.md` under LiDAR -> Topographic Features.*

### 3.3. Slope Map

In [None]:
slope_output_path = EDA_OUTPUT_DIR / f"{selected_dtm_path.stem}_slope.tif"
if not slope_output_path.exists():
    try:
        cmd_slope = ["gdaldem", "slope", str(selected_dtm_path), str(slope_output_path), "-of", "GTiff", "-p"] # -p for percent slope
        result = subprocess.run(cmd_slope, check=True, capture_output=True, text=True)
        if result.stderr: print(f"GDAL Slope STDERR: {result.stderr}")
        print(f"Generated slope map: {slope_output_path}")
    except subprocess.CalledProcessError as e:
        print(f"Error generating slope map: {e.stderr}")
        slope_output_path = None # Ensure it's None if failed
        
if slope_output_path and slope_output_path.exists():
    with rasterio.open(slope_output_path) as slope_src:
        fig, ax = plt.subplots(1, 1, figsize=(10, 10))
        show(slope_src, ax=ax, cmap='viridis', title='Slope Map (Percent)')
        plt.xlabel("Easting (m)")
        plt.ylabel("Northing (m)")
        plt.colorbar(ax.images[0], label='Slope (%)')
        plt.savefig(EDA_OUTPUT_DIR / f"{selected_dtm_path.stem}_slope_display.png")
        plt.show()
else:
    print("Slope map generation failed or file not found.")

**Observations (Slope Map):**
- *[TODO: Add observations. Do any areas show unusually steep or flat slopes compared to their surroundings? Can edges of potential platforms or banks of canals be seen as sharp changes in slope?]*

### 3.4. Aspect Map

In [None]:
aspect_output_path = EDA_OUTPUT_DIR / f"{selected_dtm_path.stem}_aspect.tif"
if not aspect_output_path.exists():
    try:
        cmd_aspect = ["gdaldem", "aspect", str(selected_dtm_path), str(aspect_output_path), "-of", "GTiff"]
        result = subprocess.run(cmd_aspect, check=True, capture_output=True, text=True)
        if result.stderr: print(f"GDAL Aspect STDERR: {result.stderr}")
        print(f"Generated aspect map: {aspect_output_path}")
    except subprocess.CalledProcessError as e:
        print(f"Error generating aspect map: {e.stderr}")
        aspect_output_path = None

if aspect_output_path and aspect_output_path.exists():
    with rasterio.open(aspect_output_path) as aspect_src:
        fig, ax = plt.subplots(1, 1, figsize=(10, 10))
        # Use a circular colormap like 'hsv' for aspect
        show(aspect_src, ax=ax, cmap='hsv', title='Aspect Map (Degrees from North)')
        plt.xlabel("Easting (m)")
        plt.ylabel("Northing (m)")
        plt.colorbar(ax.images[0], label='Aspect (Degrees)')
        plt.savefig(EDA_OUTPUT_DIR / f"{selected_dtm_path.stem}_aspect_display.png")
        plt.show()
else:
    print("Aspect map generation failed or file not found.")

**Observations (Aspect Map):**
- *[TODO: Add observations. Are there areas with consistent aspect that might indicate terracing or constructed slopes? Do features align with particular aspects?]*

### 3.5. Contour Maps

In [None]:
contour_interval = 1 # meters, adjust based on DTM elevation range and detail required
contour_output_shapefile = EDA_OUTPUT_DIR / f"{selected_dtm_path.stem}_contours.shp"

if not contour_output_shapefile.exists():
    try:
        cmd_contour = [
            "gdal_contour",
            "-a", "elevation", # Attribute name for elevation
            "-i", str(contour_interval),
            str(selected_dtm_path),
            str(contour_output_shapefile)
        ]
        result = subprocess.run(cmd_contour, check=True, capture_output=True, text=True)
        if result.stderr: print(f"GDAL Contour STDERR: {result.stderr}")
        print(f"Generated contour shapefile: {contour_output_shapefile}")
    except subprocess.CalledProcessError as e:
        print(f"Error generating contours: {e.stderr}")
        contour_output_shapefile = None

if contour_output_shapefile and contour_output_shapefile.exists():
    contours = geopandas.read_file(contour_output_shapefile)
    fig, ax = plt.subplots(1, 1, figsize=(12, 12))
    
    # Plot DTM as background
    show(dtm_data, ax=ax, transform=dtm_profile['transform'], cmap='terrain', alpha=0.6)
    
    # Plot contours
    contours.plot(ax=ax, column='elevation', legend=True, legend_kwds={'label': "Elevation (m)"}, cmap='viridis', linewidth=0.7)
    
    ax.set_title(f'Contour Map (Interval: {contour_interval}m)')
    plt.xlabel("Easting (m)")
    plt.ylabel("Northing (m)")
    plt.savefig(EDA_OUTPUT_DIR / f"{selected_dtm_path.stem}_contours_display.png")
    plt.show()
else:
    print("Contour generation failed or file not found.")

**Observations (Contour Maps):**
- *[TODO: Add observations. Do contours show any unusual geometric patterns (e.g., rectangular, circular)? Are there tightly packed contours indicating mounds or depressions not easily seen in other visualizations?]*

### 3.6. Advanced Visualizations (Placeholder - SVF/LRM)

Generating Sky-View Factor (SVF) or Local Relief Models (LRM) often requires more specialized tools like WhiteboxTools, RVT (Relief Visualization Toolbox), or SAGA GIS. Integrating these directly into a notebook can be complex due to installation and execution paths.

**Conceptual Steps (if using WhiteboxTools via Python wrapper):**
1. Ensure WhiteboxTools is installed and `whitebox_tools.py` is accessible.
2. Initialize `WhiteboxTools()`: `wbt = WhiteboxTools()`
3. Set working directory: `wbt.set_working_dir('path/to/your/data')`
4. Run the tool, e.g., for Sky-View Factor:
   `wbt.sky_view_factor(dem=selected_dtm_path, output=svf_output_path, sky_model='anisoptropic')`
5. Load and display `svf_output_path` using Rasterio and Matplotlib.

For this EDA, we will skip direct implementation of these advanced visualizations within the notebook unless a simple GDAL or Rasterio equivalent is readily available. The focus remains on broadly applicable techniques. If specific features of interest are noted, these advanced tools can be applied manually outside this notebook for deeper investigation on those specific areas.

## 4. Overlaying AOI Boundary (Contextual)

If an AOI geometry is available, overlay it on one of the visualizations for context, ensuring CRS alignment.

In [None]:
if aoi_geom is not None and generated_hillshade_paths.get("NW").exists():
    # Ensure AOI is in the same CRS as the DTM/Hillshade
    if aoi_geom.crs and dtm_crs:
        if aoi_geom.crs.to_string().lower() != dtm_crs.to_string().lower():
            print(f"Reprojecting AOI from {aoi_geom.crs} to {dtm_crs} for overlay.")
            aoi_geom_reprojected = aoi_geom.to_crs(dtm_crs)
        else:
            aoi_geom_reprojected = aoi_geom
    else:
        print("AOI or DTM CRS is undefined, cannot ensure CRS match for overlay. Assuming compatible.")
        aoi_geom_reprojected = aoi_geom

    fig, ax = plt.subplots(1, 1, figsize=(12, 12))
    with rasterio.open(generated_hillshade_paths.get("NW")) as src:
        show(src, ax=ax, cmap='gray')
    
    if isinstance(aoi_geom_reprojected, geopandas.GeoSeries):
        aoi_geom_reprojected.plot(ax=ax, facecolor='none', edgecolor='red', linewidth=2, label='AOI')
    elif isinstance(aoi_geom_reprojected, geopandas.GeoDataFrame):
         aoi_geom_reprojected.geometry.plot(ax=ax, facecolor='none', edgecolor='red', linewidth=2, label='AOI')

    ax.set_title('Hillshade (NW) with AOI Overlay')
    plt.xlabel("Easting (m)")
    plt.ylabel("Northing (m)")
    plt.legend()
    plt.savefig(EDA_OUTPUT_DIR / f"{selected_dtm_path.stem}_hillshade_with_aoi.png")
    plt.show()
else:
    print("AOI geometry not available or NW hillshade not generated, skipping AOI overlay.")

## 5. Summary of Observations & Potential Anomalies

Based on the visualizations above:

1.  **Overall Terrain:**
    *   *[TODO: Briefly describe the general landscape characteristics of the selected AOI based on DTM and hillshades.]*

2.  **Potential Anomalies Noted:**
    *   **Feature 1 (Location/Coordinates if possible, Description, Visualizations that best show it):**
        *   *e.g., Possible linear embankment seen in NW and W hillshades, and as a slight break in slope map near coordinates X,Y.*
    *   **Feature 2 (Location, Description, Visualizations):**
        *   *e.g., A series of small, regularly spaced mounds visible in the contour map and SE hillshade in the southern part of the AOI.*
    *   **Feature 3 (Location, Description, Visualizations):**
        *   *e.g., A subtle rectangular depression best seen with low-altitude E hillshade and potentially hinted at by SVF if generated.*

3.  **Interpretation Difficulty:**
    *   *[TODO: Note any challenges. E.g., Are some features ambiguous? Could they be natural landforms or modern disturbances? Is the resolution sufficient?]*

4.  **Next Steps for these Anomalies:**
    *   Consider these areas for more detailed analysis using advanced visualization techniques (SVF, LRM if not done here).
    *   These could be candidate areas for targeted feature engineering (e.g., extracting specific shapes or textural properties).
    *   Cross-reference with satellite imagery and textual data if available for these specific locations.