# DEM - Extraction des Donn√©es Topographiques

## C'est quoi un DEM ?

**DEM (Digital Elevation Model)** est un mod√®le num√©rique de terrain :
- Repr√©sente l'altitude de chaque point de la surface terrestre
- Permet de calculer la pente et l'orientation

## Pourquoi c'est utile pour la qualit√© de l'eau ?

La topographie influence fortement l'hydrologie :

| Variable | Impact sur la qualit√© de l'eau |
|----------|--------------------------------|
| **Altitude** | Temp√©rature, pr√©cipitations, type de v√©g√©tation |
| **Pente** | Vitesse d'√©coulement, √©rosion, temps de r√©sidence |
| **Orientation** | Ensoleillement, √©vaporation |

## Variables extraites

| Variable | Unit√© | Description |
|----------|-------|-------------|
| `elevation` | m√®tres | Altitude du point |
| `slope` | degr√©s | Pente locale (0-90¬∞) |
| `aspect` | degr√©s | Orientation (0-360¬∞, Nord=0) |

## Source des donn√©es

**Copernicus DEM GLO-30** sur Planetary Computer
- R√©solution : 30 m√®tres
- Couverture : mondiale
- Documentation : [Copernicus DEM](https://planetarycomputer.microsoft.com/dataset/cop-dem-glo-30)

---

## √âtape 1 : Installation des d√©pendances

In [1]:
!pip install uv
!uv pip install --system -r ../requirements.txt

Defaulting to user installation because normal site-packages is not writeable



[notice] A new release of pip is available: 24.3.1 -> 26.0
[notice] To update, run: python.exe -m pip install --upgrade pip
[2mUsing Python 3.13.1 environment at: c:\Program Files\Python313[0m
[2mResolved [1m194 packages[0m [2min 632ms[0m[0m
[1m[31merror[39m[0m: Failed to install: jupyter_events-0.12.0-py3-none-any.whl (jupyter-events==0.12.0)
  [1m[31mCaused by[39m[0m: failed to create directory `c:\Program Files\Python313\Lib\site-packages\jupyter_events`: Acc√®s refus√©. (os error 5)


In [2]:
# =============================================================================
# IMPORTS
# =============================================================================

import warnings
warnings.filterwarnings("ignore")

# Manipulation de donn√©es
import numpy as np
import pandas as pd

# Manipulation d'images raster
import rasterio
from rasterio.windows import from_bounds
from rasterio.crs import CRS
from rasterio.warp import transform_bounds

# Acc√®s √† l'API Microsoft Planetary Computer
import pystac_client
import planetary_computer as pc

from tqdm import tqdm
import os

print("Imports OK!")

Imports OK!


---

## √âtape 2 : D√©finition des constantes et fonctions

In [3]:
# =============================================================================
# CONSTANTES
# =============================================================================

# Collection Copernicus DEM sur Planetary Computer
DEM_COLLECTION = "cop-dem-glo-30"

# Taille du buffer en degr√©s (~500m)
BUFFER_DEG = 0.005

# Dossier de sortie
OUTPUT_DIR = "../data/processed"

print(f"Collection DEM : {DEM_COLLECTION}")
print(f"Buffer : ~{BUFFER_DEG * 111000:.0f}m")

Collection DEM : cop-dem-glo-30
Buffer : ~555m


In [4]:
def get_catalog():
    """
    Se connecte au catalogue Microsoft Planetary Computer.
    """
    catalog = pystac_client.Client.open(
        "https://planetarycomputer.microsoft.com/api/stac/v1",
        modifier=pc.sign_inplace,
    )
    print("Connexion au catalogue Planetary Computer OK!")
    return catalog

### Fonctions de calcul de pente et orientation

In [5]:
def calculate_slope(elevation_array, cell_size=30):
    """
    Calcule la pente √† partir d'un array d'√©l√©vation.
    
    Param√®tres:
        elevation_array : array 2D d'√©l√©vation
        cell_size : taille du pixel en m√®tres
    
    Retourne:
        pente moyenne en degr√©s
    """
    if elevation_array.size < 9:  # Minimum 3x3
        return np.nan
    
    # Calculer les gradients
    dy, dx = np.gradient(elevation_array, cell_size)
    
    # Pente = arctan(sqrt(dx¬≤ + dy¬≤))
    slope_rad = np.arctan(np.sqrt(dx**2 + dy**2))
    slope_deg = np.degrees(slope_rad)
    
    return float(np.nanmean(slope_deg))


def calculate_aspect(elevation_array, cell_size=30):
    """
    Calcule l'orientation (aspect) √† partir d'un array d'√©l√©vation.
    
    Param√®tres:
        elevation_array : array 2D d'√©l√©vation
        cell_size : taille du pixel en m√®tres
    
    Retourne:
        orientation moyenne en degr√©s (0-360, Nord=0)
    """
    if elevation_array.size < 9:
        return np.nan
    
    # Calculer les gradients
    dy, dx = np.gradient(elevation_array, cell_size)
    
    # Aspect = arctan2(-dx, dy) converti en degr√©s
    aspect_rad = np.arctan2(-dx, dy)
    aspect_deg = np.degrees(aspect_rad)
    
    # Convertir de [-180, 180] √† [0, 360]
    aspect_deg = (aspect_deg + 360) % 360
    
    # Moyenne circulaire pour l'orientation
    sin_mean = np.nanmean(np.sin(np.radians(aspect_deg)))
    cos_mean = np.nanmean(np.cos(np.radians(aspect_deg)))
    mean_aspect = np.degrees(np.arctan2(sin_mean, cos_mean))
    mean_aspect = (mean_aspect + 360) % 360
    
    return float(mean_aspect)

### Fonction principale d'extraction

In [6]:
def extract_dem_features(catalog, lat, lon, buffer_deg=0.005, debug=False):
    """
    Extrait les features topographiques pour un point.
    
    Param√®tres:
        catalog : catalogue Planetary Computer
        lat, lon : coordonn√©es du point
        buffer_deg : buffer en degr√©s autour du point
        debug : afficher les messages de d√©bogage
    
    Retourne:
        dict avec elevation, slope, aspect
    """
    results = {
        'elevation': np.nan,
        'slope': np.nan,
        'aspect': np.nan
    }
    
    try:
        # Bounding box
        bbox = [
            lon - buffer_deg,
            lat - buffer_deg,
            lon + buffer_deg,
            lat + buffer_deg
        ]
        
        if debug:
            print(f"  Bbox: {bbox}")
        
        # Rechercher les tuiles DEM
        search = catalog.search(
            collections=[DEM_COLLECTION],
            bbox=bbox,
        )
        items = list(search.items())
        
        if debug:
            print(f"  Items trouv√©s: {len(items)}")
        
        if len(items) == 0:
            if debug:
                print("  ‚ö†Ô∏è Pas de donn√©es")
            return results
        
        item = items[0]
        
        if debug:
            print(f"  Item: {item.id}")
            print(f"  Assets: {list(item.assets.keys())}")
        
        # Signer l'asset
        signed_asset = pc.sign(item.assets["data"])
        
        # Ouvrir et lire les donn√©es
        with rasterio.open(signed_asset.href) as src:
            # Transformer la bbox
            dst_crs = src.crs
            transformed_bbox = transform_bounds(
                CRS.from_epsg(4326),
                dst_crs,
                *bbox
            )
            
            window = from_bounds(*transformed_bbox, src.transform)
            elevation_data = src.read(1, window=window)
            
            if debug:
                print(f"  Shape: {elevation_data.shape}")
                print(f"  Elevation: min={elevation_data.min():.0f}, max={elevation_data.max():.0f}")
            
            if elevation_data.size == 0:
                return results
            
            # Filtrer les valeurs NoData
            valid_data = elevation_data[elevation_data > -1000]
            
            if valid_data.size == 0:
                return results
            
            # Calculer les features
            results['elevation'] = float(np.mean(valid_data))
            results['slope'] = calculate_slope(elevation_data, cell_size=30)
            results['aspect'] = calculate_aspect(elevation_data, cell_size=30)
            
            if debug:
                print(f"  Elevation: {results['elevation']:.1f}m")
                print(f"  Slope: {results['slope']:.1f}¬∞")
                print(f"  Aspect: {results['aspect']:.1f}¬∞")
    
    except Exception as e:
        if debug:
            print(f"  ‚ùå Erreur: {type(e).__name__}: {e}")
    
    return results

### Test de diagnostic

In [7]:
# =============================================================================
# TEST DE DIAGNOSTIC
# =============================================================================

print("Test de diagnostic DEM (Copernicus)")
print("=" * 50)

catalog = get_catalog()

# Point de test en Afrique du Sud
test_lat, test_lon = -26.45, 28.085833

print(f"\nTest avec le point: lat={test_lat}, lon={test_lon}")
print("-" * 50)

result = extract_dem_features(catalog, test_lat, test_lon, debug=True)

print("\n" + "=" * 50)
print("R√©sultats:")
for k, v in result.items():
    if pd.notna(v):
        print(f"  {k}: {v:.1f}")

Test de diagnostic DEM (Copernicus)
Connexion au catalogue Planetary Computer OK!

Test avec le point: lat=-26.45, lon=28.085833
--------------------------------------------------
  Bbox: [28.080833000000002, -26.455, 28.090833, -26.445]
  Items trouv√©s: 1
  Item: Copernicus_DSM_COG_10_S27_00_E028_00_DEM
  Assets: ['data', 'tilejson', 'rendered_preview']
  Shape: (36, 36)
  Elevation: min=1467, max=1485
  Elevation: 1473.7m
  Slope: 1.4¬∞
  Aspect: 134.6¬∞

R√©sultats:
  elevation: 1473.7
  slope: 1.4
  aspect: 134.6


---

## √âtape 3 : Extraction pour les donn√©es d'entra√Ænement

In [8]:
# Charger les donn√©es
Water_Quality_df = pd.read_csv("../data/raw/water_quality_training_dataset.csv")

print(f"Nombre d'observations : {len(Water_Quality_df)}")

# Sites uniques
training_sites = Water_Quality_df[['Latitude', 'Longitude']].drop_duplicates().reset_index(drop=True)
print(f"Sites uniques √† traiter : {len(training_sites)}")

Nombre d'observations : 9319
Sites uniques √† traiter : 162


In [9]:
# =============================================================================
# EXTRACTION DEM - TRAINING
# =============================================================================

print("Connexion √† Microsoft Planetary Computer...")
catalog = get_catalog()

print(f"\nExtraction pour {len(training_sites)} sites uniques...")

# Fichier de sauvegarde incr√©mentale
BACKUP_PATH = "../data/processed/dem_training_backup.csv"
SAVE_EVERY = 100

print(f"Sauvegarde automatique tous les {SAVE_EVERY} sites\n")

training_results = []
completed_count = 0

for idx, row in tqdm(training_sites.iterrows(), total=len(training_sites), desc="Extraction"):
    lat, lon = row['Latitude'], row['Longitude']
    
    # Extraire les features DEM
    dem_features = extract_dem_features(catalog, lat, lon, BUFFER_DEG, debug=False)
    
    # Ajouter les coordonn√©es
    result = {'Latitude': lat, 'Longitude': lon}
    result.update(dem_features)
    
    training_results.append(result)
    completed_count += 1
    
    # Sauvegarde incr√©mentale
    if completed_count % SAVE_EVERY == 0:
        backup_df = pd.DataFrame(training_results)
        backup_df.to_csv(BACKUP_PATH, index=False)
        print(f"\nüíæ Sauvegarde : {completed_count}/{len(training_sites)} sites")

# Sauvegarde finale
training_dem_unique = pd.DataFrame(training_results)
training_dem_unique.to_csv(BACKUP_PATH, index=False)

print(f"\n‚úÖ Extraction termin√©e : {len(training_dem_unique)} sites")

Connexion √† Microsoft Planetary Computer...
Connexion au catalogue Planetary Computer OK!

Extraction pour 162 sites uniques...
Sauvegarde automatique tous les 100 sites



Extraction:  62%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñè   | 100/162 [00:59<00:29,  2.09it/s]


üíæ Sauvegarde : 100/162 sites


Extraction: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 162/162 [01:29<00:00,  1.81it/s]


‚úÖ Extraction termin√©e : 162 sites





In [10]:
# Fusionner avec le DataFrame original
training_dem_df = Water_Quality_df[['Latitude', 'Longitude', 'Sample Date']].merge(
    training_dem_unique,
    on=['Latitude', 'Longitude'],
    how='left'
)

print(f"DataFrame final : {len(training_dem_df)} lignes")

DataFrame final : 9319 lignes


In [11]:
# Sauvegarder le fichier CSV
output_path = os.path.join(OUTPUT_DIR, 'dem_features_training.csv')
training_dem_df.to_csv(output_path, index=False)

print(f"Fichier cr√©√© : {output_path}")

Fichier cr√©√© : ../data/processed\dem_features_training.csv


In [12]:
# Aper√ßu des donn√©es extraites
print("Aper√ßu des donn√©es extraites :")
print(f"- Lignes : {len(training_dem_df)}")
print(f"- Colonnes : {list(training_dem_df.columns)}")

# Statistiques
dem_cols = ['elevation', 'slope', 'aspect']
print(f"\nStatistiques des features topographiques :")
print(training_dem_df[dem_cols].describe())

display(training_dem_df.head())

Aper√ßu des donn√©es extraites :
- Lignes : 9319
- Colonnes : ['Latitude', 'Longitude', 'Sample Date', 'elevation', 'slope', 'aspect']

Statistiques des features topographiques :
         elevation        slope       aspect
count  9319.000000  9319.000000  9319.000000
mean    924.211947     5.251166   178.098969
std     509.823945     4.366169   103.745307
min       5.359703     0.972492     0.081085
25%     429.259399     2.323860    90.644226
50%    1084.152832     3.458956   183.557205
75%    1325.551270     6.884036   264.508942
max    1620.325195    26.051317   359.604919


Unnamed: 0,Latitude,Longitude,Sample Date,elevation,slope,aspect
0,-28.760833,17.730278,02-01-2011,192.663025,11.798665,299.49765
1,-26.861111,28.884722,03-01-2011,1527.916626,2.923243,109.644104
2,-26.45,28.085833,03-01-2011,1473.671143,1.366939,134.574402
3,-27.671111,27.236944,03-01-2011,1347.080688,3.807301,310.537842
4,-27.356667,27.286389,03-01-2011,1357.651001,1.690194,224.774612


---

## √âtape 4 : Extraction pour les donn√©es de validation

In [13]:
# Charger le template de soumission
Validation_df = pd.read_csv('../data/raw/submission_template.csv')

print(f"Nombre de sites de validation : {len(Validation_df)}")

# Sites uniques
validation_sites = Validation_df[['Latitude', 'Longitude']].drop_duplicates().reset_index(drop=True)
print(f"Sites uniques √† traiter : {len(validation_sites)}")

Nombre de sites de validation : 200
Sites uniques √† traiter : 24


In [14]:
# =============================================================================
# EXTRACTION DEM - VALIDATION
# =============================================================================

print(f"Extraction pour {len(validation_sites)} sites uniques...")

BACKUP_PATH_VAL = "../data/processed/dem_validation_backup.csv"
SAVE_EVERY_VAL = 50

validation_results = []
completed_count = 0

for idx, row in tqdm(validation_sites.iterrows(), total=len(validation_sites), desc="Extraction"):
    lat, lon = row['Latitude'], row['Longitude']
    
    dem_features = extract_dem_features(catalog, lat, lon, BUFFER_DEG, debug=False)
    
    result = {'Latitude': lat, 'Longitude': lon}
    result.update(dem_features)
    
    validation_results.append(result)
    completed_count += 1
    
    if completed_count % SAVE_EVERY_VAL == 0:
        backup_df = pd.DataFrame(validation_results)
        backup_df.to_csv(BACKUP_PATH_VAL, index=False)
        print(f"\nüíæ Sauvegarde : {completed_count}/{len(validation_sites)} sites")

# Sauvegarde finale
validation_dem_unique = pd.DataFrame(validation_results)
validation_dem_unique.to_csv(BACKUP_PATH_VAL, index=False)

print(f"\n‚úÖ Extraction termin√©e : {len(validation_dem_unique)} sites")

Extraction pour 24 sites uniques...


Extraction: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 24/24 [00:12<00:00,  1.95it/s]


‚úÖ Extraction termin√©e : 24 sites





In [15]:
# Fusionner avec le DataFrame original
validation_dem_df = Validation_df[['Latitude', 'Longitude', 'Sample Date']].merge(
    validation_dem_unique,
    on=['Latitude', 'Longitude'],
    how='left'
)

print(f"DataFrame final : {len(validation_dem_df)} lignes")

DataFrame final : 200 lignes


In [16]:
# Sauvegarder le fichier CSV
output_path = os.path.join(OUTPUT_DIR, 'dem_features_validation.csv')
validation_dem_df.to_csv(output_path, index=False)

print(f"Fichier cr√©√© : {output_path}")

Fichier cr√©√© : ../data/processed\dem_features_validation.csv


In [17]:
# Aper√ßu des donn√©es de validation
print(f"Donn√©es de validation : {len(validation_dem_df)} lignes")
print(f"\nStatistiques :")
print(validation_dem_df[dem_cols].describe())

display(validation_dem_df.head())

Donn√©es de validation : 200 lignes

Statistiques :
        elevation       slope      aspect
count  200.000000  200.000000  200.000000
mean   419.181021   10.924888  185.274252
std    319.368110    7.894237  110.095558
min     42.320709    1.390205   21.941162
25%    193.071960    3.837967   80.797882
50%    248.035019   10.501728  195.292770
75%    800.228394   12.302191  244.790375
max    986.560364   30.558901  356.730499


Unnamed: 0,Latitude,Longitude,Sample Date,elevation,slope,aspect
0,-32.043333,27.822778,01-09-2014,800.228394,7.94189,244.790375
1,-33.329167,26.0775,16-09-2015,355.946747,21.501944,339.971893
2,-32.991639,27.640028,07-05-2015,193.07196,10.501728,356.730499
3,-34.096389,24.439167,07-02-2012,76.233414,12.302191,216.089355
4,-32.000556,28.581667,01-10-2014,437.48172,24.601475,21.941162


---

## R√©sum√©

**Ce qu'on a fait :**
1. Connect√© √† Microsoft Planetary Computer (Copernicus DEM GLO-30)
2. Pour chaque site de mesure, extrait :
   - Altitude moyenne
   - Pente moyenne
   - Orientation moyenne
3. Cr√©√© 2 fichiers CSV avec les features topographiques

**Features extraites :**

| Feature | Description | Impact attendu |
|---------|-------------|----------------|
| elevation | Altitude (m) | Temp√©rature, type d'√©cosyst√®me |
| slope | Pente (¬∞) | Vitesse d'√©coulement, √©rosion |
| aspect | Orientation (¬∞) | Ensoleillement, √©vaporation |

**Fichiers cr√©√©s :**

| Fichier | Description |
|---------|-------------|
| dem_features_training.csv | Features pour l'entra√Ænement |
| dem_features_validation.csv | Features pour la validation |

**Prochaine √©tape :**
- Fusionner tous les CSV (Landsat, TerraClimate, WorldCover, SoilGrids, DEM, Water Type)
- R√©entra√Æner le mod√®le avec toutes les nouvelles features