# Fill sinks
Filling sinks in a Digital Elevation Model (DEM) is an important preprocessing step in various geospatial analyses and modeling tasks.

In **hydrological analysis**, accurate representation of surface water flow is crucial. Sinks in a DEM represent areas where water can accumulate but not flow out, causing inaccuracies in flow path delineation, watershed delineation, and drainage network extraction. Filling sinks ensures a continuous and realistic representation of surface water flow, which is essential for hydrological modeling and water resources management.

For **visualization** purposes, especially in 3D rendering and visualization of terrain models, filled DEMs provide a visually appealing and more realistic representation of the landscape compared to DEMs with sinks. This enhances the visual interpretation of terrain features and supports better decision-making in various fields, including urban planning, environmental management, and disaster response.

Overall, filling sinks in a DEM is essential for improving the accuracy, reliability, and usability of DEM data in a wide range of geospatial applications and analyses.

## 1. Import libraries

In [1]:
import numpy as np
import pandas as pd
import plotly.express as px
import rasterio
from ipywidgets import widgets
import os

## 2. Define the function to fill sinks in a DEM
First we define two **helper functions** for processing digital elevation models (DEM).

**`elevation_highest_neighbour`** function:

1. This function takes three arguments: x and y representing the coordinates of a cell in the DEM, and dem, the DEM itself represented as a 2D NumPy array.
2. It first retrieves the dimensions (rows and columns) of the DEM using the shape attribute of NumPy arrays.
3. A list neighbours is defined to represent the 8-connected neighbors of the given cell.
4. It iterates over each neighbor, calculates their coordinates, and checks if they are within the bounds of the DEM.
5. If a neighbor is within bounds, its elevation is appended to the list neighbour_elevations.
6. Finally, it returns the maximum elevation among the neighboring cells.

In [2]:
# Helper function to get the highest elevation among the neighbours
def elevation_highest_neighbour(x, y, dem):
    """
    Helper function to calculate the maximum elevation among neighbouring cells.

    Args:
    - x, y: Coordinates of the cell

    Returns:
    - Maximum elevation among neighbours
    """
    
    rows, cols = dem.shape  # Get the dimensions of the DEM (rows, cols)
    
    # Define the neighbourhood offsets (8-connected)
    neighbours = [(-1, -1), (-1, 0), (-1, 1),
                  (0, -1),           (0, 1),
                  (1, -1),  (1, 0),  (1, 1)]
    
    neighbour_elevations = []
    for dx, dy in neighbours:
        nx, ny = x + dx, y + dy
        # Check if neighbour is within bounds
        if 0 <= nx < rows and 0 <= ny < cols:
            neighbour_elevations.append(dem[nx, ny])
    return max(neighbour_elevations)

**`is_sink`** function:

1. Similar to elevation_highest_neighbour, this function also takes x, y, and dem as arguments.
2. It defines the same set of 8-connected neighbors in the neighbours list.
3. It iterates over each neighbor, checks if it's within bounds and if its elevation is lower than or equal to the elevation of the current cell (dem[x, y]).
4. If any neighbor meets these criteria, the function immediately returns False, indicating that the current cell is not a sink.
5. If none of the neighbors satisfy the conditions, the function returns True, indicating that the current cell is a sink.

In [3]:
# Helper function to check if a cell is a sink
def is_sink(x, y, dem):
    """
    Helper function to check if a cell is a sink.

    Args:
    - x, y: Coordinates of the cell

    Returns:
    - True if the cell is a sink, False otherwise
    """
    rows, cols = dem.shape  # Get the dimensions of the DEM (rows, cols)
    
    # Define the neighbourhood offsets (8-connected)
    neighbours = [(-1, -1), (-1, 0), (-1, 1),
                  (0, -1),           (0, 1),
                  (1, -1),  (1, 0),  (1, 1)]
    
    for dx, dy in neighbours:
        nx, ny = x + dx, y + dy
        # Check if neighbour is within bounds and has lower elevation
        if 0 <= nx < rows and 0 <= ny < cols and dem[nx, ny] <= dem[x, y]:
            return False
    return True

The **`fill_sinks`** function takes a 2D NumPy array representing a Digital Elevation Model (DEM) as input and aims to remove sinks from it.

1. It starts by obtaining the dimensions of the DEM (number of rows and columns).

2. It iterates over all cells in the DEM until no more sinks are found (while True loop).

3. Within each iteration, it initializes a flag found_sinks to False to track if any sinks are found in the current iteration.

4. It loops through each cell in the DEM using nested for loops.

5. For each cell, it checks if the elevation of the cell is greater than the minimum elevation in the DEM. This check is to avoid modifying cells with the minimum elevation, as they may not be sinks.

6. If the cell passes the elevation check, it checks if the cell is a sink using the is_sink function.

7. If the cell is identified as a sink, it replaces its elevation with the maximum elevation among its neighbors, obtained using the elevation_highest_neighbour function.

8. If the elevation of the sink is replaced, the found_sinks flag is set to True.

9. After iterating through all cells, if no more sinks are found (found_sinks is still False), the loop breaks, and the modified DEM is returned.

In [4]:
def fill_sinks(dem):
    """
    Function to fill sinks in a Digital Elevation Model (DEM).
    
    Args:
    - dem: 2D numpy array representing the DEM
    
    Returns:
    - Filled DEM with sinks removed
    """

    rows, cols = dem.shape  # Get the dimensions of the DEM (rows, cols)

    # Iterate over all cells in the DEM
    while True: # Repeat until no more sinks are found
        found_sinks = False
        # Loop through each cell in the DEM
        for i in range(rows):
            for j in range(cols):
                # Check if the cell has elevation greater than the minimum in the DEM
                if dem[i, j] > np.nanmin(dem):
                    # Check if the cell is a sink
                    if is_sink(i, j, dem):
                        # Get the maximum elevation among neighbours
                        high_neighbour_elevation = elevation_highest_neighbour(i, j, dem)
                        # Replace the sink elevation with the maximum neighbour elevation
                        if dem[i, j] != high_neighbour_elevation:
                            dem[i, j] = high_neighbour_elevation
                            found_sinks = True # Set the flag to True indicating that a sink has been filled
        # Break the loop if no more sinks are found
        if not found_sinks:
            break
    return dem

## Example
### Load the DEM of the catchment of study
We use `rasterio` to manipulate raster datasets, extract information from them, and perform various raster operations.

In [5]:
files = [f for f in os.listdir('.') if os.path.isfile(f)]
dem_files = []
for s in files:
    if 'asc' in s and 'dem' in s:
        dem_files.append(s)
input_file = widgets.Select(
    options=dem_files,
    description='select the file:',
    disabled=False
)
display(input_file)

Select(description='select the file:', options=('dem.asc', 'dem_fill.asc', 'dem_syn.asc', 'dem_syn_fill.asc', …

In [24]:
# Open the raster map file
with rasterio.open(input_file.value) as src:
    # Read the raster data as a numpy array
    dem = src.read(1)  # Read the first band (index 0)

    # Get metadata of the raster map
    dem_metadata = src.meta

nodata_value = -9999
dem[dem == nodata_value] = np.nan

The `src.read(1)` function call reads the raster data as a numpy array. The `metadata` variable contains metadata information such as the raster's spatial reference system, data type, and geotransform. You can use this metadata for various purposes, such as georeferencing and understanding the properties of the raster map.

### Plot the DEM

In [25]:
fig = px.imshow(dem,color_continuous_scale='rdbu')
fig.show()

### Run the function to fill the sinks of the loaded DEM

In [26]:
# Fill the sinks
dem_fill = fill_sinks(dem.copy())

fig = px.imshow(dem_fill,color_continuous_scale='rdbu')
fig.show()

## Add noise to the flat areas of the DEM
By adding noise to flat areas of a DEM before computing flow accumulation, we introduce subtle variations in elevation that help resolve flat areas where multiple cells have the same elevation. This process is essential for accurately simulating surface water flow in hydrological models. 

**Why flat areas are a problem?** In a DEM, flat areas or depressions where multiple cells have the same elevation can create ambiguity in determining the direction of surface water flow. This ambiguity arises because the conventional methods for computing flow direction rely on the elevation difference between neighboring cells. In flat areas, there might not be a clear direction of flow, leading to incorrect flow direction assignment.

In hydrological modeling, it's important to maintain hydrological connectivity, ensuring that water can flow continuously across the landscape. Adding noise to flat areas helps to break ties and establish a consistent flow path, improving the connectivity of the hydrological network.

**`add_noise_to_flat_areas`** function: adds noise to flat areas in a digital elevation model (DEM). It first identifies neighbour duplicate values, then adds random noise to each duplicate value. The magnitude of the noise is controlled by the epsilon parameter. Finally, it returns the DEM with noise added to duplicate values.
Note: the randomness inherent in this process may result in slightly different outcomes each time it's applied

In [27]:
def add_noise_to_flat_areas(dem, epsilon):
    """
    Adds noise to flat areas in a digital elevation model (DEM).

    Args:
        dem (numpy.ndarray): A 2D array representing the elevation data.
        epsilon (float): The magnitude of the noise to be added.

    Returns:
        dem (numpy.ndarray): The DEM with noise added to duplicate values.
    """

    # Find unique values and their counts in the DEM
    unique_values, counts = np.unique(dem, return_counts=True)
    # Identify duplicate values
    duplicates = unique_values[counts > 1]

    # Iterate over duplicate values
    for value in duplicates:
        # Find indices where the duplicate value occurs
        indices = np.where(dem == value)
        # Add noise to duplicate values
        # Note that the noise is negative to avoid creating new sinks
        noise = np.random.uniform(-epsilon, 0, size=len(indices[0]))
        dem[indices] += noise

    return dem

### Run the function to add the noise to the flat areas of the DEM
Epsilon (epsilon) is defined as a small positive value (0.0001 in this case but can be modified). It represents the amount of noise to be added to resolve duplicates or flat areas in the filled DEM. This value is chosen to be small enough to introduce minimal perturbations while still effectively breaking ties in elevation values.

In [28]:
# Set the value of epsilon, which represents the small amount of noise to be added
epsilon = 0.0001

# Add noise to duplicates in the filled DEM using the add_noise_to_duplicates function
dem_fill = add_noise_to_flat_areas(dem_fill, epsilon)

# Compute the difference between the filled DEM and the original DEM
dem_diff = dem_fill - dem

# Visualize the difference between the original DEM and the filled DEM with noise added
fig = px.imshow(dem_diff)
fig.show()

### Save the filled DEM as a raster file

In [29]:
if 'syn' in input_file.value:
    # Specify the output file path
    output_file = 'dem_syn_fill.asc'    
else:
    # Specify the output file path
    output_file = 'dem_fill.asc'

dem_metadata['nodata'] = np.nan

# Write the modified raster data to a new file with the same metadata
with rasterio.open(output_file, 'w', **dem_metadata) as dst:
    # Write the modified raster data to the new file
    dst.write(dem_fill, 1)  # Assuming raster_data is the modified array