# 3.3 Exercise: Working with OpenEO on Copernicus Data

Now that we have familiarized ourselves with the concepts of the Copernicus Data Space ecosystem and openEO, it is time to get our hands dirty. In this exercise, we will use Sentinel-2 satellite data to get a first impression of the impact of the flood disaster in Ahrweiler, Germany, which occurred on July 14-15, 2021.

In the first part of this exercise, we will focus on creating the pre-flood NDVI image using Sentinel-2 data. In a later section, you'll have the opportunity to create the post-flood NDVI image yourself and observe the changes in vegetation.

## 3.3.1 Installing the needed software
To configure our software environment, we need to install several Python libraries which enable us to use openEO and some functions for analyzing and visualizing the satellite data:
- `openeo`: To access Earth Observation data via the Copernicus Data Space Ecosystem.
- `numpy`: For numerical data processing.
- `matplotlib`: For visualization.
- `rasterio`: To read and process geospatial TIFF files.
- `os`: For interacting with the file system.

If you're interested to learn more about the openeo python library, you can find the detailed documentation here: https://openeo.org/documentation/1.0/python/.

You can install these libraries using the following command if any of imports should fail:
```bash
!pip install openeo numpy matplotlib rasterio

In [None]:
import openeo
import numpy as np
import matplotlib.pyplot as plt
import rasterio
from skimage import exposure
import os

## 3.3.2 Sign in with your Account

To begin with, we need to connect to the Copernicus Data Space Ecosystem and authenticate ourselfes as a registered user. This enables us to get access to the data and services of the CDSE platform’s data and services. 

**Before you continue with the notebook, make sure that you have successfully registered as a user of the Copernicus Data Space Ecosystem at https://dataspace.copernicus.eu/**


In [None]:
# to authenticate, we use the following function provided by the openeo library:

connection = openeo.connect(url="openeo.dataspace.copernicus.eu")
connection.authenticate_oidc()

# after activating this cell, a link to CDSE's authentication widget will be displayed
# follow the link and confirm that this notebook can access the CDSE with your credentials
# after successfully completing this step a green checkmark followed by "Authorized successfully" shoukd appear

## 3.3.3 Creating filters for retrieving the Sentinel 2 data

Next, we want to retrieve usable Sentinel-2 data for the area around Ahrweiler in Germany, where the heavy rainfall event occurred on July 14 and 15, 2021.

To focus our search on the region of Ahrweiler, we use a bounding box as a **spatial filter**. We define the coordinates of the bounding box of our area of interest in such a way that the region around Ahrweiler is enclosed.  


In [None]:
# area of interest covers the region around Ahrweiler in Germany

areaOfInterest_bbox = {
    "west": 7.055496548917159,
    "south": 50.52018836230043,
    "east": 7.117575039388905,
    "north": 50.546364682516895
} 


As for the **temporal filter** we will focus on the period shortly before the flood, i.e. up to July 13, 2021. 
Since we do not know exactly when a usable image was taken for our target area at that time, we specify a time period that includes the last few days before the weather event.


In [None]:
# time period before the rainfall event, where we hope that some usable Sentinel-2 images have been captured

period_before = ["2021-06-12", "2021-06-14"] 


## 3.3.4 Loading Sentinel 2 data with a Datacube

To calculate the NDVI (Normalized Difference Vegetation Index), we load the bands B08 (NIR) and B04 (Red) of the Sentinel-2 data as a datacube. Datacubes are a key concept in OpenEO. If you want to learn more about datacubes, follow this link: https://openeo.org/documentation/1.0/datacubes.html. 


In [None]:
s2_data_before = connection.load_collection(
    "SENTINEL2_L2A",
    spatial_extent=areaOfInterest_bbox,
    temporal_extent=period_before,
    bands=["B04", "B08"]
)

## 3.3.5 Calculating the NDVI
For this analysis, we will use the B04 (red) and B08 (near-infrared) bands of Sentinel-2, which are essential for NDVI calculations. This is a common index used to measure vegetation health. 
Luckily for us there already exists a function for that from the OpenEO API. OpenEO has many predefined functions like this, called processes, to perform calculations or mainpulate your data. These processes are also documented and can be found here: https://openeo.org/documentation/1.0/processes.html.

The NDVI formula is: NDVI = (NIR - Red) / (NIR + Red)

This index helps us to get a first impression of the impact of the flood on the vegetation. We will first create the NDVI image for the time before the flood and then compare it with the situation after the flood.

In [None]:
ndvi_before = s2_data_before.ndvi(nir="B08", red="B04")

## 3.3.6 Saving the NDVI Result as a GeoTIFF
In this step, we calculate the mean NDVI over time for the pre-flood data. This reduces the temporal dimension and allows us to generate a single NDVI image representing the average vegetation health before the flood.

We save the resulting NDVI image in GeoTIFF format. The GeoTIFF format allows us to store geospatial raster data with the correct geographic coordinates, ensuring that it can be easily loaded into GIS software for further analysis.

In [None]:
ndvi_mean_before = ndvi_before.reduce_dimension(dimension="t", reducer="mean")

result_before = ndvi_mean_before.save_result(format="GTiff")

## 3.3.7 Creating and Starting a Batch Job
Since the processing of satellite data can take some time, we use OpenEO's batch job feature. A batch job processes the data asynchronously on the OpenEO backend, and once completed, we can download the results.

Here, we create and start a batch job to process the pre-flood NDVI data. Once the job is finished, the resulting NDVI image is downloaded and saved as a GeoTIFF file.

As the batch job processes, it’s important to monitor the job’s progress. This ensures that we know when the processing is complete and can download the results.

Once the job is finished (status = "finished"), the results can be downloaded directly, as shown in the previous step. If the job is still running or encounters an error, appropriate action can be taken.

In [None]:
output_dir = "results" 
sampleOutput_dir = "sampleOutput"

In [None]:
job_before = result_before.create_job()
job_before.start_and_wait()
if job_before.status() == "finished":
    job_before.get_results().download_files(output_dir)
    print("NDVI before flood downloaded successfully.")
else:
    print("Error in downloading NDVI before flood!")

downloaded_files = os.listdir(output_dir)
for filename in downloaded_files:
    if filename == "openEO.tif":
        file_path = os.path.join(output_dir, filename)
        print(f"Renaming {filename} to NDVI_Before_Flood.tif")
        os.rename(file_path, os.path.join(output_dir, "NDVI_Before_Flood.tif"))

## 3.3.8 Show resulting .tif file
After downloading and renaming the GeoTIFF file, we can visualize the pre-flood NDVI image. By using rasterio and matplotlib, we can load and display the NDVI data with a suitable color map to interpret vegetation health. 
This visualization helps us to understand the state of the vegetation before the flood and serves as a baseline for later comparison with post-flood data.

In [None]:
def plot_ndvi(filename, output_dir):
    file_path = os.path.join(output_dir, filename)
    
    if os.path.exists(file_path):
        with rasterio.open(file_path) as src:
            ndvi_data = src.read(1)
            ndvi_data = np.clip(ndvi_data, -1, 1)
            
            plt.figure(figsize=(10, 10))
            plt.title(f'NDVI Image - {filename}')
            plt.imshow(ndvi_data, cmap='RdYlGn', vmin=-1, vmax=1)
            plt.colorbar()
            plt.show()
    else:
        print(f"File {filename} not found!")

plot_ndvi("NDVI_Before_Flood.tif", output_dir)

## 3.3.9 Your Turn
Now that you have seen the workflow try recreating it. Therefor fill the gaps in the following code snippet and run them after that. 

We want to get a second .tif file for our area of interest that shows the situation after the flood. Therefor we want to look for Sentinel-2 data within a time period after the flood, i.e. for example between the 2021.07.20 and 2021.07.22. 

Compare the results with the pre-flood NDVI to observe the impact of the flood on the vegetation. 

Once you've generated the post-flood NDVI image, we compute the NDVI difference to assess the changes in vegetation and identify areas that were most affected by the flood.

In [None]:
# creating the time filter for the period shortly after the rainfall event 
period_after =                  # Please insert a time period here.

# load Sentienl 2 data with datacube
s2_data_after = connection.load_collection(
    "SENTINEL2_L2A",
    spatial_extent=areaOfInterest_bbox,
    temporal_extent=period_after,
    bands=                      # Please insert the needed bands.
)

# calculation of NDVI
ndvi_after = s2_data_after.ndvi(nir="B08", red="B04")

# use a reducer if necessary
ndvi_mean_after = ndvi_after.reduce_dimension(dimension="t", reducer="mean")

# Saving the NDVI result as a .tif file
result_after = ndvi_mean_after.save_result(format="GTiff")

# creating and starting the job
job_after = result_after.create_job()
job_after.start_and_wait()
if job_after.status() == "finished":
    job_after.get_results().download_files(output_dir)
    print("NDVI after flood downloaded successfully.")
else:
    print("Error in downloading NDVI after flood!")

# Renaming file
downloaded_files = os.listdir(output_dir)
for filename in downloaded_files:
    if filename == "openEO.tif":
        file_path = os.path.join(output_dir, filename)
        print(f"Renaming {filename} to NDVI_After_Flood.tif")
        os.rename(file_path, os.path.join(output_dir, "NDVI_After_Flood.tif"))

## 3.3.10 Comparing pre- and post-flood situation
Now you can display both NDVI images side by side and compare the situation in Ahrweiler before and after the heavy rainfall event. 

If something goes wrong with the image you creates for the time after the rainfall event (step 3.3.9), you can also use our example solution (AfterFlood_NDVI).

In [None]:
# definition of the NDVI-files that shall be plotted
files = [
    ("NDVI_Before_Flood.tif", output_dir, "Before Flood"),
    ("NDVI_After_Flood.tif", output_dir, "After Flood (your result)")]

# if something went wrong with computing "NDVI_After_Flood.tif" you can use our sample solution "sampleOutput\NDVI_After_Flood_Control.tif" 
# just by decommenting the following lines

#files = [
#    ("NDVI_Before_Flood.tif", output_dir, "Before Flood"),
#    ("NDVI_After_Flood_Control.tif", sampleOutput_dir, "After Flood (sample solution)")]




In [None]:

def plot_ndvi_images(files_and_dirs):
    fig, axes = plt.subplots(1, 2, figsize=(20, 10))

    for ax, (filename, directory, title) in zip(axes, files_and_dirs):
        file_path = os.path.join(directory, filename)
        
        if os.path.exists(file_path):
            with rasterio.open(file_path) as src:
                data = src.read(1)
                data = np.clip(data, -1, 1)

            ax.imshow(data, cmap='RdYlGn', vmin=-1, vmax=1)
            ax.set_title(f'NDVI {title}')
            ax.axis('off')
        else:
            print(f"File {filename} not found in directory {directory}.")

    cbar = fig.colorbar(plt.cm.ScalarMappable(cmap='RdYlGn'), ax=axes, orientation='vertical', fraction=0.02, pad=0.04)
    cbar.set_label('NDVI Value')

    plt.show()

plot_ndvi_images(files)

## 3.3.11 Comparing pre- and post-flood situation using the NDVI difference

After calculating the NDVI values for both pre-flood and post-flood data, we will visualize the NDVI difference to understand how the flood impacted vegetation in the Ahrweiler region.

The color scale:
- **Green**: Indicates areas where vegetation has even improved.
- **Red**: Indicates areas where vegetation health has decreased, possibly due to flooding.

Below is an example plot showing the NDVI difference:
```python
plt.imshow(ndvi_difference, cmap="RdYlGn")
plt.colorbar(label="NDVI Difference")
plt.title("NDVI Difference (After - Before)")
plt.show()

In [None]:
def calculate_ndvi_difference(before_path, after_path):
    with rasterio.open(before_path) as before_src:
        ndvi_before = before_src.read(1)
    
    with rasterio.open(after_path) as after_src:
        ndvi_after = after_src.read(1)

    ndvi_before = np.clip(ndvi_before, -1, 1)
    ndvi_after = np.clip(ndvi_after, -1, 1)
    
    ndvi_diff = ndvi_after - ndvi_before
    
    plot_ndvi_difference(ndvi_diff)

def plot_ndvi_difference(ndvi_diff):
    plt.figure(figsize=(10, 10))
    plt.title('NDVI Difference (After - Before)')
    plt.imshow(ndvi_diff, cmap='RdYlGn', vmin=-1, vmax=1)
    plt.colorbar(label='NDVI Difference')
    plt.show()

before_path = os.path.join(files[0][1], files[0][0])
after_path = os.path.join(files[1][1], files[1][0])

calculate_ndvi_difference(before_path, after_path)


## Congratulations!!!

You have mastered this exercise. You have seen how easily you can get to EO data and process it through the openEO API and Copernicus Data Space Ecosystem.

In this module, you learned how to retrieve and analyze Sentinel-2 data using NDVI to assess the impact of the Ahrweiler flood. By comparing the pre- and post-flood vegetation health, you can now identify areas most affected by the disaster.

The use of OpenEO's batch processing and geospatial data handling in Python allows for efficient, large-scale environmental monitoring, which can be applied to many other natural disasters or environmental changes.

You are invited to play around with the code to better understand its details.

As soon as you've finished please go back to the tutorial document and follow the last remaining steps of this module.