# Example downloader of the Sentinel 1 and Sentinel 2 data from Copernicus
---
**Author**: [Mattia C. Mancini](https://github.com/mcmancini) -- (m.c.mancini@exeter.ac.uk)  
**Date**: March 20th, 2024  

---

This notebook goes through the process to identifying and downloading the Sentinel 1 data from the [Copernicus Data Space Ecosystem](https://dataspace.copernicus.eu/).

## Step 1: User registration with Copernicus and set up of API credentials
To access Copernicus Sentinel data users will need to [self-register](https://documentation.dataspace.copernicus.eu/Registration.html) on the [Copernicus Data Space Ecosystem](https://dataspace.copernicus.eu/). More information can be found [here](https://scihub.copernicus.eu/).  
We will download data using [APIs](https://scihub.copernicus.eu/twiki/do/view/SciHubWebPortal/APIHubDescription) and the [CDSETool](https://github.com/SDFIdk/CDSETool) python package. Given to the ongoing (as of March 2024) migration from Copernicus Science Hub to the new Data Space Ecosystem, previous packages such as the [Sentinelsat](https://sentinelsat.readthedocs.io/) no longer work. The CDSETool package is a community-developed package that uses [OpenSearch](https://documentation.dataspace.copernicus.eu/APIs/OpenSearch.html) which I find better documented when it comes to features to query compared to [OData](https://documentation.dataspace.copernicus.eu/APIs/OData.html). In case of interest, an official example from the [EU-CDSE Github repository](https://github.com/eu-cdse) using OData can be found [here](https://github.com/eu-cdse/notebook-samples/blob/c0e0ade601973c5d4e4bf66a13c0b76ebb099805/sentinelhub/migration_from_scihub_guide.ipynb).  
  

For the APIs to work, the user must specify authentication credentials. These are the ones created when registering with the Copernicus Data Space Ecosystem. To store and access the credentials, create with a text editor a *.netrc* file with the following content:
```
machine https://identity.dataspace.copernicus.eu/auth/realms/CDSE/protocol/openid-connect/token
login <your username>
password <your password>
```
Save this file in the user home directory. For Windows machines, this is usually `C:\Users\<username>\`

## Step 2: Connect to the API and query data
The first step is to generate a connection to the API with an instance of the class `Credentials` from the **CDSETool** python package.  
As we stored user credentials in the *.netrc* file, we do not need to pass them explicitly when generating an instance of the `Credentials` class.

In [None]:
# Update sys path so notebook can access the agromanagement package
import sys
sys.path.append('../')

In [None]:
from cdsetool.credentials import Credentials

# connect to the API
credentials = Credentials()

## Step 3: select an area of interest to download the Sentinel data
As for the downloader of the Sentinel 2 L2A data, we identify tiles with the data to donwload from user-inputted geometries. For us, this will likely be field parcels in the UK, which are shapefiles defined in the [CEH Vector Land Cover maps](https://www.ceh.ac.uk/data/ukceh-land-cover-maps). From the geometry of a selected parcel, we will query the correct Sentinel 1 data to download.

Let's load a parcel:

In [None]:
import geopandas as gpd

# Load a parcel from data
parcels = gpd.read_file("../resources/" + "lcm2021_tile_11_1014_647.geojson")
parcel = parcels.iloc[1025:1026, :]
print(parcel)

We can now plot it:

In [None]:
import matplotlib.pyplot as plt

# plot the parcel
parcel.plot()
plt.show()

The geometry of the parcel is what we will be using to query the Sentinel 1 data. The API requires that the footprint of the location of interest is passed in geo-json format

In [None]:
from shapely.geometry import mapping

# extract geometry and convert to geojson
geometry = parcel.iloc[0, :]["geometry"]
geom_json = mapping(geometry)
print(geom_json)

## Step 4: select a temporal timeframe of interest
Here we specify the temporal timeframe for which we want data to be downloaded. The Sentinel-1 satellites have a 12-day repeat cycle. We can define dates as `string` objects.

In [None]:
start_date = "2019-01-01"
end_date = "2019-01-31"

## STEP 5: define the product to download
Now that we have an API connection, a location and a temporal timeframe of interest, we need to specify the data that we want to download.  
As we are using this data to compute Leaf Area Index following the algorithm from [Myrgiotis and Vasilis (2021)](https://datashare.ed.ac.uk/handle/10283/4086), we are interested here in the Sentinel 2 L2A and Sentinel-1 GRD S1-VV/VH dat, but other data can be specified as well. The way the CDSETool package works is to define a collection of search terms or *properties*. I found in the documentation of OpenSearch the list of the available [OpenSearch API search keywords](https://scihub.copernicus.eu/userguide/FullTextSearch#Search_Keywords); however, these do NOT work! (not sure whether I misunderstood how the API works...). The list of available is available invoking the `cdsetool.query()` function from the CDSETool query module.  



The values that can be assigned to each of these properties do not always match the [Sentinel 1 naming conventions](https://sentinels.copernicus.eu/web/sentinel/user-guides/sentinel-1-sar/naming-conventions) or [Sentinel 2 naming conventions](https://sentinels.copernicus.eu/web/sentinel/user-guides/sentinel-2-msi/naming-convention): for example, for Sentinel 1, `PRODUCT_TYPE` is defined in the properties for the API call as `"IW_GRDH_1S"` rather than `GRD` as one would derive from naming conventions. For Sentinel 2, it is defined as `"S2MSI2A"` rather than `"MSIL2A"`. There are also some other oddities: if wanting to download data with dual polarisation (*DV* in naming conventions) VV+VH, I could not figure out a way to declare it. While it is defined in the returned feature properties as `polarisation: "VV&VH"`, if passed into the search as such will not work and return an empty list of features. None of the following works: `"VVVH", "VV&VH", "VV+VH", "VV/VH", "DV"`. 

This sample code should allow to download the data we need for the algorithm from [Myrgiotis and Vasilis (2021)](https://datashare.ed.ac.uk/handle/10283/4086) for Sentinel 1:

```
COLLECTION = "Sentinel1"
PRODUCT_TYPE = "IW_GRDH_1S"


search_terms = {
    "maxRecords": "2000",
    "startDate": start_date,
    "completionDate": end_date,
    "geometry": geometry,
    "productType": PRODUCT_TYPE,
}

```

This for Sentinel 2:

```
COLLECTION = "Sentinel2"
PRODUCT_TYPE = "S2MSI2A"


search_terms = {
    "maxRecords": "2000",
    "startDate": start_date,
    "completionDate": end_date,
    "geometry": geometry,
    "productType": PRODUCT_TYPE,
}
```

**NB**  
Make sure to change `maxRecords` in your actual script to remove the limit on the features retrieved!!!

In [None]:
from cdsetool.query import describe_collection

# List all search terms available for the Sentinel1 collection
search_properties = describe_collection("Sentinel1").keys()
for key in search_properties:
    print(key)

#### Retrieve Sentinel 1 data:

In [None]:
from cdsetool.query import query_features

COLLECTION = "Sentinel1"
PRODUCT_TYPE = "IW_GRDH_1S"

search_terms = {
    "maxRecords": "2000",
    "startDate": start_date,
    "completionDate": end_date,
    "geometry": geometry,
    "productType": PRODUCT_TYPE,
}

sentinel_1_feature_list = query_features(
    collection=COLLECTION, search_terms=search_terms)
print(len(sentinel_1_feature_list))

#### Retrieve Sentinel 2 data:

In [None]:
from cdsetool.query import query_features

COLLECTION = "Sentinel2"
PRODUCT_TYPE = "S2MSI2A"

search_terms = {
    "maxRecords": "2000",
    "startDate": start_date,
    "completionDate": end_date,
    "geometry": geometry,
    "productType": PRODUCT_TYPE,
}

sentinel_2_feature_list = query_features(
    collection=COLLECTION, search_terms=search_terms)
print(len(sentinel_2_feature_list))

## Step 6: download the data

In [None]:
import os
from cdsetool.download import download_feature

# N.B.: These are large files, about 1.7Gb. It will take more than 10 minutes to run this section!

SENTINEL_1_OUTPUT_FOLDER = os.path.join("..", "resources", "sentinel_1")
SENTINEL_2_OUTPUT_FOLDER = os.path.join("..", "resources", "sentinel_l2a")

# create the folders if they do not exist
if not os.path.exists(SENTINEL_1_OUTPUT_FOLDER):
    os.makedirs(SENTINEL_1_OUTPUT_FOLDER)

if not os.path.exists(SENTINEL_2_OUTPUT_FOLDER):
    os.makedirs(SENTINEL_2_OUTPUT_FOLDER)

### Bulk download

In [None]:
# N.B.: Each file is about 1.7Gb, so this can take more than 6 hours to run depending on Internect speed.

for i in sentinel_1_feature_list:
    download_feature(i, SENTINEL_1_OUTPUT_FOLDER)
    print(f"feature {i} downloaded")

for i in sentinel_2_feature_list:
    download_feature(i, SENTINEL_2_OUTPUT_FOLDER)
    print(f"feature {i} downloaded")

### Download individual files

In [None]:
# Download the first in the list of features for Sentinel 1 data.
s1_feature = sentinel_1_feature_list[0]
download_feature(s1_feature, SENTINEL_1_OUTPUT_FOLDER)

# Download the first in the list of features for Sentinel 2 L2A data.
s2_feature = sentinel_2_feature_list[0]
download_feature(s2_feature, SENTINEL_2_OUTPUT_FOLDER)