<img src='./img/opengeohub_logo.png' alt='OpenGeoHub Logo' align='right' width='15%'></img>

<br>

<a href="./00_index.ipynb"><< Index</a><br>
<a href="./03_WEKEO_dias_service.ipynb"><< 03 - WEkEO - Copernicus Data and Information Access Service</a><span style="float:right;"><a href="./05_google_earth_engine.ipynb">05 - Google Earth Engine>></a></span>

# Amazon Web Services - Open Data Registry

### About

[The Registry of Open Data on AWS]("https://registry.opendata.aws/?search=tags:gis,earth%20observation,events,mapping,meteorological,environmental,transportation") is a registry of open geospatial datasets on Amazon Web Services.
Large volumes of geospatial data are made available as S3 storage buckets and are available for download or to directly load it in AWS processing services.

Check out the program [Earth on AWS](https://aws.amazon.com/earth/) as well.



### Data

There is a variety of different `geospatial` data available. So far, the following Copernicus data are available:
* [Sentinel-1](https://registry.opendata.aws/sentinel-1/)
* [Sentinel-2](https://registry.opendata.aws/sentinel-2/)
* [Sentinel-3](https://registry.opendata.aws/sentinel-3/)
* [Sentinel-5P Level 2](https://registry.opendata.aws/sentinel5p/)


* [ECMWF ERA-5 climate reanalysis](https://registry.opendata.aws/ecmwf-era5/)

### How to retrieve data?

[boto3](https://boto3.amazonaws.com/v1/documentation/api/latest/index.html) is the Amazon Web Services (AWS) SDK for Python in order e.g. to access data on AWS S3 storage buckets.

Follow the steps below for an example on how to use boto3 to retrieve data.

<hr>

#### Load required libraries

In [36]:
import boto3
import botocore


<hr>

### Example to retrieve data with `boto3`

#### 1. Initiate the a `boto3 client` with `boto3.client()` and define the bucket of your interest.

Initiate the a `s3` client and define `meeo-s5p` as your bucket of interest.

In [3]:
client = boto3.client('s3', config=botocore.client.Config(signature_version=botocore.UNSIGNED))
s5p_bucket = 'meeo-s5p'

#### 2. Create a `paginator` and iterate of the results from the API request

`Paginators` are a feature of boto3 that act as an abstraction over the process of iterating over an entire result set of a truncated API operation. 

You must call the `paginate()` method of a Paginator in order to iterate over the pages of API operation results. 

In [14]:
paginator = client.get_paginator('list_objects_v2')
results = paginator.paginate(Bucket=s5p_bucket, Delimiter='/')
results

<botocore.paginate.PageIterator at 0x7feda851e0b8>

But the `paginator` object is just a `PageIterator`. In order to see the structure in which the data is organised on the S3 bucket, you have to iterate over the object: 

In [15]:
for prefix in results.search('CommonPrefixes'):
    print(prefix.get('Prefix'))

COGT/
NRTI/
OFFL/
RPRO/


#### 3. Paginate over the PageIteror with a `Prefix` parameter

Instead of the `Delimiter` keyword, the `paginate` function accepts also a `Prefix` parameter, which can be used to filter the paginated results by prefix on the server-side before sending them to the client:

In [16]:
prefix=('COGT/OFFL/L3__NO2__')
pages = paginator.paginate(Bucket=s5p_bucket, Prefix=prefix)
pages

<botocore.paginate.PageIterator at 0x7feda851e160>

The next step is now to iterate over the `Contents` of each page of the PageIterator `pages`. Since the resulting object are dictionaries, we get a much cleaner output by just selecting the `dictionary key`. 

You see that the data are daily aggregates and for each day, there are three differenct COG files with the following endings:
- `_4326`
- `_mask50_4326`
- `_mask75_4326`

In [25]:
for page in pages:
    for obj in page['Contents']:
#        print(obj)
        print(obj.get('Key'))

COGT/OFFL/L3__NO2___/2020/02/S5P_OFFL_L3__NO2____20200225_PRODUCT_daily_nitrogendioxide_tropospheric_column_4326.tif
COGT/OFFL/L3__NO2___/2020/02/S5P_OFFL_L3__NO2____20200225_PRODUCT_daily_nitrogendioxide_tropospheric_column_mask50_4326.tif
COGT/OFFL/L3__NO2___/2020/02/S5P_OFFL_L3__NO2____20200225_PRODUCT_daily_nitrogendioxide_tropospheric_column_mask75_4326.tif
COGT/OFFL/L3__NO2___/2020/02/S5P_OFFL_L3__NO2____20200226_PRODUCT_daily_nitrogendioxide_tropospheric_column_4326.json
COGT/OFFL/L3__NO2___/2020/02/S5P_OFFL_L3__NO2____20200226_PRODUCT_daily_nitrogendioxide_tropospheric_column_4326.tif
COGT/OFFL/L3__NO2___/2020/02/S5P_OFFL_L3__NO2____20200226_PRODUCT_daily_nitrogendioxide_tropospheric_column_mask50_4326.tif
COGT/OFFL/L3__NO2___/2020/02/S5P_OFFL_L3__NO2____20200226_PRODUCT_daily_nitrogendioxide_tropospheric_column_mask75_4326.tif
COGT/OFFL/L3__NO2___/2020/02/S5P_OFFL_L3__NO2____20200227_PRODUCT_daily_nitrogendioxide_tropospheric_column_4326.json
COGT/OFFL/L3__NO2___/2020/02/S5P_O

#### 4. Store the `dictionary_keys` in a Python list

Let us store the dictionary keys in a list with the name `key_list`.

In [27]:
i=0
key_list=[]
for page in pages:
    for obj in page['Contents']:
        print(i)
        print(obj.get('Key'))
        key_list.append(obj.get('Key'))
        i=i+1

0
COGT/OFFL/L3__NO2___/2020/02/S5P_OFFL_L3__NO2____20200225_PRODUCT_daily_nitrogendioxide_tropospheric_column_4326.tif
1
COGT/OFFL/L3__NO2___/2020/02/S5P_OFFL_L3__NO2____20200225_PRODUCT_daily_nitrogendioxide_tropospheric_column_mask50_4326.tif
2
COGT/OFFL/L3__NO2___/2020/02/S5P_OFFL_L3__NO2____20200225_PRODUCT_daily_nitrogendioxide_tropospheric_column_mask75_4326.tif
3
COGT/OFFL/L3__NO2___/2020/02/S5P_OFFL_L3__NO2____20200226_PRODUCT_daily_nitrogendioxide_tropospheric_column_4326.json
4
COGT/OFFL/L3__NO2___/2020/02/S5P_OFFL_L3__NO2____20200226_PRODUCT_daily_nitrogendioxide_tropospheric_column_4326.tif
5
COGT/OFFL/L3__NO2___/2020/02/S5P_OFFL_L3__NO2____20200226_PRODUCT_daily_nitrogendioxide_tropospheric_column_mask50_4326.tif
6
COGT/OFFL/L3__NO2___/2020/02/S5P_OFFL_L3__NO2____20200226_PRODUCT_daily_nitrogendioxide_tropospheric_column_mask75_4326.tif
7
COGT/OFFL/L3__NO2___/2020/02/S5P_OFFL_L3__NO2____20200227_PRODUCT_daily_nitrogendioxide_tropospheric_column_4326.json
8
COGT/OFFL/L3__NO

#### 5. Filter the relevant files

With Sentinel-5P data, you are interested in a conservative cloud factor of at least 75%. Let us selected only the files with the ending `_mask75_4326.tif`. We can do this by filtering the list `key_list`.

In [29]:
selected_files = list(filter(lambda x: x.endswith('mask75_4326.tif'), key_list))
selected_files

['COGT/OFFL/L3__NO2___/2020/02/S5P_OFFL_L3__NO2____20200225_PRODUCT_daily_nitrogendioxide_tropospheric_column_mask75_4326.tif',
 'COGT/OFFL/L3__NO2___/2020/02/S5P_OFFL_L3__NO2____20200226_PRODUCT_daily_nitrogendioxide_tropospheric_column_mask75_4326.tif',
 'COGT/OFFL/L3__NO2___/2020/02/S5P_OFFL_L3__NO2____20200227_PRODUCT_daily_nitrogendioxide_tropospheric_column_mask75_4326.tif',
 'COGT/OFFL/L3__NO2___/2020/02/S5P_OFFL_L3__NO2____20200228_PRODUCT_daily_nitrogendioxide_tropospheric_column_mask75_4326.tif',
 'COGT/OFFL/L3__NO2___/2020/02/S5P_OFFL_L3__NO2____20200229_PRODUCT_daily_nitrogendioxide_tropospheric_column_mask75_4326.tif',
 'COGT/OFFL/L3__NO2___/2020/03/S5P_OFFL_L3__NO2____20200301_PRODUCT_daily_nitrogendioxide_tropospheric_column_mask75_4326.tif',
 'COGT/OFFL/L3__NO2___/2020/03/S5P_OFFL_L3__NO2____20200302_PRODUCT_daily_nitrogendioxide_tropospheric_column_mask75_4326.tif',
 'COGT/OFFL/L3__NO2___/2020/03/S5P_OFFL_L3__NO2____20200303_PRODUCT_daily_nitrogendioxide_tropospheric_c

#### 6. Download the files of interest with `client.download_file()`

Now you are ready to go to download the specific files. The client function `client.download_file` takes the bucket name, key name and output folder to download the data.

For the example, we only select one image to download

In [34]:
selected_files[161]

'COGT/OFFL/L3__NO2___/2020/08/S5P_OFFL_L3__NO2____20200806_PRODUCT_daily_nitrogendioxide_tropospheric_column_mask75_4326.tif'

In [35]:
for i in selected_files[161:162]:
    print(i)
    tmp = i.split('/')
    print(tmp[5])
    client.download_file(s5p_bucket, i, './'+tmp[5])

COGT/OFFL/L3__NO2___/2020/08/S5P_OFFL_L3__NO2____20200806_PRODUCT_daily_nitrogendioxide_tropospheric_column_mask75_4326.tif
S5P_OFFL_L3__NO2____20200806_PRODUCT_daily_nitrogendioxide_tropospheric_column_mask75_4326.tif


<br>

<a href="./03_WEKEO_dias_service.ipynb"><< 03 - WEkEO - Copernicus Data and Information Access Service</a><span style="float:right;"><a href="./05_google_earth_engine.ipynb">05 - Google Earth Engine>></a></span><br>
<a href="./00_index.ipynb"><< Index</a>

<hr>
&copy; 2020 | Julia Wagemann
<a rel="license" href="http://creativecommons.org/licenses/by/4.0/"><img style="float: right" alt="Creative Commons Lizenzvertrag" style="border-width:0" src="https://i.creativecommons.org/l/by/4.0/88x31.png" /></a>