<span style="color:red; font-family:Helvetica Neue, Helvetica, Arial, sans-serif; font-size:2em;">An Exception was encountered at '<a href="#papermill-error-cell">In [5]</a>'.</span>

In [1]:
from eoxhub import check_compatibility
check_compatibility("user-2022.07-00", dependencies=["SH"])



---------

The following environment variables are available:

* `SH_CLIENT_ID`, `SH_INSTANCE_ID`, `SH_CLIENT_NAME`, `SH_CLIENT_SECRET`


# How to bring your own data to EDC: Using Sentinel Hub Python package
---
Source: [Bring Your Own COG documentation](https://sentinelhub-py.readthedocs.io/en/latest/examples/byoc_request.html)

## Getting started with Bring Your Own COG(BYOC)
Sentinel Hub allows you to access your own data stored in your S3 bucket with the powerful Sentinel Hub API. Since data remains on your bucket, you keep full control over it. This functionality requires no replication of data and allows you to exercise the full power of the Sentinel Hub service including Custom algorithms. [More information here!](https://www.sentinel-hub.com/bring-your-own-data/).

The [Sentinel Hub Dashboard](https://apps.sentinel-hub.com/dashboard/) has a very user-friendly “Bring your own COG” tab. If you are not going to be creating collections, adding/updating collection tiles, etc. daily, the Dashboard tool is your friend. For the rest, this tutorial is a simple walk-through on creating, updating, listing, and deleting your BYOC collections through Python using `sentinelhub-py`.

## Outline
In this demonstration Jupyter Notebook, based on the [Sentinel Hub Python package BYOC examples](https://sentinelhub-py.readthedocs.io/en/latest/examples/byoc_request.html#), we will learn how to:
- [Set up for prerequisites](#Set-up-for-prerequisites)
  - [Imports](#Imports)
  - [Credentials](#Credentials)
- [Manage BYOC collections](#Manage-BYOC-collections)
  - [Create new collection](#Create-new-collection)
  - [Get a list of your collections](#Get-a-list-of-your-collections)
  - [Update existing collection](#Update-existing-collection)
  - [Delete collection](#Delete-collection)
- [Manage BYOC tiles (cogs in the collection)](#Manage-BYOC-tiles-(cogs-in-the-collection))
  - [Creating a new tile (and ingesting it to collection)](#Creating-a-new-tile-(and-ingesting-it-to-collection))
  - [Add multiple tiles to a single collection](#Add-multiple-tiles-to-a-single-collection)
  - [Get tiles from your collection](#Get-tiles-from-your-collection)
  - [Visualize the tiles in your collection](#Visualize-the-tiles-in-your-collection)
  - [Update and delete a tile](#Updating-and-deleting-a-tile)
- [Retrieve data from collection](#Retrieve-data-from-collection)

import sys
sys.path.append("/home/jovyan")
from credentials import *## Set up for prerequisites
Before accessing the data, we will start by importing the necessary Python libraries (already configured in your EDC workspace) and generate credentials automatically to access the services.

### Imports

In [2]:
# EDC libraries
from edc import setup_environment_variables

# Utilities
import os
import boto3
import numpy as np
import datetime as dt
import pandas as pd
import geopandas as gpd
import matplotlib.pyplot as plt

# Sentinel Hub
from sentinelhub import (SHConfig, DataCollection, Geometry, BBox, CRS, 
                         SentinelHubRequest, filter_times, bbox_to_dimensions, MimeType, 
                         SentinelHubBYOC, ByocCollection, ByocTile, ByocCollectionAdditionalData,
                         DownloadFailedException)

In [3]:
# Configure plots for inline use in Jupyter Notebook
%matplotlib inline

### Credentials
Credentials for Sentinel Hub services are automatically injected as environement variables. It is therefore easy to populate Sentinel Hub's credential manager with the values.

In [4]:
# Pass Sentinel Hub credentials to SHConfig
config = SHConfig()
config.sh_client_id = os.environ["SH_CLIENT_ID"]
config.sh_client_secret = os.environ["SH_CLIENT_SECRET"]

This example notebook will demonstrate how to ingest tiles from AWS S3 bucket taking [Sentinel-2 L2A 120m Mosaic data](https://registry.opendata.aws/sentinel-s2-l2a-mosaic-120/) as an example. You will need aws credentials to access data on AWS S3 bucket. Please create a text file named `custom.env` in your home directory with the following input:

```
AWS_ACCESS_KEY_ID = "<aws_access_key_id>"
AWS_SECRET_ACCESS_KEY = "<aws_secret_access_key>"
```

<span id="papermill-error-cell" style="color:red; font-family:Helvetica Neue, Helvetica, Arial, sans-serif; font-size:2em;">Execution using papermill encountered an exception here and stopped:</span>

In [5]:
config.aws_access_key_id = os.environ["AWS_ACCESS_KEY_ID"]
config.aws_secret_access_key = os.environ["AWS_SECRET_ACCESS_KEY"]

KeyError: 'AWS_ACCESS_KEY_ID'

## Manage BYOC collections
`SentinelHubBYOC` class holds the methods for interacting with Sentinel-Hub services. Let’s initialize it with our `config`:

In [None]:
# Initialize SentinelHubBYOC class
byoc = SentinelHubBYOC(config=config)

### Create new collection
The easiest way to create a collection is to use its dataclass:

In [None]:
new_collection = ByocCollection(name='new collection', s3_bucket='byoc-tutorial-bucket')

The `new collection` is accessible on `byoc-tutorial-bucket` s3 bucket (please see how to configure your bucket for Sentinel-Hub service [here](https://docs.sentinel-hub.com/api/latest/data/byoc/#aws-bucket-settings)).


In [None]:
created_collection = byoc.create_collection(new_collection)

In [None]:
created_collection_name = created_collection['name']
created_collection_id = created_collection['id']
print('name:', created_collection_name)
print('id:', created_collection_id)

### Get a list of your collections
Now we have created a data collection named `new collection`, we can retrieve it with the following code.

In [None]:
my_collection = byoc.get_collection(created_collection_id)

Let's have a look at the collection we just created.

In [None]:
print(f"name: {my_collection['name']}")
print(f"collection id: {my_collection['id']}")

In cases where you have a large amount of collections and you would only like to load collection info for a few collections, the following code would be a good option for you:

In [None]:
collections_iterator = byoc.iter_collections()

In [None]:
my_collection_using_next = next(collections_iterator)
my_collection_using_next
print('name:', my_collection_using_next['name'])
print('id:', my_collection_using_next['id'])

**Note:** `next()` will only show the first collection on its first execution, the second collection for its second execution, and so on. if you already have collections the code below won't necessarily show the one we just created.

If you prefer to work with dataclasses, you can also run the following code:

```python
my_collection = ByocCollection.from_dict(next(collections_iterator))
```

One can of course retrieve all of them in one go like so:

In [None]:
my_collections = list(collections_iterator)

for collection in my_collections:
    print((collection['name'], collection['id']))

A useful way for managing your collections is `pandas.DataFrame` you can create like so:

In [None]:
my_collections_df = pd.DataFrame(data=list(byoc.iter_collections()))
my_collections_df[['id','name','created']].head()

### Update existing collection

Anything you can do on Dashboard, Bring your own COG tab, you can do programmatically as well. Below we're going to rename the 'byoc-tutorial' collection to 'byoc-tutorial-renamed':

In [None]:
my_collection['name'] = 'renamed new collection'

When using `next()`, run the following code:

```python
my_collection['name'] = 'renamed new collection'
```

When using dataclass, run the following code:
```python
my_collection.name = 'renamed new collection'
```

When using `list`, run the following code:
```python
collection_to_be_updated = [x for x in my_collections if x['name']=='new collection'][0]
collection_to_be_updated['name'] = 'renamed new collection'
```

**Note:** While you can change other fields as well, `s3_bucket` cannot be changed, and the `bitDepth` of bands in the collection is something that is pertinent to the COGs themselves and populated during the ingestion.

To update the collection, call:

In [None]:
byoc.update_collection(my_collection)

Now we can see that the `new collection` collection has been renamed as `renamed new collection`.

In [None]:
get_renamed_collection = byoc.get_collection(created_collection_id)
print("name:", get_renamed_collection['name'])
print("id:", get_renamed_collection['id'])

### Delete collection

If you are the owner of the collection, you can also delete it. Let's delete the `renamed new collection` collection we just created.

<div class="alert alert-warning">

**Warning:**
    
Beware! Deleting the collection will also delete all its tiles!
</div>

In [None]:
byoc.delete_collection(my_collection)

The collection can also be deleted via passing its id to `byoc.delete_collection()` as shown below:
```python
byoc.delete_collection(my_collection['id'])
```

Trying to access this collection now will fali.

In [None]:
try:
    deleted_collection = byoc.get_collection([x for x in my_collections if x['id']==created_collection_id][0])
except DownloadFailedException as e:
    print(e)

## Manage BYOC tiles (cogs in the collection)
Your data needs to be organized into collections of tiles. Each tile needs to contain a set of bands and (optionally) an acquisition date and time. Tiles with the same bands can be grouped into collections. Think of the Sentinel-2 data source as a collection of Sentinel-2 tiles.

Tiles have to be on an s3 bucket and need to be in COG format. We will not go into details about the COGification process; users can have a look at the documentation or use the BYOC tool that will take care of creating a collection and ingesting the tiles for you.

### Creating a new tile (and ingesting it to collection)
When we create a new tile and add it to the collection, the ingestion process on the Sentinel-Hub side will happen, checking if the tile corresponds to the COG specifications as well as if it conforms to the collection. For more information refer to the [Bring Your Own COG API documentation](https://docs.sentinel-hub.com/api/latest/api/byoc/).

The simplest way to create a new tile is by using the `ByocTile` dataclass, which will complain if the required fields are missing. In the following cell we will show how to ingest [Sentinel-2 L2A 120m Mosaic](https://registry.opendata.aws/sentinel-s2-l2a-mosaic-120/) data listed on open registry data on AWS.

In [None]:
new_tile = ByocTile(path='2019/11/27/28V/(BAND).tif',
                    sensing_time=dt.datetime(2019, 11, 27, 0, 0, 0)
                   )

**Note:** 
- The most important field of the tile is its `path` on an s3 bucket. For example, if your band files are stored in `s3://bucket-name/folder/`, then set `folder` as the tile path. In this case, the band names will equal the file names. For example, the band B1 corresponds to the file `s3://bucket-name/folder/B1.tiff`. If your file names have something other than just the band name, such as a prefix, this is fine as long as the prefix is the same for all files. In this case, the path needs to include this prefix and also the band placeholder: `(BAND)`. Adding the extension is optional. For example, this is what would happen if you would use the following path `folder/tile_1_(BAND)_2019.tiff` for the following files:
  - `s3://bucket-name/folder/tile_1_B1_2019.tiff` - the file would be used, the band name would be B1
  - `s3://bucket-name/folder/tile_1_B2_2019.tiff` - the file would be used, the band name would be B2
  - `s3://bucket-name/folder/tile_2_B1_2019.tiff` - the file would not be used
  - `s3://bucket-name/folder/tile_2_B2_2019.tiff` - the file would not be used
- `ByocTile` takes `sensing_time` as optional parameters, but setting the `sensing_time` is highly recommended since it makes the collection “temporal” and help you search for the data with Sentinel Hub services. 
- `tile_geometry` is optional as it is the bounding box of the tile and will be read from COG file. 
- `cover_geometry` is the geometry of where the data (within the bounding box) is and can be useful for optimized search as an optional parameter. For a good explanation of the `coverGeometry` please see [docs](https://docs.sentinel-hub.com/api/latest/data/byoc/#a-note-about-cover-geometries). 

Let's [create a new collection](#Create-new-collection) for these tiles.

In [None]:
new_collection = ByocCollection(name='byoc-s2l2a-120m-mosaic', s3_bucket='sentinel-s2-l2a-mosaic-120')
created_collection = byoc.create_collection(new_collection)

In [None]:
created_tile = byoc.create_tile(created_collection, new_tile)

The response from `byoc.create_tile` has a valid `id`, and its `status` is set to `WAITING`. Checking the tile `status` after a while (by [requesting this tile](#Get-tiles-from-your-collection)) will tell you if it has been `INGESTED` or if the ingestion procedure `FAILED`. In case of failure, additional information (with the cause of failure) will be available in the tile `additional_data`.

In [None]:
created_tile

### Add multiple tiles to a single collection
A data collection can for sure contain multiple tiles. It is important to know that adding multiple tiles will work only if these tiles have the same bands. Let's add more tiles from the [Sentinel-2 L2A 120m Mosaic](https://registry.opendata.aws/sentinel-s2-l2a-mosaic-120/) listed on the open data registry on AWS to the collection.

We first define a function to get a list of paths for each tile:

In [None]:
def list_objects_path(bucket, aws_access_key_id, aws_secret_access_key, y=None, m=None, d=None):
    tiles_path = []
    client = boto3.client('s3', aws_access_key_id=aws_access_key_id, aws_secret_access_key=aws_secret_access_key)
    result = client.list_objects(Bucket=bucket, Delimiter='/')
    for year in result.get('CommonPrefixes')[:y]:
        year_result = client.list_objects(Bucket=bucket, Delimiter='/', Prefix=year.get('Prefix'))
        for month in year_result.get('CommonPrefixes')[:m]:
            month_result = client.list_objects(Bucket=bucket, Delimiter='/', Prefix=month.get('Prefix'))
            for day in month_result.get('CommonPrefixes')[:d]:
                day_result = client.list_objects(Bucket=bucket, Delimiter='/', Prefix=day.get('Prefix'))
                for tile in day_result.get('CommonPrefixes'):
                    tiles_path.append(tile.get('Prefix'))
    return tiles_path

Next we obtain a list of paths for tiles available on `s3://sentinel-s2-l2a-mosaic-120/2019/1/1/`.

In [None]:
tiles_path = list_objects_path('sentinel-s2-l2a-mosaic-120', config.aws_access_key_id, config.aws_secret_access_key, y=1, m=1, d=1)

Then we can add tiles to the collection with a `for` loop.

In [None]:
for tile in tiles_path:
    byoc_tile = ByocTile(path=f'{tile}(BAND).tif', 
                         sensing_time=dt.datetime(int(tile.split('/')[0]), 
                                                  int(tile.split('/')[1]), 
                                                  int(tile.split('/')[2]), 
                                                  0, 
                                                  0, 
                                                  0)
                        )
    byoc.create_tile(created_collection, byoc_tile)

**Note:** The tile ingesting process could take some time, please wait a few more minutes after the cell has done running before heading to the next step.

### Get tiles from your collection
After `byoc.create_tile` being executed, we can request the tile from the collection where it is ingested. To request one specific tile in the collection, you can do it by passing the collection id and the tile id to the `get_tile` method.

In [None]:
tile = byoc.get_tile(collection=created_collection['id'], tile=created_tile['id'])

In [None]:
tile

You can of course retrieve all tiles into a list.

In [None]:
tiles = list(byoc.iter_tiles(created_collection))

In cases where you have a large collection with a lot of tiles and you would only like to load tile info for a few tiles, the following code using `next()` would be a good option for you:
```python
tile = next(byoc.iter_tiles(created_collection))
```

To convert it to ByocTile dataclass using the code below:
```python
tile = ByocTile.from_dict(next(byoc.iter_tiles(created_collection)))
```

Let's take a look at the keys of the first dictionary, which contains the info of the first tile, in the returned list.

In [None]:
list(tiles[0].keys())

To check if there's any tile failed to be ingested, run the code below:

In [None]:
tiles_failed_to_be_ingested = [x['path'] for x in tiles if x['status']=='FAILED']
tiles_failed_to_be_ingested

### Visualize the tiles in your collection
Using `ByocTile` dataclass, which will properly parse tile geometries, date-time strings, etc., one can create a `geopandas.GeoDataFrame`.

**Note:** the geometries can be in different coordinate reference systems, so a transform to a common CRS might be needed.

In [None]:
tile_iterator = byoc.iter_tiles(created_collection)

In [None]:
tiles_for_visualized = []
for i in range(100):
    tiles_for_visualized.append(ByocTile.from_dict(next(tile_iterator)))

tiles_gdf = gpd.GeoDataFrame(tiles_for_visualized, geometry=[t.cover_geometry.transform(CRS.WGS84).geometry for t in tiles_for_visualized], crs='epsg:4326')
tiles_gdf.head()

In [None]:
fig, ax = plt.subplots(figsize=(17,8))
tiles_gdf.plot(ax=ax);

In the above example, the ingested tiles are 100 tiles from the pull out from the [Sentinel-2 L2A 120m Mosaic](https://registry.opendata.aws/sentinel-s2-l2a-mosaic-120/) which contains 19869 tiles around the globe, hence the tiles are so sparse in the image above.

### Updating and deleting a tile

Updating and deleting a tile follow the same logic as updating/deleting a collection:
- To updatea tile:

In [None]:
tile_to_be_updated = byoc.get_tile(collection=created_collection['id'], tile=created_tile['id'])
tile_to_be_updated['sensingTime'] = '2021-06-29T18:02:34'
byoc.update_tile(created_collection, tile_to_be_updated)

After updating we can see that the `sensingTime` has been changed.

In [None]:
byoc.get_tile(collection=created_collection['id'], tile=created_tile['id'])['sensingTime']

- To delete a tile:

In [None]:
tile_to_be_deleted = byoc.get_tile(collection=created_collection['id'], tile=created_tile['id'])
byoc.delete_tile(created_collection, tile_to_be_deleted)

Now the tile is gone forever.

In [None]:
tiles = list(byoc.iter_tiles(created_collection))
[x for x in tiles if x['id']==created_tile['id']]

## Retrieve data from collection
Once we have a collection created and its tiles ingested, we can retrieve the data from said collection.
We will be using ProcessAPI for this.

In [None]:
data_collection = DataCollection.define_byoc(created_collection['id'])

Alternatively using dataclasee:
```python
data_collection = my_collection_dataclass.to_data_collection()
```

In [None]:
tile_time = dt.datetime.fromisoformat(tiles[0]['sensingTime'].split("T")[0])

If using dataclass run:
```python
tile_time = tile_dataclass.sensing_time
```

In [None]:
tiles[0]['sensingTime']

Below we're going to request a false color image of Caspian Sea.

In [None]:
caspian_sea_bbox = BBox([49.9604, 44.7176, 51.0481, 45.2324], crs=CRS.WGS84)

In [None]:
false_color_evalscript = """
//VERSION=3
function setup() {
  return {
    input: ["B08","B04","B03", "dataMask"],
    output: { bands: 4 },
  };
}

var f = 2.5/10000;
function evaluatePixel(sample) {
  return [f*sample.B08, f*sample.B04, f*sample.B03, sample.dataMask];
}
"""

request = SentinelHubRequest(
        evalscript=false_color_evalscript,
        input_data=[
            SentinelHubRequest.input_data(
                data_collection=data_collection,
                time_interval=tile_time
            )
        ],
        responses=[
            SentinelHubRequest.output_response('default', MimeType.PNG)
        ],
        bbox=caspian_sea_bbox,
        size=bbox_to_dimensions(caspian_sea_bbox, 100),
        config=config
    )

In [None]:
data = request.get_data()[0]

In [None]:
fig, ax = plt.subplots(figsize=(15, 10))

ax.imshow(data)
ax.set_title(tile_time.date().isoformat(), fontsize=10)

plt.tight_layout()