# Introduction to `eo-learn`

Welcome to a workshop about Earth observation (EO) processing framework in Python, called `eo-learn`. It was designed to fill a gap between EO data and machine learning. It's mission is connect existing software and build on top of it. It is easily scalable over large geographical areas and huge amounts of data.

With about a year since it's first public release it still being actively developed.

Here are some useful links:
- [Github repository](https://github.com/sentinel-hub/eo-learn)
- [Documentation page](https://eo-learn.readthedocs.io/en/latest/)
- [Blog posts and papers](https://github.com/sentinel-hub/eo-learn#blog-posts-and-papers)

Good luck!


## Installation

Please first read the official [installation instructions](https://eo-learn.readthedocs.io/en/latest/install.html). Additinal recommendations:

- At the moment the recommended way is using a virtual environment (`venv`, i.e. `python3.6 -m venv workshop_venv`) or `pipenv` Installing with `conda` might prove problematic.

- On Linux it is recommended to install system packages from  [CI build instructions](https://github.com/sentinel-hub/eo-learn/blob/master/.travis.yml#L12) first.

Let us know if you need any help.


## `eo-learn` building blocks

### `EOPatch`

The most basic object in the package is a data container, called `EOPatch`. 

- It is designed to store all types of EO data for a single geographical location.
- Typically it is defined by a bounding box and coordinate reference system.
- There is no limit how much data a single `EOPatch` can store, but typically it shouldn't be more than the size of your RAM.

EO data can be divided into categories, called "feature types" according to the following properties:

| `FeatureType` | Type of data | Time component |  Spatial component | Type of values | Python object | Shape |
| --- | --- | --- | --- | --- | --- | --- |
| DATA | raster | <span style="color:green">yes</span> | <span style="color:green">yes</span> | float | `numpy.ndarray` | `t x n x m x d` |
| MASK | raster | <span style="color:green">yes</span> | <span style="color:green">yes</span> | integer | `numpy.ndarray` | `t x n x m x d` |
| SCALAR | raster | <span style="color:green">yes</span> | <span style="color:red">no</span> | float | `numpy.ndarray` | `t x d` |
| LABEL | raster | <span style="color:green">yes</span> | <span style="color:red">no</span> | integer | `numpy.ndarray` | `t x d` |
| DATA_TIMELESS | raster | <span style="color:red">no</span> | <span style="color:green">yes</span> | float | `numpy.ndarray` | `n x m x d` |
| MASK_TIMELESS | raster | <span style="color:red">no</span> | <span style="color:green">yes</span> | integer | `numpy.ndarray` | `n x m x d` |
| SCALAR_TIMELESS | raster | <span style="color:red">no</span> | <span style="color:red">no</span> | float | `numpy.ndarray` | `d` |
| LABEL_TIMELESS | raster | <span style="color:red">no</span> | <span style="color:red">no</span> | integer | `numpy.ndarray` | `d` |
| VECTOR | vector | <span style="color:green">yes</span> | <span style="color:green">yes</span> | / | `geopandas.GeoDataFrame` | Required columns `geometry` and `TIMESTAMP` |
| VECTOR_TIMELESS | vector | <span style="color:red">no</span> | <span style="color:green">yes</span> | / | `geopandas.GeoDataFrame` | Required column `geometry` |
| META_INFO | anything | <span style="color:red">no</span> | <span style="color:red">no</span> | anything | anything | anything |
| TIMESTAMP | timestamps | <span style="color:green">yes</span> | <span style="color:red">no</span> | datetime | `list(datetime.datetime)` | `t` |
| BBOX | bounding box and CRS | <span style="color:red">no</span> | <span style="color:green">yes</span> | coordinates | `sentinelhub.BBox` | / |

Note: `t` specifies time component, `n` and `m` are spatial components (height and width), and `d` is an additional component for data with multiple channels.

Let's start by loading an existing `EOPatch` and displaying it's content (i.e. features):

In [None]:
import os
from eolearn.core import EOPatch

INPUT_FOLDER = os.path.join('.', '..', 'example_data')
INPUT_EOPATCH = os.path.join(INPUT_FOLDER, 'TutorialEOPatch')

eopatch = EOPatch.load(INPUT_EOPATCH)

# There is no need to load all data in the memory right away
# eopatch = EOPatch.load(INPUT_EOPATCH, lazy_loading=True)

eopatch

Note: `LULC` stands for [Land use / land cover](https://en.wikipedia.org/wiki/Land_cover).

There are multiple ways how to access a feature in the `EOPatch`

In [None]:
from eolearn.core import FeatureType

# bands = eopatch.get_feature(FeatureType.DATA, 'BANDS-S2-L1C')
# bands = eopatch.data['BANDS-S2-L1C']

# The most common ones are
bands = eopatch[FeatureType.DATA]['BANDS-S2-L1C']
# or
bands = eopatch[(FeatureType.DATA, 'BANDS-S2-L1C')]

type(bands), bands.shape

Vector features are handled by `geopandas`

In [None]:
eopatch[FeatureType.VECTOR]['CLOUD_MASK_VECTOR']

Special features are bounding box and timestamps

In [None]:
print(eopatch.timestamp)
print(repr(eopatch.bbox))

eopatch.bbox

Let's create a new `EOPatch` and store some features inside

In [None]:
import numpy as np

new_eopatch = EOPatch()

new_eopatch[FeatureType.DATA]['BANDS'] = eopatch[FeatureType.DATA]['BANDS-S2-L1C']

new_eopatch[FeatureType.MASK_TIMELESS]['NEW_MASK'] = np.zeros((10, 10, 13), dtype=np.uint8)

# The following wouldn't work as there are restrictions what kind of data can be stored in each feature type
# new_eopatch[FeatureType.MASK]['NEW_MASK'] = np.zeros((10, 10, 13), dtype=np.uint8)
# new_eopatch[FeatureType.DATA_TIMELESS]['NEW_MASK'] = np.zeros((10, 10, 13), dtype=np.uint8)
# new_eopatch[FeatureType.VECTOR]['NEW_MASK'] = np.zeros((10, 10, 13), dtype=np.uint8)

new_eopatch

It is also possible to delete a feature

In [None]:
del new_eopatch[FeatureType.DATA]['BANDS']

new_eopatch

We can save `EOPatch` into a local folder. In case `EOPatch` would already exist in the specified location we are also giving a permission to overwrite its features.

In [None]:
from eolearn.core import OverwritePermission

OUTPUT_FOLDER = os.path.join('.', '..', 'outputs')
if not os.path.isdir(OUTPUT_FOLDER):
    os.mkdir(OUTPUT_FOLDER)
    
new_eopatch.save(os.path.join(OUTPUT_FOLDER, 'TestEOPatch'),
                 overwrite_permission=OverwritePermission.OVERWRITE_FEATURES)

#### Plotting

`EOPatch` plotting functionalities will only work if you successfully install `eo-learn-visualization[FULL]`

In [None]:
#!pip install eo-learn-visualization[FULL]

In [None]:
import geoviews as gv

eopatch.plot((FeatureType.DATA, 'BANDS-S2-L1C')) * gv.tile_sources.EsriImagery

### EOTask

The next core object is `EOTask`, which is a single well defined operation on one or more `EOPatch` objects.

We can create a new EOTask by creating a class that inherits from the abstract `EOTask` class:

```Python
class FooTask(EOTask):
    
    def __init__(self, foo_param):
        """ Task-specific parameters
        """
        self.foo_param = foo_param
        
    def execute(self, eopatch, *, patch_specific_param):
        
       # Do what foo does on EOPatch and return it
    
        return eopatch
```

- In initialization method we define task-specific parameters.
- Each task has to implement `execute` method.
- `execute` method has to be defined in a way that:
    * Positional arguments have to be instances of `EOPatch`
    * Keyword arguments have to be anything else (i.e. `EOPatch` specific parameters)
- Otherwise the task itself can do anything.

Example of a task that adds a new feature to existing `EOPatch`:

In [None]:
from eolearn.core import EOTask


class AddFeature(EOTask):
    """Adds a feature to the given EOPatch.

    :param feature: Feature to be added
    :type feature: (FeatureType, feature_name) or FeatureType
    """
    def __init__(self, feature):
        self.feature = feature

    def execute(self, eopatch, *, data):
        """Returns the EOPatch with added features.

        :param eopatch: input EOPatch
        :type eopatch: EOPatch
        :param data: data to be added to the feature
        :type data: object
        :return: input EOPatch with the specified feature
        :rtype: EOPatch
        """
        eopatch[self.feature] = data

        return eopatch

Let's see how such task could be used

In [None]:
eopatch = EOPatch()

add_feature_task = AddFeature((FeatureType.DATA, 'NEW_BANDS'))

data = np.zeros((5, 100, 100, 13))

eopatch = add_feature_task.execute(eopatch, data=data)
# or shorter:
# eopatch = add_feature_task(eopatch, data)

eopatch

*Problem 1*:
Create a new task called `RenameFeature` which changes a name of a feature in an `EOPatch`.

In [None]:
# Solution is at the end of the notebook

The majority of `eo-learn` consists of different EOTasks implementing different operations on EO data.

The list of all EOTasks is available here https://eo-learn.readthedocs.io/en/latest/eotasks.html

## EOWorkflow

EOTasks can be connected together into acyclic processing graphs. Class `EOWorkflow` implements such functionality.

Here is a simple example of an `EOWorkflow`

Create a workflow

In [None]:
from eolearn.core import EOWorkflow
from eolearn.core import LoadFromDisk, SaveToDisk

new_feature = FeatureType.LABEL, 'NEW_LABEL'

load_task = LoadFromDisk(folder=INPUT_FOLDER)
add_feature_task = AddFeature(new_feature)
save_task = SaveToDisk(folder=OUTPUT_FOLDER, overwrite_permission=OverwritePermission.OVERWRITE_FEATURES)

# EOWorkflow is initialized by defining a graph of tasks
workflow = EOWorkflow([
    (load_task, [], 'Load EOPatch'),
    (add_feature_task, [load_task], 'Add a new feature'),
    (save_task, [add_feature_task], 'Save EOPatch')
])

# EOWorkflow is executed by specifying EOPatch related parameters
result = workflow.execute({
    load_task: {'eopatch_folder': 'TutorialEOPatch'},
    add_feature_task: {'data': np.zeros((10, 3), dtype=np.uint8)},
    save_task: {'eopatch_folder': 'WorkflowEOPatch'}
})

result

Display the dependency graph

In [None]:
%matplotlib inline

workflow.dependency_graph()

# If you specify filename, the image will be saved
# workflow.dependency_graph(os.path.join(OUTPUT_FOLDER, 'graph.png'))

For a linear workflow such as the one above we can also use the `LinearWorkflow` class

In [None]:
from eolearn.core import LinearWorkflow

workflow = LinearWorkflow(load_task, add_feature_task, save_task)

result = workflow.execute({
    load_task: {'eopatch_folder': 'TutorialEOPatch'},
    add_feature_task: {'data': np.zeros((10, 3), dtype=np.uint8)},
    save_task: {'eopatch_folder': 'OutputEOPatch2'}
})
                                    
workflow.dependency_graph()

## EOExecutor

`EOExecutor` handles execution and monitoring of EOWorkflows. It enables executing a workflow multiple times and in parallel. It monitors execution times and handles any error that might occur in the process. At the end it generates a report which contains summary of the workflow and process of execution.

Execute previously defined workflow with different arguments

In [None]:
from eolearn.core import EOExecutor

execution_args = [  # EOWorkflow will be executed for each of these 5 dictionaries:
    {
        load_task: {'eopatch_folder': 'TutorialEOPatch'},
        add_feature_task: {'data': idx * np.ones((10, 3), dtype=np.uint8)},
        save_task: {'eopatch_folder': 'ResultEOPatch{}'.format(idx)}
    } for idx in range(5)
]

executor = EOExecutor(workflow, execution_args, save_logs=True, logs_folder=OUTPUT_FOLDER)

executor.run(workers=3)  # The execution will use at most 3 parallel processes

Make the report

In [None]:
%matplotlib

executor.make_report()

print('Report was saved to location: {}'.format(executor.get_report_filename()))

## Solutions

*Solution 1*

In [None]:
class RenameFeature(EOTask):
    """Renames a feature in a given EOPatch
    """
    def __init__(self, feature, new_feature):
        self.feature = feature
        self.new_feature = new_feature

    def execute(self, eopatch):
        
        eopatch[self.new_feature] = eopatch[self.feature]
        del eopatch[self.feature]

        return eopatch