# SCANEO


As the final step towards creating our training dataset, we need to make the data AI-Ready. There are multitude of tasks that can be performed here, such as:

- **Data cleaning**: remove corrupted images, remove images with too much cloud cover, etc.
- **Feature engineering**: calculate vegetation indices, calculate statistics, etc.
- **Data analysis**: plot time series, plot histograms, etc.
- **Labelling**: create labels for the images, etc.

For each one, feel free to use your favourite tools. Here we are going to demonstrate labelling using [SCANEO](https://github.com/earthpulse/scaneo).

SCANEO is a labelling web application that allows tagging satellite images (to identify, e.g., objects present, terrain types, etc.) in an easy and fast way. The service provided by SCANEO is vital since it is necessary to prepare the satellite data so that it can be processed by neural networks, enabling active learning.

Before running the web interface, we need to make sure we have the `scaneo` package installed in our machine and, if not, install it.


In [None]:
# !pip install scaneo

You can run `scaneo` with the following options


In [None]:
!scaneo --help

You can run `scaneo` by opening a terminal and running:

```
scaneo
```

Then, you can then access the web interface at `http://localhost:8000`.

> You can change the host and port with `scaneo --host 0.0.0.0 --port 8000`.


![scaneo](./images/scaneo2.png)


Your annotations will be stored alongside the images as GeoJSON files containig the segmentation masks as multipolygons, bounding boxes for detection tasks or classification labels.


In [None]:
!ls data/sentinel_2/*.geojson

Once your data is ready you can ingest it to EOTDL like we have seen in the previous notebook and start working with it as any other dataset in the repository.


In [None]:
text = """---
name: Boadella-LPS25
authors: 
  - Juan B. Pedro
license: free
source: https://github.com/earthpulse/eotdl/blob/develop/tutorials/workshops/lps25/04_creating.ipynb
---

# Boadella-LPS25

This is a toy dataset created during the LPS25 workshop.
"""

with open("data/sentinel_2/README.md", "w") as outfile:
    outfile.write(text)

In [None]:
from eotdl.datasets import ingest_dataset

ingest_dataset("data/sentinel_2")

If you add more images or labels to the dataset, you can re-upload and a new version will be automatically generated.


## Learn more with our use cases


There is much more on EOTDL and SCANEO for creating and labelling datasets as well as training models in the [EOTDL use cases](https://github.com/earthpulse/eotdl/tree/main/tutorials/usecases) section.


## Discussion and Contribution opportunities


Feel free to ask questions now (live or through Discord) and make suggestions for future improvements.

- What features concerning data exploration would you like to see?
- What other features concerning data download would you like to see?
- What features and tools concerning data preparation would you like to see?
- What does your typical workflow look like?
- Do you already use any labelling tool?
- What does you ideal labelling tool looks like?
