# EuroSAT training set @JEODPP

- This notebook demostrates how the user can have access to the **EuroSAT** collection stored at the JEODPP EOS storage system.
- More specifically, it contains different ways of retrieving the **EuroSAT** images (input images and masks containing the class labels) which can be used as training data.

**For more information:** 

- @GitLab: https://jeodpp.jrc.ec.europa.eu/apps/gitlab/jeodpp-services/training-sets-for-earth-observation-applications/-/wikis/home
- @Connected: https://connected.cnect.cec.eu.int/groups/bigdataeoss 
- @Internet: https://jeodpp.jrc.ec.europa.eu/home/

**Contacts:**  jrc-jeodpp@ec.europa.eu

**Source data:** https://github.com/phelber/eurosat

For details about EuroSAT, please refer to the papers:  
    _- Eurosat: A novel dataset and deep learning benchmark for land use and land cover classification. Patrick Helber, Benjamin Bischke, Andreas Dengel, Damian Borth. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 2019._  
    _- Introducing EuroSAT: A Novel Dataset and Deep Learning Benchmark for Land Use and Land Cover Classification. Patrick Helber, Benjamin Bischke, Andreas Dengel. 2018 IEEE International Geoscience and Remote Sensing Symposium, 2018._

<img src="https://cidportal.jrc.ec.europa.eu/services/shared/html/JRClogo2.png" width="200" height="200" /> <img src="https://cidportal.jrc.ec.europa.eu/services/shared/html/JRCBigDataPlatform_512.png" width="200" height="200" /> 

In [None]:
import numpy as np
import os, fnmatch, urllib.request
import pandas as pd
import json 
import matplotlib.pyplot as plt

In [None]:
# Please, download the scripts Query.py and gdalRead.py into the working directory
from Query import Query
from gdalRead import gdalRead

In [None]:
%%HTML
<style type="text/css">
table.dataframe td, table.dataframe th {
    border: 1px  black solid !important;
  color: black !important;
}
</style>

In [None]:
mainfolder = 'https://jeodpp.jrc.ec.europa.eu/ftp/public/MachineLearning/SatImNet'
collection = 'EuroSAT'

# Get info for the dataset

In [None]:
# Read general info for all the datasets
df = pd.read_json(os.path.join(mainfolder, 'Table.json'))
cols = list(df.columns)
cols.remove('Feature')
df = df[['Feature']+cols]
df

In [None]:
# Read specific info for EuroSAT
pd.set_option('display.max_colwidth', 200)
try: df.set_index('Feature', inplace=True);
except: pass
df[[collection]]

# Read the structure of the EuroSAT dataset

In [None]:
with urllib.request.urlopen(os.path.join(mainfolder, collection, 'content_public.json')) as f:
    content = json.loads(f.read().decode())

# Get class notation

In [None]:
classes = content['classes']
classes

# Search for images according to some criteria

In [None]:
# Use the string 'path' as 3rd argument in case you would like to retrieve the file paths only.
# Search for jpg files and class: 'PermanentCrop'
query = Query(content['tree'], 
               {'genre': 'jpg', 'class': ['PermanentCrop']}, 'path')
query

In [None]:
# Use the string 'path' as 3rd argument in case you would like to retrieve the file paths only.
# Search for files having specific number of bands
query = Query(content['tree'], 
               {'type': 'file', 'metainfo_numofbands': 13})
query

# Read the content of an image file

In [None]:
# Read a mask containing class labels
infile = '/vsizip//vsicurl/https://jeodpp.jrc.ec.europa.eu/ftp/public/MachineLearning/SatImNet/EuroSAT/rgb/images/PermanentCrop.zip/PermanentCrop_1005.jpg'
InfoMask, Mask = gdalRead(infile)
InfoMask

In [None]:
# Display images
fig, axarr = plt.subplots(1, 1, figsize=(6, 6))
axarr.axis('off')
axarr.imshow(Mask)
plt.tight_layout(h_pad=0.1, w_pad=0.1)
plt.show()

# Imageshow

In [None]:
# Use the string 'path' as 3rd argument in case you would like to retrieve the file paths only.
query1 = Query(content['tree'], 
               {'genre': 'jpg', 'class': ['Industrial']}, 'path')
query2 = Query(content['tree'], 
               {'genre': 'jpg', 'class': ['Residential']}, 'path')

In [None]:
from IPython.display import display, clear_output
f, axarr = plt.subplots(2, 5)
f.set_size_inches(16, 8)
val = 0
for idx in np.arange(0, min(len(query1), len(query2)), 5):
    pos = 0
    for q in range(2):
        for p in np.arange(5):
            if q == 0:
                _, I = gdalRead(query1[idx+pos])
                axarr[q,p].set_title(os.path.basename(os.path.dirname(query1[idx+pos])).replace('.zip',''))
            else:
                _, I = gdalRead(query2[idx+pos])
                axarr[q,p].set_title(os.path.basename(os.path.dirname(query2[idx+pos])).replace('.zip',''))
            axarr[q,p].axis('off')
            axarr[q,p].imshow(I)
            pos += 1
    display(f)
    if val != 'p':
        val = input("Press Enter to continue...")
        if val == 'x':
            clear_output(wait = True)
            break
    clear_output(wait = True)

> **In order to read many images via the _vsicurl_ driver, please use the command _gdal.VSICurlClearCache()_ after every _gdalRead_ command.**