# <img align="right" src="../additional_data/banner.png" style="width:1100px;">

# Introduction to loading data

* [**Sign up to the JupyterHub**](https://www.phenocube.org/) to run this notebook interactively from your browser
* **Compatibility:** Notebook currently compatible with the Open Data Cube environments of the University of Wuerzburg
* **Products used**: s2_l2a_bavaria
* **Prerequisites**:  Users of this notebook should have a basic understanding of:
    * How to run a [Jupyter notebook](01_jupyter_introduction.ipynb)
    * The basic structure of the eo2cube [satellite datasets](02_eo2cube.ipynb)
    * How to browse through the available [products and measurements](03_products_and_measurements.ipynb) of the eo2cube datacube 

## Background

Loading data from the eo2cube requires a data query command that specifies what kind of data (what) and for which area of interest (AOI) (where) and time (when) the specified data should be loaded. Each loading query returns a multi-dimensional `xarray` object containing the contents of your query.
For the work with the data in eo2cube environment, it is essential to understand the `xarray` data structures as they are fundamental to the structure of the stored and loaded data. Any kind of further geospatial analysis builds on this `xarray` structur.

## Description

This notebook demonstrates how to load data from the eo2cube datacube by using the `dc.load()` function. It also includes the construction of a query to define the data according to desired parameters. Topics covered include:

* Loading data with the `dc.load()`function
* Interpretation of the resulting `xarray.Dataset` object
* Customising the `dc.load()`query
* Tips and tricks to simplify the data loading process

***

## Load packages

The `datacube` package is required to query the eo2cube datacube database and load the requested data. The `with_ui_cbk` function from `odc.ui` enables a progress bar when loading large amounts of data

In [1]:
import datacube
from odc.ui import with_ui_cbk

## Connection to the datacube

The usual first step is to connect the script with the datacube. As described in the script about [products and measurements](03_products_and_measurements.ipynb) an app name needs to be defined. In this case it is going to be *'04_loading_data_and_xarray'*. Also the standard configuration file needs to be defined.

In [2]:
dc = datacube.Datacube(app = '04_loading_data_and_xarray', config = '/home/datacube/.datacube.conf')

## Loading data with the `dc.load()`function

Loading data from the datacube uses the `dc.load()` function.

The function requires the following minimum arguments:

* product: The data product to load (to see all available eo2cube products, see the [Products and measurements notebook](03_products_and_measurements.ipynb)).
* x: The spatial region in the x dimension. By default, the x and y arguments accept queries in a geographical co-ordinate system WGS84, identified by the EPSG code 4326.
* y: The spatial region in the y dimension. The dimensions longitude/latitude and x/y can be used interchangeably. It is also possible to use the extent of an imported shapefile as x/y (see the [notebook for working with vectordata](XX_vectordata.ipynb)).
* time: The temporal extent. The time dimension can be specified using a tuple of datetime objects or strings in the “YYYY”, “YYYY-MM” or “YYYY-MM-DD” format.

This example loads all Sentinel-2 data from April 2020 for an area around Würzburg. The used product is defined by its name. The spatial extent is defined by lat/lon coordinates. The time intervall ist defined by the format "YYYY-MM-DD".

In [9]:
data = dc.load(product= "s2_l2a_bavaria",
               x= (9.8506165, 10.0538635),
               y= (49.7352601, 49.8335172),
               time= ("2020-04-01", "2020-04-30"))

print(data)

<xarray.Dataset>
Dimensions:          (time: 12, x: 1477, y: 1112)
Coordinates:
  * time             (time) datetime64[ns] 2020-04-01T10:26:57 ... 2020-04-29...
  * y                (y) float64 5.521e+06 5.521e+06 ... 5.51e+06 5.51e+06
  * x                (x) float64 5.612e+05 5.612e+05 ... 5.759e+05 5.759e+05
    spatial_ref      int32 25832
Data variables:
    coastal_aerosol  (time, y, x) int16 408 475 475 475 ... 12189 12189 12189
    blue             (time, y, x) int16 412 389 451 527 ... 10208 10352 10624
    green            (time, y, x) int16 760 695 703 829 ... 10120 10272 10160
    red              (time, y, x) int16 620 599 708 909 ... 10216 10256 10304
    red_edge1        (time, y, x) int16 1366 1356 1356 ... 10592 10796 10796
    red_edge2        (time, y, x) int16 3188 2814 2814 ... 10197 10532 10532
    red_edge3        (time, y, x) int16 3656 3138 3138 2742 ... 9920 10420 10420
    nir              (time, y, x) int16 3894 3363 3162 3064 ... 9952 9944 9984
    narrow_n

## Interpretation of the `xarray.Dataset`

The resulting variable returned from the `dc.load()` function is a `xarray.Dataset` which contains all data that matched our basic query. This data format stores the satellite data in an effective and easy way. For analyzing eo2cube data it is essential to understand the basic structure of an `xarray.Dataset`. Because of the importance of the `xarray`, we will devote a separate section to it later in the notebook.
First, we will take a deeper look into the custimization of the `dc.load()` query.

For going straight to the `xarray` section [click here](#xarray_intro). 

<a id='xarray_intro'></a>
XARRAY INTRO