<img align="right" src="../../additional_data/banner_siegel.png" style="width:1100px;">

# Advanced xArray

* [**Sign up to the JupyterHub**](https://www.phenocube.org/) to run this notebook interactively from your browser
* **Compatibility:** Notebook currently compatible with the Open Data Cube environments of the University of Wuerzburg
* **Products used**: 
* **Prerequisites**:  Users of this notebook should have a basic understanding of:
    * How to run a [Jupyter notebook](01_jupyter_introduction.ipynb)
    * The basic structure of the eo2cube [satellite datasets](02_eo2cube.ipynb)
    * How to browse through the available [products and measurements](03_products_and_measurements.ipynb) of the eo2cube datacube 
    * How to [load data from the eo2cube datacube](04_loading_data_and_basic_xarray.ipynb) 

## Background

The Python library `xarray` simplifies working with labelled multi-dimension arrays. The library introduces labels in the forms of dimensions, coordinates and attributes on top of `numpy` arrays. This structure allows easier and more effective handling of remote sensing raster data in a Python environment. Therefore, it is essential to fully understand the structure of an `xarray`. A first introduction into the usage of `xarray` within the eo2cube environment was given in ["04_loading_data_and_basic_xarray"](04_loading_data_and_basic_xarray.ipynb). This notebook builds on this gained knowledge and attempts to give a deeper understanding of the `xarray` data structure of raster data. Since the `xarray.Dataset` within the datacube environment is specialised for the use of remote sensing raster data, it differs slightly from the original `xarray` library. However, if you are interested in learning more about the basic structures of the original `xarray`, have a look at this [**"introduction to xarray" notebook**](intro_to_xarray.ipynb) within the "intro_to_python" directory.
To get more information about the `xarray` package, visit the [offical documentation website](http://xarray.pydata.org/en/stable/).

## Description

This notebook introduces users to the `xarray` library within the datacube environment. It aims to deepen the understanding of the `xarray` structure as a container for remote sensing raster data. Also it introduces useful `xarray` functions to effectivly work with raster data in the eo2cube environment. Within this notebook the following topics are covered:

* Definition of the `xarray.Dataset` structure for eo2cube remote sensing data
* Access of `xarray.Dataset` dimensions, measurements and metadata
* Indexing and slicing of `xarray.Dataset`
* Application of built-in `xarray` functions for analyzing raster data

***

## Load packages

The `datacube` package is required to query the eo2cube datacube database and load the requested data. The `with_ui_cbk` function from `odc.ui` enables a progress bar when loading large amounts of data. The `xarray` and `numpy` package are needed for the different methods and analysis steps within this notebook. 

In [2]:
import datacube
from odc.ui import with_ui_cbk
import xarray as xr
import numpy as np

## Datacube connection and load data

First we connect to the datacube and load a dumy dataset from the eo2cube. For this we will use the `s2_l2a_bavaria` product. An area around Würzburg is loaded for April 2020. For more information about how to use the `dc. load()` function, check out [notebook 04](04_loading_data_and_basic_xarray.ipynb).

In [3]:
dc = datacube.Datacube(app = '05_advanced_xarray', config = '/home/datacube/.datacube.conf')

In [4]:
data = dc.load(product= "s2_l2a_bavaria",
               measurements= ["blue", "green", "red"],
               x= (9.8506165, 11.273325),
               y= (49.7352601, 50.191334),
               time= ("2020-04-01", "2020-04-07"),
               group_by = "solar_day",
               progress_cbk=with_ui_cbk())

data

VBox(children=(HBox(children=(Label(value=''), Label(value='')), layout=Layout(justify_content='space-between'…

## `xarray.Dataset` structure

The variable `data` is an `xarray.Dataset`. A `xarray.Dataset` is basically a dictionary structure or data container that packs the raster dataset into "dimensions", "coordinates", "data variables" and "attributes".

The "dimensions" represent the absolute dimensions of the data, i.e. the amount of time steps (how many different scenes are available in the dataset) and the absoulte pixel number in x and y (lon and lat) direction of the dataset (how many pixels exist in x and y (lon and lat) direction).

The "coordinates" represent the actual values of the dimensions. These are stored in multidimensional `xarray.DataArrays`. In their core, `xarray.DataArrays` are a build on raw `numpy` arrays. To see a preview of the contained values click the database symbol on the right of the `xarray.DataArray`. [The section below](#index_xarray) presents how to index and actually work with the values of different `xarray.DataArrays` of a dataset.


In [9]:
data.dims

Frozen(SortedKeysDict({'time': 4, 'y': 5284, 'x': 10310}))

<a id='index_xarray'></a>
Index xarray

## Recommended next steps

To continue with the beginner's guide, the following notebooks are designed to be worked through in the following order:

1. [Jupyter Notebooks](01_jupyter_introduction.ipynb)
2. [eo2cube](02_eo2cube.ipynb)
3. [Products and Measurements](03_products_and_measurements.ipynb)
4. [Loading data and introduction to xarrays](04_loading_data_and_basic_xarray.ipynb)
5. **Advanced xarrays operations (this notebook)**
6. [Plotting data](06_plotting.ipynb)
7. [Basic analysis of remote sensing data](07_basic_analysis.ipynb)
8. [Parallel processing with Dask](08_parallel_processing_with_dask.ipynb)

***

## Additional information

<font size="2">This notebook for the usage in the Open Data Cube entities of the [Department of Remote Sensing](http://remote-sensing.org/), [University of Wuerzburg](https://www.uni-wuerzburg.de/startseite/), is adapted from [Geoscience Australia](https://github.com/GeoscienceAustralia/dea-notebooks), published using the Apache License, Version 2.0. Thanks! </font>

**License:** The code in this notebook is licensed under the [Apache License, Version 2.0](https://www.apache.org/licenses/LICENSE-2.0). 
Digital Earth Australia data is licensed under the [Creative Commons by Attribution 4.0](https://creativecommons.org/licenses/by/4.0/) license.


**Contact:** If you would like to report an issue with this notebook, you can file one on [Github](https://github.com).

**Last modified:** February 2021