In [39]:
import sys
sys.path.append("../../") 

# Loading Rainfall Data near Lake Chad  


<br>  

## Creating the datacube object 


The following code connects to the datacube and accepts `chad_rainfall` as an app-name.

In [40]:
import datacube
dc = datacube.Datacube(app = "chad_rainfall", config = '/home/localuser/.datacube.conf') 

This object is the main interface to your stored and ingested data. It can handle complicated things like reprojecting data with varying resolutions and orientations. It can also be used to explore existing datasets. In this notebook, it is only used for loading data from the datacube

<br>  

##  Loading GPM data

A small dataset is easier to work with than the entirety of Lake Chad. The region we're about to load contains GPM measurements for a small area of Lake Chad near the mouth of its largest contributing river. The code below displays the bounds of the region but doesn't load it. 

In [41]:
latitude_bounds = (12.75, 13.0)
longitude_bounds = (14.25, 14.5)

In [42]:
from utils.notebooks.display_map import display_map
display_map(latitude = latitude_bounds, longitude = longitude_bounds)

<br>  

# Specify Parameters for the Load  

In [43]:
## Define Geographic boundaries using a (min,max) tuple.
latitude_bounds = (12.75, 13.0)
longitude_bounds = (14.25, 14.5)

## Specify a date range using a (min,max) tuple  
from datetime import datetime
time = (datetime(2015,1,1), datetime(2016,1,2))

<br>  

It's simple to intuit what the **latitude**,**longitude** and **time** bounds will get you. It will give you a bounded and grided-dataset containing our rainy season. Each square in the diagram below represents the smallest spatial unit in our imagery. This smallest unit is often reffered to as a **pixel**.
  
<br>  

![img](./diagrams/rainy2.png)


In [44]:
## define the name you gave your data while it was being indexed or ingested, as well as the platform it was captured on. 
product = 'gpm_imerg_gis_daily_global'
platform = 'GPM'  

While defining space and time bounds are simple to understand, it may be more complicated to pick up on what **product** and **platform** are. These names are defined when data is brought into the system.

Platform will tell you how/where the data is produced. Product is a key used to chose what representation of that platform's data you wish to index.  

For the sake of this tutorial, think of **product** and **platform** as shorthand names we used to look up data that is:  

- produced on a **GPM** platform.
  
- represented using **gpm_imerg_gis_daily_global** settings or types.
   
The representation reflects personal prefferences that define, for example, the resolution of each pixel, how pixels are sampled, and the sort of geometric projections this grid of pixels undergoes to assume its current shape. Scripts for adding more general product types/representations are avaiable [here](), but aren't necessary to understanding this stage of the tutorial.  

## Loading the data  

In [45]:
#Load Percipitation data using parameters,
gpm_data = dc.load(latitude = latitude_bounds, longitude = longitude_bounds, time = time, product = product, platform = platform)

The code above should have loaded an [xarray]() containing your GPM data.  An xarray data-structure is essentially a wrapper for high-dimensional data-arrays. One of its main uses is the coupling of different data with a shared set of coordinates.
  
Conceptually, you can imagine GPM's xarray looking like this:  

<br>

![img](diagrams/gpm_01.png)  
  
<br>

Each latitude-longitude coordinate pair will have,  `total_precipitation`, `liquid_precipitation`, `ice_precipitation` and `percent_liquid` measurements taken for them.  

An Xarray Dataset will store each of these measurements in separate grids and share a single set of coordinates among all measurements.  

To get some detailed information about an xarray, a `print()` statement will give you a readout on its structure.  
<br>

In [46]:
print( gpm_data )

<xarray.Dataset>
Dimensions:               (latitude: 3, longitude: 3, time: 366)
Coordinates:
  * time                  (time) datetime64[ns] 2015-01-01T11:59:59.500000 ...
  * latitude              (latitude) float64 12.95 12.85 12.75
  * longitude             (longitude) float64 14.25 14.35 14.45
Data variables:
    total_precipitation   (time, latitude, longitude) int32 0 0 0 0 0 0 0 0 ...
    liquid_precipitation  (time, latitude, longitude) int32 0 0 0 0 0 0 0 0 ...
    ice_precipitation     (time, latitude, longitude) int32 0 0 0 0 0 0 0 0 ...
    percent_liquid        (time, latitude, longitude) uint8 15 15 15 15 15 ...
Attributes:
    crs:      EPSG:4326


<br>
Using the readout above we can quickly gain a summary of the dataset we just loaded by examining:  

- **Coordinates**  
a readout of individual coordinates. In this case, it's three lists of `latitude`, `longitude`, and `time` values  

- **Dimensions**  
a readout of how large each dimension is. In this case, we've loaded in a 3 by 3 area of land having anywhere from 300-400 acquisitions between 2015-2016.      
- **Data Variables**  
a readout of what sort of data is stored. In this case, each `latitude`, `longitude`, and `time` point will store four types of data. One for each `total_precipitation`, `liquid_precipitation`, `ice_precipitation`,`percent_liquid` variable.  

- **Data Size**  
Each entry has a size/type associated.  IE. `int32`, `int8`, `float64`.  You can use this to profile the memory footprint of your object. 

