# Chapter 4a - Python Tools: xarray

This chapter, divided in two, will cover two libraries that are esential to satellite data analysis and visualization: __xarray__ and __matplotlib__. In Chapter 4a we will cover the basics of xarray with examples, and in Chapter 4b we will make customized visualization of data with matplotlib.

Although we show examples here, we invite you to edit the cells and rerun them to better grasp their use.
    
***
</font><img src='./figures/xarray_logo.png'>
## <font color=#d55121>xarray</font>   
    
It is an open source `Python` library designed to deal (read, write, analyze, visualize, etc) with sets of labelled multi-dimensional arrays and metadata common in the Earth sciences. Its data structure, the __Dataset__, is built to reflect a netcdf file, and as it was built on top of the library <font color=#31909f>__Pandas__</font>, which processes label tabular data, it inherits several of its methods and functiality.

For this reason, when importing __xarray__, we will also import __numpy__ and __pandas__, then acquiring all their methods. <font color=#d55121>__Test this:__</font> Run the next cell to import these libraries, with ther conventional nickname - although feel free to choose yours. Note that when you run an importing cell, no output is displayed other than a number in a [ ] on the left side of the cell.


In [None]:
import numpy as np
import pandas as pd
import xarray as xr

# this library helps to make your code execution less messy
import warnings
warnings.simplefilter('ignore') # filter some warning messages

### <font color=#184d68> Reading and exploring data sets
    
<font color=#d55121>__Run the next cell:__</font>  Let's start by reading and exploring the content of a netcdf file located locally. __It is so easy!__

Once the content is displayed, you can click on the file and disk icons on the left to get more details on each parameter.

Also note that the data <font color=#31909f>__variable__</font> (_sst_) has 3 <font color=#31909f>__dimensions__</font> (_latitude, longitude and time_), and that each dimension has a <font color=#31909f>__coordinate__</font> data variable associate it with it. Variables as well as the file has metadata denominated attributes.

In [None]:
ds = xr.open_dataset('./data/HadISST_sst_2000-2020.nc') # read a netcdf
ds.close() # close the file, so can be use by you or others. it is good practice.
ds  # display the content of the dataset object

__xarray__ can also read data online. We are going to learn hoow read data from the cloud in the application chapters, but for now, we will exemplify xarray and python capablity of readnig from an online file.  <font color=#d55121>__Run the next cell.__</font>  

In [None]:
# assign a string variable with the url of the datafile
url = 'https://podaac-opendap.jpl.nasa.gov/opendap/allData/ghrsst/data/GDS2/L4/GLOB/CMC/CMC0.2deg/v2/2011/305/20111101120000-CMC-L4_GHRSST-SSTfnd-CMC0.2deg-GLOB-v02.0-fv02.0.nc'
ds_sst = xr.open_dataset(url) # reads it and display it the same way it does local files
ds_sst

### <font color=#184d68> Visualizing data
    
An image worth thousand _attributes_. Sometimes what we need is a quick visualization of our data, and __xrray__ is there to help. In the next cells, visualization for both open datasets are shown. __Yes! it is that easy!__ (We'll get more sofisticated in the Chapter 4b).

In [None]:
ds_sst.analysed_sst.plot() # note that we needed to choose one of the variable to be displayed

In [None]:
ds.sst[0,:,:].plot() # in addition to choose the variable, we choose a time to visualize the spatial data at that time

### <font color=#184d68>Some basic methods of dataset</font>
   
__xarray__ also let you operate over the dataset on simple ways. Many operations are build as methods of the dataset class that can be accessed by adding a . after the dataset. <font color=#d55121>__Test this:__</font> In the next cell, we access the average method to make a time series of sea surface temperature over the entire globe, and plot it. __In one line!__

In [None]:
ds.sst.mean(dim=['latitude','longitude']).plot() # in this line we select a variable, average over spatial dimensions 

### <font color=#184d68>Selecting data</font>

Sometimes we want to visualize or operate only on a portion of the data. In the next we exemplify the method __.sel__, which select along a dimension, in this case specified as a _slice_ of the coordinates data.

In [None]:
ds.sst.sel(time=slice('2012-01-01','2013-12-31')).mean(dim=['time']).plot() # here we select a period of time

In [None]:
ds.sst.sel(latitude=slice(50,-50)).mean(dim=['time']).plot() # here, we select a range of latitudes. note that we need to go from 50 to -50 as the coordinate data goes from 90 to -90

Another useful way to select data is the method __.where__, which instead of selecting by a coordinate, selects by a condition over the data or the coordinates. <font color=#d55121>__Test this:__</font> In the next cell we extract the _ocean mask_ contaiined in the NASA surface temperature data.

In [None]:
ds_sst.analysed_sst.where(ds_sst.mask==1).plot() # In this line we select, with .where, the data in the variable 'mask' that is equal to 1, 
# applied it to the variable 'analysed_sst', and plot the data.  Try changing the value for mask - for example 2 is land, 8 is ice.

### <font color=#184d68>Operating over and between datasets or data arrays
    
There are two main ways to operate, we already see how to operate over the same variable using a method. In the next example we compare two years of temperature. For that, we average over a year, and then substract another annual average. That simple!

In [None]:
# comparing 2015 and 2012 sea surface temperatures
(ds.sst.sel(time=slice('2015-01-01','2015-12-31')).mean(dim=['time'])
-ds.sst.sel(time=slice('2012-01-01','2012-12-31')).mean(dim=['time'])).plot() # note that in this case i could split the liine in two
# makes it easier to read

We will cover more examples of methods and operations over datasets in the following chapters. But if you want to learn more, and we recommend it, given the many awesome capabilitiies of xarray, please look at the __Resources__ section below. 

***

### <font color=#184d68>Saving your datasets</font>
There is one more thing you should learn here. In case you want to save one of the intermediate steps done here and in following chapters, so there is no need to obtained or reprocess data in the future, <font color=#d55121>__the next cell__</font> shows you how in two simple steps:

- Assign the outcome of an operation to a variable, which will be a new dataset object
- Save it to a new netcdf file

In [None]:
# same operation as before, minus the plotting method
my_ds = (ds.sst.sel(time=slice('2015-01-01','2015-12-31')).mean(dim=['time'])-ds.sst.sel(time=slice('2012-01-01','2012-12-31')).mean(dim=['time']))
# save the new dataset `my_ds` to a file in the directory data
my_ds.to_netcdf('./data/Global_SST_2015-2012.nc')
# explore the content of `my_ds`. Note that the time dimension does not existe anymore
my_ds

*** 

## <font color=#d55121>Resources</font> 

The __xarray__ official site: [http://xarray.pydata.org/en/stable/](http://xarray.pydata.org/en/stable/)

Great intro to __xarray__ capabilities: [https://www.youtube.com/watch?v=Dgr_d8iEWk4&t=908s](https://www.youtube.com/watch?v=Dgr_d8iEWk4&t=908s)

If you really want to dig deep: [https://www.youtube.com/watch?v=ww4EYv20Ucw](https://www.youtube.com/watch?v=ww4EYv20Ucw)

A step-by-step guide to __xarray__ handling of netcdf files, and many of the methods seeing here, like .sel and .where: [https://rabernat.github.io/research_computing_2018/xarray.html](https://rabernat.github.io/research_computing_2018/xarray.html)

Sometimes, the best way to learn how to do something is go directly to the reference page for a function or method. There you can see what arguments, types of data, and outputs to expect. Most oof the time, they have useful examples.

__.where()__: [http://xarray.pydata.org/en/stable/generated/xarray.DataArray.where.html](http://xarray.pydata.org/en/stable/generated/xarray.DataArray.where.html)

__.sel()__: [http://xarray.pydata.org/en/stable/generated/xarray.DataArray.sel.html](http://xarray.pydata.org/en/stable/generated/xarray.DataArray.sel.html)

__.mean()__: [http://xarray.pydata.org/en/stable/generated/xarray.DataArray.mean.html](http://xarray.pydata.org/en/stable/generated/xarray.DataArray.mean.html)
