# <font color='green' size='8'> Data Analaysis & Manipulation with Xarray </span>
Shanice Bailey

## Overview
> Xarray is a num-focus sponsored project with an active open-source development community around the world, and a Python package that enables one to efficiently analyze and manipulate labelled, multi-dimensional data. Xarray can be used for small and large data, and is applicable in many scientific fields, such as, natural sciences, neuroscience, astrophysics, biomedical engineering, etc. This curriculum serves as an introduction to the fundamentals of Xarray and an endeavor to exploit the potential of its applicability to your data! This curriculum borrows heavily from Xarray's documentation which can be found [here](http://xarray.pydata.org/en/stable/).

## Lesson: Xarray Fundamentals
> <font color='Blue' size='4'> 1. DataArray/Dataset Creation <br>
2. Basic Indexing and Interpolation <br>
3. Basic Computation <br>
4. Basic Plotting 
    </font>

## Prerequisites
> 1. Basic Python (familiar with Pandas and Numpy) <br>

## <font color='blue'><font size='6'> <center> 1. DataArray / Dataset Creation </center> </font>

## <font size='6'> Goals:
> * Create xarray `DataArrays` and `DataSets` out of raw numpy arrays
> * Create xarray objects with and without indexes
> * Load xarray datasets from netCDF files and openDAP servers
> * View and set attributes    
</font>

`DataArrays` and `Datasets` in Xarray are similar to the idea of `Series` and `Dataframe` in Pandas. DataArrays are labelled, multi-dimensional arrays with the following key properties:
- `values`: a numpy.ndarray holding the array’s values
- `dims`: dimension names for each axis (e.g., `('x', 'y', 'z')`)
- `coords`: a dict-like container of arrays (_coordinates_) that label each point (e.g., 1-dimensional arrays of numbers, datetime objects or strings)
- `attrs` : dict to hold arbitrary metadata (_attributes_)

Dimensions provide names that xarray uses instead of the axis argument found in many numpy functions. Coordinates enable fast label-based indexing and alignment, building on the functionality of the index found on a pandas DataFrame or Series. DataArray objects also can have a name and can hold arbitrary metadata in the form of their attrs property.

## Create a DataArray

In [3]:
import xarray as xr
xr.set_options(display_style='text')

<xarray.core.options.set_options at 0x7f0cfa7c9c50>

In [4]:
da = xr.DataArray([9,0,2,1,0], dims=['x'], coords={'x':[10,20,30,40,50]})
da

> <font size='4'>**Explore the properties of your DataArray by typing `da.print()`, `da.data`, `da.dims`, `da.coords()` and `da.indexes`.**

<font size='4'>Now let's create a DataArray with real data. Download the npz Argo float file using this link: http://www.ldeo.columbia.edu/~rpa/argo_float_4901412.npz. </font>
```
! wget http://www.ldeo.columbia.edu/~rpa/argo_float_4901412.npz
    ```

In [7]:
import numpy as np

In [10]:
#Load in argo data
argo_data = np.load('argo_float_4901412.npz')

#Read the data keys
list(argo_data.keys())

['S', 'T', 'levels', 'lon', 'date', 'P', 'lat']

In [13]:
# Create arrays with the keys' values
S = argo_data.f.S
T = argo_data.f.T
P = argo_data.f.P
level = argo_data.f.levels
lon = argo_data.f.lon
lat = argo_data.f.lat
date = argo_data.f.date

#print the shape of the vars
print(S.shape, lon.shape, date.shape)

(78, 75) (75,) (75,)


In [14]:
#create data array for salinity variable that has 2 dimensions 
#(we can name dims whatever we want, but best practice is to label dims with practical names)
da_salinity = xr.DataArray(S, dims=['level', 'date'], coords={'level':level, 'date':date})

da_salinity

In [16]:
#set some attributes, like units
da_salinity.attrs['units'] = 'PSU'
da_salinity.attrs['standard_name'] = 'sea_water_salinity'
da_salinity

## Create a DataSet
We will not create a dataset from the Argo float data

In [17]:
ds = xr.Dataset(data_vars={'salinity': (('level', 'date'), S), 
                           'temperature': (('level', 'date'), T),
                           'pressure': (('level', 'date'), P)},
                coords={'level': level, 'date': date})

# dimensions are ('level', 'date') and data values are S, T, P. So different values share same dimensions here
# the arrays have to share the same coords but can have different dims

ds

> <font size='3'> **You can look at individual variables (var) and explore their properties using the `ds.<var>` or `ds['<var>']` syntax.** </font>

## <font color='blue'><font size='6'> <center> 2. Basic Indexing and Interpolation </center> </font>

## <font color='blue'><font size='6'> <center> 3. Basic Computation </center> </font>

## <font color='blue'><font size='6'> <center> 4. Basic Plotting </center> </font>

## <font color='blue'> 