# Exercise 1: Introduction to xarray


## Aim: Learn about what xarray is and how to create and look at a `DataArray`.

### Issues Covered:
- Importing `xarray`
- Loading a dataset using `xr.open_dataset()`
- Creating a `DataArray`
- Indexing, using `.loc()`, `.isel()` and `.sel()`

## 1. Introduction to multidimensional arrays

- Unlabelled N dimensional arrays of numbers are the most widely used data structure in scientific computing
- These arrays lack meaningful metadata so users must track indices in an arbitrary fashion

<img src="../images/multidimensional_array.png" width="800"/>

Can you think of any reasons why xarray might be preferred to pandas when working with multi-dimensional data like climate models?
(Hint: how many dimensions does a pandas dataframe have?)

In [67]:
# xarray is designed to handle data with multiple dimensions
# pandas is defined for 1D (series) and 2D (dataframe) structures.
# xarray allows you to work with labelled dimensions and coorfinates.
# pandas only offers labels through 'MultiIndex'
# xarray is built on netcdf model and understands CF conventions
# panfas doesn't natively support netcdf or cf conventions
# xarray supports metadata attached to datasets
# pandas metadata support is minimal

## 2. xarray architecture

1. Open the `'../data/tas_rcp45_2055_mon_avg_change.nc'` dataset and load it into an xarray `Dataset` called `ds`.
(Hint: Don't forget to import any packages you need)

In [6]:
import xarray as xr
ds = xr.open_dataset('../data/tas_rcp45_2055_mon_avg_change.nc')

2. Look at the parameters of the dataset.

In [7]:
ds

3. What are the dimensions and variables in this dataset? What does each represent? 

In [35]:
# The dimensions are `time`, `lat` and `lon`. 
# The primary variable is `tas` representing the temperature anomaly across these dimensions.

4. Find the name of the Data Variable, and use it to extract a `DataArray` called `temperature`.

In [28]:
# tas = temperature anomomaly
temperature = ds["tas"]

5. Take a look at the `temperature` data array and inspect its dimensions, coordinates and attributes. What are the specific dimensions and coordinates associated with it? What metadata (attributes) is provided?

In [36]:
# It is a 3D array with dimensions (time, lat, lon)
temperature

## 3. Label-based indexing

6. Select a subset of the `temperature` array using label-based indexing to get data at the position [0,0,0].

In [31]:
temperature[0,0,0]

7. Use `.loc` to find the lat and lon values at the time `2065-01-30`. 

In [54]:
temperature.loc['2065-01-30', :, :]

8. It's not ideal to have to keep track of which dimension is in which position. Instead, use `.isel` to use the dimension names to get the data in the first time, lat and lon position.

In [56]:
temperature.isel(time=0, lat=0, lon=0)

9. The previous method is still referring to a selection by integer position. Use `.sel` to give a labelled index with the named dimension to find the data at `time=2065-12-30`, `lat=-78.5`, `lon=11.0`.

In [65]:
temperature.sel(time='2065-12-30', lat=-78.5, lon=11.0)