# Exercise 1: Introduction to xarray


## Aim: Learn about what xarray is and how to create and look at a `DataArray`.

### Issues Covered:
- Importing `xarray`
- Loading a dataset using `xr.open_dataset()`
- Creating a `DataArray`
- Indexing, using `.loc()`, `.isel()` and `.sel()`

## 1. Introduction to multidimensional arrays

- Unlabelled N dimensional arrays of numbers are the most widely used data structure in scientific computing
- These arrays lack meaningful metadata so users must track indices in an arbitrary fashion

<img src="../images/multidimensional_array.png" width="800"/>

Can you think of any reasons why xarray might be preferred to pandas when working with multi-dimensional data like climate models?
(Hint: how many dimensions does a pandas dataframe have?)

In [1]:
# xarray is designed to handle data with multiple dimensions
# pandas is defined for 1D (series) and 2D (dataframe) structures.
# xarray allows you to work with labelled dimensions and coorfinates.
# pandas only offers labels through 'MultiIndex'
# xarray is built on netcdf model and understands CF conventions
# panfas doesn't natively support netcdf or cf conventions
# xarray supports metadata attached to datasets
# pandas metadata support is minimal

## 2. Opening and Exploring Datasets

1. Open the `'../data/tas_rcp45_2055_mon_avg_change.nc'` dataset and load it into an xarray `Dataset` called `ds`.
(Hint: Don't forget to import any packages you need).
This file is a model run for HadCM3 run as part of the RAPID study: https://catalogue.ceda.ac.uk/uuid/6bbab8394124b252f8b1b036f9eb6b6b/

In [2]:
import xarray as xr
ds = xr.open_dataset('../data/xbhubo.pgc0apr.nc')

2. Look at the parameters of the dataset.

In [3]:
ds

3. What are the dimensions and variables in this dataset? What does each represent? 

In [4]:
# There are four data variables, temp, salinity, ucurr and vcurr.
# t (time) and depth are dimensiones used by all of these variables.
# temp and salinity have a a different grid to ucurr and vcurr. This is why there is longitude and longitude_1, which are dimensions which refer to differnt variables.

4. Find the name of the Data Variable, and use it to extract a `DataArray` called `temperature`.

In [5]:
# temp = sea surface temperature.
temperature = ds["temp"]

5. Take a look at the `temperature` data array and inspect its dimensions, coordinates and attributes. What are the specific dimensions and coordinates associated with it? What metadata (attributes) is provided?

In [6]:
# It is a 3D array with dimensions (time, lat, lon)
temperature

## 3. Label-based indexing

6. Select a subset of the `temperature` array using label-based indexing to get data at the position [0,0,0].

In [7]:
temperature[0,0,0,0]

7. Use `.loc` to find the temperature 5 meters below the sea surface in south atlantic where latitiude is -50.625 and longitude is 0.

In [8]:
temperature.loc[:, 5, -50.625, 0]

8. It's not ideal to have to keep track of which dimension is in which position. Instead, use `.isel` to use the dimension names to get the data in the same place: this is depth position 0, latitude position 31 and longitude position 0.

In [9]:
temperature.isel(depth=0, latitude=31, longitude=0)

9. The previous method is still referring to a selection by integer position. Use `.sel` to give a labelled index with the named dimension to find the data at `time=2065-12-30`, `lat=-78.5`, `lon=11.0`.

In [10]:
temperature.sel(depth=5, latitude=-50.625, longitude=0)