# Exercise 1a: Introduction to xarray


## Aim: Learn about what xarray is and how to create and look at a `DataArray`.

Find the teaching resources here: https://tutorial.xarray.dev/fundamentals/01_data_structures.html and https://tutorial.xarray.dev/fundamentals/01_datastructures.html.

### Issues Covered:
- Importing `xarray`
- Loading a dataset using `xarray.open_dataset()`
- Creating a `DataArray`

## 1. Introduction to multidimensional arrays

- Unlabelled N dimensional arrays of numbers are the most widely used data structure in scientific computing
- These arrays lack meaningful metadata so users must track indices in an arbitrary fashion

<img src="../images/multidimensional_array.png" width="800"/>

Q1. Can you think of any reasons why xarray might be preferred to pandas when working with multi-dimensional data like climate models?
(Hint: how many dimensions does a pandas dataframe have?)

## 2. Opening and Exploring Datasets

Q2. Open the `'../data/xbhubo.pgc0apr.nc'` dataset and load it into an xarray `Dataset` called `ds`.
(Hint: Don't forget to import any packages you need).
This file is a model run for HadCM3 run as part of the RAPID study: https://catalogue.ceda.ac.uk/uuid/6bbab8394124b252f8b1b036f9eb6b6b/

In [1]:
import xarray as xr

ds = xr.open_dataset('../data/xbhubo.pgc0apr.nc')
ds

Q3. Look at the parameters of the dataset.

Q4. What are the dimensions and variables in this dataset? What does each represent? 

In [13]:
ds.dims



In [14]:
ds.var

<bound method DatasetAggregations.var of <xarray.Dataset> Size: 13MB
Dimensions:      (longitude: 288, latitude: 144, depth: 20, t: 1,
                  longitude_1: 288, latitude_1: 143)
Coordinates:
  * longitude    (longitude) float32 1kB 0.0 1.25 2.5 3.75 ... 356.2 357.5 358.8
  * latitude     (latitude) float32 576B -89.38 -88.12 -86.88 ... 88.12 89.38
  * depth        (depth) float32 80B 5.0 15.0 25.0 ... 4.577e+03 5.192e+03
  * t            (t) object 8B 1920-04-16 00:00:00
  * longitude_1  (longitude_1) float32 1kB 0.625 1.875 3.125 ... 358.1 359.4
  * latitude_1   (latitude_1) float32 572B -88.75 -87.5 -86.25 ... 87.5 88.75
Data variables:
    temp         (t, depth, latitude, longitude) float32 3MB ...
    salinity     (t, depth, latitude, longitude) float32 3MB ...
    ucurr        (t, depth, latitude_1, longitude_1) float32 3MB ...
    vcurr        (t, depth, latitude_1, longitude_1) float32 3MB ...
Attributes:
    history:      Tue Sep 12 11:49:35 BST 2006 - CONVSH V1.91 1

Q5. Find the name of the temperature data variable, and use it to extract a `DataArray` called `temperature`.

In [16]:
temperature = ds["temp"]
print(temperature)

<xarray.DataArray 'temp' (t: 1, depth: 20, latitude: 144, longitude: 288)> Size: 3MB
[829440 values with dtype=float32]
Coordinates:
  * longitude  (longitude) float32 1kB 0.0 1.25 2.5 3.75 ... 356.2 357.5 358.8
  * latitude   (latitude) float32 576B -89.38 -88.12 -86.88 ... 88.12 89.38
  * depth      (depth) float32 80B 5.0 15.0 25.0 ... 4.577e+03 5.192e+03
  * t          (t) object 8B 1920-04-16 00:00:00
Attributes:
    source:     Unified Model Output:
    name:       temp
    title:      POTENTIAL TEMPERATURE (OCEAN)  DEG.C
    date:       01/12/99
    time:       00:00
    long_name:  POTENTIAL TEMPERATURE (OCEAN)  DEG.C
    units:      degC
    valid_min:  -1.7999878
    valid_max:  35.0495


Q6. Take a look at the `temperature` data array and inspect its dimensions, coordinates and attributes. What are the specific dimensions and coordinates associated with it? What metadata (attributes) is provided?

In [10]:
temperature.coords

Coordinates:
  * longitude  (longitude) float32 1kB 0.0 1.25 2.5 3.75 ... 356.2 357.5 358.8
  * latitude   (latitude) float32 576B -89.38 -88.12 -86.88 ... 88.12 89.38
  * depth      (depth) float32 80B 5.0 15.0 25.0 ... 4.577e+03 5.192e+03
  * t          (t) object 8B 1920-04-16 00:00:00

In [11]:
temperature.attrs

{'source': 'Unified Model Output:',
 'name': 'temp',
 'title': 'POTENTIAL TEMPERATURE (OCEAN)  DEG.C',
 'date': '01/12/99',
 'time': '00:00',
 'long_name': 'POTENTIAL TEMPERATURE (OCEAN)  DEG.C',
 'units': 'degC',
 'valid_min': np.float32(-1.7999878),
 'valid_max': np.float32(35.0495)}

In [12]:
temperature.dims

('t', 'depth', 'latitude', 'longitude')

Q7. Find out what dimensions and coordinates exist in your dataset. Which latitude and longitude variables are associated with the ocean temperature variable?

In [7]:
# The latitude and longitude coordinates are associated with temperature (i.e from the center of the grid cell)
# For another variable, for example ucurr or vcurr, they have latitude_1 and longitude_1 coordinates (i.e. from the edge of the grid cell)