# 3-Arrays: Working with multi-dimensional arrays

In this section, we introduce the python package `xarray`.

`xarray` is useful for working with labeled multi-dimensional arrays, it includes functions for advanced analytics and visualization. 

`xarray` is heavily inspired by pandas and it uses pandas internally. 

We'll see two high-level data structures, `Dataset` and `DataArray`

## Opening a `Dataset`

For this example, we'll use the file: `t2m_1990-1999.nc` with 30-years data of temperature for a domain around Barcelona.

First, use the `open_dataset` function to create an instance of a `Dataset` defined in `xarray`:


In [None]:
import xarray as xr
import numpy as np
import matplotlib.pyplot as plt
import os

path  = "./3-Arrays"
fname = "t2m_1990-1999.nc"
era5_file = os.path.join(path,fname)
ds = xr.open_dataset(era5_file)

In [None]:
ds

We can access to a variable from a `Dataset` through a `DataArray` instance: 

In [None]:
ds.t2m

In [None]:
ds.t2m.latitude

In [None]:
ds.t2m.attrs['units']

In [None]:
ds.t2m.max(dim='time')

## Selecting data

Use `sel` to select by dimension name and coordinate label.

For example, get the time series of 2-m temperature for the Barcelona location (approximately)

In [None]:
t2m_bcn = ds.t2m.sel(latitude=41.5,longitude=2.25)

In [None]:
#Barcelona coordinates
lat_bcn, lon_bcn = (41.3851, 2.1734)
t2m_bcn = ds.t2m.sel(latitude=lat_bcn,longitude=lon_bcn, method='nearest')

In [None]:
t2m_bcn

## Interpolating

Multidimensional interpolation of Dataset:

In [None]:
#By default it uses linear interpolation
t2m = ds.t2m.interp(latitude=lat_bcn,longitude=lon_bcn)

In [None]:
t2m

In [None]:
np.abs(t2m-t2m_bcn).max()

## Group Data By Date

As in `pandas`, we can use `resample`:

In [None]:
# Group the data by month, and take the mean for each group (i.e. each month)
t2m.resample(time='M')

In [None]:
t2m_bcn.resample(time='M').mean()

In [None]:
t2m.plot(figsize=(15,5))
t2m.resample(time='1M').min().plot(drawstyle='steps-pre')
t2m.resample(time='1M').max().plot(drawstyle='steps-pre')
plt.show()

In [None]:
t2m.groupby('time.month').mean().plot()
plt.show()

In [None]:
#Change units from K to C
t2m -= 273.15
t2m.attrs['units'] = "C"
t2m.attrs['long_name'] = '2 metre temperature'

t2m.groupby('time.month').mean().plot()
t2m.groupby('time.month').max().plot()
t2m.groupby('time.month').min().plot()

plt.show()

In [None]:
t2m.groupby('time.season').min()
t2m.groupby('time.season').max()

## Plotting time series

In [None]:
fig, (ax1, ax2) = plt.subplots(ncols=2, figsize=(15,5))

t2m.plot(ax=ax1)
t2m.groupby('time.month').mean().plot(ax=ax2)

# Each Axes has a title, set via set_title()
ax1.set_title("hourly time series")
ax2.set_title("monthly average")

plt.show()

## Opening multiple files as a single dataset

In [None]:
import glob 

path      = "3-Arrays"
fnames    = os.path.join(path,"*nc")
nc_files  = glob.glob(fnames)
nc_files

In [None]:
ds = xr.open_mfdataset(nc_files)
ds

In [None]:
#Interpolate and change units
t2m = ds.t2m.interp(latitude=lat_bcn,longitude=lon_bcn)

t2m -= 273.15
t2m.attrs['units'] = "C"
t2m.attrs['long_name'] = '2 metre temperature'

In [None]:
fig, (ax1, ax2) = plt.subplots(nrows=2, figsize=(12,6))

t2m.plot(ax=ax1)
t2m.groupby('time.month').mean().plot(ax=ax2)

# Each Axes has a title, set via set_title()
ax1.set_title("hourly time series")
ax2.set_title("monthly average")

plt.tight_layout()
plt.show()

In [None]:
#annual average temperature
t2m.resample(time="A").mean().plot()
plt.show()

## Accessing to remote data

`xarray` includes support for OPeNDAP (via the netCDF4 library or Pydap), which lets us access large datasets over HTTP.

For example, we can access to GFS forecast:

In [120]:
url = "https://nomads.ncep.noaa.gov:9090/dods/gfs_0p25/gfs20191204/gfs_0p25_12z"

ds = xr.open_dataset(url)