# Discover xarray

In this notebook we will take an example of "simulating" the kinetic energy for different masses and velocities. We start of doing the calculation in `numpy` and slowly transition to `xarray`.

In [None]:
import matplotlib.pyplot as plt
import numpy as np

Let's say we are interested in simulating the kinetic energy for all combinations of the following values: 

In [None]:
v_mps = np.linspace(0, 10)
m_kg = np.linspace(3, 17)

In [None]:
v_mps

We can do that in the following way

In [None]:
Ekin_J = 0.5 * np.expand_dims(m_kg, 1) * v_mps**2
Ekin_J

Plotting it gives us some insights but it is hard to interpret because the axis have no labels and the ticks are based on the index of the array not on the values of v and m.

In [None]:
plt.pcolormesh(Ekin_J)

Doing calculations is possible but require remembering what axis is what:

In [None]:
# What is the average over all speeds
Ekin_J.mean(axis=1)

## Labeling the axes

We can use `xarray` to give names to the axes (now called dims), which makes interpretation much easier. 

In [None]:
import xarray as xr

In [None]:
v_mps = xr.DataArray(np.linspace(0, 10), dims="v_mps")
m_kg = xr.DataArray(np.linspace(3, 17), dims="m_kg")

Now you can see that the dim has the name `v_mps` and has a length of 50.

In [None]:
v_mps

Doing calculation is now simplified as `xarray` knows which axis to match and which to broadcast.

In [None]:
Ekin_J = 0.5 * m_kg * v_mps**2
Ekin_J

Plotting will now also indicate the name of the dim.

In [None]:
Ekin_J.plot()

Calculations can now be done without remembering the order of the axes as you just supply the name of the dim.

In [None]:
Ekin_J.mean("v_mps")

## Adding coordinates

It would be great if the plot would have directly the right ticks, coming from the mass and velocity and not the indexes of the array. In `xarray` this is called a coordinate. A coordinate is an array that "labels" a dimension.

In [None]:
v_mps_array = np.linspace(0, 10)
v_mps = xr.DataArray(v_mps_array, dims="v_mps", coords={"v_mps": v_mps_array})
m_kg_array = np.linspace(3, 17)
m_kg = xr.DataArray(m_kg_array, dims="m_kg", coords={"m_kg": m_kg_array})

You can now see that `v_mps` is bold and that means there is a coordinate associated with that dimension.

In [None]:
v_mps

In [None]:
Ekin_J = 0.5 * m_kg * v_mps**2
Ekin_J

Now plotting will know how to label your x and y axis.

In [None]:
Ekin_J.plot()

## Adding metadeta

Let's start adding labels and units so that the plotting is directly correct.\
'long_name' and 'units' are special attributes of DataArrays.

In [None]:
Ekin_J.v_mps.attrs = {"long_name": "Velocity", "units": "m/s"}
Ekin_J.m_kg.attrs = {"long_name": "Mass", "units": "kg"}
Ekin_J.attrs = {"long_name": "Energy", "units": "J"}
Ekin_J

Clicking the document icon behind the coordinates allows you to inspect the attrs.

This gives us the desired plot directly.

In [None]:
Ekin_J.plot()

## Using a dataset

The workflow can be simplified by using a `Dataset` from `xarray`. You can see the `Dataset` as a dictionary holding many `DataArrays`, which can share dimensions and coordinates.

You can read and write something to a dataset using the `ds["name"]` notation. You can only read using the `ds.name` notation.

Assigning a `numpy` array to a dataset directly sets it as a coordinate.

In [None]:
ds = xr.Dataset()
ds["v_mps"] = np.linspace(0, 10)
ds.v_mps.attrs = {"long_name": "Velocity", "units": "m/s"}
ds["m_kg"] = np.linspace(3, 17)
ds.m_kg.attrs = {"long_name": "Mass", "units": "kg"}
ds["Ekin_J"] = 0.5 * ds.m_kg * ds.v_mps**2
ds.Ekin_J.attrs = {"long_name": "Energy", "units": "J"}
ds

In [None]:
ds.Ekin_J.plot()

## Using a function (clean coding)

Now let's clean up the code by making use of a function:

In [None]:
def kinetic_energy_J(m_kg, v_mps):
    return 0.5 * m_kg * v_mps**2


ds = xr.Dataset()
ds["v_mps"] = np.linspace(0, 10)
ds.v_mps.attrs = {"long_name": "Velocity", "units": "m/s"}
ds["m_kg"] = np.linspace(3, 17)
ds.m_kg.attrs = {"long_name": "Mass", "units": "kg"}
ds["Ekin_J"] = kinetic_energy_J(ds.m_kg, ds.v_mps)
ds.Ekin_J.attrs = {"long_name": "Energy", "units": "J"}
ds

## Using xr.apply_ufunc (clean coding and very powerful)

The function call can also be done using `xr.apply_ufunc`. Eventhough the advantage is not clear now, it will be in the next notebook.

In [None]:
def kinetic_energy_J(m_kg, v_mps):
    return 0.5 * m_kg * v_mps**2


ds = xr.Dataset()
ds["v_mps"] = np.linspace(0, 10)
ds.v_mps.attrs = {"long_name": "Velocity", "units": "m/s"}
ds["m_kg"] = np.linspace(3, 17)
ds.m_kg.attrs = {"long_name": "Mass", "units": "kg"}
ds["Ekin_J"] = xr.apply_ufunc(kinetic_energy_J, ds.m_kg, ds.v_mps)
ds.Ekin_J.attrs = {"long_name": "Energy", "units": "J"}
ds