# xarray for computations

First of all, [`xarray`](http://xarray.pydata.org/en/stable/index.html)
is a **wonderful** tool for creating - and interacting with -
*labeled multidimensional data*. I turn to `xarray` any time I have multidimensional data.

If you're familiar with [`pandas`](https://pandas.pydata.org/),
then you've probably grown to love its API, and you have more than likely
grown so used to having *labeled data*. I bet that, like me, when you see unlabeled data,
like *raw `numpy` arrays*, your heart stops for a little.

The purpose of this tutorial is to (hopefully) show you
that performing computations using `xarray` is easy.
Using [`xarray` data structures](http://xarray.pydata.org/en/stable/data-structures.html)
makes your *multidimensional computations* much more expressive,
easier to understand, and easier to develop!

## Defining the problem

Imagine we're working on a _modeling and simulation_ project
where we want to model the path of a projectile in 3D over time.
Simple enough, right. Once we get the basics of this working, we'll
add more complex things.

For example, later on we might want to investigate

* the effects of the projectile on the surrounding air temperature
* the effects of wind on the projectile's path
* etc.

## Setting up the simulation

This is a _modeling and simulation_ project.
So, our first step is to simulate some *real world* data.

So far, we know that we want to model a projectile's path over *time*,
so we know that we'll have to create some *time* data.
But what else will we have to create?

Imagine that our _projectile_ has some sensors on board.
For now, imagine that it has an *accelerometer* that informs
us of its *x- y- and z-acceleration* over time.
So we'll create some *acceleration* data as well.

We can add more later. Let's keep things simple for now.

In [None]:
import numpy as np
import xarray as xr
import matplotlib.pyplot as plt

def gen_linspace_values(start, stop, n):
    """Generate linearly spaced values, then add same noise."""
    values, step = np.linspace(start, stop, n, retstep=True)
    noise = step / 5
    noise_values = np.random.default_rng().normal(
        0.0, noise, size=values.shape
    )
    values = values + noise_values
    return values

def gen_centered_values(center, shape, noise):
    """Create distributed about a center."""
    noise_values = np.random.default_rng().normal(0, noise, size=shape)
    return np.array(center) + noise_values

### Time

Time will be stored in an [`xarray.DataArray`](http://xarray.pydata.org/en/stable/data-structures.html#dataarray),
**not a `numpy.ndarray`**! This is because we can store so much more relevant info in an `xarray.DataArray`.

Let's also add some *noise* to the time measurements.

In [None]:
time = xr.DataArray(
    gen_linspace_values(0, 10, 100),
    dims=("time"),
    attrs={
        "units":"s",
        "long_name":"Time"
    },
)
time.plot()
plt.show()

### X- Y- and Z- axes

We are dealing with 3D coordinates.
It may seem trivial to define this data structure,
but believe me, it makes things much more expressive and clear.

In [None]:
axis = xr.DataArray(
    ["x", "y", "z"],
    dims=("axis"),
    attrs={
        "long_name":"Axis"
    }
)
axis

### Acceleration

This simulation takes place on the Earth, so we have to take *gravity* into account.
Let's say that gravity acts in the *z-direction*.
Let us assume, for now, that acceleration in both the *x-* and *y-* directions
is zero.

As mentioned earlier, acceleration is being recorded via onboard sensors,
so there will be some noise in the recordings. Let's go ahead and create some
acceleration data with some noise.

In [None]:
acceleration = xr.DataArray(
    gen_centered_values(
        [0.0, 0.0, -9.81],
        (len(time), len(axis)),
        0.5,
    ),
    coords=[time, axis],
)

acceleration.plot(hue="axis")
plt.show()

In [None]:
# see https://stackoverflow.com/a/59378218/8863304
dt = xr.DataArray(
    np.diff(time, prepend=0),
    coords=[time],
    attrs={
        "units":"s",
        "long_name":"Change in Time"
    },
    name="delta_time"
)
dt.plot()
plt.show()

In [None]:
simulation = xr.Dataset(
    {
        "time":time,
        "axis":axis,
        "acceleration":acceleration,
    }
)
simulation

In [None]:
initial_velocity = np.array([5.0, 40.0, 5.0])

simulation["velocity"] = (dt * simulation.acceleration).cumsum() + initial_velocity

simulation.velocity.plot(hue="axis")

In [None]:
initial_position = np.array([0.0, 40.0, 0.0])

simulation["position"] = (dt * simulation.velocity).cumsum() + initial_position

simulation.position.plot(hue="axis")

In [None]:
simulation["position"] = simulation.position.where(
    simulation.position >= 0.0,
    0.0
)
simulation.position.plot(hue="axis")

## Dimensions / Coordinates

In [None]:
time_values = np.linspace(0, 10)

x_values = np.arange(0, 20)
y_values = np.arange(0, 20)

time_da = xr.DataArray(
    time_values,
    dims=("time"),
    attrs={"units":"s"},
    name="time",
)

x_da = xr.DataArray(
    x_values,
    dims=("x"),
    attrs={"units":"m"},
    name="x",
)

y_da = xr.DataArray(
    y_values,
    dims=("y"),
    attrs={"units":"m"},
    name="y",
)

In [None]:
base_temperature = 20.0

temperature_values = np.zeros((
    len(time_values), len(x_values), len(y_values)
))

temperature_values.fill(base_temperature)

temperature_da = xr.DataArray(
    temperature_values,
    coords=[time_da, x_da, y_da],
    name="temperature"
)

mask = (
    (temperature_da.time == 0)
    & (temperature_da.x <= 5)
    & (temperature_da.y <= 5)
)

initial_temperature = 1000

temperature_da = xr.where(
    mask,
    initial_temperature,
    temperature_da
)

temperature_da.plot.hist()

In [None]:
y_da.plot()