# Shapes in ML4Science

[![Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/holl-/ML4Science/blob/main/docs/Shapes.ipynb)
&nbsp; • &nbsp; [🌐 **ML4Science**](https://github.com/holl-/ML4Science)
&nbsp; • &nbsp; [📖 **Documentation**](https://holl-.github.io/ML4Science/)
&nbsp; • &nbsp; [🔗 **API**](https://holl-.github.io/ML4Science/ml4s)
&nbsp; • &nbsp; [**▶ Videos**]()
&nbsp; • &nbsp; [<img src="images/colab_logo_small.png" height=4>](https://colab.research.google.com/github/holl-/ML4Science/blob/main/docs/Examples.ipynb) [**Examples**](https://holl-.github.io/ML4Science/Examples.html)


In [3]:
# !pip install ml4s
from ml4s import math

## Dimension Types

The largest difference between ML4Science and its backend libraries like PyTorch or Jax lies in the tensor shapes.
When using [ML4Science's tensors](Tensors.html), all dimensions must be assigned a name and type flag.
The following dimension types are available:

* *batch* dimensions can be added to any code in order to parallelize it. This is their only function. The code should always give the exact same result as if it was called sequentially on all slices and the results were stacked along the batch dimension.
* *channel* dimensions list components of one object, such as a pixel, grid cell or particle. Typical examples include color channels or (x,y,z) components of a vector.
* *spatial* dimensions denote grid dimensions. Typically, elements are equally-spaced along spatial dimensions, enabling operations such as convolutions or FFTs. The resolution of an image or lattice is typically expressed via spatial dimensions.
* *instance* dimensions enumerate objects that are not regularly ordered, such as moving particles or finite elements.
* *dual* dimensions represent function inputs and are typically used to denote the columns of matrices. See [the matrix documentation](Matrices.html) for more.

In [4]:
from ml4s.math import batch, channel, spatial, instance, dual
BATCH = batch(examples=100)
BATCH

(examplesᵇ=100)

Here, we have created a [`Shape`](https://holl-.github.io/ML4Science/ml4s/math/index.html#ml4s.math.Shape) containing a single *batch* dimension with name `examples`.
Note the superscript `b` to indicate that this is a batch dimension. Naturally the other superscripts are `c` for *channel*, `s` for *spatial*, `i` for *instance* and `d` for *dual*.

We can now use this shape to construct tensors:

In [5]:
x = math.zeros(BATCH)
x

[92m(examplesᵇ=100)[0m [94mconst 0.0[0m

Let's create a tensor with this batch and multiple spatial dimensions!
We can pass multiple shapes to tensor constructors and can construct multiple dimensions of the same type in one call.

In [6]:
x = math.ones(BATCH, spatial(x=28, y=28))
x

[92m(examplesᵇ=100, xˢ=28, yˢ=28)[0m [94mconst 1.0[0m

We can retrieve the `Shape` of x using either `x.shape` or [`math.shape(x)`](https://holl-.github.io/ML4Science/ml4s/math/index.html#ml4s.math.shape) which also works on primitive types.

In [7]:
x.shape

(examplesᵇ=100, xˢ=28, yˢ=28)

The dimension constructors, such as `math.spatial`, can also be used to filter for only these dimensions off an object.

In [8]:
spatial(x)

(xˢ=28, yˢ=28)

There are additional filter function, such as [`non_***`](https://holl-.github.io/ML4Science/ml4s/math/index.html#ml4s.math.non_batch) as well as [`primal`](https://holl-.github.io/ML4Science/ml4s/math/index.html#ml4s.math.primal) to exclude *batch* and *dual* dimensions.

This way, we can easily construct a tensor without the batch dimension.

In [9]:
from ml4s.math import non_batch, non_channel, non_spatial, non_instance, non_dual, primal, non_primal
math.random_uniform(non_batch(x))

[92m(xˢ=28, yˢ=28)[0m [94m0.522 ± 0.285[0m [37m(3e-04...1e+00)[0m

## Automatic Reshaping

One major advantage of naming all dimensions is that reshaping operations can be performed under-the-hood.
Assuming we have a tensor with dimensions `a,b` and another with the reverse dimension order.

In [10]:
t1 = math.random_normal(channel(a=2, b=3))
t2 = math.random_normal(channel(b=3, a=2))

When combining them in a tensor operation, ML4Science automatically transposes the tensors to match.

In [11]:
t1 + t2

[94m(-1.495, 1.671, -0.905, 0.743, -1.208, -0.851)[0m [92m(aᶜ=2, bᶜ=3)[0m

The resulting dimension order is generally undefined.
However, this is of no consequence, because dimensions are never referenced by their index in the shape.

When one of the tensors is missing a dimension, it will be added automatically.
In these cases, you can think of the value being constant along the missing dimension (like with [singleton dimensions in NumPy](https://numpy.org/doc/stable/reference/generated/numpy.expand_dims.html)).

In [12]:
t1 = math.random_normal(channel(a=2))
t2 = math.random_normal(channel(b=3))
t1 + t2

[94m(2.287, 0.391, 0.760, 2.096, 0.200, 0.569)[0m [92m(aᶜ=2, bᶜ=3)[0m

Here, we created a 2D tensor from two 1D tensors. No manual reshaping required.

## Selecting and Combining Dimensions

All tensor creation functions accept a variable number of `Shape` objects as input and concatenate the dimensions internally.
This can also be done explicitly using [`concat_shapes()`](ml4s/math/#ml4s.math.concat_shapes).

In [16]:
b = batch(examples=16)
s = spatial(x=28, y=28)
c = channel(channels='red,green,blue')
math.concat_shapes(s, c, b)

(xˢ=28, yˢ=28, channelsᶜ=red,green,blue, examplesᵇ=16)

This preserves the dimension order and fails if multiple dimensions with the same name are given.
Alternatively, [`merge_shapes()`](ml4s/math/#ml4s.math.merge_shapes) can be used, which groups dimensions by type and allows for the same dimensions to be present on multiple inputs.

In [18]:
s = math.merge_shapes(s, c, b)
s

(examplesᵇ=16, xˢ=28, yˢ=28, channelsᶜ=red,green,blue)

This can also be done using the `&` operator.
Notice how the *batch* dimension is moved to the first place.

In [44]:
s & c & b

(examplesᵇ=16, xˢ=28, yˢ=28, channelsᶜ=red,green,blue)

Filtering shapes for specific dimensions can be done using `Shape[name]`, [`Shape.only()`](ml4s/math/#ml4s.math.Shape.only) and [`Shape.without()`](ml4s/math/#ml4s.math.Shape.without).

In [41]:
s['x']

(xˢ=28)

In [42]:
s.only('x,y')

(xˢ=28, yˢ=28)

In [43]:
s.without('x,y')

(examplesᵇ=16, channelsᶜ=red,green,blue)

In [23]:
s.only(spatial)

(xˢ=28, yˢ=28)

Selecting only one type of dimension can also be done using the construction function or the corresponding Shape member variable.

In [25]:
s.spatial

(xˢ=28, yˢ=28)

In [26]:
spatial(s)

(xˢ=28, yˢ=28)

In [34]:
s.non_spatial

(examplesᵇ=16, channelsᶜ=red,green,blue)

In [35]:
non_spatial(s)

(examplesᵇ=16, channelsᶜ=red,green,blue)

## Properties of Shapes

[`Shape`](ml4s/math/#ml4s.math.Shape) objects are *immutable*. Do not attempt to change any property of a `Shape` directly.
The sizes of all dimensions can be retrieved as a `tuple` using `Shape.sizes´. The result is equal to what NumPy or any of the other backends would return for `tensor.shape`.

In [30]:
s.sizes

(16, 28, 28, 3)

Likewise, the names of the dimensions can be read using `Shape.names`.

In [31]:
s.names

('examples', 'x', 'y', 'channels')

For single-dimension shapes, the properties `name` and `size` return the value directly.
You can select
To get the size of a specific dimension, you can use one of the following methods:

In [40]:
s['x'].size

28

In [29]:
for dim in s:
    print(dim.name, dim.size, dim.dim_type.__name__)

examples 16 batch
x 28 spatial
y 28 spatial
channels 3 channel


The number of dimensions and total elements can be retrieved using `len(Shape)` and `Shape.volume`, respectively.

In [46]:
len(s)

4

In [48]:
s.non_batch.volume

2352

## Changing Dimensions

The names and types of dimensions can be changed, but this always returns a new object, leaving the original unaltered.
Assume, we want to rename the `channels` dimension from above to `color`.

In [49]:
math.rename_dims(s, 'channels', 'color')

(examplesᵇ=16, xˢ=28, yˢ=28, colorᶜ=red,green,blue)

The same can be done for tensors.

In [50]:
math.rename_dims(math.zeros(s), 'channels', 'color')

[92m(examplesᵇ=16, xˢ=28, yˢ=28, colorᶜ=red,green,blue)[0m [94mconst 0.0[0m

To change the type, you may use `replace_dims()`, which is an alias for `rename_dims()` but clarifies the intended use.

In [53]:
math.replace_dims(s, 'channels', batch('channels'))

(examplesᵇ=16, xˢ=28, yˢ=28, channelsᵇ=3)

## Response to Dimension Types by Function

The dimension types serve an important role in indicating what role a dimension plays.
Many `math` functions behave differently, depending on the given dimension types.

Vector operations like [`vec_length`](ml4s/math#ml4s.math.vec_length) or [`rotate_vector`](ml4s/math#ml4s.math.rotate_vector) require the input to have a *channel* dimension to list the vector components.

Spatial operations like [`fft`](ml4s/math#ml4s.math.fft) or [`convolve`](ml4s/math#ml4s.math.convolve),
as well as finite differences
[`spatial_gradient`](ml4s/math#ml4s.math.spatial_gradient), [`laplace`](ml4s/math#ml4s.math.laplace),
[`fourier_laplace`](ml4s/math#ml4s.math.fourier_laplace), [`fourier_poisson`](ml4s/math#ml4s.math.fourier_poisson),
and resampling operations like
[`downsample2x`](ml4s/math#ml4s.math.downsample2x),
[`upsample2x`](ml4s/math#ml4s.math.upsample2x),
[`grid_sample`](ml4s/math#ml4s.math.grid_sample) act only on *spatial* dimensions.
Their dimensionality (1D/2D/3D/etc.) [depends on the number of spatial dimensions](N_Dimensional.html) of the input.

Dual dimensions are ignored (treated as batch dimensions) by almost all functions, except for [matrix multiplications](Matrices.html), `matrix @ vector`, which reduces the dual dimensions of the matrix against the corresponding primal dimensions of the vector.
Dual dimensions are created by certain operations like [`pairwise_distances`](ml4s/math#ml4s.math.pairwise_distances).

All functions ignore *batch* dimensions.
This also applies to functions that would usually reduce all dimensions by default, such as
[`sum`](ml4s/math#ml4s.math.sum), [`mean`](ml4s/math#ml4s.math.mean), [`std`](ml4s/math#ml4s.math.std),
[`any`](ml4s/math#ml4s.math.any), [`all`](ml4s/math#ml4s.math.all),
[`max`](ml4s/math#ml4s.math.max), [`min`](ml4s/math#ml4s.math.min) and many more, as well as loss functions like the [`l2_loss`](ml4s/math#ml4s.math.l2_loss).

The elementary functions
[`gather`](ml4s/math#ml4s.math.gather) and
[`scatter`](ml4s/math#ml4s.math.scatter) act on *spatial* or *instance* dimensions of the grid.
The indices are listed along *instance* dimensions and the index components along a singular *channel* dimension.

## Further Reading

Dimension names play an important role in [slicing tensors](Introduction.html#Slicing).
To make your code more readable, you can also name slices along dimensions.

The number of spatial dimensions dictates what dimensionality (1D, 2D, 3D) your code works in.
You can therefore write code that [works in 1D, 2D, 3D and beyond](N_Dimensional.html).

Dual dimensions are used to represent [columns of matrices](Matrices.html#Primal-and-Dual-Dimensions).

Stacking tensors with the same dimension names but different sizes results in [non-uniform shapes](Non_Uniform.html).

[🌐 **ML4Science**](https://github.com/holl-/ML4Science)
&nbsp; • &nbsp; [📖 **Documentation**](https://holl-.github.io/ML4Science/)
&nbsp; • &nbsp; [🔗 **API**](https://holl-.github.io/ML4Science/ml4s)
&nbsp; • &nbsp; [**▶ Videos**]()
&nbsp; • &nbsp; [<img src="images/colab_logo_small.png" height=4>](https://colab.research.google.com/github/holl-/ML4Science/blob/main/docs/Examples.ipynb) [**Examples**](https://holl-.github.io/ML4Science/Examples.html)