# ML4Science Quickstart

[![Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/holl-/ML4Science/blob/main/docs/Introduction.ipynb) 
&nbsp; • &nbsp; [🌐 **ML4Science**](https://github.com/holl-/ML4Science)
&nbsp; • &nbsp; [📖 **Documentation**](https://holl-.github.io/ML4Science/)
&nbsp; • &nbsp; [🔗 **API**](https://holl-.github.io/ML4Science/ml4s)
&nbsp; • &nbsp; [**▶ Videos**]()
&nbsp; • &nbsp; [<img src="images/colab_logo_small.png" height=4>](https://colab.research.google.com/github/holl-/ML4Science/blob/main/docs/Examples.ipynb) [**Examples**](https://holl-.github.io/ML4Science/Examples.html)



## Installation

Install ML4Science with [pip](https://pypi.org/project/pip/) on [Python 3.6](https://www.python.org/downloads/) and later:

In [2]:
%%capture
!pip install ml4s

Install [PyTorch](https://pytorch.org/), [TensorFlow](https://www.tensorflow.org/install) or [Jax](https://github.com/google/jax#installation) to enable machine learning capabilities and GPU execution.
See the [detailed installation instructions](https://holl-.github.io/ML4Science/Installation_Instructions.html).

In [3]:
from ml4s import math

## Usage without ML4Science's Tensors

You can call many functions on native tensors directly.
ML4Science will dispatch the call to the corresponding library and return the result as another native tensor.

In [4]:
math.sin(1.)

0.841471

In [5]:
from jax import numpy as jnp
math.sin(jnp.asarray([1.]))

DeviceArray([0.841471], dtype=float32)

In [6]:
import torch
math.sin(torch.tensor([1.]))

tensor([0.8415], device='cuda:0')

In [7]:
import tensorflow as tf
math.sin(tf.constant([1.]))

<tf.Tensor: shape=(1,), dtype=float32, numpy=array([0.84147096], dtype=float32)>

In [8]:
import numpy as np
math.sin(np.asarray([1.]))

array([0.841471], dtype=float32)

## ML4Science's `Tensor`

For more advanced operations, we recommend using [ML4Science's tensors](Tensors.html).
While ML4Science includes a [unified low-level API](https://holl-.github.io/ML4Science/ml4s/backend/#ml4s.backend.Backend) that behaves much like NumPy, using it correctly (so that the code is actually compatible with all libraries) is difficult.
Instead, ML4Science provides a higher-level API consisting of the [`Tensor` class](https://holl-.github.io/ML4Science/ml4s/math/#ml4s.math.Tensor), the [`math`](https://holl-.github.io/ML4Science/ml4s/math) functions and other odds and ends, that makes writing unified code easy.
Tensors can be created by wrapping an existing backend-specific tensor or array:


In [9]:
torch_tensor = torch.tensor([1, 2, 3])
math.tensor(torch_tensor)

[94m(1, 2, 3)[0m [93mint64[0m

In [10]:
math.wrap(torch_tensor)

[94m(1, 2, 3)[0m [93mint64[0m

The difference between `tensor` and `wrap` is that `wrap` keeps the original data you pass in while `tensor` will convert the data to the default backend which can be set using [`math.use()`](https://holl-.github.io/ML4Science/ml4s/math/#ml4s.math.use).

In [11]:
math.use('jax')
math.wrap(torch_tensor).default_backend

torch

In [12]:
math.tensor(torch_tensor).default_backend

jax

The last `tensor` call converted the PyTorch tensor to a Jax `DeviceArray` using a no-copy routine from [`dlpack`](https://github.com/dmlc/dlpack) under the hood.

## Dimension Types

For tensors with more than one dimensions, you have to specify a name and type for each.
Possible types are *batch* for parallelizing code, *channel* for listing features (color channels or x/y/z components) and *spatial* for equally-spaced sample points (width/height of an image, 1D time series, etc.).
For an exhaustive list, see [here](Shapes.html)

In [13]:
from ml4s.math import batch, spatial, channel
torch_tensor = torch.tensor([[1, 2, 3], [4, 5, 6]])
math.wrap(torch_tensor, batch('dim1'), channel('dim2'))

[94m(1, 2, 3)[0m; [94m(4, 5, 6)[0m [92m(dim1ᵇ=2, dim2ᶜ=3)[0m [93mint64[0m

The superscript `b` and `c` denote the dimension type.
When creating a new tensor from scratch, we also need to specify the size along each dimension:

In [14]:
math.random_uniform(batch(dim1=2), channel(dim2=3))

[94m(0.072, 0.020, 0.077)[0m; [94m(0.879, 0.165, 0.102)[0m [92m(dim1ᵇ=2, dim2ᶜ=3)[0m

When passing tensors to a neural network, the tensors are transposed to match the preferred dimension order (`BHWC` for TensorFlow/Jax, `BCHW` for PyTorch).
For example, we can pass any number of batch and channel dimensions to an MLP.

In [15]:
from ml4s import nn
mlp = nn.mlp(in_channels=6, out_channels=3, layers=[64, 64])
data = math.random_normal(batch(b1=4, b2=10), channel(c1=2, c2=3))
math.native_call(mlp, data)

[92m(b1ᵇ=4, b2ᵇ=10, vectorᶜ=3)[0m [94m-1.04e-04 ± 3.0e-01[0m [37m(-1e+00...9e-01)[0m

The network here is a standard fully-connected network module with two hidden layers of 64 neurons each.
The native tensor that is passed to the network has shape (40, 6) as all batch dimensions are compressed into the first and all channel dimensions into the last dimension.

For a network acting on spatial data, we would add *spatial* dimensions.

In [16]:
net = nn.u_net(in_channels=6, out_channels=3, in_spatial=2)
data = math.random_normal(batch(b1=4, b2=10), channel(c1=2, c2=3), spatial(x=28, y=28))
math.native_call(mlp, data)

[92m(b1ᵇ=4, b2ᵇ=10, xˢ=28, yˢ=28, vectorᶜ=3)[0m [94m-0.004 ± 0.322[0m [37m(-2e+00...2e+00)[0m

In this example, we ran a 2D [U-Net](https://en.wikipedia.org/wiki/U-Net#:~:text=U%2DNet%20is%20a%20convolutional,of%20the%20University%20of%20Freiburg.).
For a 1D or 3D variant, we would pass `in_spatial=1` or `3`, respectively, and add the corresponding number of spatial dimensions to `data`.

## Slicing

Slicing in ML4Science is done by dimension names.
Say we have a set of images:

In [17]:
images = math.random_uniform(batch(set=4), spatial(x=28, y=28), channel(channels=3))
images

[92m(setᵇ=4, xˢ=28, yˢ=28, channelsᶜ=3)[0m [94m0.502 ± 0.288[0m [37m(1e-04...1e+00)[0m

The red, green and blue components are stored inside the `channels` dimension.
Then to get just the red component of the last entry in the set, we can write

In [18]:
images.set[-1].channels[0]

[92m(xˢ=28, yˢ=28)[0m [94m0.501 ± 0.291[0m [37m(2e-03...1e+00)[0m

Or we can slice using a dictionary

In [19]:
images[{'set': -1, 'channels': 0}]

[92m(xˢ=28, yˢ=28)[0m [94m0.501 ± 0.291[0m [37m(2e-03...1e+00)[0m

Slicing the NumPy way, i.e. `images[-1, :, :, 0]` is not supported because the order of dimensions generally depends on which backend you use.

To make your code easier to read, you may name slices along dimensions as well.
In the above example, we might name the red, green and blue channels explicitly:

In [20]:
images = math.random_uniform(batch(set=4), spatial(x=28, y=28), channel(channels='red,green,blue'))
images.set[-1].channels['red']
images[{'set': -1, 'channels': 'red'}]

[92m(xˢ=28, yˢ=28)[0m [94m0.500 ± 0.276[0m [37m(2e-03...1e+00)[0m

To select multiple items by index, use the syntax `tensor.<dim>[start:end:step]` where `start >= 0`, `end` and `step > 0` are integers.

In [21]:
images.x[1:3]

[92m(setᵇ=4, xˢ=2, yˢ=28, channelsᶜ=red,green,blue)[0m [94m0.509 ± 0.282[0m [37m(4e-03...1e+00)[0m

To select multiple named slices, pass a tuple, list, or comma-separated string.

In [22]:
images.channels['red,blue']

[92m(setᵇ=4, xˢ=28, yˢ=28, channelsᶜ=red,blue)[0m [94m0.503 ± 0.288[0m [37m(2e-04...1e+00)[0m

You can iterate along a dimension or unstack a tensor along a dimension.

In [26]:
for image in images.set:
    print(math.mean(image))

[94m0.5032147[0m
[94m0.50452465[0m
[94m0.49833444[0m
[94m0.5087652[0m


In [24]:
list(images.set)

[[92m(xˢ=28, yˢ=28, channelsᶜ=red,green,blue)[0m [94m0.503 ± 0.292[0m [37m(4e-04...1e+00)[0m,
 [92m(xˢ=28, yˢ=28, channelsᶜ=red,green,blue)[0m [94m0.505 ± 0.287[0m [37m(6e-04...1e+00)[0m,
 [92m(xˢ=28, yˢ=28, channelsᶜ=red,green,blue)[0m [94m0.498 ± 0.290[0m [37m(1e-04...1e+00)[0m,
 [92m(xˢ=28, yˢ=28, channelsᶜ=red,green,blue)[0m [94m0.509 ± 0.284[0m [37m(2e-04...1e+00)[0m]

You can even convert named slices to a `dict` or use them as keyword arguments.

In [29]:
dict(images.set)

{0: [92m(xˢ=28, yˢ=28, channelsᶜ=red,green,blue)[0m [94m0.503 ± 0.292[0m [37m(4e-04...1e+00)[0m,
 1: [92m(xˢ=28, yˢ=28, channelsᶜ=red,green,blue)[0m [94m0.505 ± 0.287[0m [37m(6e-04...1e+00)[0m,
 2: [92m(xˢ=28, yˢ=28, channelsᶜ=red,green,blue)[0m [94m0.498 ± 0.290[0m [37m(1e-04...1e+00)[0m,
 3: [92m(xˢ=28, yˢ=28, channelsᶜ=red,green,blue)[0m [94m0.509 ± 0.284[0m [37m(2e-04...1e+00)[0m}

In [31]:
def color_transform(red, green, blue):
    print(f"Red: {math.mean(red)}, Green: {math.mean(green)}, Blue: {math.mean(blue)}")

color_transform(**dict(images.channels))

Red: [94m(0.485, 0.500, 0.503, 0.500)[0m along [92msetᵇ[0m, Green: [94m(0.500, 0.507, 0.503, 0.511)[0m along [92msetᵇ[0m, Blue: [94m(0.525, 0.507, 0.489, 0.515)[0m along [92msetᵇ[0m


## Further Reading

Learn more about the [dimension types](Shapes.html) and how to efficiently [operate on tensors](Tensors.html).

ML4Science unifies [data types](Data_Types.html) as well and lets you set the floating point precision globally or by context.

While the dimensionality of neural networks must be specified during network creation, this is not the case for math functions.
These [automatically adapt to the number of spatial dimensions of the data that is passed in](N_Dimensional.html).

[🌐 **ML4Science**](https://github.com/holl-/ML4Science)
&nbsp; • &nbsp; [📖 **Documentation**](https://holl-.github.io/ML4Science/)
&nbsp; • &nbsp; [🔗 **API**](https://holl-.github.io/ML4Science/ml4s)
&nbsp; • &nbsp; [**▶ Videos**]()
&nbsp; • &nbsp; [<img src="images/colab_logo_small.png" height=4>](https://colab.research.google.com/github/holl-/ML4Science/blob/main/docs/Examples.ipynb) [**Examples**](https://holl-.github.io/ML4Science/Examples.html)