# What's new in scipp

This page highlights feature additions and discusses major changes from recent releases.
For a full list of changes see the [Release Notes](https://scipp.github.io/about/release-notes.html).

In [None]:
import numpy as np
import scipp as sc

## General

### Bound method equivalents to many free functions

<div class="alert alert-info">

**New in 0.8**

Many functions that have been available as free functions can now be used also as methods of variables and data arrays.
See the [documentation for individual classes](https://scipp.github.io/reference/api.html#classes) for a full list.

</div>

Example:

In [None]:
var = sc.arange(dim='x', unit='m', start=0, stop=12)
var.sum()  # Previously sc.sum(var)

Note that `sc.sum(var)` will continue to be supported as well.

### Python-like shallow/deep copy mechanism

<div class="alert alert-info">

**New in 0.7**

The most significant change in the scipp 0.7 release is a fundamental rework of all scipp data structures (variables, data arrays, and datasets).
These now behave mostly like nested Python objects, i.e., sub-objects are shared by default.
Previously there was no sharing mechanism and scipp always made deep-copies.
Some of the effects are exemplified in the following.

</div>

#### Variables

For variables on their own, the new and old implementations mostly yield the same user experience.
Previously, views of variables, such as created when slicing a variable along a dimension, returned a different type &ndash; `VariableView` &ndash; which kept alive the original `Variable`.
This asymmetry is now gone.
Slices or other views of variables are now also of type `Variable`, and all views share ownership of the underlying data.

If a variable refers only to a section of the underlying data buffer this is now indicated in the HTML view in the title line as part of the size, here *"16 Bytes out of 96 Bytes"*.
This allows for identification of "small" variables that keep alive potentially large buffers:

In [None]:
var = sc.arange(dim='x', unit='m', start=0, stop=12)
var['x', 4:6]

To create a variable with sole ownership of a buffer, use the `copy()` method:

In [None]:
var['x', 4:6].copy()

By default, `copy()` returns a deep copy.
Shallow copies can be made by specifying `deep=False`, which preserves shared ownership of underlying buffers:

In [None]:
shallow_copy = var['x', 4:6].copy(deep=False)
shallow_copy

#### Data arrays

The move away from the previous "always deep copy" mechanism avoids a number of critical issues.
However, as a result of the new sharing mechanism extra care must now be taken in some cases, just like when working with any other Python library.
Consider the following example, using the same variable for data and a coordinate:

In [None]:
da = sc.DataArray(data=var, coords={'x': var})
da += 666 * sc.units.m
da

The modification unintentionally also affected the coordinate.
However, if we think of data arrays and coordinate dicts as Python-like objects, the behavior should then not be surprising.

Note that the original `var` is also affected:

In [None]:
var

To avoid this, use `copy()`, e.g.,:

In [None]:
da = sc.DataArray(data=var.copy(), coords={'x': var.copy()})
da += 666 * sc.units.m
da

Apart from the more standard and pythonic behavior, one advantage of this is that creating data arrays from variables can now be cheap, without inflicting copies of potentially large objects.

A related change is the introduction of read-only flags.
Consider the following attempt to modify the data via a slice:

In [None]:
try:
    da['x', 0].data = var['x', 2]
except sc.DataArrayError as e:
    print(e)

Since `da['x',0]` is itself a data array, assigning to the `data` property would repoint the data to whatever is given on the right-hand side.
However, this would not affect `da`, and the attempt to change the data would silently do nothing, since the temporary `da['x',0]` disappears immediately.
The read-only flag protects us from this.

To actually modify the slice, use `__setitem__` instead:

In [None]:
da['x', 0] = var['x', 2]

Read-only flags were also introduced for variables, meta-data dicts (`coords`, `masks`, and `attrs` properties), data arrays and datasets.
The flags solve a number of conceptual issues and serve as a safeguard against hidden bugs.

#### Datasets

Just like creating data arrays from variables is now cheap (without deep-copies), inserting items into datasets does not inflict potentially expensive deep copies:

In [None]:
ds = sc.Dataset()
ds['a'] = da  # shallow copy

Note that while the buffers are shared, the meta-data dicts such as `coords`, `masks`, or `attrs` are not.
Compare:

In [None]:
ds['a'].attrs['attr'] = 1.2 * sc.units.m
'attr' in da.attrs  # the attrs *dict* is copied

with

In [None]:
da.coords['x'] *= -1
ds.coords['x']  # the coords *dict* is copied, but the 'x' coordinate references same buffer

### Indexing

#### Ellipsis

<div class="alert alert-info">

**New in 0.8**
    
Indexing with ellipsis (`...`) is now supported.
This can be used, e.g., to replace data in an existing object without re-pointing the underlying reference to the object given on the right-hand side.

</div>

Example

In [None]:
var1 = sc.ones(dims=['x'], shape=[4])
var2 = var1 + var1
da = sc.DataArray(data=sc.zeros(dims=['x'], shape=[4]))
da.data = var1  # replace data variable
da.data[...] = var2  # assign to slice, copy into existing data variable
var1  # now holds values of var2

Changing `var2` has no effect on `da.data`:

In [None]:
var2 += 2222.0
da

#### Label-based indexing

<div class="alert alert-info">

**New in 0.5**
    
Indexing based on coordinate values is now possible:

- Works just like position indexing (with integers).
- Use a scalar variable as index (instead of integer) to use label-based indexing
- Works with single values as well as slices (`:` notation)

See [Label-based indexing](https://scipp.github.io/user-guide/slicing.html#Label-based-indexing) for more details.
    
</div>

Example

In [None]:
da = sc.DataArray(data=sc.zeros(dims=['x', 'day'], shape=(4, 3)))
da.coords['x'] = sc.linspace(dim='x', unit='m', start=0.1, stop=0.2, num=5)
da.coords['day'] = sc.array(dims=['day'], values=[1, 7, 31])

In [None]:
da['day', sc.scalar(7)]

In [None]:
da['x', 0.13 * sc.units.m]  # selects bin containing this value

### Support for datetime64

<div class="alert alert-info">

**New in 0.6**
    
- Previously we stored time-related information such as, e.g., sample-temperature logs as integers.
- Added support for datetime64 compatible with [np.datetime64](https://numpy.org/doc/stable/reference/arrays.datetime.html)
- Time differences (`np.timedelta64`) are not used, we simply use integers since in combination with scipp's units this provides everything we need.

</div>

Example:

In [None]:
var = sc.array(dims=['time'],
               values=np.arange(np.datetime64('2021-01-01T12:00:00'),
                                np.datetime64('2021-01-01T12:04:00')))

Datetimes and intgers with time units interoperate naturally.
We can offset a datetime by adding a duration:

In [None]:
var + 123 * sc.Unit('s')

Or subtract datetimes to obtain a duration:

In [None]:
var['time', 10] - var['time', 0]

`to_unit` can be used to convert to a different precision:

In [None]:
sc.to_unit(var, 'ms')

### Operations

#### Creation functions

<div class="alert alert-info">

**New in 0.5**

For convenience and similarity to `numpy` we added [functions that create variables](../reference/api.rst#creation-functions).
Our intention is to fully replace the need to use `sc.Variable` directly, but at this point this has not been rolled out to our documentation pages.

</div>

Examples:

In [None]:
sc.array(dims=['x'], values=np.array([1, 2, 3]))

In [None]:
sc.zeros(dims=['x'], shape=[3])

In [None]:
sc.scalar(17)

All of these also take keyword arguments.
Note that we can still support creating scalars by multiplying with a unit:

In [None]:
1.2 * sc.units.m

<div class="alert alert-info">

**New in 0.7**
    
More creation functions were added:

- Added `zeros_like`, `ones_like`, and `empty_like`.
- Added `linspace`, `logspace`, `geomspace`, and `arange`.

</div>

<div class="alert alert-info">

**New in 0.8**
    
More creation functions were added:

- Added `full` and `full_like`.

</div>

#### Unit conversion

<div class="alert alert-info">

**New in 0.6**

Conversions between different unit scales (not to be confused with [conversions provided by scippneutron](https://scipp.github.io/scippneutron/user-guide/unit-conversions.html)) are now supported.
`to_unit` provides conversion of variables between, e.g., `mm` and `m`.

</div>

<div class="alert alert-info">

**New in 0.7**

- `to_unit` can now avoid making a copy if the input already has the desired unit.
  This can be used as a cheap way to ensure inputs have expected units.
- `to_unit` now also works for binned data, converting the unit of the underlying events in the bins
    
</div>

<div class="alert alert-info">

**New in 0.8**

- `to_unit` now has a `copy` argument.
   By default, `copy=True` and `to_unit` makes a copy even if the input already has the desired unit.
   For a cheap way to ensure inputs have expected units use `copy=False` to avoid copies if possible.
    
</div>

Example:

In [None]:
var = sc.array(dims=['x'], unit='mm', values=[3.2, 5.4, 7.6])
m = sc.to_unit(var, 'm')
m

No copy is made if the input has the requested unit when we specify `copy=False`:

In [None]:
sc.to_unit(m, 'm', copy=False)  # no copy

Conversions also work for more specialized units such as electron-volt:

In [None]:
sc.to_unit(sc.scalar(1.0, unit='nJ'), unit='meV')

#### `from_pandas` and `from_xarray`

<div class="alert alert-info">

**New in 0.8**

- `from_pandas` for converting `pandas.Dataframe` to `scipp.Dataset`.
- `from_xarray` for converting `xarray.DataArray` or `xarray.Dataset` to `scipp.DataAray` or `scipp.Dataset`, respectively.

Both functions are available in the `compat` submodule.

</div>

### Shape operations

#### `fold` and `flatten`

<div class="alert alert-info">

**New in 0.6**

`fold` and `flatten`, which are similar to [numpy.reshape](https://numpy.org/doc/stable/reference/generated/numpy.reshape.html), have been added.
In contrast to `reshape`, `fold` and `flatten` support data arrays and handle also meta data such as coord, masks, and attrs.

</div>

<div class="alert alert-info">

**New in 0.7**

- `fold` now always returns views of data and all meta data instead of making deep copies.
- `flatten` also preserves reshaped data as a view, but unlike `fold` the same is not true for meta data in general, since it may require duplication in the flatten operation.

</div>

Example:

In [None]:
var = sc.ones(dims=['pixel'], shape=[100])
xy = sc.fold(var, dim='pixel', sizes={'x': 10, 'y': 10})
xy = sc.DataArray(data=xy,
                  coords={
                      'x': sc.array(dims=['x'], values=np.arange(10)),
                      'y': sc.array(dims=['y'], values=np.arange(10))
                  })
xy

Folding does not effect copies of either data or meta data, for example:

In [None]:
xy['y', 4] *= 0.0  # affects var (scipp-0.7 and higher)
var.plot()

The reverse of `fold` is `flatten`:

In [None]:
flat = sc.flatten(xy, to='pixel')
flat

Flattening does not effect a copy of data, but meta data may get copied if values need to be duplicated by the operation:

In [None]:
flat['pixel', 0] = 22  # modifies var (scipp-0.7 and higher)
var.plot()

### Vectors and matrices

#### General

<div class="alert alert-info">

**New in 0.7**

Several improvements for working with (3-D position) vectors and (3-D rotation) matrices are part of this release:

- Creation functions were added:
  - `vector` (a single vector)
  - `vectors` (array of vectors)
  - `matrix` (a single matrix),
  - `matrices` (array of matrices).
- Direct creation and initialization of 2-D (or higher) arrays of matrices and vectors is now possible from numpy arrays.
- The values property now returns a numpy array with ndim+1 (vectors) or ndim+2 (matrices) axes, with the inner 1 (vectors) or 2 (matrices) axes corresponding to the vector or matrix axes.
- Vector or matrix elements can now be accessed and modified directly using the new `fields` property of variables.
  `fields` provides access to vector elements `x`, `y`, and `z` or matrix elements `xx`, `xy`, ..., `zz`.
    
</div>

<div class="alert alert-info">

**New in 0.8**

The `fields` property can now be iterated and behaves similar to a `dict` with fixed keys.

</div>

In [None]:
sc.vector(value=[1, 2, 3])

In [None]:
vecs = sc.vectors(dims=['x'], unit='m', values=np.arange(12).reshape(4, 3))
vecs

In [None]:
vecs.values

In [None]:
vecs.fields.y

In [None]:
vecs.fields.z += 0.666 * sc.units.m
vecs

<div class="alert alert-info">

**New in 0.8**
    
The `cross` function to compute the cross-product of vectors as added.

</div>

In [None]:
sc.cross(vecs, vecs['x', 0])

#### `scipp.spatial.transform`

<div class="alert alert-info">

**New in 0.8**
    
The `scipp.spatial.transform` (in the style of `scipy.spatial.transform`) submodule was added.
This now provides:
- `from_rotvec` to create rotation matrices from rotation vectors.
- `as_rotvec` to convert rotation matrices into rotation vectors.

</div>

As an example, the following creates a rotation matrix for rotation around the `x`-axis by 30 degrees:

In [None]:
from scipp.spatial.transform import from_rotvec

rot = from_rotvec(sc.vector(value=[30.0, 0, 0], unit='deg'))
rot

### Coordinate transformations

<div class="alert alert-info">

**New in 0.8**

The `transform_coords` function has been added (also available as method of data arrays and datasets).
It is a tool for transforming one or more input coordinates into one or more output coordinates. It automatically handles:

- Renaming of dimensions, if dimension-coordinates are transformed.
- Change of coordinates to attributes to avoid interference of coordinates consumed by the transformation in follow-up operations.
- Conversion of event-coordinates of binned data, if present.

See [Coordinate transformations](../user-guide/coordinate-transformations.ipynb) for a full description.

</div>

### Physical constants

<div class="alert alert-info">

**New in 0.8**
    
The `scipp.constants` (in the style of `scipy.constants`) submodule was added, providing physical constants from CODATA 2018.
For full details see the [module's documentation](../generated/modules/scipp.constants.rst).

</div>

Examples:

In [None]:
from scipp.constants import hbar, m_e, physical_constants

In [None]:
hbar

In [None]:
m_e

In [None]:
physical_constants('speed of light in vacuum')

In [None]:
physical_constants('neutron mass', with_variance=True)

## Plotting

<div class="alert alert-info">

**New in 0.7**

- Plotting supports `redraw()` method for updating existing plots with new data, without recreating the plot.

</div>

<div class="alert alert-info">

**New in 0.8**

- Plotting 1-D binned (event) data is now supported.

</div>

## Binned data

### Buffer and meta data access

<div class="alert alert-info">

**New in 0.6**

</div>

<div class="alert alert-info">

**New in 0.7**

- The internal buffer holding the "events" underlying binned data can now be accessed directly using the new `events` property.
- HTML view now works for binned meta data access such as `binned.bins.coords['time']`

</div>

<div class="alert alert-info">

**New in 0.8**

The mean of bins can now be computed using `binned.bins.mean()`.
This should general be used instead of `binned.bins.sum()` the if dtype is not "summable", i.e., typically anything that is not of unit "counts".

</div>

Consider the following example, representing a time series of temperature measurements on an x-y plane:

In [None]:
import numpy as np

N = int(800)
data = sc.DataArray(
    data=sc.Variable(dims=['time'], values=100 + np.random.rand(N) * 10, unit='K'),
    coords={
        'x': sc.Variable(dims=['time'], unit='m', values=np.random.rand(N)),
        'y': sc.Variable(dims=['time'], unit='m', values=np.random.rand(N)),
        'time': sc.Variable(dims=['time'], values=(10000 * np.random.rand(N)).astype('datetime64[s]')),
    })
binned = sc.bin(data,
                edges=[sc.linspace(dim='x', unit='m', start=0.0, stop=1.0, num=5),
                       sc.linspace(dim='y', unit='m', start=0.0, stop=1.0, num=5)])
binned

In [None]:
sc.show(binned)

The underlying flat list of data points (events) can be accessed using the `events` property:

In [None]:
binned.events

In [None]:
sc.show(binned.events)

Note that there is no guarantee about the order of events in this internal buffer.
Furthermore, it includes potential reserved zones between bins as well as events from bins that are not part of the current variable, for example:

In [None]:
x_slice = binned['x', 0]
x_slice.events  # ALL events, including those at different `x`

The `events` property should thus be used with care.

`events` is `None` for data that is not binned:

In [None]:
binned.bins.mean().events is None

The `events` property could be used, e.g., to access the `x` coordinate of the events, using `binned.events.coords['x']`.
However, this does not give information on which bin an event belongs to, so this is not sufficient if, e.g., a computation involving information available on a per-bin basis should be performed.

To allow for this, the `bins` property now provides properties `data`, `coords`, `masks`, and `attrs` *of the bins* that behave like the properties of a data array *while retaining the binned structure*:

In [None]:
binned.bins.coords['time']

In [None]:
sc.show(binned.bins.coords['time'])

Compare this to the following, accessing the same coordinate via the `events` property, and make sure you understand the difference:

In [None]:
binned.events.coords['time']

We can use this in our example to correct for an hypothetical clock error that depends on the x-y bin:

In [None]:
clock_correction = sc.array(dims=['x', 'y'], unit='s', values=(100 * np.random.rand(4, 4)).astype('int64'))
clock_correction

In [None]:
binned.bins.coords['time'] += clock_correction

The properties can also be used to add or delete meta data entries:

In [None]:
del binned.bins.coords['x']

## Performance

<div class="alert alert-info">

**New in 0.7**

- `sort` is now considerably faster for data with more rows.
- reduction operations such as `sum` and `mean` are now also multi-threaded and thus considerably faster.

</div>