# What's new in scipp 0.7.0

For a full list of changes see the [Release Notes](https://scipp.github.io/about/release-notes.html#v0-7-june-2021).

## Various

- Added `zeros_like`, `ones_like`, and `empty_like`.
- Added `linspace`, `logspace`, `geomspace`, and `arange`.
- Plotting supports `redraw()` method for updating existing plots with new data, without recreating the plot.

## Performance

- `sort` is now considerably faster for data with more rows.
- reduction operations such as `sum` and `mean` are now also multi-threaded and thus considerably faster.

## Python-like shallow/deep copy mechanism

The most significant change in this scipp release is a fundamental rework of all scipp data structures (variables, data arrays, and datasets).
These now behave mostly like nested Python objects, i.e., sub-objects are shared by default.
Previously there was no sharing mechanism and scipp always made deep-copies.
Some of the effects are exemplified in the following.

### Variables

For variables on their own, the new and old implementations mostly yield the same user experience.
Previously, views of variables, such as created when slicing a variable along a dimension, returned a different type &ndash; `VariableView` &ndash; which kept alive the original `Variable`.
This asymmetry is now gone.
Slices or other views of variables are now also of type `Variable`, and all views share ownership of the underlying data.

In [None]:
import numpy as np
import scipp as sc
if not sc.__version__.startswith('0.7'):
    print(f'This notebook was made for scipp-0.7 and will likely not work with your version ({sc.__version__}).')

If a variable refers only to a section of the underlying data buffer this is now indicated in the HTML view in the title line as part of the size, here *"16 Bytes out of 96 Bytes"*.
This allows for identification of "small" variables that keep alive potentially large buffers:

In [None]:
var = sc.arange(dim='x', unit='m', start=0, stop=12)
var['x', 4:6]

To create a variable with sole ownership of a buffer, use the `copy()` method:

In [None]:
var['x', 4:6].copy()

By default, `copy()` returns a deep copy.
Shallow copies can be made by specifying `deep=False`, which preserves shared ownership of underlying buffers:

In [None]:
shallow_copy = var['x', 4:6].copy(deep=False)
shallow_copy

### Data arrays

The move away from the previous "always deep copy" mechanism avoids a number of critical issues.
However, as a result of the new sharing mechanism extra care must now be taken in some cases, just like when working with any other Python library.
Consider the following example, using the same variable for data and a coordinate:

In [None]:
da = sc.DataArray(data=var, coords={'x':var})
da += 666 * sc.units.m
da

The modification unintentionally also affected the coordinate.
However, if we think of data arrays and coordinate dicts as Python-like objects, the behavior should then not be surprising.

Note that the original `var` is also affected:

In [None]:
var

Apart from the more standard and pythonic behavior, one advantage of this is that creating data arrays from variables is now cheap, without inflicting copies of potentially large objects.

A related change is the introduction of read-only flags.
Consider the following attempt to modify the data via a slice:

In [None]:
try:
    da['x', 0].data = var['x', 2]
except sc.DataArrayError as e:
    print(e)

Since `da['x',0]` is itself a data array, assigning to the `data` property would repoint the data to whatever is given on the right-hand side.
However, this would not affect `da`, and the attempt to change the data would silently do nothing, since the temporary `da['x',0]` disappears immediately.
The read-only flag protects us from this.

To actually modify the slice, use `__setitem__` instead:

In [None]:
da['x', 0] = var['x', 2]

Read-only flags were also introduced for variables, meta-data dicts (`coords`, `masks`, and `attrs` properties), data arrays and datasets.
The flags solve a number of conceptual issues and serve as a safeguard against hidden bugs.

### Datasets

Just like creating data arrays from variables is now cheap (without deep-copies), inserting items into datasets does not inflict potentially expensive deep copies:

In [None]:
ds = sc.Dataset()
ds['a'] = da  # shallow copy

Note that while the buffers are shared, the meta-data dicts such as `coords`, `masks`, or `attrs` are not.
Compare:

In [None]:
ds['a'].attrs['attr'] = 1.2 * sc.units.m
'attr' in da.attrs  # the attrs *dict* is copied

with

In [None]:
da.coords['x'] *= -1
ds.coords['x']  # the coords *dict* is copied, but the 'x' coordinate references same buffer

### Improvements possible due to sharing

#### `to_unit`

`to_unit` can now avoid making a copy if the input already has the desired unit.
This can be used as a cheap way to ensure inputs have expected units:

In [None]:
sc.to_unit(var, 'm')  # no copy

#### `fold` and `flatten`

`fold` now always returns views of data and all meta data:

In [None]:
var = sc.ones(dims=['pixel'], shape=[100])
xy = sc.fold(var, dim='pixel', sizes={'x':10, 'y':10})
xy = sc.DataArray(data=xy,
                  coords={
                      'x':sc.array(dims=['x'], values=np.arange(10)),
                      'y':sc.array(dims=['y'], values=np.arange(10))})
xy['y',4] *= 0.0  # affects var
var.plot()

`flatten` also preserves reshaped data as a view, but unlike `fold` the same is not true for meta data in general, since it may require duplication in the flatten operation:

In [None]:
flat = sc.flatten(xy, to='pixel')
flat['pixel', 0] = 22  # modifies var
var.plot()

## Vectors and matrices

Several improvements for working with (3-D position) vectors and (3-D rotation) matrices are part of this release:

- Creation functions were added:
  - `vector` (a single vector)
  - `vectors` (array of vectors)
  - `matrix` (a single matrix),
  - `matrices` (array of matrices).
- Direct creation and initialization of 2-D (or higher) arrays of matrices and vectors is now possible from numpy arrays.
- The values property now returns a numpy array with ndim+1 (vectors) or ndim+2 (matrices) axes, with the inner 1 (vectors) or 2 (matrices) axes corresponding to the vector or matrix axes.
- Vector or matrix elements can now be accessed and modified directly using the new `fields` property of variables.
  `fields` provides access to vector elements `x`, `y`, and `z` or matrix elements `xx`, `xy`, ..., `zz`.

In [None]:
sc.vector(value=[1,2,3])

In [None]:
vecs = sc.vectors(dims=['x'], unit='m', values=np.arange(12).reshape(4,3))
vecs

In [None]:
vecs.values

In [None]:
vecs.fields.y

In [None]:
vecs.fields.z += 0.666 * sc.units.m
vecs

### Binned data buffer access

The internal buffer holding the "events" underlying binned data was made previously available as `data.bins.constituents['data']`.
This can now be accessed directly using the new `events` property:

In [None]:
N = int(800)
data = sc.DataArray(
    data=sc.Variable(dims=['event'], values=np.random.rand(N), unit='K'),
    coords={
        'x':sc.Variable(dims=['event'], values=np.random.rand(N)),
        'y':sc.Variable(dims=['event'], values=np.random.rand(N)),
        'z':sc.Variable(dims=['event'], values=np.random.rand(N))
    })
binned = sc.bin(data, edges=[sc.linspace(dim='x', start=0.0, stop=1.0, num=5),
                             sc.linspace(dim='y', start=0.0, stop=1.0, num=5),
                             sc.linspace(dim='z', start=0.0, stop=1.0, num=5)])
binned.events

Note that there is no guarantee about the order of events in this internal buffer.
Furthermore, it includes all data including potentially reserved zones between bins, or events from bins that are not part of the current variable, for example:

In [None]:
x_slice = binned['x', 0] 
x_slice.events  # ALL events, including those at different `x`

The `events` property should thus be used with care.

`events` is `None` for data that is not binned:

In [None]:
binned.bins.sum().events is None