# Sparse data

Scipp can handle a certain type of sparse data, i.e., data that cannot directly be represented as a multi-dimensional array.
For applications that rely solely on dense arrays of data this section can safely be ignored.

Scipp supports sparse data in shape of a multi-dimensional array of lists.
This could, e.g., be used to store data from an array of sensors/detectors that are read out independently, with potentially widely varying frequency.

If data has a sparse dimension it is always the innermost dimension of a variable.
Since here we are not dealing with a dense array we cannot set values for all `x` from a numpy array.
The recommended approach is to slice out all outer dimensions.
Then the remaining values (for a particluar "x" in this case) are a dense array with a list-like interface.
Initially all lists are empty:

In [None]:
import numpy as np
import scipp as sc

var = sc.Variable(dims=['x', 'y'],
                  shape=[4, sc.Dimensions.Sparse])
sc.show(var)
var

In [None]:
var['x', 0].values = np.arange(3)
var['x', 1].values.append(42)
var['x', 0].values.extend(np.ones(3))
var['x', 3].values = np.ones(6)
sc.show(var)
var

In [None]:
var['x', 0].values

In [None]:
var['x', 1].values

In [None]:
var['x', 2].values

Operations such as slicing the sparse dimension are ill-defined and are not supported:

In [None]:
try:
    var['y', 0]
except Exception as e:
    print(str(e))

Operations between variables or datasets broadcast dense data into sparse dimensions:

In [None]:
scale = sc.Variable(dims=['x'], values=np.arange(2.0, 6))
var *= scale
var['x', 0].values

In [None]:
var['x', 1].values

In [None]:
var['x', 2].values

Sparse data in a dataset can be associated with a corresponding sparse coordinate and sparse labels.
These are specific to a particular data item:

In [None]:
d = sc.Dataset(
        {'dense': sc.Variable(['x', 'y'], values=np.ones(shape=(4, 3)))},
         coords={
             'x': sc.Variable(['x'], values=np.arange(4.0)),
             'y': sc.Variable(['y'], values=np.arange(3.0))})
d['a'] = sc.DataArray(data=var, coords={'y': var})
var['x', 0].values = np.arange(7)
var['x', 3].values = np.ones(2)
d['b'] = sc.DataArray(coords={'y': var})
sc.show(d)
d

The sparse coord shadows the global coordinate when accessed via the `coords` property of a data item, compare accessing a dense item:

In [None]:
sc.show(d['dense'])

with

In [None]:
sc.show(d['a'])

In [None]:
d.coords['y']

In [None]:
d['a'].coords['y']

In [None]:
try:
  d['b'].coords['y']
except IndexError:
  print('Dense coord is meaningless for sparse data, so it is also hidden')

The lengths of the sublists between coordinate and values (and variances) must match.
Scipp does not enforce this when modifying sublists, but *does* verify correctness in operations on variables or dataset.