# Creating Arrays and Datasets

There are several ways to create data structures in Scipp.
`scipp.Variable` is particularly diverse.

## Variable

[Variables](./data-structures.rst#Variable) can be created using any of the dedicated [creation functions](../../reference/creation-functions.rst#creation-functions).
These fall into several categories as described by the following subsections.

### From Python Sequences or NumPy Arrays

#### Arrays

Variables can be constructed from any Python object that can be used to create a NumPy array or NumPy arrays directly.
See [Array creation](https://numpy.org/doc/stable/user/basics.creation.html) for details.
Given such an object, an array variable can be created using [scipp.array](../../generated/functions/scipp.array.rst) (not to be confused with [data arrays](./data-structures.rst#Data-Array)!)

In [None]:
import scipp as sc
v1d = sc.array(dims=['x'], values=[1, 2, 3, 4])
v2d = sc.array(dims=['x', 'y'], values=[[1, 2], [3, 4]])
v3d = sc.array(dims=['x', 'y', 'z'], values=[[[1, 2], [3, 4]], [[5, 6], [7, 8]]])

Alternatively, passing a NumPy array:

In [None]:
import numpy as np
a = np.array([[1, 2], [3, 4]])
v = sc.array(dims=['x', 'y'], values=a)

Note that *both* the NumPy array and Python lists are copied into the Scipp variable which leads to some additional memory usage.

The `dtype` and unit of the variable are deduced automatically in the above cases:

In [None]:
v

If required, they can be specified directly:

In [None]:
sc.array(dims=['x', 'y'], values=[[1, 2], [3, 4]], dtype='float64', unit='m')

#### Scalars

Scalars are variables with no dimensions.
They can be constructed, among others, using [scipp.scalar](../../generated/functions/scipp.scalar.rst):

In [None]:
sc.scalar(3.41)

`scipp.scalar` will always produce a scalar variable, even when passed a sequence like a list:

In [None]:
sc.scalar([3.41])

In this case, it stores the Python list inside a Scipp variable which is likely not the intention here.

### Generating Values

#### Range-Like Variables

1D ranges and similar sequences can be created directly in scipp.
[scipp.linspace](../../generated/functions/scipp.linspace.rst) creates arrays with regularly spaced values with a given number of elements.
For example (click the stacked disks icon to see all values):

In [None]:
sc.linspace('x', start=-2, stop=5, num=6, unit='s')

[scipp.arange](../../generated/functions/scipp.arange.rst) similarly creates arrays a given stepsize

In [None]:
sc.arange('x', start=-2, stop=5, step=1.2, unit='K')

Please note that the caveats described in [NumPy's documentation](https://numpy.org/doc/stable/user/basics.creation.html#d-array-creation-functions) apply to Scipp as well.

#### Filling with a Value

There are a number opf functions to create N-D arrays with a fixed value, e.g. [scipp.zeros](../../generated/functions/scipp.zeros.rst) and [scipp.full](../../generated/functions/scipp.full.rst).
`scipp.zeros` creates a variable of any number of dimensions filled with zeros:

In [None]:
sc.zeros(dims=['x', 'y'], shape=[3, 4])

### Special DTypes

Scipp has a number of `dtypes` that require some form of conversion when creating variables.
Notably [scipp.datetimes](../../generated/functions/scipp.datetimes.rst), [scipp.vectors](../../generated/functions/scipp.vectors.rst), and their scalar counterparts [scipp.datetime](../../generated/functions/scipp.datetime.rst), [scipp.vector](../../generated/functions/scipp.vector.rst).
As well as types for spatial transformations in [scipp.spatial](../../generated/modules/scipp.spatial.rst).
While variables of all of these dtypes can be constructed using `scipp.array` and `scipp.scalar`, the specialized functions offer more convenience and document their intent better.

`scipp.datetimes` constructs an array of date-time-points.
It can do so either by parsing a string:

In [None]:
sc.datetimes(dims=['t'], values=['2021-01-10T01:23:45', '2021-01-11T01:23:45'])

Or by converting number:

In [None]:
sc.datetimes(dims=['t'], values=[0, 1610288175], unit='s')

Note that the unit is mandatory in the second case and the values are the numbers of time units elapsed since the Unix epoch.
See also [scipp.epoch](../../generated/functions/scipp.epoch.rst)

`scipp.vectors` creates an array of 3-vectors.
It does so by converting a sequence or array that a length of 3 in its inner dimension:

In [None]:
sc.vectors(dims=['x'], values=[[1, 2, 3], [4, 5, 6]])

## Data Arrays

There is essentially only one way to construct [data arrays](./data-structures.rst#Data-Array), namely its initializer:

In [None]:
x = sc.linspace('x', start=1.5, stop=3.0, num=4, unit='m')
a = sc.scalar('an attribute')
m = sc.array(dims=['x'], values=[True, False, True, False])
data = x ** 2
sc.DataArray(data, coords={'x': x}, attrs={'a': a}, masks={'m': m})

`coords`, `attrs`, and `masks` are optional but the `data` must always be given.
Note how the creation functions for `scipp.Variable` can be used to make the individual pieces of a data array.

## Dataset

[Datasets](./data-structures.rst#Dataset) are constructed by combining multiple data arrays or variables.
For instance, using the previously defined variables:

In [None]:
sc.Dataset({'data1': data, 'data2': -data}, coords={'x': x})

Or from data arrays:

In [None]:
da1 = sc.DataArray(data, coords={'x': x}, attrs={'a': a}, masks={'m': m})
da2 = sc.DataArray(-data, coords={'x': x})
sc.Dataset({'data1': da1, 'data2': da2})

## Any Data Structure

Any of `scipp.Variable`, `scipp.DataArray`, and `scipp.Dataset` and be created using the methods described in the following subsections.

### From Files

Scipp has a custom file format based on HDF5 which can store data structures.
See [Reading and Writing Files](../reading-and-writing-files.rst) for details.
In short, `scipp.io.open_hdf5` loads whatever Scipp object is stored in a given file.
For demonstration purposes, we use a `BytesIO` object here. But the same code can be used by passing a string as a file name to `to_hdf5` and `open_hdf5`.

In [None]:
from io import BytesIO
buffer = BytesIO()
v = sc.arange('x', start=1.0, stop=5.0, step=1.0, unit='s')
v.to_hdf5(buffer)
sc.io.open_hdf5(buffer)

### From Other Libraries

Scipp's data structures can be converted to and from certain other structures using the functions listed under [Compatibility](../../reference/free-functions.rst#compatibility).
For example, `scipp.compat.from_pandas` can convert a Pandas dataframe into a Scipp dataset:

In [None]:
import pandas as pd
df = pd.DataFrame({'x': 10*np.arange(5), 'y': np.linspace(0.1, 0.5, 5)})
df

In [None]:
sc.compat.from_pandas(df)