# Scipp Part 1

## Getting help

- Scipp documentation is available at https://scipp.github.io/
- Join [#scipp](https://ess-eric.slack.com/archives/C01AAGCQEU8) in the ESS Slack workspace for updates, questions, and discussions.

In [None]:
import scipp as sc
import numpy as np

## Using Jupyter

- Press `shift-return` to run a cell and move to next cell
- Press `ctrl-return` (`cmd-return` on macOS) to run a cell, to keep focus on current cell
- If things go wrong, `Kernel > Restart kernel and clear all outputs` is often helpful.
- Jupyter will automatically display the last (and only the last) object typed in a cell

In [None]:
a = 5
b = 4
a
b

## Scipp crash course

- `scipp` stores data in a **multi-dimensional array** with **labeled (named) dimensions**.
  This is best imagined as `numpy` arrays, without the need to memorize and keep track of dimension order.
- Each array is combined with a **physical unit** into a **variable**.
- Variables are enhanced by **coordinates**.
  Each coordinate is also a variable.
  A variable with associated coordinates is called **data array**.
- Multiple data arrays with aligned coordinates can be combined into a **dataset**.

Consider a 2-D numpy array:

In [None]:
a = np.random.rand(2,4)
a

Scipp variables enrich this with labelled dimensions and units, for clarity and safety.
Variables can be created from numpy arrays using `sc.array`:

In [None]:
var = sc.array(dims=['time','location'], values=a, unit='K')
var

Dimension labels are used for many operations, the simplest example is "slicing":

In [None]:
var['location',2:4]

Data arrays are created from variables:

In [None]:
time =     sc.array(dims=['time'], unit=sc.units.s, values=[20,30])
location = sc.array(dims=['location'], unit=sc.units.m, values=np.arange(4))
array =    sc.DataArray(data=var, coords={'time':time, 'location':location})
array

Scalar variables are variables with zero dimensions.
There are two ways two create these, using `sc.scalar`, or by multiplying an value by a scipp unit:

In [None]:
windspeed = sc.scalar(1.2, unit='m/s') # see help(sc.scalar) for additional arguments
windspeed = 1.2 * sc.Unit('m/s')
windspeed

Data arrays also support **attributes** to store additional meta information:

In [None]:
array.attrs['windspeed'] = windspeed
array

## Exploring data

When working with a dataset the first step is usually to understand what the data and metadata contains.
In this chapter we explore how scipp supports this.

We  start by loading some data, in this case measured with a prototype of the LoKI detectors at the LARMOR beamline:

In [None]:
data = sc.io.open_hdf5(filename='loki-at-larmor.hdf5')

Note that the exercises in the following are fictional and do not represent the actual data reduction workflow.

### Step 1: Use the HTML representation to see what the loaded data contains

The HTML representation is what Jupyter displays for a scipp object.
- Take some time to explore this view and try to understand all the information (dimensions, dtypes, units, ...).
- Note that sections can be expanded, and values can shown by clicking the icons to the right.

In [None]:
data

### Step 2: Plot the data

Scipp objects can by created using the `plot()` method.
Alternatively `sc.plot.plot(obj)` can be used.
Since this is neutron-scattering data, we can also use the "instrument view", provided by `sc.neutron.instrument_view`.

- Plot the loaded data and familiarize yourself with the controls.
- Create the instrument view and familiarize yourself with the controls.

In [None]:
data.plot()

In [None]:
# pixel_size is optional, but on certain screens/systems the defaults don't work well
sc.neutron.instrument_view(data, pixel_size=0.01)

### Step 3: Exporing meta data

Above we saw that many attributes are scalar variable with `dtype=DataArray`.
The single value in a scalar variable is accessed using the `value` property.
Compare:

In [None]:
data.attrs['proton_charge_by_period']

In [None]:
data.attrs['proton_charge_by_period'].value

Exercises:
1. Find some attributes of `data` with `dtype=DataArray` and plot their `value`.
   Also try `sc.table(attr.value)` to show a table representation.
2. Find and plot a monitor.
3. Plot all the monitors on the same plot.
   Note that `sc.plot.plot()` can be used with a Python `dict` for this purpose: `sc.plot.plot({'a':something, 'b':else})`.
4. Convert all the monitors from `'tof'` to `'wavelength'` using, e.g., `mon1_wav = sc.neutron.convert(mon1, 'tof', 'wavelength')`.
5. Inspect the HTML view and note have the "unit conversion" changed the dimensions and units.
6. Re-plot all the monitors on the same plot, now in `'wavelength'`.

In [None]:
sc.plot.plot({f'monitor{i}':data.attrs[f'monitor{i}'].value for i in [1,2,3,4,5]})

In [None]:
sc.plot.plot({f'monitor{i}':sc.neutron.convert(data.attrs[f'monitor{i}'].value,'tof','wavelength') for i in [1,2,3,4,5]})

### Step 4: Fixing metadata

Exercises:
1. The sample-position is wrong, shift the sample by `delta = sc.scalar(value=np.array([0.01,0.01,0.04]), unit=sc.units.m)`.
2. Because of an glitch in the timing system the time-of-flight has an offset of $2.3~\mu s$.
   Fix the corresponding coordinate.
3. Use the HTML view of `data` to verify that you applied the corrections/calibrations there, rather than in a copy.

### Step 5: A closer look at the data

The 2-D plot we obtain above by default is often not very enlightening.
Define:

In [None]:
counts = sc.sum(data, 'tof')

Exercises:
1. Create a plot of `counts` and also try the instrument view.
2. How many counts are there in total, in all spectra combined?
3. Plot a single spectrum of `data` as a 1-D plot using the slicing syntax to access the spectrum.

As seen in the instrument view the detectors consist of 4 layers of tubes, each containing 7 straws.
Let us try to split up our data, so we can compare layers.
There are other (and probably better) ways to do this, but here we try to define an integer variable containing a layer index:

In [None]:
z = sc.geometry.z(data.coords['position'])
near = sc.min(z)
far = sc.max(z)
layer = ((z-near)*120).astype(sc.dtype.int32)
layer.unit = ''
layer.plot()

Exercises:
- Change the magic parameter `400` until pixels fall cleanly into layers, either 4 layers of tubes 12 layers of straws.
- Store `layer` as a new coord in `data`.
- Use `sc.groupby(data, group='layer').sum('spectrum')` to group spectra into layers.
- Inspect and understand the HTML view of the result.
- Plot the result.
  There are two options:
  - Use `plot` with `projection='1d'`
  - Use `sc.plot.plot` after collapsing dimensions, `sc.collapse(grouped, keep='tof')`
- Bonus: When grouping by straw layers, there is a different number of straws in the center layer of each tube (3 instead of 2) due to the flower-pattern arrangement of straws.
  Define a helper data array with data set to one for each spectrum, group by layers and sum over spectrum as above, and use this result to normalize the layer-grouped data from above to spectrum count.

In [None]:
data.coords['layer'] = layer
grouped = sc.groupby(data, group='layer').sum('spectrum')
grouped.plot(projection='1d')
sc.plot.plot(sc.collapse(grouped, keep='tof'))

In [None]:
norm = sc.DataArray(data=layer*0+1, coords={'layer':layer})
norm = sc.groupby(norm, group='layer').sum('spectrum')
sc.plot.plot(sc.collapse(grouped/norm, keep='tof'))

In [None]:
#from mantid.kernel import config
#folder = '/folder/with/downloaded/files'
#config.appendDataSearchDir(folder)
#config.saveConfig(config.getUserFilename())

In [None]:
run_number = 49338
data = sc.neutron.load(filename=f'LARMOR000{run_number}')
edges = sc.array(dims=['tof'], unit='us', values=np.linspace(5.0, 100000.0, num=201))
data = sc.rebin(data, 'tof', edges)
data.to_hdf5(filename='loki-at-larmor.hdf5')

# Working with masks

1. Masking a prompt pulse.
   - Create a mask from the `'tof'` coord of `data` to mask the region between X and Y $\mu s$.
   - Plot the result to inspect the mask.
   - Pass a `dict` containing `counts` (computed above as `counts = sc.sum(data, 'tof')`) and the equivalent counts computed *after* masking to `sc.plot.plot`.
     Use this to verify that the promt-pulse mask results in removal of counts.

- mask X range
- mask TOF, verify counts has changed
- mask based on counts (combined with shape)
- create circular mask
- create function to mask ring 
- mask based on scattering angle
- mask wedge (pick one)
- mask only front layer

Hints:
- `del`
- `<=`, `>`, `sc.less`, `sc.equal`

In [None]:
import scipp as sc
import numpy as np
#data = sc.neutron.load(filename='/mnt/extra/simon/MantidExternalData/MD5/e5c22cf69fdd0d007c29aa51c6537004')
run_number = 49338
data = sc.neutron.load(filename=f'LARMOR000{run_number}')

In [None]:
counts = sc.sum(data, 'tof')
data.masks['tof'] = data.coords['tof']['tof',1:] < 10000.0 * sc.units.us
sc.plot.plot({'orig':counts, 'masked':sc.sum(data,'tof')})

In [None]:
data.plot()

In [None]:
pos = data.coords['position']
x = sc.geometry.x(pos)
y = sc.geometry.y(pos)
z = sc.geometry.z(pos)
data.coords['x'] = x
data.coords['y'] = y
data.coords['z'] = z
data

In [None]:
data.masks['x'] = x < -0.1 * sc.units.m
del data.masks['x']
data.masks['circle'] = sc.sqrt(x*x + y*y) < 0.1*sc.units.m
del data.masks['circle']
r = sc.sqrt(x*x + y*y)
data.masks['ring'] = (0.14*sc.units.m < r) & (r < 0.15*sc.units.m)
theta = sc.neutron.scattering_angle(data)
data.masks['theta'] = (0.01*sc.units.rad < theta) & (theta < 0.02*sc.units.rad)
phi = sc.atan2(y,x) * ((180.0 * sc.units.deg) / (np.pi * sc.units.rad))
data.masks['wedge'] = (10.0*sc.units.deg < phi) & (phi < 20.0*sc.units.deg)

In [None]:
sc.neutron.instrument_view(sc.sum(data,'tof'))

In [None]:
import scipp as sc
import numpy as np
data = sc.neutron.load(filename='/home/simon/data/TrainingCourseData/EQSANS_6071_event.nxs')

In [None]:
data

In [None]:
49152//256

In [None]:
edges = sc.array(dims=['tof'], unit=sc.units.us, values=np.linspace(-1.0, 17000.0, num=1001))
hist = sc.histogram(data, edges)
ny = 256
nx = 49152 // ny
var = sc.reshape(hist.data, dims=['x','y','tof'],shape=(nx,ny,1000))

In [None]:
var.plot()

### Basic 1D and 2D plots

Plotting is mostly based on `matplotlib`.
Data structures with named dimensions, units, and coordinates allows for meaningful plots by default:

In [None]:
sample.plot()

Slicing can be used, e.g., to select and plot a single spectrum:

In [None]:
sample['spectrum', 59155].plot()

Plotting multiple spectra on the same plot is also possible by passing a Python `dict` to `scipp.plot.plot`:

In [None]:
from scipp.plot import plot
section = sample['tof', 100:150]
plot({'spec1':section['spectrum',59155],'spec2':section['spectrum',59255]})

### Debugging Detectors with Scipp

The LoKI detectors are tubes containing 7 straws each, and there are multiple layers of tubes.
This makes finding, e.g., broken straws in the instrument view difficult and tedious.
The default 2D representation of data is not adequate in this case:

In [None]:
sample

With scipp we can reshape our data to match this logical layer and sum, e.g., over time-of-flight and pixels within straws.
This yields:

In [None]:
import sys
sys.path.append('/home/simon/code/ess-legacy/sans')
from loki import LoKI
loki = LoKI()
spectrum_counts = sc.sum(sample, 'tof') # sum is optional, could also keep TOF
pixel_counts = loki.to_logical_dims(spectrum_counts) # reshape
pixel_counts

We can plot the counts in each straw by summing along the `'pixel'` dimensions:

In [None]:
straw_counts = sc.sum(pixel_counts, 'pixel')
straw_counts.plot(norm='log')

If we instead plot `pixel_counts` without summing along straws, we obtain a plot with a slider along the third dimension.
A profile plot can be enabled as well:

In [None]:
pixel_counts.plot(axes={'x':'straw', 'y':'tube'})

In this case we observe 4 straws with 0 counts as well as 4 straws with very low counts.
We can define a mask for these using a small LoKI-specific helper:

In [None]:
pos = sc.neutron.position(sample)
x = sc.geometry.x(pos)
y = sc.geometry.y(pos)
counts = spectrum_counts.data
sample.masks['electronics-error'] = (sc.abs(x) < 0.2 * sc.units.m) \
                                  & (sc.abs(y) < 0.03 * sc.units.m) \
                                  & (counts == 0.0 * sc.units.counts)
print(f"Masking {sc.sum(sample.masks['electronics-error'], 'spectrum').value} bad pixels due to electronics error.")

In [None]:
# Note that this needs more tuning and masks too much. Better do this after moving detectors?
sample.masks['beam-stop'] = (sc.abs(x) < 0.03 * sc.units.m) & (y < 0.028 * sc.units.m) & (y > -0.016 * sc.units.m)

In [None]:
sample.masks['tube-ends'] = (x > 0.36 * sc.units.m) | (x < -0.36 * sc.units.m)

In [None]:
tof = sample.coords['tof']
sample.masks['prompt-pulse'] = (tof['tof',1:] < 1500.0 * sc.units.us) | \
                               ((tof['tof',:-1] > 17500.0 * sc.units.us) & \
                                (tof['tof',1:] < 19000.0 * sc.units.us))

In [None]:
(sample.masks['tube-ends'] | sample.masks['prompt-pulse']).plot()

In [None]:
loki.to_logical_dims(sample).plot(norm='log', vmin=1e0, vmax=1e2)

In [None]:
pixel_counts = loki.to_logical_dims(sc.sum(sample, 'tof'))
plot(pixel_counts, vmax=1000, axes={'y':'tube', 'x':'pixel'})

## Backup slides

### Straw plot against real X

In [None]:
from loki import LoKI
loki = LoKI()
from scipp.plot import plot
spectrum_counts = sc.sum(sample, 'tof') # sum is optional, could also keep TOF
spectrum_counts.coords['pixel'] = sc.geometry.x(sample.coords['position'])
pixel_counts = loki.to_logical_dims(spectrum_counts) # reshape
plot(pixel_counts, norm='log', axes={'y':'tube', 'x':'pixel'})

In [None]:
#filename = 'PG3_4844_event'
#tmp = sc.neutron.load(filename=f'{filename}.nxs').bins.sum()
filename = '/home/simon/data/TrainingCourseData/SXD23767.raw'
tmp = sc.neutron.load(filename=f'{filename}')
tmp = sc.sum(tmp, 'tof')
tmp.coords['theta'] = sc.neutron.scattering_angle(tmp)
tmp.coords['phi'] = sc.neutron.scattering_angle(tmp)
pos = sc.neutron.position(tmp)
x = sc.geometry.x(pos)
y = sc.geometry.y(pos)
tmp.coords['phi'] = sc.atan(y/x) + np.pi * sc.units.rad
theta = sc.Variable(dims=['theta'], unit=sc.units.rad, values=np.linspace(0, np.pi/2, num=100))
phi = sc.Variable(dims=['phi'], unit=sc.units.rad, values=np.linspace(0, 2*np.pi, num=100))
binned = sc.bin(tmp, edges=[theta,phi])
binned.plot(resolution={'x':100,'y':100})

When build-in surface cuts are not flexible enough, `scipp` features such as `groupby` can be used to quickly extract groups of pixels:

In [None]:
from loki import LoKI
loki = LoKI()
sample.coords['layer'] = loki.layers()
sc.neutron.instrument_view(sc.groupby(sample, 'layer').copy(group=1), norm='log', pixel_size=0.01, bins=1)
#del sample.coords['layer']

Using the same mechanism we can create nearly arbitrary other visualizations of the instrument.
For example, we may want to inspect all pixels with low counts rates, e.g., to find issues with detectors.
In this case mask all pixels with less than 100 counts and can check whether we indeed masked all relevant features:

In [None]:
counts = sc.sum(sample.data, 'tof')
sample.coords['low-counts'] = counts < 100.0*sc.units.counts
sc.neutron.instrument_view(sc.sum(sc.groupby(sample, 'low-counts').copy(group=1), 'tof'), pixel_size=0.005)
#del sample.coords['low-counts']