# Dataset in a Nutshell - Part 3: Neutron Data

 This is a solution for the final exercise in [Dataset in a Nutshell - Part 3](demo-part3.ipynb).

In [None]:
import numpy as np

import dataset as ds
from dataset import Dim, Coord, Data, Attr

### Converters from Mantid workspaces

For convenience and testing, very basic converters from Mantid's `EventWorkspace` and `Workspace2D` to `Dataset` are available:

In [None]:
import mantid.simpleapi as mantid
import dataset.compat.mantid as mantidcompat
filename = 'PG3_4844_event.nxs'
filename_vanadium = 'PG3_4866_event.nxs'

# Create a list of banks (one dataset per bank)
banks = []
bankids = [124, 144, 164, 184]
for bankid in bankids:
    bank = 'bank{}'.format(bankid)
    sampleWS = mantid.LoadEventNexus(filename, BankName=bank)
    vanadiumWS = mantid.LoadEventNexus(filename_vanadium, BankName=bank)

    d = mantidcompat.to_dataset(sampleWS, name='sample', drop_pulse_times=False)
    d.merge(mantidcompat.to_dataset(vanadiumWS, name='vanadium', drop_pulse_times=True))
    banks.append(d)

# Concatenate all banks into a single dataset
d = banks[0]
for d2 in banks[1:]:
    d = ds.concatenate(d, d2, Dim.Row)

# We add a new coordinate for the new dimension. Strictly speaking this is not necessary,
# but if we want to plot directly it is convenient.
d[Coord.Row] = ([Dim.Row], bankids)
# Concatenation created a 2D array of spectrum numbers. This is currently not supported by
# the plot helper, so we create a 1D fake spectrum number (restarting at 0 for every bank).
# 1078 is the number of pixels per bank.
d[Coord.SpectrumNumber] = ([Dim.Position], np.arange(1078))

### Nested data in a dataset: Instrument geometry and event lists

Some of the content of the workspaces is not easily representable as a plain multi-dimensional array.
A number of variables in the dataset obtained from the workspace thus have item type `Dataset`:

In [None]:
d

`Coord.ComponentInfo` is similar to the `ComponentInfo` in Mantid.
 It contains information about the components in the beamline.
 For the time being, it only contains the positions for source and sample:

In [None]:
d[Coord.ComponentInfo].scalar

*Bonus note 1:
 For the most part, the structure of `ComponentInfo` (and `DetectorInfo`) in Mantid is easily represented by a `Dataset`, i.e., very little change is required.
 For example, scanning is simply handled by an extra dimension of, e.g., the position and rotation variables.
 By using `Dataset` to handle this, we can use exactly the same tools and do not need to implement or learn a new API.*

`Data.Events` is the equivalent to a vector of `EventList`s in Mantid.
 Here, we chose to represent an event list as a `Dataset`:

In [None]:
d[Data.Events, 'sample'].data[500]

We could in principle also choose other, more efficient, event storage formats.
 At this point, using a dataset as an event list is convenient because it lets us reuse the same functionality:

In [None]:
events = d[Data.Events, 'sample'].data[500]
d[Data.Events, 'sample'].data[500] = ds.sort(events, Data.Tof)[Dim.Event, 100:-100]
ds.table(d[Data.Events, 'sample'].data[500])

### From events to histogram

We histogram the event data:

In [None]:
ds.events.sort_by_tof(d)
coord = ds.Variable(Coord.Tof, [Dim.Tof], np.arange(1000.0, 20000.0, 50.0))
binned = ds.histogram(d, coord)
d.merge(binned)
d

In [None]:
# We need to specify an extra coordinate here
ds.plot(d.subset[Data.Value, 'sample'], axes=[Coord.Row, Coord.SpectrumNumber, Coord.Tof])

### Fake instrument view

Just for fun, we can quickly generate a crude "instrument view".
In this case this works since we have only a single panel.
If there were multiple panels, they could be handled as an extra dimension.

In [None]:
panel = ds.Dataset()
# 154 and 7 are the extents of the panel
# We need to add Dim.Row with extent 4 as the new outermost dimension
panel[Data.Value, 'sample'] = d[Data.Value, 'sample'].reshape([Dim.Row, Dim.X, Dim.Y, Dim.Tof], (4, 154, 7, 379))
panel[Coord.Tof] = d[Coord.Tof]
# Note that the scale is meaningless, could use real instrument parameters
panel[Coord.X] = ([Dim.X], np.arange(154))
panel[Coord.Y] = ([Dim.Y], np.arange(7))

# Showing panel 4
ds.plot(panel[Dim.Tof, 180:260][Dim.Row, 3], axes=[Coord.Tof, Coord.Y, Coord.X])

In [None]:
# Showing panel 3, note the different time-of-flight range
ds.plot(panel[Dim.Tof, 260:300][Dim.Row, 2], axes=[Coord.Tof, Coord.Y, Coord.X])

In [None]:
del(d[Data.Events, 'sample'])
del(d[Data.Events, 'vanadium'])

### Monitors

Monitors are not handled by the Mantid converter yet, but we can add some fake ones to demonstrate the versatility  of `Dataset`.
Storing each monitor as a separate variable that contains a nested dataset gives us complete freedom an flexibility.

In [None]:
# Histogram-mode beam monitor
beam = ds.Dataset()
beam[Coord.Tof] = ([Dim.Tof], np.arange(1001.0))
beam[Data.Value] = ([Dim.Tof], np.random.rand(1000))
beam[Data.Variance] = beam[Data.Value]
beam[Data.Value].unit = ds.units.counts
beam[Data.Variance].unit = ds.units.counts * ds.units.counts

# Event-mode transmission monitor
transmission = ds.Dataset()
transmission[Data.Tof] = ([Dim.Event], 20000.0 * np.random.rand(123456))

# Beam profile monitor
profile = ds.Dataset()
profile[Coord.X] = ([Dim.X], np.arange(-0.1, 0.11, 0.01))
profile[Coord.Y] = ([Dim.Y], np.arange(-0.1, 0.11, 0.01))
profile[Data.Value] = ([Dim.Y, Dim.X], np.random.rand(20, 20))
for i in 1,2,3,4:
    profile[Dim.X, i:-i][Dim.Y, i:-i] += 1.0
profile[Data.Value].unit = ds.units.counts

ds.plot(profile)

In [None]:
d[Coord.Monitor, 'transmission'] = ([], transmission)
d[Coord.Monitor, 'beam'] = ([], beam)
d[Coord.Monitor, 'profile'] = ([], profile)
d

### Exercise 1
 Normalize the sample data to the "beam" monitor.

 ### Solution 1
 The binning of the monitor does not match that of the data, so we need to rebin it before the division:

In [None]:
sample_over_beam = d.subset['sample'] / ds.rebin(d[Coord.Monitor, 'beam'].scalar, d[Coord.Tof])
sample_over_beam

### Adding new dimensions

In [None]:
temp_scan = ds.concatenate(d, d * 0.8, Dim.Temperature)
temp_scan = ds.concatenate(temp_scan, temp_scan * 0.64, Dim.Temperature)
temp_scan[Coord.Temperature] = ([Dim.Temperature], [273.0, 180.0, 100.0, 4.3])
temp_scan

In [None]:
# Coord.Row specified as additional dimension, we have two sliders now
ds.plot(temp_scan.subset[Data.Value, 'sample'][Dim.Position, 20:], axes=[Coord.Row, Coord.SpectrumNumber, Coord.Temperature, Coord.Tof])

In [None]:
# Slicing out a single panel
ds.plot(temp_scan.subset[Data.Value, 'sample'][Dim.Position, 500][Dim.Row, 3], collapse=Dim.Temperature)

### Unit conversion

In [None]:
d = ds.convert(d, Dim.Tof, Dim.DSpacing)

In [None]:
# Plotting cannot handle ragged coordinates at this point, rebin to edges of first spectrum
# Need to slice also Dim.Row to obtain 1D axis for rebinning
d = ds.rebin(d, d[Coord.DSpacing][Dim.Position, 0][Dim.Row, 3])

In [None]:
# 3 coordinates now, this is now a plot with slider
ds.plot(d.subset[Data.Value, 'sample'], axes=[Coord.Row, Coord.SpectrumNumber, Coord.DSpacing])

In [None]:
# Second plot moved to different cell (two plots with slider in the same cell not working currently)
ds.plot(d.subset[Data.Value, 'vanadium'], axes=[Coord.Row, Coord.SpectrumNumber, Coord.DSpacing])

### Summing and normalizing

In [None]:
summed = ds.sum(d, Dim.Position)
# Using a loop + slicing to obtain separate plots. Otherwise we would get a 2D plot which
# is not so useful in this case.
for bank in [0,1,2,3]:
    ds.plot(summed[Dim.Row, bank])

In [None]:
normalized = summed.subset['sample'] / summed.subset['vanadium']
# Using the `collapse` option to get a unified plot
ds.plot(normalized, collapse=Dim.Row)

### Exercise 2 (advanced)

Instead of loading only a single bank, load multiple, e.g., `bank124`, `bank144`, `bank164`, and `bank184`.
Modify everything in this notebook to work with the new multi-bank data, obtaining a separate focussed diffraction spectrum for each bank.

There is more than one option to solve this:
1. Concatenate the loaded data into a single dataset, resulting in more or larger dimensions.
2. Merge the loaded data into a single dataset, resulting in differently named variables for each bank.
3. Call the existing code as-is for each bank, working, e.g., for a Python `list` of datasets.

Each of the approaches has its advantages and drawbacks.

Here we recommend option 1, which in itself can be implemented in one of two ways:
- Concatenate along a new dimension (`Dim.Bank` and `Coord.Bank` is not supported currently, use, e.g., `Dim.Row` instead).
- Concatenate along the existing dimension `Dim.Position`.

*Note: You will likely experience some small problems with plotting, in particular issues with multi-dimensional coordinates in the first case (we suggest to slice manually until this is supported), and large gaps in the second case (can be avoided by adding a helper-coordinate).*

*Bonus note for option 3: Unlike Mantid workspaces, datasets can safely be used in combination with Python containers. Do not try this with workspaces, since they are entangled with the `AnalysisDataService`.*