# Scaling

## MTZ IO

``ess.nmx`` has ``MTZ`` IO helper functions.
They can be used as providers in a workflow of scaling routine.

They are wrapping ``MTZ`` IO functions of ``gemmi``.

In [None]:
from ess.nmx.mtz_io import read_mtz_file, mtz_to_pandas, MTZFilePath
from ess.nmx.data import get_small_mtz_samples


small_mtz_sample = get_small_mtz_samples()[0]
mtz = read_mtz_file(MTZFilePath(small_mtz_sample))
df = mtz_to_pandas(mtz)
df.head()

## Build Pipeline

Scaling routine includes:
- Reducing individual MTZ dataset
- Merging MTZ dataset 
- Reducing merged MTZ dataset

These operations are done on pandas dataframe as recommended in ``gemmi``.
And multiple MTZ files are expected, so we need to use ``sciline.ParamTable``.
<!--TODO: Update it to use cyclebane instead of ParamTable if needed.-->

In [None]:
import sciline as sl
import scipp as sc

from ess.nmx.mtz_io import mtz_io_providers, mtz_io_params
from ess.nmx.mtz_io import MTZFileIndex, SpaceGroupDesc
from ess.nmx.scaling import scaling_providers, scaling_params
from ess.nmx.scaling import (
    WavelengthBinSize,
    FilteredEstimatedScaledIntensities,
    ReferenceWavelength,
    WavelengthBinCutProportion,
    NRoot,
    NRootStdDevCut,
)

pl = sl.Pipeline(
    providers=mtz_io_providers + scaling_providers,
    params={
        SpaceGroupDesc: "C 1 2 1",
        WavelengthBinSize: 500,
        ReferenceWavelength: sc.scalar(
            3, unit=sc.units.angstrom
        ),  # Remove it if you want to use the middle of the bin
        WavelengthBinCutProportion: 0.25,  # 0 < proportion < 0.5
        NRoot: 4,  # Increase this value to effectively remove more outliers on the right tail
        NRootStdDevCut: 1.0,  # Lower this value to remove more outliers
        **mtz_io_params,
        **scaling_params,
    },
)

file_path_table = sl.ParamTable(
    row_dim=MTZFileIndex, columns={MTZFilePath: get_small_mtz_samples()}
)

pl.set_param_table(file_path_table)
pl

## Build Workflow

In [None]:
scaling_nmx_workflow = pl.get(FilteredEstimatedScaledIntensities)
scaling_nmx_workflow.visualize(graph_attr={"rankdir": "LR"})

## Compute Desired Type

In [None]:
scaling_nmx_workflow.compute(FilteredEstimatedScaledIntensities)