# MTZ Data Types  

MTZ files use column types to specify what type of crystallographic data is contained within a given column (see [MTZ specification](https://www.ccp4.ac.uk/html/mtzformat.html#column-types)). This enables columns to have arbitrary names while ensuring that the column values are interpreted correctly.  

In order to ensure that MTZ data types behave as expected in ``rs.DataSet`` objects, we have implemented a set of custom ``pandas`` dtypes to represent the crystallographic data found in MTZ files. This facilitates MTZ file I/O, and makes it possible to write methods that operate only on expected types of crystallographic data. 

In [1]:
import reciprocalspaceship as rs
import numpy as np
from IPython.display import HTML

---
### Supported MTZ data types

The following MTZ dtypes are available for `rs.DataSet` and `rs.DataSeries` objects:

In [2]:
df = rs.summarize_mtz_dtypes(print_summary=False)
HTML(df.to_html(index=False))

MTZ Code,Name,Class,Internal
D,AnomalousDifference,AnomalousDifferenceDtype,float32
B,Batch,BatchDtype,int32
K,FriedelIntensity,FriedelIntensityDtype,float32
G,FriedelSFAmplitude,FriedelStructureFactorAmplitudeDtype,float32
H,HKL,HKLIndexDtype,int32
A,HendricksonLattman,HendricksonLattmanDtype,float32
J,Intensity,IntensityDtype,float32
I,MTZInt,MTZIntDtype,int32
R,MTZReal,MTZRealDtype,float32
Y,M/ISYM,M_IsymDtype,int32


Internally, these are all stored as `numpy` arrays of 32-bit ints or floats. This is because MTZ files only take 32-bit values. It is worth keeping in mind that other data types can be stored in an `rs.DataSet` column or `rs.DataSeries`; however, only MTZ dtypes can be written out to an MTZ file. 

---
### Specifying MTZ data types

It is possible to specify a dtype using the MTZ Code, Name, or Class from the above table:

In [3]:
data1 = rs.DataSeries([0, 1, 2], dtype="J")
data1

0   0.0
1   1.0
2   2.0
dtype: Intensity

In [4]:
data2 = rs.DataSeries([0, 1, 2], dtype="Intensity")
data2

0   0.0
1   1.0
2   2.0
dtype: Intensity

In [5]:
data3 = rs.DataSeries([0, 1, 2], dtype=rs.IntensityDtype())
data3

0   0.0
1   1.0
2   2.0
dtype: Intensity

If you already have an `rs.DataSeries`, it is possible to change it to an MTZ dtype:

In [6]:
data4 = rs.DataSeries([0, 1, 2], dtype=np.int64)
data4.astype("Intensity")

0   0.0
1   1.0
2   2.0
dtype: Intensity

In the example above, the `np.int64` array was converted into an array of `float32` values because that is that is the internal storage type for the `rs.IntensityDtype`.

---
### Inferring MTZ data types

If data is read directly from a MTZ file, the proper MTZ data types will be set automatically. However, in order to facilitate working with MTZ dtypes, there is also support for inferring proper dtypes based on the underlying data and name of a `rs.DataSeries` or the columns of an `rs.DataSet`. Inferring the proper dtype is not always possible, but these functions are written to work for most common column names in MTZ files. If you come across common cases that do not seem to be supported, please feel free to file an [issue on GitHub](https://github.com/rs-station/reciprocalspaceship/issues).

Inferring dtype for `DataSeries`:

In [7]:
data = rs.DataSeries([0, 1, 2], name="SigI")
print(data)

0    0
1    1
2    2
Name: SigI, dtype: int64


In [8]:
data.infer_mtz_dtype()

0   0.0
1   1.0
2   2.0
Name: SigI, dtype: Stddev

It is also possible to infer the dtype for all of the columns in a `DataSet`. To illustrate this, we will read in an MTZ file, set all of the columns to the `object` dtype, and infer the correct dtypes: 

In [9]:
dataset = rs.read_mtz("../examples/data/HEWL_SSAD_24IDC.mtz")
dataset.dtypes

FreeR_flag              MTZInt
IMEAN                Intensity
SIGIMEAN                Stddev
I(+)          FriedelIntensity
SIGI(+)         StddevFriedelI
I(-)          FriedelIntensity
SIGI(-)         StddevFriedelI
N(+)                    MTZInt
N(-)                    MTZInt
dtype: object

In [10]:
dataset = dataset.astype(object)
dataset.dtypes

FreeR_flag    object
IMEAN         object
SIGIMEAN      object
I(+)          object
SIGI(+)       object
I(-)          object
SIGI(-)       object
N(+)          object
N(-)          object
dtype: object

In [11]:
dataset.infer_mtz_dtypes(inplace=True)
dataset.dtypes

FreeR_flag              MTZInt
IMEAN                Intensity
SIGIMEAN                Stddev
I(+)          FriedelIntensity
SIGI(+)         StddevFriedelI
I(-)          FriedelIntensity
SIGI(-)         StddevFriedelI
N(+)                    MTZInt
N(-)                    MTZInt
dtype: object

---
### Switching between Friedel and non-Friedel data types

Several MTZ data types are specific for anomalous data pertaining to Friedel pairs. For applicable `rs.DataSeries` objects, it is possible to switch between these data types. For data types without a Friedel-equivalent, the `rs.DataSeries` is returned unchanged:

In [12]:
# Has Friedel-equivalent
rs.DataSeries([0, 1, 2], dtype="Intensity").to_friedel_dtype()

0   0.0
1   1.0
2   2.0
dtype: FriedelIntensity

In [13]:
# Has Friedel-equivalent
rs.DataSeries([0, 1, 2], dtype="FriedelIntensity").from_friedel_dtype()

0   0.0
1   1.0
2   2.0
dtype: Intensity

In [14]:
# No Friedel-equivalent
rs.DataSeries([0, 1, 2], dtype="MTZInt").to_friedel_dtype()

0   0
1   1
2   2
dtype: MTZInt

---
### Writing out MTZ files

Any data that will be written out to a MTZ format file must have an MTZ data type. 

In [15]:
dataset.dtypes

FreeR_flag              MTZInt
IMEAN                Intensity
SIGIMEAN                Stddev
I(+)          FriedelIntensity
SIGI(+)         StddevFriedelI
I(-)          FriedelIntensity
SIGI(-)         StddevFriedelI
N(+)                    MTZInt
N(-)                    MTZInt
dtype: object

In [16]:
dataset.write_mtz("temp.mtz")

If there is a non-MTZ dtype in a `DataSet`, `DataSet.write_mtz()` will raise a `ValueError`.

In [17]:
dataset["Temp"] = "string"
dataset.dtypes

FreeR_flag              MTZInt
IMEAN                Intensity
SIGIMEAN                Stddev
I(+)          FriedelIntensity
SIGI(+)         StddevFriedelI
I(-)          FriedelIntensity
SIGI(-)         StddevFriedelI
N(+)                    MTZInt
N(-)                    MTZInt
Temp                    object
dtype: object

In [18]:
dataset.write_mtz("temp.mtz")

ValueError: column Temp of type object cannot be written to an MTZ file. To skip columns without explicit MTZ dtypes, set skip_problem_mtztypes=True

As the error message states, it is still possible to write out the MTZ by setting `skip_problem_mtztypes=True`. This will skip any columns with non-MTZ data types:

In [19]:
dataset.write_mtz("temp.mtz", skip_problem_mtztypes=True)