# Reading Data

In [None]:
from magentropy import MagentroData

## Constructor

Data must be read when a {{ MagentroData }} object is instantiated via the constructor. Supported
input formats are Quantum Design `.dat` data files (default), delimited files, and {{ DataFrame }}s.

### QD data files

The default arguments are configured for QD `.dat` files. These files are expected to consist of:

1. A header section with metadata. In particular, the sample mass should be given as
`INFO,<sample_mass>,SAMPLE_MASS`, where `<sample_mass>` is replaced by a decimal number.

2. A `\n[Data]\n` tag separating the header section from the data. (Here, `\n` indicates a newline.)

3. The delimited data. The default separator is `','`.

4. Data columns with names `'Comment'`, `'Temperature (K)'`, `'Magnetic Field (Oe)'`,
`'Moment (emu)'`, and `'M. Std. Err. (emu)'`.

In [None]:
magdata_dat = MagentroData('magdata.dat')

```{tip}
If the column names, delimiter, or sample mass format are different, these can be set manually as
described [below](#delimited-files). The `**read_csv_kwargs` keyword arguments will also be applied
to the delimited data in `.dat` files.
```

### Delimited files

A delimited input file may be indicated by passing `qd_dat = False` to the constructor.
Additionally, different column names can be specified, including the absence of a comment
or error column. Here, the comment column is excluded.

In [None]:
magdata_csv = MagentroData(
    'magdata.csv', qd_dat=False,
    comment_col=None, T='T', H='H', M='M', M_err='M_err'
)

Since the sample mass is not present in delimited files, it is set to the default of 1.0.
This can be changed after instantiation (per-mass columns are updated accordingly):

In [None]:
print(magdata_csv.sample_mass)
magdata_csv.sample_mass = 0.1
print(magdata_csv.sample_mass)

The mass can also be provided in the constructor itself:

In [None]:
magdata_csv = MagentroData(
    'magdata.csv', qd_dat=False,
    comment_col=None, T='T', H='H', M='M', M_err='M_err',
    sample_mass=0.1
)
magdata_csv.sample_mass

```{note}
It is not strictly necessary to set `qd_dat = False`. The delimited data will be read correctly,
though a warning will be printed, and of course the sample mass must still be set manually.
```

```{tip}
Delimited data is read using {func}`pandas.read_csv`. Keyword arguments can be passed to
{func}`pandas.read_csv` as additional keyword arguments to the constructor.
For example, if the file is tab-delimited, `magdata_tab = MagentroData(..., sep='\t')`.
These will be ignored if the input is a {{ DataFrame }}, described in the [next section](#dataframes).
```

### DataFrames

If data is in a {{ DataFrame }}, perhaps because preprocessing was required, the procedure
is exactly the same as for [delimited files](#delimited-files). Here, we create a new
{{ MagentroData }} instance using the raw data from `magdata_csv`:

In [None]:
magdata_df = MagentroData(
    magdata_csv.raw_df,
    comment_col=None, T='T', H='H', M='M', M_err='M_err',
    sample_mass=0.1
)

The column labels and sample mass are specified as before. The `qd_dat` parameter is ignored
because a {{ DataFrame }} is detected, so it need not be included.

## Missing values

If a comment column label is supplied, any row in which the comment column has a non-`NaN` value is
dropped. (i.e., rows with comments are removed, since comments in QD `.dat` output files indicate
measurement problems.)

Additionally, any row containing a missing value in the temperature, field, or moment column is
dropped. If a moment error column is supplied, any row with a missing value or a value equal to
zero in the error column will be dropped.

## Viewing data

Raw, converted (SI units), and processed (smoothed) data is available through the
attributes {{ raw_df }}, {{ converted_df }}, and {{ processed_df }}. For example:

In [None]:
magdata_dat.raw_df

Each {{ DataFrame }} attribute contains columns corresponding to temperature, magnetic field strength,
moment, moment error, moment per mass, moment per mass error, moment derivative with respect to
temperature, and magnetic entropy.

Units can be viewed as a second header level by appending `_with_units` to any of these attributes.

In [None]:
magdata_dat.raw_df_with_units

Similarly for sample mass:

In [None]:
magdata_dat.sample_mass

In [None]:
magdata_dat.sample_mass_with_units

````{tip}
All {{ DataFrame }} attributes are immutable and return copies of the internal instance attributes.
If repeated access is required, for example to a {{ DataFrame }}'s columns, it is best to first save
the {{ DataFrame }} as a local variable to avoid repeatedly copying large amounts of data.

Don't do this:
```python
col_means = [magdata.raw_df['T'].mean(), magdata.raw_df['H'].mean(), magdata.raw_df['M'].mean()]
```
Instead, do this:
```python
raw_df = magdata.raw_df
col_means = [raw_df['T'].mean(), raw_df['H'].mean(), raw_df['M'].mean()]
```
````

## Simulating data

The class method {{ sim_data }} can be used to generate data for testing and examples.
A decreasing logistic function with a Gaussian "bump" whose center depends on the field strength
is used to "simulate" noisy data. (There are quotation marks because the function has no
physical significance.)

The following code returns a {{ DataFrame }} with columns `'T'`, `'H'`, `'M'`, and `'M_err'`.
This data is the same data found in the `magdata.dat` and `magdata.csv` files used in
these examples.

In [None]:
import numpy as np

sim_df = MagentroData.sim_data(
    temps=np.linspace(1., 100., 100),
    fields=np.linspace(20., 100., 5),
    sigma_m=5e-5,
    random_seed=0
)

## Units and presets

It is possible to set presets and units during instantiation.
See {doc}`processing_data` and {doc}`units_and_conversions` for additional information.