# Tutorial

The `fitgrid` workflow consists of 3 steps:

1. Read in an epochs table, which creates `Epochs`.
2. Run a model using the `Epochs`, which creates a `FitGrid`.
3. Examine fit or diagnostic information using the `FitGrid`.

The commands below are in Python and are executed in Jupyter, but any Python environment will work. We recommend Jupyter Lab, Jupyter Notebook, or IPython.

## 1. Read an epochs table

In [None]:
import fitgrid

### Assumptions

- all epochs have the same time indices, and thus at each timepoint we have measurements from the same set of epochs
- no duplicate epoch identifier values are allowed.

Suppose you have an HDF5 file containing an epochs table of the following form:

| Epoch | Sample | predictor | channel_A | channel_B |
|-------|--------|----------:|----------:|----------:|
| 1     | 1      | 0.1       | 55        | 60        |
| 1     | 2      | 0.2       | 54        | 58        |
| 1     | 3      | 0.4       | 57        | 64        |
| 2     | 1      | 0.8       | 43        | 12        |
| 2     | 2      | 0.4       | 45        | 23        |
| 2     | 3      | 0.2       | 41        | 18        |

You need to tell `fitgrid`:

1. which index column is the epoch identifier
2. which index column is the time identifier
3. which columns you want to model (channels)

So to read the HDF5 file, you would call `fitgrid.epochs_from_hdf` as follows:

```python
epochs = fitgrid.epochs_from_hdf(
    filename='epochs.hdf',
    key=None,
    time='Sample',
    epoch_id='Epoch',
    channels=['channel_A', 'channel_B']
)
```

### Example

Take the following simulated dataset:

In [None]:
example_filename = 'example.h5'

In [None]:
import pandas as pd
pd.read_hdf(example_filename).head(5)

To read this dataset, run:

In [None]:
epochs = fitgrid.epochs_from_hdf(
    filename=example_filename,
    key=None,
    time='Time',
    epoch_id='Epoch_idx',
    channels=['channel0', 'channel1']
)

This creates an epochs object that can be used for modeling.

In addition to HDF5 files, epochs objects can be read from Feather files or from pandas DataFrames using `fitgrid.epochs_from_feather` and `fitgrid.epochs_from_dataframe` respectively. For details check the Reference section.

Kutas lab specific defaults values for `time`, `epoch_id` and `channels` are available in `fitgrid.defaults`:

In [None]:
fitgrid.defaults.TIME

In [None]:
fitgrid.defaults.EPOCH_ID

In [None]:
fitgrid.defaults.CHANNELS

## 2. Run a model

As of now, linear regression (via ``statsmodels``' ``ols``) and linear mixed
models (via ``lme4``'s ``lmer``) are available. 

Running a model on the epochs creates a `FitGrid` object, containing fit
information, such as the betas, and diagnostic information,
such as $R^2$ in the case of linear regression.

### Linear regression

To run linear regression on the epochs, use the `lm` function:

In [None]:
lm_grid = fitgrid.lm(epochs, RHS='continuous + categorical')

`fitgrid.lm` runs linear regression for each channel, with a single channel
data as the left hand side, and the right hand side given by the Patsy/R style
formula passed in using the `RHS` parameter:

    channel0 ~ continuous + categorical
    channel1 ~ continuous + categorical
    ...
    channel31 ~ continuous + categorical

If you want to model only a specific subset of channels, pass the list of channels to the `LHS` parameter.

### Mixed effects

Similarly, to run `lmer`, use the `lmer` function:

In [None]:
lmer_grid = fitgrid.lmer(epochs, RHS='continuous + (continuous | categorical)')

With lmer especially, it might be useful to run your model with multiple
processes to speed it up. This can be achieved by setting ``parallel`` to
``True`` and ``n_cores`` to the desired value (defaults to 4) as follows:

In [None]:
lmer_grid = fitgrid.lmer(epochs, RHS='continuous + (continuous | categorical)', parallel=True)

## 3. Examine results


``FitGrid`` objects, like `lm_grid` or `lmer_grid` above, can be queried for attributes just like a
``fit`` object from ``statsmodels`` (see Research context for more
background), for example:

### Betas

In [None]:
betas = lm_grid.params
betas.head(6)

### $R^2$

In [None]:
rsquared_adj = lm_grid.rsquared_adj
rsquared_adj.head(6)

### Cook's distance

In [None]:
influence = lm_grid.get_influence()
cooks_distance = influence.cooks_distance
cooks_distance.head()

If you are using an interactive environment like Jupyter Notebook or IPython,
you can use tab completion to see what attributes are available:

```python
# type 'lm_grid.' and press Tab
lm_grid.<TAB>
```

Calling an attribute of a `FitGrid` objects returns either a pandas `DataFrame` of the
appropriate shape or another `FitGrid` object:

In [None]:
# this is a dataframe
lm_grid.params.head()

In [None]:
# this is a FitGrid
lm_grid.get_influence()

If a dataframe is returned, it is always presented in long form with the same
indices and columns on the outer side as a single epoch: channels as columns
and time as indices.

In addition, slicing on a `FitGrid` can be performed to produce a smaller grid
of the shape you want. Suppose you want to only look at a certain channel
within a given timeframe. You can slice as follows:

In [None]:
smaller_grid = lm_grid[25:75, 'channel0']
smaller_grid

Or multiple channels:

In [None]:
smaller_grid = lm_grid[25:75, ['channel0', 'channel1']]
smaller_grid

To include all timepoints or all channels, use a colon:

In [None]:
# all channels within certain timeframe
lm_grid[25:75, :]

In [None]:
# all timepoints, two channels
lm_grid[:, ['channel0', 'channel1']]