### Data Loader

Before we can estimate any model, we should load in the data that we created in `linear.Rmd`. We'll reshape it so that we can sample random subjects in each batch.

In [None]:
import pandas as pd

samples_df = pd.read_csv("../data/linear.csv")
print(samples_df)

samples_array = samples_df.pivot(index=["subject", "time", "class"], columns="taxon")


In [None]:
samples_df[samples_df["subject"] == "subject_1"]

samples_df["subject"].unique()

Next, we let's write a model with a forward function that lets us get predicted probabilities for the two classes given the historical microbiome profile so far.

In [None]:
import torch
import numpy as np
from torch.utils.data import Dataset, DataLoader

class LinearData(Dataset):
  def __init__(self, data):
    self.data = data
    self.subjects = data["subject"].unique()

  def __len__(self):
    return len(self.subjects)

  def __getitem__(self, index):
    samples = self.data[self.data["subject"] == self.subjects[index]]
    #import pdb
    #pdb.set_trace()
    x = samples.pivot(index="time", columns="taxon", values="Freq")
    y = [samples["class"].values[0]]
    return np.array(x).T, y


dataset = LinearData(samples_df)
loader = DataLoader(dataset, batch_size = 16)

We can now train the model based on the input data loader, using a lightning trainer.

For future reference, here were the packages we installed for this package.

```
conda install conda-forge::pytorch-lightning
conda install conda-forge::pandas
```