### Data Loader

Before we can estimate any model, we should load in the data that we created in `linear.Rmd`. We'll reshape it so that we can sample random subjects in each batch.

In [None]:
import pandas as pd
from torch.utils.data import DataLoader
from transformer import LinearData
from transformer import Transformer

# use the data from ../generate
samples_df = pd.read_csv("../data/blooms.csv")

dataset = LinearData(samples_df)
loader = DataLoader(dataset, batch_size=16)
x, y = next(iter(loader))


Next, we let's define a model with a forward function that lets us get predicted probabilities for the two classes given the historical microbiome profile so far.

In [None]:
import torch

model = Transformer()
z, probs = model(torch.randn((16, 50, 144)))

We can now train the model based on the input data loader, using a lightning trainer.

In [None]:
import lightning as L
from transformer import LitTransformer

lit_model = LitTransformer(model)
trainer = L.Trainer(max_epochs=40)
trainer.fit(model=lit_model, train_dataloaders=loader)

In [None]:
lit_model.model.eval()
p_hat = []
with torch.no_grad():
  for x, _ in loader:
    p_hat.append(lit_model.model(x)[1])

pd.DataFrame(torch.concatenate(p_hat)).to_csv("../data/p_hat.csv")

For future reference, here were the packages we installed for this package.

```
conda install conda-forge::lightning
conda install conda-forge::pandas
conda install conda-forge::tensorboard
conda install pytorch::captum
```