### Data Loader

Before we can estimate any model, we should load in the data that we created in `linear.Rmd`. We'll reshape it so that we can sample random subjects in each batch.

In [None]:
import pandas as pd
from torch.utils.data import DataLoader
from concept import ConceptData

# use the data from ../generate
samples_df = pd.read_csv("../data/blooms.csv")
concepts = pd.read_csv("../data/concepts.csv")

dataset = ConceptData(samples_df, concepts)
loader = DataLoader(dataset, batch_size=16)


We can now train the model based on the input data loader, using a lightning trainer.

In [None]:
import lightning as L
from concept import ConceptBottleneck, LitConcept

concepts
model = ConceptBottleneck()
lit_model = LitConcept(model)
trainer = L.Trainer(max_epochs=40, default_root_dir="concept_logs")
trainer.fit(model=lit_model, train_dataloaders=loader)

In [None]:
import torch

lit_model.model.eval()
p_hat = []
with torch.no_grad():
  for x, c, _ in loader:
    p_hat.append(lit_model.model(x)[1])

pd.DataFrame(torch.concatenate(p_hat)).to_csv("../data/p_hat_concept.csv")

For future reference, here were the packages we installed for this package.

```
conda install conda-forge::lightning
conda install conda-forge::pandas
conda install conda-forge::tensorboard
conda install pytorch::captum
```