# Project Index

[Custom Model Notebook](../../notebooks/custom_model.ipynb)  
[Training Notebook](../../notebooks/train.ipynb)  
[Project Config Notebook](../../notebooks/project_config.ipynb)  
[Forgather Notebook](../../notebooks/forgather.ipynb)  

In [2]:
import forgather.nb.notebooks as nb

nb.display_project_index(config_template="", show_available_templates=True, show_pp_config=True, show_generated_code=True, pp_first=True)

# Traning an Eye for Fashion

This project reproduces the configuration from a PyTorch tutorial, where a simple ML model is created and trained to recognize categories of clothing from the FashionMNIST dataset.

https://pytorch.org/tutorials/beginner/basics/optimization_tutorial.html

This was chosen as it is a relatively simple project which can be relativley self contained. Still, it is far more complex than the previous examples.

See also: https://pytorch.org/tutorials/beginner/basics/buildmodel_tutorial.html

## Custom Code

While Forgather is good at assembling objects, the language is not practical for defining logic. For this, we have defined a custom "trainer" class in the project's 'src' directory and we will use Forgather to dynamically import this code, injecting all the required dependencies.

Unlike the previous projects, you will note that the "Modules" section not empty and has a link to the model definition.

## Project Structure

Like the previous example, this project makes use of template inheritance, where there is a common 'project.yaml' file from which all of the configuratioins are derived.

Unlike the previous project, the we make use of the templates library to help automate things. This project was actually more complex to setup than most, as there is not a pre-existing template for Torch-vision projects, so the project template had to fill in the details.


---



#### Project Directory: "/home/dinalt/ai_assets/forgather/tutorials/project_gamma"

## Meta Config
Meta Config: [/home/dinalt/ai_assets/forgather/tutorials/project_gamma/meta.yaml](meta.yaml)

- [meta.yaml](meta.yaml)
    - [meta_defaults.yaml](../../forgather_workspace/meta_defaults.yaml)
        - [base_directories.yaml](../../forgather_workspace/base_directories.yaml)

Template Search Paths:
- [/home/dinalt/ai_assets/forgather/tutorials/project_gamma/templates](templates)
- [/home/dinalt/ai_assets/forgather/forgather_workspace](../../forgather_workspace)
- [/home/dinalt/ai_assets/forgather/templates/base](../../templates/base)

## Available Configurations
- [adam.yaml](templates/experiments/adam.yaml)
- [baseline.yaml](templates/experiments/baseline.yaml)

Default Configuration: baseline.yaml

Active Configuration: baseline.yaml

## Available Templates
- [base_directories.yaml](../../forgather_workspace/base_directories.yaml)
- [callbacks/base_callbacks.yaml](../../templates/base/callbacks/base_callbacks.yaml)
- [callbacks/loggers.yaml](../../templates/base/callbacks/loggers.yaml)
- [datasets/abstract/base_datasets.yaml](../../templates/base/datasets/abstract/base_datasets.yaml)
- [datasets/abstract/pretokenized_dataset.yaml](../../templates/base/datasets/abstract/pretokenized_dataset.yaml)
- [experiments/adam.yaml](templates/experiments/adam.yaml)
- [experiments/baseline.yaml](templates/experiments/baseline.yaml)
- [meta_defaults.yaml](../../forgather_workspace/meta_defaults.yaml)
- [model_test/base.yaml](../../templates/base/model_test/base.yaml)
- [model_test/sub_project.yaml](../../templates/base/model_test/sub_project.yaml)
- [models/abstract/base_language_model.yaml](../../templates/base/models/abstract/base_language_model.yaml)
- [models/abstract/causal_lm_from_config.yaml](../../templates/base/models/abstract/causal_lm_from_config.yaml)
- [models/abstract/causal_lm_from_pretrained.yaml](../../templates/base/models/abstract/causal_lm_from_pretrained.yaml)
- [models/abstract/custom_causal_lm.yaml](../../templates/base/models/abstract/custom_causal_lm.yaml)
- [models/abstract/dynamic_causal_lm.yaml](../../templates/base/models/abstract/dynamic_causal_lm.yaml)
- [models/abstract/load_model.yaml](../../templates/base/models/abstract/load_model.yaml)
- [project.yaml](templates/project.yaml)
- [trainers/accel_trainer.yaml](../../templates/base/trainers/accel_trainer.yaml)
- [trainers/base_trainer.yaml](../../templates/base/trainers/base_trainer.yaml)
- [trainers/hf_trainer.yaml](../../templates/base/trainers/hf_trainer.yaml)
- [trainers/minimal_trainer.yaml](../../templates/base/trainers/minimal_trainer.yaml)
- [trainers/simple_trainer.yaml](../../templates/base/trainers/simple_trainer.yaml)
- [trainers/trainer.yaml](../../templates/base/trainers/trainer.yaml)
- [types/meta_template.yaml](../../templates/base/types/meta_template.yaml)
- [types/model/model_type.yaml](../../templates/base/types/model/model_type.yaml)
- [types/tokenizer/bpe/bpe.yaml](../../templates/base/types/tokenizer/bpe/bpe.yaml)
- [types/tokenizer/tokenizer.yaml](../../templates/base/types/tokenizer/tokenizer.yaml)
- [types/training_script/causal_lm/causal_lm.yaml](../../templates/base/types/training_script/causal_lm/causal_lm.yaml)
- [types/training_script/training_script.yaml](../../templates/base/types/training_script/training_script.yaml)
- [types/type.yaml](../../templates/base/types/type.yaml)

## Included Templates
- [experiments/baseline.yaml](templates/experiments/baseline.yaml)
    - [project.yaml](templates/project.yaml)
        - [types/training_script/training_script.yaml](../../templates/base/types/training_script/training_script.yaml)
            - [types/type.yaml](../../templates/base/types/type.yaml)
                - [base_directories.yaml](../../forgather_workspace/base_directories.yaml)
            - [inc/formatting.jinja](../../templates/base/inc/formatting.jinja)
        - [project.trainer_config](templates/project.yaml)
            - [trainers/simple_trainer.yaml](../../templates/base/trainers/simple_trainer.yaml)
                - [trainers/minimal_trainer.yaml](../../templates/base/trainers/minimal_trainer.yaml)
## Preprocessed Config

```yaml
#---------------------------------------
#          Fashion MNIST Trainer         
#---------------------------------------
# 2024-08-17T01:33:47
# Description: Base configuration, based on Torch tutorial parameters.
# Project Dir: /home/dinalt/ai_assets/forgather/tutorials/project_gamma
# Current Working Dir: "/home/dinalt/ai_assets/forgather/tutorials/project_gamma"
# Forgather Config Dir: "/home/dinalt/.config/forgather"
# Model: base_model
# Hostname: hal9000
# Versions:
#     python: 3.10.13
#     torch: 2.3.1
#     transformers: 4.41.2
#     accelerate: 0.31.0

# Source: https://pytorch.org/tutorials/beginner/basics/optimization_tutorial.html

############# Config Vars ##############

# ns.forgather_dir: "/home/dinalt/ai_assets/forgather"
# ns.models_dir: "/home/dinalt/ai_assets/forgather/tutorials/project_gamma/output_models"
# ns.project_model_src_dir: "/home/dinalt/ai_assets/forgather/tutorials/project_gamma/model_src"
# ns.tokenizers_dir: "/home/dinalt/ai_assets/forgather/tokenizers"
# ns.datasets_dir: "/home/dinalt/ai_assets/forgather/datasets"
# ns.model_src_dir: "/home/dinalt/ai_assets/forgather/model_src"
# ns.output_dir: "./output_models/base_model"
# ns.logging_dir: "./output_models/base_model/runs/base_model_2024-08-17T01-33-47"
# ns.create_new_model: True
# ns.save_model: True
# ns.train: True
# ns.eval: False

####### Distributed Environment ########

.define: &distributed_env !singleton:forgather.ml.distributed:DistributedEnvironment@distributed_env

############# Dependencies #############



################ Model #################

.define: &model_constructor_args {}

.define: &loss_fn !factory:torch.nn:CrossEntropyLoss []

.define: &activation_factory !lambda:torch.nn:ReLU@activation_factory []

.define: &model_constructor !singleton:./model_src/mlp_model.py:MultilayerPerceptron
    # Defaults, from PyTorch Tutorial.
    d_input: 784 # The input image dimensions
    d_model: 512 # a.k.a "Hidden Dimension"
    d_output: 10 # The number of categories in the dataset
    activation_factory: *activation_factory
    loss_fn: *loss_fn    


# Copy model mode to output directory.
.define: &model !singleton:forgather.ml.construct:dependency_list@model
    - *model_constructor
    - !singleton:forgather.ml.construct:copy_package_files
        - "./output_models/base_model"
        - *model_constructor


# Model does not have dynamic code generation.
.define: &model_code_writer null

############### Datasets ###############

.define: &transform !factory:torchvision.transforms:ToTensor@transform []

.define: &train_dataset !singleton:torchvision.datasets:FashionMNIST@train_dataset
    root: "data"
    train: True
    download: True
    transform: *transform

.define: &eval_dataset !singleton:torchvision.datasets:FashionMNIST@eval_dataset
    root: "data"
    train: False
    download: True
    transform: *transform

############ Data Collator #############

.define: &data_collator null

########## Trainer Callbacks ###########

.define: &trainer_callbacks []

############### Trainer ################

# Name: Simple Trainer
# Description: A simple trainer class, for illustration purposes.

# **Trainer Args**

.define: &trainer_args
    # Minimal Trainer Defaults
    # https://huggingface.co/docs/transformers/en/main_classes/trainer#transformers.TrainingArguments
    output_dir: "./output_models/base_model"
    logging_dir: "./output_models/base_model/runs/base_model_2024-08-17T01-33-47"
    logging_steps: 500
    per_device_train_batch_size: 16
    per_device_eval_batch_size: 32
    learning_rate: 5e-5
    num_train_epochs: 1
    
    # Fashion MNIST Trainer Overrides
    logging_steps: 100
    per_device_train_batch_size: 64
    per_device_eval_batch_size: 64
    learning_rate: 1.0e-3
    num_train_epochs: 5

# **Trainer Constructor**

.define: &trainer !singleton:forgather.ml.simple_trainer:SimpleTrainer@trainer
    model: *model
    args: !singleton:forgather.ml.trainer_types:MinimalTrainingArguments@trainer_args
        <<: *trainer_args
    train_dataset: *train_dataset
    eval_dataset: *eval_dataset

#---------------------------------------
#          Configuration Output          
#---------------------------------------
meta: &meta_output !dict:@meta
    config_name: "Fashion MNIST Trainer"
    config_description: "Base configuration, based on Torch tutorial parameters."
    config_class: "type.training_script.torch_vision"
    project_dir: "."
    workspace_root: "/home/dinalt/ai_assets/forgather"
    forgather_dir: "/home/dinalt/ai_assets/forgather"
    models_dir: "./output_models"
    tokenizers_dir: "/home/dinalt/ai_assets/forgather/tokenizers"
    datasets_dir: "/home/dinalt/ai_assets/forgather/datasets"
    output_dir: "./output_models/base_model"
    model_src_dir: "/home/dinalt/ai_assets/forgather/model_src"
    logging_dir: "./output_models/base_model/runs/base_model_2024-08-17T01-33-47"
    create_new_model: "True"
    save_model: "True"
    train: "True"
    eval: "False"

main: !singleton:forgather.ml.training_script:TrainingScript@training_script
    meta: *meta_output
    do_save: True
    do_train: True
    do_eval: False
    # Init distributed envrionment before initializing anyting which depends on it.
    distributed_env: *distributed_env
    trainer: *trainer
    pp_config: !var "pp_config"

model_code_writer: *model_code_writer
distributed_env: *distributed_env
model: *model
trainer: *trainer
train_dataset: *train_dataset
eval_dataset: *eval_dataset
data_collator: *data_collator
trainer_callbacks: *trainer_callbacks

```

### Config Metadata:

```python
{'config_class': 'type.training_script.torch_vision',
 'config_description': 'Base configuration, based on Torch tutorial '
                       'parameters.',
 'config_name': 'Fashion MNIST Trainer',
 'create_new_model': 'True',
 'datasets_dir': '/home/dinalt/ai_assets/forgather/datasets',
 'eval': 'False',
 'forgather_dir': '/home/dinalt/ai_assets/forgather',
 'logging_dir': './output_models/base_model/runs/base_model_2024-08-17T01-33-47',
 'model_src_dir': '/home/dinalt/ai_assets/forgather/model_src',
 'models_dir': './output_models',
 'output_dir': './output_models/base_model',
 'project_dir': '.',
 'save_model': 'True',
 'tokenizers_dir': '/home/dinalt/ai_assets/forgather/tokenizers',
 'train': 'True',
 'workspace_root': '/home/dinalt/ai_assets/forgather'}

```

## Modules
- [./model_src/mlp_model.py](model_src/mlp_model.py) : MultilayerPerceptron
    - [/home/dinalt/ai_assets/forgather/tutorials/project_gamma/./model_src/mlp_model.py](model_src/mlp_model.py) : mlp_model
## Output Targets
- meta
- main
- model_code_writer
- distributed_env
- model
- trainer
- train_dataset
- eval_dataset
- data_collator
- trainer_callbacks

## Generated Code

```python
from torch.nn import CrossEntropyLoss
from forgather.ml.construct import copy_package_files
from torchvision.transforms import ToTensor
from forgather.ml.distributed import DistributedEnvironment
from forgather.ml.construct import dependency_list
from forgather.ml.trainer_types import MinimalTrainingArguments
from torchvision.datasets import FashionMNIST
from forgather.ml.simple_trainer import SimpleTrainer
from torch.nn import ReLU
from forgather.ml.training_script import TrainingScript
from importlib.util import spec_from_file_location, module_from_spec
import os
import sys

# Import a dynamic module.
def dynimport(module, name, searchpath):
    module_path = module
    module_name = os.path.basename(module).split(".")[0]
    module_spec = spec_from_file_location(
        module_name,
        module_path,
        submodule_search_locations=searchpath,
    )
    mod = module_from_spec(module_spec)
    sys.modules[module_name] = mod
    module_spec.loader.exec_module(mod)
    for symbol in name.split("."):
        mod = getattr(mod, symbol)
    return mod

MultilayerPerceptron = lambda: dynimport("./model_src/mlp_model.py", "MultilayerPerceptron", [])

def construct(
    pp_config,
):
    meta = {
        'config_name': 'Fashion MNIST Trainer',
        'config_description': 'Base configuration, based on Torch tutorial parameters.',
        'config_class': 'type.training_script.torch_vision',
        'project_dir': '.',
        'workspace_root': '/home/dinalt/ai_assets/forgather',
        'forgather_dir': '/home/dinalt/ai_assets/forgather',
        'models_dir': './output_models',
        'tokenizers_dir': '/home/dinalt/ai_assets/forgather/tokenizers',
        'datasets_dir': '/home/dinalt/ai_assets/forgather/datasets',
        'output_dir': './output_models/base_model',
        'model_src_dir': '/home/dinalt/ai_assets/forgather/model_src',
        'logging_dir': './output_models/base_model/runs/base_model_2024-08-17T01-33-47',
        'create_new_model': 'True',
        'save_model': 'True',
        'train': 'True',
        'eval': 'False',
    }

    distributed_env = DistributedEnvironment()

    activation_factory = lambda: ReLU()

    alpha_ = MultilayerPerceptron()(
        d_input=784,
        d_model=512,
        d_output=10,
        activation_factory=activation_factory,
        loss_fn=CrossEntropyLoss(),
    )

    model = dependency_list(
        alpha_,
        copy_package_files(
            './output_models/base_model',
            alpha_,
        ),
    )

    trainer_args = MinimalTrainingArguments(
        output_dir='./output_models/base_model',
        logging_dir='./output_models/base_model/runs/base_model_2024-08-17T01-33-47',
        logging_steps=100,
        per_device_train_batch_size=64,
        per_device_eval_batch_size=64,
        learning_rate=0.001,
        num_train_epochs=5,
    )

    transform = lambda: ToTensor()

    train_dataset = FashionMNIST(
        root='data',
        train=True,
        download=True,
        transform=transform(),
    )

    eval_dataset = FashionMNIST(
        root='data',
        train=False,
        download=True,
        transform=transform(),
    )

    trainer = SimpleTrainer(
        model=model,
        args=trainer_args,
        train_dataset=train_dataset,
        eval_dataset=eval_dataset,
    )

    training_script = TrainingScript(
        meta=meta,
        do_save=True,
        do_train=True,
        do_eval=False,
        distributed_env=distributed_env,
        trainer=trainer,
        pp_config=pp_config,
    )
    
    return training_script

```



## Construct Baseline Configuration

The "main" output is the model trainer, but we also get the model and the test-dataset as auxiliary outputs as a dictionary.

In [None]:
from forgather.project import Project
import forgather.nb.notebooks as nb
from pprint import pp

# Load default baseline config
proj = Project()

outputs = proj(["main", "meta", "trainer", "model", "eval_dataset"])

# For easier access...
training_script = outputs["main"]
project_metadata = outputs["meta"]
trainer = outputs["trainer"]
model = outputs["model"]
eval_dataset = outputs["eval_dataset"]

# Print the trainer, as it contains most of the components.
pp(trainer)

# If present, ignore the warning about missing 'image.so'; we don't use it.

## Examine Dataset and Predictions

We can take a look at what's in the dataset using [Meerkat](http://meerkat.wiki/docs/start/tutorials/tutorial-data-frames.html).

The raw images are 32x32 tensors, which we can convert to a greyscale image for rendering with the PIL library.

In the table, the "label" is the is the ground-truth target the model is expected to predict, while the "prediction" is what the model thought the item was.

As the model has not yet been trained, it is to be expected that it will fail miserably. We will train the model and then retest it.

In [None]:
import torch
import os
import meerkat as mk
import torchvision.transforms as transforms
from PIL import Image

@torch.no_grad()
def predict(model, device, x):
    """
    Given a raw image, get the model's prediction.
    """
    # The image is stored as a 32x32 uint8 Tensor
    # Convert to float and add batch dimension
    model.to(device)
    model_input = (x / 255).unsqueeze(0).to(device)
    model.eval()

    # Get model's prediciton logits for input
    logits = model(model_input)

    # Get the index for the strongest prediction.
    return logits.argmax(1).item()

def make_datapanel(eval_dataset, model, device):
    # Convert the test dataset's raw images into a Meerkat TensorColumn
    raw_img_column = mk.TensorColumn(eval_dataset.data)
    
    # This maps class indices to names
    classes = eval_dataset.classes

    # Create DataPanel
    dp = mk.DataPanel(
        {
            # Get label and convert to Python int
            "label": mk.TensorColumn(eval_dataset.targets).defer(lambda x: classes[x.item()]),
            # Lazy model inference; get model's prediction from raw image
            "prediction": raw_img_column.defer(lambda x: classes[predict(model, device, x)]),
            # Lazy conversion of raw images to images
            "image": raw_img_column.defer(lambda x: Image.fromarray(x.numpy()))
        }
    )
    return dp

# Construct the Meerkat DataPanel and display the first 10 cells.
dp = make_datapanel(eval_dataset, model, trainer.args.device)

# Note: You can change the selected slice to see other ranges of the dataset.
dp[:10]()

## Train Model

The training-script can be started by calling the "run()" method.


In [None]:
training_script.run()

# Display this when done training.
nb.display_markdown(f"""
### Outputs
Note that we have automatically saved a copy of both the config and the model's source code with the weights.
- Output Directory: {project_metadata['output_dir']}
- Logging Directory: {project_metadata['logging_dir']}
- [Saved Model Source]({os.path.join(project_metadata['output_dir'], "mlp_model.py")})
- [Saved Configuration]({os.path.join(project_metadata['logging_dir'], "config.yaml")})

""")

### View Training Runs in Tensorboard

We have automatically logged the training session to TensorBoard.
Run the following cell and go to the provided link to see the results.

If the notebook is running on the same machine as the trainer, remove "--bind_all"

When done, **stop the cell**, as Tensorboard will by running synchronously in the cell and will block execution of other cells.

You can run Tensorboard in a terminal or in another notebook to avoid blocking this notebook.

In [None]:
!tensorboard --bind_all --logdir "{project_metadata['models_dir']}"

### Test the Trained Model

We will display the same 10 data-cells as before, but now that the model has been trained, it's much less awful at the task.

In [None]:
dp[:10]()

## Construct and Train with Adam Optimizer

We hava an alternate configuration, where the original SGD optimizer is replaced with Adam.

In [None]:
nb.display_project_index(config_template="adam.yaml", show_pp_config=True, show_generated_code=True, pp_first=True)

In [None]:
from forgather.project import Project
import forgather.nb.notebooks as nb
from pprint import pp

proj = Project(config_name="adam.yaml")
outputs = proj(["main", "meta", "trainer", "model", "eval_dataset"])

# For easier access...
training_script = outputs["main"]
project_metadata = outputs["meta"]
trainer = outputs["trainer"]
model = outputs["model"]
eval_dataset = outputs["eval_dataset"]

In [None]:
training_script.run()

In [None]:
# Regenerate the DataPanel and show predictions.
dp = make_datapanel(eval_dataset, model, trainer.args.device)
dp[:10]()