# Project Index

In [1]:
import forgather.nb.notebooks as nb
nb.display_project_index()

## Tiny LLama

In this tutorial we will train a very small Llama model (about 5M parameters) on 10% of the Tiny Stories dataset. On a single RTX-4090, this takes about three minutes. Once training is complete, we will load the model an use it for text generation -- and the generation will be reasonably coherent for a three-minute-old model.

#### Project Directory: "/home/dinalt/ai_assets/forgather/examples/tutorials/tiny_llama"

## Meta Config
Meta Config: [/home/dinalt/ai_assets/forgather/examples/tutorials/tiny_llama/meta.yaml](meta.yaml)

- [meta.yaml](meta.yaml)
    - [meta_defaults.yaml](../../../forgather_workspace/meta_defaults.yaml)
        - [base_directories.yaml](../../../forgather_workspace/base_directories.yaml)

Template Search Paths:
- [/home/dinalt/ai_assets/forgather/examples/tutorials/tiny_llama/templates](templates)
- [/home/dinalt/ai_assets/forgather/forgather_workspace](../../../forgather_workspace)
- [/home/dinalt/ai_assets/forgather/templatelib/modellib](../../../templatelib/modellib)
- [/home/dinalt/ai_assets/forgather/templatelib/examples](../../../templatelib/examples)
- [/home/dinalt/ai_assets/forgather/templatelib/base](../../../templatelib/base)

## Available Configurations
- [train_bigger_llama.yaml](templates/configs/train_bigger_llama.yaml)
- [train_tiny_llama.yaml](templates/configs/train_tiny_llama.yaml)
- [train_bigger_llama_full.yaml](templates/configs/train_bigger_llama_full.yaml)
- [prompts.yaml](templates/configs/prompts.yaml)

Default Configuration: train_tiny_llama.yaml



---
This example makes extensive use of the Forgather templates library. Take a look at the various files which go into the configuration and compare these to the pre-processed output.

In [2]:
nb.display_config(config_template="", show_pp_config=True, show_generated_code=False)

## Included Templates
- [configs/train_tiny_llama.yaml](templates/configs/train_tiny_llama.yaml)
    - [project.yaml](templates/project.yaml)
        - [callbacks/loggers.yaml](../../../templatelib/base/callbacks/loggers.yaml)
            - [callbacks/base_callbacks.yaml](../../../templatelib/base/callbacks/base_callbacks.yaml)
                - [inc/formatting.jinja](../../../templatelib/base/inc/formatting.jinja)
        - [datasets/tiny_stories_abridged.yaml](../../../templatelib/examples/datasets/tiny_stories_abridged.yaml)
            - [datasets/tiny_stories.yaml](../../../templatelib/examples/datasets/tiny_stories.yaml)
                - [datasets//base_datasets.yaml](../../../templatelib/base/datasets/base_datasets.yaml)
        - [types/training_script/causal_lm/causal_lm.yaml](../../../templatelib/base/types/training_script/causal_lm/causal_lm.yaml)
            - [trainers/trainer.yaml](../../../templatelib/base/trainers/trainer.yaml)
                - [trainers/base_trainer.yaml](../../../templatelib/base/trainers/base_trainer.yaml)
                    - [trainers/minimal_trainer.yaml](../../../templatelib/base/trainers/minimal_trainer.yaml)
            - [models/causal_lm/load_model.yaml](../../../templatelib/base/models/causal_lm/load_model.yaml)
                - [models/causal_lm/from_pretrained.yaml](../../../templatelib/base/models/causal_lm/from_pretrained.yaml)
                    - [models/base_language_model.yaml](../../../templatelib/base/models/base_language_model.yaml)
            - [types/training_script/training_script.yaml](../../../templatelib/base/types/training_script/training_script.yaml)
                - [types/type.yaml](../../../templatelib/base/types/type.yaml)
                    - [base_directories.yaml](../../../forgather_workspace/base_directories.yaml)
        - [project.trainer_config](templates/project.yaml)
        - [project.model_config](templates/project.yaml)
            - [tokenizers/tiny_2k.yaml](../../../templatelib/examples/tokenizers/tiny_2k.yaml)
            - [models/llama.yaml](../../../templatelib/examples/models/llama.yaml)
                - [models/causal_lm/from_config.yaml](../../../templatelib/base/models/causal_lm/from_config.yaml)
### Config Metadata:

```python
{'config_class': 'type.training_script.causal_lm',
 'config_description': 'A demo of training a tiny llama model from scratch.',
 'config_name': 'Tiny Llama',
 'create_new_model': 'True',
 'datasets_dir': '/home/dinalt/ai_assets/forgather/datasets',
 'eval': 'False',
 'forgather_dir': '/home/dinalt/ai_assets/forgather',
 'logging_dir': './output_models/default_model/runs/log_2025-06-22T03-46-07',
 'model_src_dir': '/home/dinalt/ai_assets/forgather/model_src',
 'models_dir': './output_models',
 'output_dir': './output_models/default_model',
 'project_dir': '.',
 'save_model': 'True',
 'tokenizers_dir': '/home/dinalt/ai_assets/forgather/tokenizers',
 'train': 'True',
 'workspace_root': '/home/dinalt/ai_assets/forgather'}

```

## Modules
## Output Targets
- distributed_env
- model_constructor_args
- tokenizer
- model_code_generator
- model_code_writer
- model_config
- model
- train_source_dataset
- eval_source_dataset
- train_dataset_split
- eval_dataset_split
- preprocess_args
- train_dataset
- eval_dataset
- data_collator
- experiment_info
- trainer_callbacks
- optimizer
- lr_scheduler
- trainer_args
- model_preprocessor
- trainer
- meta
- main

## Preprocessed Config

```yaml
#---------------------------------------
#               Tiny Llama               
#---------------------------------------
# 2025-06-22T03:46:07
# Description: A demo of training a tiny llama model from scratch.
# Project Dir: /home/dinalt/ai_assets/forgather/examples/tutorials/tiny_llama
# Current Working Dir: "/home/dinalt/ai_assets/forgather/examples/tutorials/tiny_llama"
# Forgather Config Dir: "/home/dinalt/.config/forgather"
# Model: default_model
# Hostname: hal9000
# Versions:
#     python: 3.10.13
#     torch: 2.7.1
#     transformers: 4.51.3
#     accelerate: 1.7.0

############# Config Vars ##############

# ns.forgather_dir: "/home/dinalt/ai_assets/forgather"
# ns.models_dir: "/home/dinalt/ai_assets/forgather/examples/tutorials/tiny_llama/output_models"
# ns.project_model_src_dir: "/home/dinalt/ai_assets/forgather/examples/tutorials/tiny_llama/model_src"
# ns.tokenizers_dir: "/home/dinalt/ai_assets/forgather/tokenizers"
# ns.datasets_dir: "/home/dinalt/ai_assets/forgather/datasets"
# ns.model_src_dir: "/home/dinalt/ai_assets/forgather/model_src"
# ns.output_dir: "./output_models/default_model"
# ns.logging_dir: "./output_models/default_model/runs/log_2025-06-22T03-46-07"
# ns.create_new_model: True
# ns.save_model: True
# ns.train: True
# ns.eval: False
# ns.trust_remote_code: False

####### Distributed Environment ########

distributed_env: &distributed_env !singleton:forgather.ml.distributed:DistributedEnvironment@distributed_env

############# Dependencies #############



################ Model #################

# https://huggingface.co/docs/transformers/en/model_doc/auto
model_constructor_args: &model_constructor_args {}

# Name: Llama
# Description: Llama model

# model_def.source = ""
# model_def.model_config_cls = "transformers:LlamaConfig"

# **Tokenizer**

# Load custom tokenizer from sub-project definition
tokenizer: &tokenizer !singleton:forgather.ml.construct:load_from_config@tokenizer
    project_dir: "/home/dinalt/ai_assets/forgather/examples/tokenizers/tiny_stories_bpe"
    config_template: "2k.yaml"

# **Model Config**

# Model config dependencies

model_code_generator: &model_code_generator null

model_code_writer: &model_code_writer null    

# https://github.com/huggingface/transformers/blob/main/src/transformers/models/llama/configuration_llama.py
model_config: &model_config !singleton:transformers:LlamaConfig
    vocab_size: !singleton:len [ *tokenizer ]
    max_position_embeddings: !singleton:getattr [ *tokenizer, 'model_max_length' ]
    pad_token_id: !singleton:getattr [ *tokenizer, 'pad_token_id' ]
    bos_token_id: !singleton:getattr [ *tokenizer, 'bos_token_id' ]
    eos_token_id: !singleton:getattr [ *tokenizer, 'eos_token_id' ]

    # Tiny Llama overrides
    hidden_size: 256
    intermediate_size: 1024
    num_attention_heads: 2
    num_key_value_heads: 2
    num_hidden_layers: 4

# **Model Factory**

model: &model !lambda:transformers:AutoModelForCausalLM.from_config@model
    args:
        - *model_config
    kwargs:
        <<: *model_constructor_args

############### Datasets ###############

# Name: TinyStories Abridged
# Define: Abridged to 10% of original size; Dataset containing synthetically generated (by GPT-3.5 and GPT-4) short stories that only use a small vocabulary.
# Source: https://arxiv.org/abs/2305.07759
# Train Dataset: "roneneldan/TinyStories" : "train"
# Eval Dataset: "roneneldan/TinyStories" : "validation"

# **Source Datasets**

train_source_dataset: &train_source_dataset !singleton:datasets:load_dataset@train_source_dataset
    - "roneneldan/TinyStories"

eval_source_dataset: &eval_source_dataset !singleton:datasets:load_dataset@eval_source_dataset
    - "roneneldan/TinyStories"

# **Dataset Splits**

train_dataset_split: &train_dataset_split !singleton:operator:getitem
    - *train_source_dataset
    - "train"

eval_dataset_split: &eval_dataset_split !singleton:operator:getitem
    - *train_source_dataset
    - "validation"

# **Preprocess Dataset Args**

preprocess_args: &preprocess_args
    truncation: True

# **Preprocessed Datasets**

train_dataset: &train_dataset !singleton:forgather.ml.datasets:preprocess_dataset@train_dataset
    dataset: *train_dataset_split
    tokenizer: *tokenizer
    select_range: 0.1
    desc: "Tokenizing train"
    fn_kwargs:
        <<: *preprocess_args

eval_dataset: &eval_dataset !singleton:forgather.ml.datasets:preprocess_dataset@eval_dataset
    dataset: *eval_dataset_split
    tokenizer: *tokenizer
    select_range: 500
    desc: "Tokenizing validation split"
    fn_kwargs:
        <<: *preprocess_args

############ Data Collator #############

# Data collator for causal model
# Batches are dynamically padded to longest sequence
# labels are set to input_ids, with pad tokens set to -100
data_collator: &data_collator !singleton:forgather.ml.data_collator:DataCollatorForCausalLM@DataCollatorForCausalLM
    tokenizer: *tokenizer
    return_tensors: pt

    # Tiny Llama
    truncation: True
    max_length: 512

########## Trainer Callbacks ###########

# **Dependencies**

# Experiment tracking: Tensorboard SummaryWriter
.define: &summary_writer !singleton:torch.utils.tensorboard:SummaryWriter
    - "./output_models/default_model/runs/log_2025-06-22T03-46-07"

# Additional data to record to experiment loggers
experiment_info: &experiment_info !dict:@experiment_info
    date: "2025-06-22T03:46:07"
    name: "Tiny Llama"
    description: "A demo of training a tiny llama model from scratch."
    config: !var "pp_config"
    versions: {'python': '3.10.13', 'torch': '2.7.1', 'transformers': '4.51.3', 'accelerate': '1.7.0'}

# **Callback List**

trainer_callbacks: &trainer_callbacks !list:@trainer_callbacks
    # Log all training output to JSON
    - !singleton:forgather.ml.json_logger:JsonLogger
        <<: *experiment_info
    # Log configuration and metrics to Tensorboard file
    - !singleton:forgather.ml.tb_logger:TBLogger
        args: [ *summary_writer ]
        kwargs:
            <<: *experiment_info

############## Optimizer ###############

optimizer: &optimizer !lambda:torch:optim.AdamW
    lr: 1.0e-3

############# LR Scheduler #############

lr_scheduler: &lr_scheduler ~

############### Trainer ################

# Name: forgather.ml.trainer.Trainer
# Description: A lightweight, extensible trainer; does not support multiple GPUs

# **Trainer Args**

trainer_args: &trainer_args
    # Minimal Trainer Defaults
    # https://huggingface.co/docs/transformers/en/main_classes/trainer#transformers.TrainingArguments
    output_dir: "./output_models/default_model"
    logging_dir: "./output_models/default_model/runs/log_2025-06-22T03-46-07"
    logging_steps: 500
    per_device_train_batch_size: 16
    per_device_eval_batch_size: 32
    num_train_epochs: 1
    # Base Trainer Defaults
    # https://huggingface.co/docs/transformers/en/main_classes/trainer#transformers.TrainingArguments
    overwrite_output_dir: True
    eval_steps: 100
    eval_strategy: "steps"
    save_strategy: "no"
    logging_strategy: "steps"

    # Tiny Llama Project Overrides
    seed: 42
    per_device_train_batch_size: 32
    per_device_eval_batch_size: 64
    logging_steps: 100
    eval_steps: 500
    num_train_epochs: 1
    dataloader_num_workers: 1


model_preprocessor: &model_preprocessor !partial:call [ *model ]

# **Trainer Constructor**

trainer: &trainer !singleton:forgather.ml.trainer:Trainer@trainer
    model_init: *model_preprocessor
    args: !singleton:forgather.ml.trainer_types:TrainingArguments@trainer_args
        <<: *trainer_args
    data_collator: *data_collator
    train_dataset: *train_dataset
    eval_dataset: *eval_dataset
    processing_class: *tokenizer
    callbacks: *trainer_callbacks
    optimizer_factory: *optimizer
    lr_scheduler_factory: *lr_scheduler

#---------------------------------------
#          Configuration Output          
#---------------------------------------
meta: &meta_output !dict:@meta
    config_name: "Tiny Llama"
    config_description: "A demo of training a tiny llama model from scratch."
    config_class: "type.training_script.causal_lm"
    project_dir: "."
    workspace_root: "/home/dinalt/ai_assets/forgather"
    forgather_dir: "/home/dinalt/ai_assets/forgather"
    models_dir: "./output_models"
    tokenizers_dir: "/home/dinalt/ai_assets/forgather/tokenizers"
    datasets_dir: "/home/dinalt/ai_assets/forgather/datasets"
    output_dir: "./output_models/default_model"
    model_src_dir: "/home/dinalt/ai_assets/forgather/model_src"
    logging_dir: "./output_models/default_model/runs/log_2025-06-22T03-46-07"
    create_new_model: "True"
    save_model: "True"
    train: "True"
    eval: "False"

main: !singleton:forgather.ml.training_script:TrainingScript@training_script
    meta: *meta_output
    do_save: True
    do_train: True
    do_eval: False
    # Init distributed envrionment before initializing anyting which depends on it.
    distributed_env: *distributed_env
    trainer: *trainer
    pp_config: !var "pp_config"

```



## Load Project

Load the default configuraiton.

In [3]:
from forgather.project import Project
import forgather.nb.notebooks as nb

# Load the default project, which is "train_tiny_llama.yaml"
proj = Project()

## Start Tensorboard

This project has been configured to log training to Tensorboard (TB). To watch the model's training progress with TB, run the following command, which will generate a CLI command to start the TB server. Then run the command from a shell.

When TB starts, it should provide the URL to access it. e.g.

```
Serving TensorBoard on localhost; to expose to the network, use a proxy or pass --bind_all
TensorBoard 2.16.2 at http://localhost:6006/ (Press CTRL+C to quit)
```

We will get back to this after training starts...

In [4]:
# Show command to run tensorboard; local_host should be false if tensorboard should run on all network interfaces -- not running on the same computer as your browser.
nb.display_tb_command(proj, local_host=True)

#### Tensorboard Command

```bash
tensorboard --logdir "/home/dinalt/ai_assets/forgather/examples/tutorials/tiny_llama/output_models/default_model"
```

## Train Model

You have a few options for training the mode.

1. Run it directly from the notebook. This should work find with this example, although for projects using multiple GPUs, you will want to use one of the other options. To train from the notebook, just run the following cell.
2. You can generate a training script and run it from the shell. To do so, run the cell with "generate_trainingscript()," then run the generated shell script from a terminal.
3. You can use the Forgather CLI.

```bash
# This assumes that you have added forgather's 'bin' directory to your path. If you have not, prefix the command with the path to the command (in the forgather "bin" directory).

# Open a shell in thie project's directory, then run this command:
fgcli.py train -d 0

# See fgcli.py --help for more details.
```

Once training starts, switch to Tensorboard in your browser. One of the first things you will want to do is enable automatic refresh. To do so, click the gear in the upper-right corner and check "Reload Data."

Once training has started, take a look at the "Text" tab. You will see that we have automatically logged the preprocessed configuraiton as well as having dumped the primary training artifacts.

Next, switch to the "Scalars" tab. You will see a plot of train and evaluation loss which will automatically update every 30 seconds. If you are not familiar with Tensorboard, now would be a good time to play with the UI elements to see how they work.

When training completes, the model will be automatically saved to the output directory ("./output_models/default_model").

In [5]:
# Train model in notebook.

# Construct the default target, "main," which is a training script.
training_script = proj()

# Start training the model.
training_script.run()

# Release resources
training_script = None

**** Training Script Started *****
config_name: Tiny Llama
config_description: A demo of training a tiny llama model from scratch.
output_dir: ./output_models/default_model
logging_dir: ./output_models/default_model/runs/log_2025-06-22T03-46-10


  0%|                                                                                                         …

total_examples: 212,000
total_train_samples: 212,000
per_device_train_batch_size: 32
actual_per_device_batch_size: 32
total_train_batch_size: 32
max_steps: 6,625
total_parameters: 5.2M
trainable_parameters: 5.2M
model:
LlamaForCausalLM(
  (model): LlamaModel(
    (embed_tokens): Embedding(2000, 256, padding_idx=1)
    (layers): ModuleList(
      (0-3): 4 x LlamaDecoderLayer(
        (self_attn): LlamaAttention(
          (q_proj): Linear(in_features=256, out_features=256, bias=False)
          (k_proj): Linear(in_features=256, out_features=256, bias=False)
          (v_proj): Linear(in_features=256, out_features=256, bias=False)
          (o_proj): Linear(in_features=256, out_features=256, bias=False)
        )
        (mlp): LlamaMLP(
          (gate_proj): Linear(in_features=256, out_features=1024, bias=False)
          (up_proj): Linear(in_features=256, out_features=1024, bias=False)
          (down_proj): Linear(in_features=1024, out_features=256, bias=False)
          (act_fn): Si

  0%|                                                                                                         …

2025-06-22 03:47:30          500  0.08  eval-loss:  2.72421   
2025-06-22 03:47:33          600  0.09  train-loss: 2.74511   learning-rate: 9.09e-04
2025-06-22 03:47:35          700  0.11  train-loss: 2.63223   learning-rate: 8.94e-04
2025-06-22 03:47:37          800  0.12  train-loss: 2.60686   learning-rate: 8.79e-04
2025-06-22 03:47:39          900  0.14  train-loss: 2.50137   learning-rate: 8.64e-04
2025-06-22 03:47:41        1,000  0.15  train-loss: 2.3653    learning-rate: 8.49e-04


  0%|                                                                                                         …

2025-06-22 03:47:42        1,000  0.15  eval-loss:  2.26289   
2025-06-22 03:47:44        1,100  0.17  train-loss: 2.38422   learning-rate: 8.34e-04
2025-06-22 03:47:46        1,200  0.18  train-loss: 2.3678    learning-rate: 8.19e-04
2025-06-22 03:47:48        1,300  0.2   train-loss: 2.32665   learning-rate: 8.04e-04
2025-06-22 03:47:50        1,400  0.21  train-loss: 2.31223   learning-rate: 7.89e-04
2025-06-22 03:47:52        1,500  0.23  train-loss: 2.28764   learning-rate: 7.74e-04


  0%|                                                                                                         …

2025-06-22 03:47:53        1,500  0.23  eval-loss:  2.0599    
2025-06-22 03:47:55        1,600  0.24  train-loss: 2.25563   learning-rate: 7.58e-04
2025-06-22 03:47:57        1,700  0.26  train-loss: 2.20934   learning-rate: 7.43e-04
2025-06-22 03:47:59        1,800  0.27  train-loss: 2.14287   learning-rate: 7.28e-04
2025-06-22 03:48:01        1,900  0.29  train-loss: 2.13751   learning-rate: 7.13e-04
2025-06-22 03:48:03        2,000  0.3   train-loss: 2.18676   learning-rate: 6.98e-04


  0%|                                                                                                         …

2025-06-22 03:48:04        2,000  0.3   eval-loss:  1.9571    
2025-06-22 03:48:06        2,100  0.32  train-loss: 2.13909   learning-rate: 6.83e-04
2025-06-22 03:48:08        2,200  0.33  train-loss: 2.08504   learning-rate: 6.68e-04
2025-06-22 03:48:10        2,300  0.35  train-loss: 2.04639   learning-rate: 6.53e-04
2025-06-22 03:48:12        2,400  0.36  train-loss: 2.10779   learning-rate: 6.38e-04
2025-06-22 03:48:14        2,500  0.38  train-loss: 2.07545   learning-rate: 6.23e-04


  0%|                                                                                                         …

2025-06-22 03:48:14        2,500  0.38  eval-loss:  1.86501   
2025-06-22 03:48:16        2,600  0.39  train-loss: 2.07933   learning-rate: 6.08e-04
2025-06-22 03:48:19        2,700  0.41  train-loss: 2.01078   learning-rate: 5.92e-04
2025-06-22 03:48:21        2,800  0.42  train-loss: 2.06126   learning-rate: 5.77e-04
2025-06-22 03:48:23        2,900  0.44  train-loss: 1.97158   learning-rate: 5.62e-04
2025-06-22 03:48:25        3,000  0.45  train-loss: 1.85841   learning-rate: 5.47e-04


  0%|                                                                                                         …

2025-06-22 03:48:25        3,000  0.45  eval-loss:  1.75771   
2025-06-22 03:48:27        3,100  0.47  train-loss: 1.95764   learning-rate: 5.32e-04
2025-06-22 03:48:29        3,200  0.48  train-loss: 2.04966   learning-rate: 5.17e-04
2025-06-22 03:48:31        3,300  0.5   train-loss: 1.92616   learning-rate: 5.02e-04
2025-06-22 03:48:34        3,400  0.51  train-loss: 1.8471    learning-rate: 4.87e-04
2025-06-22 03:48:36        3,500  0.53  train-loss: 1.8766    learning-rate: 4.72e-04


  0%|                                                                                                         …

2025-06-22 03:48:36        3,500  0.53  eval-loss:  1.72352   
2025-06-22 03:48:38        3,600  0.54  train-loss: 1.95022   learning-rate: 4.57e-04
2025-06-22 03:48:40        3,700  0.56  train-loss: 1.85785   learning-rate: 4.42e-04
2025-06-22 03:48:43        3,800  0.57  train-loss: 1.86316   learning-rate: 4.26e-04
2025-06-22 03:48:45        3,900  0.59  train-loss: 1.91145   learning-rate: 4.11e-04
2025-06-22 03:48:47        4,000  0.6   train-loss: 1.94081   learning-rate: 3.96e-04


  0%|                                                                                                         …

2025-06-22 03:48:47        4,000  0.6   eval-loss:  1.68185   
2025-06-22 03:48:49        4,100  0.62  train-loss: 1.84015   learning-rate: 3.81e-04
2025-06-22 03:48:52        4,200  0.63  train-loss: 1.80779   learning-rate: 3.66e-04
2025-06-22 03:48:54        4,300  0.65  train-loss: 1.8393    learning-rate: 3.51e-04
2025-06-22 03:48:56        4,400  0.66  train-loss: 1.89586   learning-rate: 3.36e-04
2025-06-22 03:48:58        4,500  0.68  train-loss: 1.82264   learning-rate: 3.21e-04


  0%|                                                                                                         …

2025-06-22 03:48:58        4,500  0.68  eval-loss:  1.6471    
2025-06-22 03:49:01        4,600  0.69  train-loss: 1.75596   learning-rate: 3.06e-04
2025-06-22 03:49:03        4,700  0.71  train-loss: 1.76089   learning-rate: 2.91e-04
2025-06-22 03:49:05        4,800  0.72  train-loss: 1.79232   learning-rate: 2.75e-04
2025-06-22 03:49:07        4,900  0.74  train-loss: 1.80353   learning-rate: 2.60e-04
2025-06-22 03:49:10        5,000  0.75  train-loss: 1.80619   learning-rate: 2.45e-04


  0%|                                                                                                         …

2025-06-22 03:49:10        5,000  0.75  eval-loss:  1.61131   
2025-06-22 03:49:12        5,100  0.77  train-loss: 1.80021   learning-rate: 2.30e-04
2025-06-22 03:49:14        5,200  0.78  train-loss: 1.70302   learning-rate: 2.15e-04
2025-06-22 03:49:17        5,300  0.8   train-loss: 1.70104   learning-rate: 2.00e-04
2025-06-22 03:49:19        5,400  0.82  train-loss: 1.73868   learning-rate: 1.85e-04
2025-06-22 03:49:21        5,500  0.83  train-loss: 1.71429   learning-rate: 1.70e-04


  0%|                                                                                                         …

2025-06-22 03:49:21        5,500  0.83  eval-loss:  1.57267   
2025-06-22 03:49:23        5,600  0.85  train-loss: 1.75668   learning-rate: 1.55e-04
2025-06-22 03:49:25        5,700  0.86  train-loss: 1.78478   learning-rate: 1.40e-04
2025-06-22 03:49:28        5,800  0.88  train-loss: 1.73766   learning-rate: 1.25e-04
2025-06-22 03:49:30        5,900  0.89  train-loss: 1.75866   learning-rate: 1.09e-04
2025-06-22 03:49:32        6,000  0.91  train-loss: 1.66765   learning-rate: 9.43e-05


  0%|                                                                                                         …

2025-06-22 03:49:32        6,000  0.91  eval-loss:  1.54673   
2025-06-22 03:49:34        6,100  0.92  train-loss: 1.63996   learning-rate: 7.92e-05
2025-06-22 03:49:37        6,200  0.94  train-loss: 1.70862   learning-rate: 6.42e-05
2025-06-22 03:49:39        6,300  0.95  train-loss: 1.67234   learning-rate: 4.91e-05
2025-06-22 03:49:41        6,400  0.97  train-loss: 1.66689   learning-rate: 3.40e-05
2025-06-22 03:49:43        6,500  0.98  train-loss: 1.69109   learning-rate: 1.89e-05


  0%|                                                                                                         …

2025-06-22 03:49:43        6,500  0.98  eval-loss:  1.52501   
2025-06-22 03:49:46        6,600  1.0   train-loss: 1.64215   learning-rate: 3.77e-06
2025-06-22 03:49:46        6,625  1.0   train_runtime: 146.7 train_samples: 212,000 step: 6,625 train_samples_per_second: 1.445e+03 train_steps_per_second: 45.16 epoch: 1.0 
**** Training Completed *****
{'train_runtime': 146.69839358329773, 'train_samples': 212000, 'step': 6625, 'train_samples_per_second': 1445.142, 'train_steps_per_second': 45.161, 'epoch': 1.0}
Model saved to: ./output_models/default_model


In [6]:
# Generate training script to run from shell.
nb.generate_trainingscript(proj, "0")

#### Generated Shell Script
[train_tiny_llama.sh](train_tiny_llama.sh)
```bash
#!/bin/bash
CUDA_VISIBLE_DEVICES='0' torchrun --standalone --nproc-per-node 'gpu' '/home/dinalt/ai_assets/forgather/scripts/train_script.py' -p '/home/dinalt/ai_assets/forgather/examples/tutorials/tiny_llama' "train_tiny_llama.yaml"

```

## Load Trained Model

You can use the regular HF APIs to load the saved model and tokenizer.

In [7]:
from forgather.project import Project
from transformers import AutoModelForCausalLM, AutoTokenizer
from transformers import GenerationConfig, StoppingCriteria

model_path = "./output_models/default_model"
device = "cuda:0"

model = AutoModelForCausalLM.from_pretrained(model_path)
tokenizer = AutoTokenizer.from_pretrained(model_path)

## Setup Generation Config

Let's do something interesting with our newly trained model...

To speed things up a little, there is a project configuraiton which just defines some prompts and generation parameters. The next cell will load this configuration and print it, but you can easily replace these with your own settings.

In [8]:
prompts, generation_config_args = Project("prompts.yaml")("testprompts", "generation_config")
prompts, generation_config_args

(['Alice was so tired when she got back home so she went',
  'Jack and Lily liked to watch the moon at night. They noticed that the moon changed its shape every night. Sometimes the moon was big and round, and sometimes it was',
  'Jack and Lily saw a rainbow after a rainy day.They were amazed by the colors. Jack said, "Look, Lily. A rainbow has',
  'Jack wanted to read a book, so he went to',
  '"Can cows fly?" Alice asked her mother.',
  '"What do birds like to eat?" Tom asked his mother.',
  '"What language do they speak in France?" Tom asked his mother.',
  'If I throw a ball up in the air, eventually it will',
  'It was winter and cold outside so his mother told him, "You should',
  'Lily likes cats and dogs. She asked her mom for a dog and her mom said no, so instead she asked',
  'Jack told Mary, "If you give me your banana, I\'ll give you my apple." Mary gave Jack her Banana, so',
  'On weekends Jack went to visit his grandmother whereas on weekdays he would go to school. Last 

In [9]:
# Construct generation config object froma dictionary.
gen_config = GenerationConfig(
    pad_token_id=model.config.pad_token_id,
    bos_token_id=model.config.bos_token_id,
    eos_token_id=model.config.eos_token_id,
    **generation_config_args,
)

## Text Generation

This loop will use the newly trained model to generate text, seeded with the above prompts.

In [10]:
import torch

def generate_text(model, tokenizer, prompts, gen_config, max_new_tokens, device):
    model.to(device)
    model.eval()
    
    with torch.inference_mode():
        for prompt in prompts:
            tokenizer_outputs = tokenizer(
                [prompt],
                truncation=False,
                return_length=True,
                return_tensors="pt",
                return_attention_mask=True,
            )
        
            input_ids = tokenizer_outputs["input_ids"].to(device)
            attention_mask = tokenizer_outputs["attention_mask"].to(device)
            use_cache = getattr(model, "_supports_cache_class", False)
            outputs = model.generate(
                input_ids,
                attention_mask=attention_mask,
                generation_config=gen_config,
                return_dict_in_generate=True,
                use_cache=use_cache,
                past_key_values=None,
                max_new_tokens=max_new_tokens,
            )
    
            output_text = tokenizer.decode(
                outputs.sequences[0],
                skip_special_tokens=True,
            )
            yield prompt + " [START] " + output_text[len(prompt) + 1 :]

for s in generate_text(model, tokenizer, prompts, gen_config, 100, "cuda:0"):
    print(s)
    print(f"{'-' * 40}")

Alice was so tired when she got back home so she went [START] to sleep. She closed her eyes and dreamed of eating breakfast.

The next day, Alice went to the park to play. She saw a big dog, a dog, and a dog. The dog was very cute and had big teeth.

Alice had to go home. She was sad to see the dog go, but she was afraid to run away.

Alice's mom saw her and said,
----------------------------------------
Jack and Lily liked to watch the moon at night. They noticed that the moon changed its shape every night. Sometimes the moon was big and round, and sometimes it was [START] hard work. 

One night, Jack and Lily saw a big, shiny thing in the sky. It was a shiny thing that could make noises. Jack wanted to see what it could do. He picked it up and tried to pull it out. But the thing was not moving. 

Jack and Lily were scared. They did not know what to do. They tried to pull the button, but it was too heavy for them. They tried and tried, but it was too
----------------------------------

## Train Bigger Llama

Next, let's try training a bigger version of the last model. We will double the dimensions.

We will also extend the trainer callbacks by adding text generation every 1000 steps, which can be seen under "text" in Tensorboard.

In [11]:
nb.display_config(config_template="train_bigger_llama.yaml", show_pp_config=True, show_generated_code=False)

## Included Templates
- [configs/train_bigger_llama.yaml](templates/configs/train_bigger_llama.yaml)
    - [bigger_llama_project.yaml](templates/bigger_llama_project.yaml)
        - [project.yaml](templates/project.yaml)
            - [callbacks/loggers.yaml](../../../templatelib/base/callbacks/loggers.yaml)
                - [callbacks/base_callbacks.yaml](../../../templatelib/base/callbacks/base_callbacks.yaml)
                    - [inc/formatting.jinja](../../../templatelib/base/inc/formatting.jinja)
            - [datasets/tiny_stories_abridged.yaml](../../../templatelib/examples/datasets/tiny_stories_abridged.yaml)
                - [datasets/tiny_stories.yaml](../../../templatelib/examples/datasets/tiny_stories.yaml)
                    - [datasets//base_datasets.yaml](../../../templatelib/base/datasets/base_datasets.yaml)
            - [types/training_script/causal_lm/causal_lm.yaml](../../../templatelib/base/types/training_script/causal_lm/causal_lm.yaml)
                - [trainers/trainer.yaml](../../../templatelib/base/trainers/trainer.yaml)
                    - [trainers/base_trainer.yaml](../../../templatelib/base/trainers/base_trainer.yaml)
                        - [trainers/minimal_trainer.yaml](../../../templatelib/base/trainers/minimal_trainer.yaml)
                - [models/causal_lm/load_model.yaml](../../../templatelib/base/models/causal_lm/load_model.yaml)
                    - [models/causal_lm/from_pretrained.yaml](../../../templatelib/base/models/causal_lm/from_pretrained.yaml)
                        - [models/base_language_model.yaml](../../../templatelib/base/models/base_language_model.yaml)
                - [types/training_script/training_script.yaml](../../../templatelib/base/types/training_script/training_script.yaml)
                    - [types/type.yaml](../../../templatelib/base/types/type.yaml)
                        - [base_directories.yaml](../../../forgather_workspace/base_directories.yaml)
            - [project.trainer_config](templates/project.yaml)
            - [project.model_config](templates/project.yaml)
                - [tokenizers/tiny_2k.yaml](../../../templatelib/examples/tokenizers/tiny_2k.yaml)
                - [models/llama.yaml](../../../templatelib/examples/models/llama.yaml)
                    - [models/causal_lm/from_config.yaml](../../../templatelib/base/models/causal_lm/from_config.yaml)
        - [biggerllama.logger_config](templates/bigger_llama_project.yaml)
            - [prompts/tiny_stories.yaml](../../../templatelib/examples/prompts/tiny_stories.yaml)
        - [biggerllama.model_config](templates/bigger_llama_project.yaml)
### Config Metadata:

```python
{'config_class': 'type.training_script.causal_lm',
 'config_description': 'A bigger Llama',
 'config_name': 'Bigger Llama',
 'create_new_model': 'True',
 'datasets_dir': '/home/dinalt/ai_assets/forgather/datasets',
 'eval': 'False',
 'forgather_dir': '/home/dinalt/ai_assets/forgather',
 'logging_dir': './output_models/bigger_llama/runs/bigger_llama_2025-06-22T03-50-42',
 'model_src_dir': '/home/dinalt/ai_assets/forgather/model_src',
 'models_dir': './output_models',
 'output_dir': './output_models/bigger_llama',
 'project_dir': '.',
 'save_model': 'True',
 'tokenizers_dir': '/home/dinalt/ai_assets/forgather/tokenizers',
 'train': 'True',
 'workspace_root': '/home/dinalt/ai_assets/forgather'}

```

## Modules
## Output Targets
- distributed_env
- model_constructor_args
- tokenizer
- model_code_generator
- model_code_writer
- model_config
- model
- train_source_dataset
- eval_source_dataset
- train_dataset_split
- eval_dataset_split
- preprocess_args
- train_dataset
- eval_dataset
- data_collator
- experiment_info
- testprompts
- generation_config
- trainer_callbacks
- optimizer
- lr_scheduler
- trainer_args
- model_preprocessor
- trainer
- meta
- main

## Preprocessed Config

```yaml
#---------------------------------------
#              Bigger Llama              
#---------------------------------------
# 2025-06-22T03:50:42
# Description: A bigger Llama
# Project Dir: /home/dinalt/ai_assets/forgather/examples/tutorials/tiny_llama
# Current Working Dir: "/home/dinalt/ai_assets/forgather/examples/tutorials/tiny_llama"
# Forgather Config Dir: "/home/dinalt/.config/forgather"
# Model: bigger_llama
# Hostname: hal9000
# Versions:
#     python: 3.10.13
#     torch: 2.7.1
#     transformers: 4.51.3
#     accelerate: 1.7.0

############# Config Vars ##############

# ns.forgather_dir: "/home/dinalt/ai_assets/forgather"
# ns.models_dir: "/home/dinalt/ai_assets/forgather/examples/tutorials/tiny_llama/output_models"
# ns.project_model_src_dir: "/home/dinalt/ai_assets/forgather/examples/tutorials/tiny_llama/model_src"
# ns.tokenizers_dir: "/home/dinalt/ai_assets/forgather/tokenizers"
# ns.datasets_dir: "/home/dinalt/ai_assets/forgather/datasets"
# ns.model_src_dir: "/home/dinalt/ai_assets/forgather/model_src"
# ns.output_dir: "./output_models/bigger_llama"
# ns.logging_dir: "./output_models/bigger_llama/runs/bigger_llama_2025-06-22T03-50-42"
# ns.create_new_model: True
# ns.save_model: True
# ns.train: True
# ns.eval: False
# ns.trust_remote_code: False

####### Distributed Environment ########

distributed_env: &distributed_env !singleton:forgather.ml.distributed:DistributedEnvironment@distributed_env

############# Dependencies #############



################ Model #################

# https://huggingface.co/docs/transformers/en/model_doc/auto
model_constructor_args: &model_constructor_args {}

# Name: Llama
# Description: Llama model

# model_def.source = ""
# model_def.model_config_cls = "transformers:LlamaConfig"

# **Tokenizer**

# Load custom tokenizer from sub-project definition
tokenizer: &tokenizer !singleton:forgather.ml.construct:load_from_config@tokenizer
    project_dir: "/home/dinalt/ai_assets/forgather/examples/tokenizers/tiny_stories_bpe"
    config_template: "2k.yaml"

# **Model Config**

# Model config dependencies

model_code_generator: &model_code_generator null

model_code_writer: &model_code_writer null    

# https://github.com/huggingface/transformers/blob/main/src/transformers/models/llama/configuration_llama.py
model_config: &model_config !singleton:transformers:LlamaConfig
    vocab_size: !singleton:len [ *tokenizer ]
    max_position_embeddings: !singleton:getattr [ *tokenizer, 'model_max_length' ]
    pad_token_id: !singleton:getattr [ *tokenizer, 'pad_token_id' ]
    bos_token_id: !singleton:getattr [ *tokenizer, 'bos_token_id' ]
    eos_token_id: !singleton:getattr [ *tokenizer, 'eos_token_id' ]

    # Tiny Llama overrides
    hidden_size: 256
    intermediate_size: 1024
    num_attention_heads: 2
    num_key_value_heads: 2
    num_hidden_layers: 4

    # Bigger Llama overrides
    hidden_size: 512
    intermediate_size: 2048
    num_attention_heads: 4
    num_key_value_heads: 4
    num_hidden_layers: 8

# **Model Factory**

model: &model !lambda:transformers:AutoModelForCausalLM.from_config@model
    args:
        - *model_config
    kwargs:
        <<: *model_constructor_args

############### Datasets ###############

# Name: TinyStories Abridged
# Define: Abridged to 10% of original size; Dataset containing synthetically generated (by GPT-3.5 and GPT-4) short stories that only use a small vocabulary.
# Source: https://arxiv.org/abs/2305.07759
# Train Dataset: "roneneldan/TinyStories" : "train"
# Eval Dataset: "roneneldan/TinyStories" : "validation"

# **Source Datasets**

train_source_dataset: &train_source_dataset !singleton:datasets:load_dataset@train_source_dataset
    - "roneneldan/TinyStories"

eval_source_dataset: &eval_source_dataset !singleton:datasets:load_dataset@eval_source_dataset
    - "roneneldan/TinyStories"

# **Dataset Splits**

train_dataset_split: &train_dataset_split !singleton:operator:getitem
    - *train_source_dataset
    - "train"

eval_dataset_split: &eval_dataset_split !singleton:operator:getitem
    - *train_source_dataset
    - "validation"

# **Preprocess Dataset Args**

preprocess_args: &preprocess_args
    truncation: True

# **Preprocessed Datasets**

train_dataset: &train_dataset !singleton:forgather.ml.datasets:preprocess_dataset@train_dataset
    dataset: *train_dataset_split
    tokenizer: *tokenizer
    select_range: 0.1
    desc: "Tokenizing train"
    fn_kwargs:
        <<: *preprocess_args

eval_dataset: &eval_dataset !singleton:forgather.ml.datasets:preprocess_dataset@eval_dataset
    dataset: *eval_dataset_split
    tokenizer: *tokenizer
    select_range: 500
    desc: "Tokenizing validation split"
    fn_kwargs:
        <<: *preprocess_args

############ Data Collator #############

# Data collator for causal model
# Batches are dynamically padded to longest sequence
# labels are set to input_ids, with pad tokens set to -100
data_collator: &data_collator !singleton:forgather.ml.data_collator:DataCollatorForCausalLM@DataCollatorForCausalLM
    tokenizer: *tokenizer
    return_tensors: pt

    # Tiny Llama
    truncation: True
    max_length: 512

########## Trainer Callbacks ###########

# **Dependencies**

# Experiment tracking: Tensorboard SummaryWriter
.define: &summary_writer !singleton:torch.utils.tensorboard:SummaryWriter
    - "./output_models/bigger_llama/runs/bigger_llama_2025-06-22T03-50-42"

# Additional data to record to experiment loggers
experiment_info: &experiment_info !dict:@experiment_info
    date: "2025-06-22T03:50:42"
    name: "Bigger Llama"
    description: "A bigger Llama"
    config: !var "pp_config"
    versions: {'python': '3.10.13', 'torch': '2.7.1', 'transformers': '4.51.3', 'accelerate': '1.7.0'}

# **Callback List**

# The model will be given the following prompts for text-gen at regular intervals.
testprompts: &testprompts !list:@testprompts
    # Test prompts from "https://arxiv.org/abs/2305.07759"
    - "Alice was so tired when she got back home so she went"
    - "Jack and Lily liked to watch the moon at night. They noticed that the moon changed its shape every night. Sometimes the moon was big and round, and sometimes it was"
    - "Jack and Lily saw a rainbow after a rainy day.They were amazed by the colors. Jack said, \"Look, Lily. A rainbow has"
    - "Jack wanted to read a book, so he went to"
    - "\"Can cows fly?\" Alice asked her mother."
    - "\"What do birds like to eat?\" Tom asked his mother."
    - "\"What language do they speak in France?\" Tom asked his mother."
    - "If I throw a ball up in the air, eventually it will"
    - "It was winter and cold outside so his mother told him, \"You should"
    - "Lily likes cats and dogs. She asked her mom for a dog and her mom said no, so instead she asked"
    - "Jack told Mary, \"If you give me your banana, I'll give you my apple.\" Mary gave Jack her Banana, so"
    - "On weekends Jack went to visit his grandmother whereas on weekdays he would go to school. Last weekend, when Jack was on his way to"
    - "Lily and Ben were having an argument. Ben said that cake is much better than ice cream and Lily said that"
    - "Lily and Ben are having an argument. They are trying to decide between the park and the swimming pool. Ben says, \"I want to go to the park\". Lily says"
    - "Jack's mother was not home, and his father was at home. When Jack came home, he said hello to"
    - "Lily doesn't like swimming. When her father wants to take her to the swimming pool, she says"
    - "Both Ben and Lily wanted cake. Father said that there was only one piece of cake left. They"
    - "Ben went to visit Lily in her house, but she was not at home. Ben knocked on the door,"

# Conservative text-generation parameters.
generation_config: &generation_config !dict:@generation_config
    identity: generation_config
    do_sample: True
    top_k: 20
    top_p: 0.9
    temperature: 0.7
    repitition_penalty: 1.15

trainer_callbacks: &trainer_callbacks !list:@trainer_callbacks
    # Log all training output to JSON
    - !singleton:forgather.ml.json_logger:JsonLogger
        <<: *experiment_info
    # Log configuration and metrics to Tensorboard file
    - !singleton:forgather.ml.tb_logger:TBLogger
        args: [ *summary_writer ]
        kwargs:
            <<: *experiment_info
    - !singleton:forgather.ml.textgen_callback:TextgenCallback
        summary_writer: *summary_writer
        prompts: *testprompts
        generation_config: *generation_config
        max_new_tokens: 40
        generation_steps: 1000

############## Optimizer ###############

optimizer: &optimizer !lambda:torch:optim.AdamW
    lr: 1.0e-3

############# LR Scheduler #############

lr_scheduler: &lr_scheduler ~

############### Trainer ################

# Name: forgather.ml.trainer.Trainer
# Description: A lightweight, extensible trainer; does not support multiple GPUs

# **Trainer Args**

trainer_args: &trainer_args
    # Minimal Trainer Defaults
    # https://huggingface.co/docs/transformers/en/main_classes/trainer#transformers.TrainingArguments
    output_dir: "./output_models/bigger_llama"
    logging_dir: "./output_models/bigger_llama/runs/bigger_llama_2025-06-22T03-50-42"
    logging_steps: 500
    per_device_train_batch_size: 16
    per_device_eval_batch_size: 32
    num_train_epochs: 1
    # Base Trainer Defaults
    # https://huggingface.co/docs/transformers/en/main_classes/trainer#transformers.TrainingArguments
    overwrite_output_dir: True
    eval_steps: 100
    eval_strategy: "steps"
    save_strategy: "no"
    logging_strategy: "steps"

    # Tiny Llama Project Overrides
    seed: 42
    per_device_train_batch_size: 32
    per_device_eval_batch_size: 64
    logging_steps: 100
    eval_steps: 500
    num_train_epochs: 1
    dataloader_num_workers: 1


model_preprocessor: &model_preprocessor !partial:call [ *model ]

# **Trainer Constructor**

trainer: &trainer !singleton:forgather.ml.trainer:Trainer@trainer
    model_init: *model_preprocessor
    args: !singleton:forgather.ml.trainer_types:TrainingArguments@trainer_args
        <<: *trainer_args
    data_collator: *data_collator
    train_dataset: *train_dataset
    eval_dataset: *eval_dataset
    processing_class: *tokenizer
    callbacks: *trainer_callbacks
    optimizer_factory: *optimizer
    lr_scheduler_factory: *lr_scheduler

#---------------------------------------
#          Configuration Output          
#---------------------------------------
meta: &meta_output !dict:@meta
    config_name: "Bigger Llama"
    config_description: "A bigger Llama"
    config_class: "type.training_script.causal_lm"
    project_dir: "."
    workspace_root: "/home/dinalt/ai_assets/forgather"
    forgather_dir: "/home/dinalt/ai_assets/forgather"
    models_dir: "./output_models"
    tokenizers_dir: "/home/dinalt/ai_assets/forgather/tokenizers"
    datasets_dir: "/home/dinalt/ai_assets/forgather/datasets"
    output_dir: "./output_models/bigger_llama"
    model_src_dir: "/home/dinalt/ai_assets/forgather/model_src"
    logging_dir: "./output_models/bigger_llama/runs/bigger_llama_2025-06-22T03-50-42"
    create_new_model: "True"
    save_model: "True"
    train: "True"
    eval: "False"

main: !singleton:forgather.ml.training_script:TrainingScript@training_script
    meta: *meta_output
    do_save: True
    do_train: True
    do_eval: False
    # Init distributed envrionment before initializing anyting which depends on it.
    distributed_env: *distributed_env
    trainer: *trainer
    pp_config: !var "pp_config"

```



## Start Tensorboard with CLI

This time, start Tensorboard using the forgather CLI

```bash
fgcli.py tb --all

# For bind all
fgcli.py tb --all -- --bind_all
```

The "--all" argument will show all models in the output directory, rather than just the default project. As above, if this is not in your path, it's located in the bin directory.

Remember to check for the generated sample text in Tensorboard, which will provide a more subjective measure of the model's progress.

## Train the Bigger Model

Remember to shut-down the kernel in the notebook first, if the model is still loaded on the GPU.

```bash
fgcli.py -t train_bigger_llama.yaml train -d 0
```

### Extra Credit

The learning-rate is likely not ideal for this larger model. Create new configurations, based upon "train_bigger_model.yaml," to experiment with different learning rates.

Hint: You will need to override the "optimizer" block.

```yaml
-- block optimizer
    == super()
    # Experiment overrides.
    lr: new_lr_here
-- endblock optimizer
```

## Train on the Complete Dataset

Finally, we have a configuration to train the bigger model on the full Tiny Stories dataset.

In addtion to switch to the full dataset, we also add a custom learning rate scheduler. Take a look at the configuration.

In [12]:
nb.display_config(config_template="train_bigger_llama_full.yaml", show_pp_config=False, show_generated_code=False)

## Included Templates
- [configs/train_bigger_llama_full.yaml](templates/configs/train_bigger_llama_full.yaml)
    - [datasets/tiny_stories.yaml](../../../templatelib/examples/datasets/tiny_stories.yaml)
        - [datasets//base_datasets.yaml](../../../templatelib/base/datasets/base_datasets.yaml)
            - [inc/formatting.jinja](../../../templatelib/base/inc/formatting.jinja)
    - [bigger_llama_project.yaml](templates/bigger_llama_project.yaml)
        - [project.yaml](templates/project.yaml)
            - [callbacks/loggers.yaml](../../../templatelib/base/callbacks/loggers.yaml)
                - [callbacks/base_callbacks.yaml](../../../templatelib/base/callbacks/base_callbacks.yaml)
            - [datasets/tiny_stories_abridged.yaml](../../../templatelib/examples/datasets/tiny_stories_abridged.yaml)
            - [types/training_script/causal_lm/causal_lm.yaml](../../../templatelib/base/types/training_script/causal_lm/causal_lm.yaml)
                - [trainers/trainer.yaml](../../../templatelib/base/trainers/trainer.yaml)
                    - [trainers/base_trainer.yaml](../../../templatelib/base/trainers/base_trainer.yaml)
                        - [trainers/minimal_trainer.yaml](../../../templatelib/base/trainers/minimal_trainer.yaml)
                - [models/causal_lm/load_model.yaml](../../../templatelib/base/models/causal_lm/load_model.yaml)
                    - [models/causal_lm/from_pretrained.yaml](../../../templatelib/base/models/causal_lm/from_pretrained.yaml)
                        - [models/base_language_model.yaml](../../../templatelib/base/models/base_language_model.yaml)
                - [types/training_script/training_script.yaml](../../../templatelib/base/types/training_script/training_script.yaml)
                    - [types/type.yaml](../../../templatelib/base/types/type.yaml)
                        - [base_directories.yaml](../../../forgather_workspace/base_directories.yaml)
            - [project.trainer_config](templates/project.yaml)
            - [project.model_config](templates/project.yaml)
                - [tokenizers/tiny_2k.yaml](../../../templatelib/examples/tokenizers/tiny_2k.yaml)
                - [models/llama.yaml](../../../templatelib/examples/models/llama.yaml)
                    - [models/causal_lm/from_config.yaml](../../../templatelib/base/models/causal_lm/from_config.yaml)
        - [biggerllama.logger_config](templates/bigger_llama_project.yaml)
            - [prompts/tiny_stories.yaml](../../../templatelib/examples/prompts/tiny_stories.yaml)
        - [biggerllama.model_config](templates/bigger_llama_project.yaml)
### Config Metadata:

```python
{'config_class': 'type.training_script.causal_lm',
 'config_description': 'A bigger Llama with the full Tiny Stories Dataset',
 'config_name': 'Bigger Llama Full',
 'create_new_model': 'True',
 'datasets_dir': '/home/dinalt/ai_assets/forgather/datasets',
 'eval': 'False',
 'forgather_dir': '/home/dinalt/ai_assets/forgather',
 'logging_dir': './output_models/bigger_llama/runs/bigger_llama_full_2025-06-22T03-59-12',
 'model_src_dir': '/home/dinalt/ai_assets/forgather/model_src',
 'models_dir': './output_models',
 'output_dir': './output_models/bigger_llama',
 'project_dir': '.',
 'save_model': 'True',
 'tokenizers_dir': '/home/dinalt/ai_assets/forgather/tokenizers',
 'train': 'True',
 'workspace_root': '/home/dinalt/ai_assets/forgather'}

```

## Modules
## Output Targets
- distributed_env
- model_constructor_args
- tokenizer
- model_code_generator
- model_code_writer
- model_config
- model
- train_source_dataset
- eval_source_dataset
- train_dataset_split
- eval_dataset_split
- preprocess_args
- train_dataset
- eval_dataset
- data_collator
- experiment_info
- testprompts
- generation_config
- trainer_callbacks
- optimizer
- lr_scheduler
- trainer_args
- model_preprocessor
- trainer
- meta
- main



## Show Preprocessed Config

Rather than dumping the pp_config in the notebook, try it from the CLI.

```bash
fgcli.py -t train_bigger_llama_full.yaml pp | less
```

## Train the Full Model

This could take a bit of time...

```bash
fgcli.py -t train_bigger_llama_full.yaml train -d 0
```

When done training, you can load the model into the notebook for experimentation.

Note that we saved it to a different path the the tiny model,

In [None]:
from forgather.project import Project
from transformers import AutoModelForCausalLM, AutoTokenizer
from transformers import GenerationConfig, StoppingCriteria

model_path = "./output_models/bigger_llama"
device = "cuda:0"

model = AutoModelForCausalLM.from_pretrained(model_path)
tokenizer = AutoTokenizer.from_pretrained(model_path)