# Project Index

In [1]:
import forgather.nb.notebooks as nb
nb.display_project_index(show_available_templates=True)

## Tiny LLama

In this tutorial we will train a very small Llama model (about 5M parameters) on 10% of the Tiny Stories dataset. On a single RTX-4090, this takes about three minutes. Once training is complete, we will load the model an use it for text generation -- and the generation will be reasonably coherent for a three-minute-old model.

#### Project Directory: "/home/dinalt/ai_assets/forgather/examples/tutorials/tiny_llama"

## Meta Config
Meta Config: [/home/dinalt/ai_assets/forgather/examples/tutorials/tiny_llama/meta.yaml](meta.yaml)

- [meta.yaml](meta.yaml)
    - [meta_defaults.yaml](../../../forgather_workspace/meta_defaults.yaml)
        - [base_directories.yaml](../../../forgather_workspace/base_directories.yaml)

Template Search Paths:
- [/home/dinalt/ai_assets/forgather/examples/tutorials/tiny_llama/templates](templates)
- [/home/dinalt/ai_assets/forgather/forgather_workspace](../../../forgather_workspace)
- [/home/dinalt/ai_assets/forgather/templatelib/modellib](../../../templatelib/modellib)
- [/home/dinalt/ai_assets/forgather/templatelib/examples](../../../templatelib/examples)
- [/home/dinalt/ai_assets/forgather/templatelib/base](../../../templatelib/base)

## Available Configurations
- [train_hf_llama.yaml](templates/configs/train_hf_llama.yaml)
- [train_tiny_llama.yaml](templates/configs/train_tiny_llama.yaml)
- [experimental_llama.yaml](templates/configs/experimental_llama.yaml)

Default Configuration: train_tiny_llama.yaml

## Available Templates
- [base_directories.yaml](../../../forgather_workspace/base_directories.yaml)
- [meta_defaults.yaml](../../../forgather_workspace/meta_defaults.yaml)
- [datasets/llm_dataset_project.yaml](../../../templatelib/examples/datasets/llm_dataset_project.yaml)
- [prompts/tiny_stories.yaml](../../../templatelib/examples/prompts/tiny_stories.yaml)
- [prompts/short_stories.yaml](../../../templatelib/examples/prompts/short_stories.yaml)
- [tokenizers/tiny_2k.yaml](../../../templatelib/examples/tokenizers/tiny_2k.yaml)
- [tokenizers/tiny_8k.yaml](../../../templatelib/examples/tokenizers/tiny_8k.yaml)
- [tokenizers/wikitext/32k.yaml](../../../templatelib/examples/tokenizers/wikitext/32k.yaml)
- [tokenizers/wikitext/8k.yaml](../../../templatelib/examples/tokenizers/wikitext/8k.yaml)
- [trainers/base_trainer.yaml](../../../templatelib/base/trainers/base_trainer.yaml)
    - [trainers/trainer.yaml](../../../templatelib/base/trainers/trainer.yaml)
        - [project.trainer_config](templates/project.yaml)
        - [trainers/accel_trainer.yaml](../../../templatelib/base/trainers/accel_trainer.yaml)
        - [trainers/pipeline_trainer.yaml](../../../templatelib/base/trainers/pipeline_trainer.yaml)
    - [trainers/hf_trainer.yaml](../../../templatelib/base/trainers/hf_trainer.yaml)
- [models/base_language_model.yaml](../../../templatelib/base/models/base_language_model.yaml)
    - [models/causal_lm/from_pretrained.yaml](../../../templatelib/base/models/causal_lm/from_pretrained.yaml)
    - [models/causal_lm/from_pretrained_config.yaml](../../../templatelib/base/models/causal_lm/from_pretrained_config.yaml)
    - [models/causal_lm/custom.yaml](../../../templatelib/base/models/causal_lm/custom.yaml)
        - [models/causal_lm/custom_dynamic.yaml](../../../templatelib/base/models/causal_lm/custom_dynamic.yaml)
            - [models/deepone.yaml](../../../templatelib/examples/models/deepone.yaml)
            - [models/dynamic_causal_transformer.yaml](../../../templatelib/examples/models/dynamic_causal_transformer.yaml)
            - [models/dynamic_llama.yaml](../../../templatelib/examples/models/dynamic_llama.yaml)
                - [models/tiny_dynamic_llama.yaml](templates/models/tiny_dynamic_llama.yaml)
                    - [project.model_config](templates/project.yaml)
                        - [experiment.model_config](templates/configs/experimental_llama.yaml)
    - [models/causal_lm/from_config.yaml](../../../templatelib/base/models/causal_lm/from_config.yaml)
        - [models/gpt2.yaml](../../../templatelib/examples/models/gpt2.yaml)
        - [models/llama.yaml](../../../templatelib/examples/models/llama.yaml)
            - [models/tiny_hf_llama.yaml](templates/models/tiny_hf_llama.yaml)
- [callbacks/base_callbacks.yaml](../../../templatelib/base/callbacks/base_callbacks.yaml)
    - [callbacks/loggers.yaml](../../../templatelib/base/callbacks/loggers.yaml)
        - [project.logger_config](templates/project.yaml)
- [types/type.yaml](../../../templatelib/base/types/type.yaml)
    - [types/tokenizer/tokenizer.yaml](../../../templatelib/base/types/tokenizer/tokenizer.yaml)
        - [types/tokenizer/bpe/bpe.yaml](../../../templatelib/base/types/tokenizer/bpe/bpe.yaml)
    - [types/model/model_type.yaml](../../../templatelib/base/types/model/model_type.yaml)
    - [types/training_script/training_script.yaml](../../../templatelib/base/types/training_script/training_script.yaml)
        - [types/training_script/causal_lm/causal_lm.yaml](../../../templatelib/base/types/training_script/causal_lm/causal_lm.yaml)
            - [project.yaml](templates/project.yaml)
                - [configs/train_hf_llama.yaml](templates/configs/train_hf_llama.yaml)
                - [configs/train_tiny_llama.yaml](templates/configs/train_tiny_llama.yaml)
                - [configs/experimental_llama.yaml](templates/configs/experimental_llama.yaml)
    - [types/dataset/dataset_type.yaml](../../../templatelib/base/types/dataset/dataset_type.yaml)
        - [datasets/tokenized_dataset.yaml](../../../templatelib/base/datasets/tokenized_dataset.yaml)


---
This example makes extensive use of the Forgather templates library. Take a look at the various files which go into the configuration and compare these to the pre-processed output.

In [2]:
nb.display_config(config_template="", show_pp_config=True, show_generated_code=False)

## Included Templates
- [configs/train_tiny_llama.yaml](templates/configs/train_tiny_llama.yaml)
    - [project.yaml](templates/project.yaml)
        - [datasets/llm_dataset_project.yaml](../../../templatelib/examples/datasets/llm_dataset_project.yaml)
        - [types/training_script/causal_lm/causal_lm.yaml](../../../templatelib/base/types/training_script/causal_lm/causal_lm.yaml)
            - [trainers/trainer.yaml](../../../templatelib/base/trainers/trainer.yaml)
                - [trainers/base_trainer.yaml](../../../templatelib/base/trainers/base_trainer.yaml)
            - [callbacks/loggers.yaml](../../../templatelib/base/callbacks/loggers.yaml)
                - [callbacks/base_callbacks.yaml](../../../templatelib/base/callbacks/base_callbacks.yaml)
            - [types/training_script/training_script.yaml](../../../templatelib/base/types/training_script/training_script.yaml)
                - [types/type.yaml](../../../templatelib/base/types/type.yaml)
                    - [base_directories.yaml](../../../forgather_workspace/base_directories.yaml)
            - [inc/formatting.jinja](../../../templatelib/base/inc/formatting.jinja)
        - [project.logger_config](templates/project.yaml)
            - [prompts/tiny_stories.yaml](../../../templatelib/examples/prompts/tiny_stories.yaml)
        - [project.trainer_config](templates/project.yaml)
        - [project.model_config](templates/project.yaml)
            - [models/tiny_dynamic_llama.yaml](templates/models/tiny_dynamic_llama.yaml)
                - [tokenizers/tiny_2k.yaml](../../../templatelib/examples/tokenizers/tiny_2k.yaml)
                - [models/dynamic_llama.yaml](../../../templatelib/examples/models/dynamic_llama.yaml)
                    - [models/causal_lm/custom_dynamic.yaml](../../../templatelib/base/models/causal_lm/custom_dynamic.yaml)
                        - [models/causal_lm/custom.yaml](../../../templatelib/base/models/causal_lm/custom.yaml)
                            - [models/base_language_model.yaml](../../../templatelib/base/models/base_language_model.yaml)
### Config Metadata:

```python
{'config_class': 'type.training_script.causal_lm',
 'config_description': 'A demo of training a tiny llama model from scratch',
 'config_name': 'Tiny Llama',
 'datasets_dir': '/home/dinalt/ai_assets/forgather/datasets',
 'forgather_dir': '/home/dinalt/ai_assets/forgather',
 'logging_dir': './output_models/tiny_llama/runs/log_2025-08-24T09-51-27',
 'model_src_dir': '/home/dinalt/ai_assets/forgather/model_src',
 'models_dir': './output_models',
 'nproc_per_node': 1,
 'output_dir': './output_models/tiny_llama',
 'project_dir': '.',
 'tokenizers_dir': '/home/dinalt/ai_assets/forgather/tokenizers',
 'workspace_root': '/home/dinalt/ai_assets/forgather'}

```

## Modules
- [./output_models/tiny_llama/dynllama.py](output_models/tiny_llama/dynllama.py) : DynamicCausalLMConfig
- [./output_models/tiny_llama/dynllama.py](output_models/tiny_llama/dynllama.py) : DynamicCasualLM
## Output Targets
- distributed_env
- model_constructor_args
- tokenizer
- model_submodule_searchpath
- loss_fn
- layer_norm_factory
- feedforward_factory
- relative_pe
- attention_factory
- layer_factory
- layer_stack
- output_decoder
- absolute_pe
- input_encoder
- init_weights
- model_factory
- model_code_generator
- model_code_writer
- model_config
- pretrained_model
- model
- tokenizer_args
- train_dataset
- eval_dataset
- data_collator
- experiment_info
- testprompts
- generation_config
- trainer_callbacks
- optimizer
- lr_scheduler
- trainer_args
- model_preprocessor
- trainer
- dynamic_args
- meta
- main

## Preprocessed Config

```yaml
#---------------------------------------
#               Tiny Llama               
#---------------------------------------
# 2025-08-24T09:51:27
# Description: A demo of training a tiny llama model from scratch
# Project Dir: /home/dinalt/ai_assets/forgather/examples/tutorials/tiny_llama
# Current Working Dir: "/home/dinalt/ai_assets/forgather/examples/tutorials/tiny_llama"
# Forgather Config Dir: "/home/dinalt/.config/forgather"
# Model: tiny_llama
# Hostname: hal9000
# Versions:
#     python: 3.10.13
#     torch: 2.7.1
#     transformers: 4.51.3
#     accelerate: 1.7.0

############# Config Vars ##############

# ns.forgather_dir: "/home/dinalt/ai_assets/forgather"
# ns.models_dir: "/home/dinalt/ai_assets/forgather/examples/tutorials/tiny_llama/output_models"
# ns.project_model_src_dir: "/home/dinalt/ai_assets/forgather/examples/tutorials/tiny_llama/model_src"
# ns.tokenizers_dir: "/home/dinalt/ai_assets/forgather/tokenizers"
# ns.datasets_dir: "/home/dinalt/ai_assets/forgather/datasets"
# ns.model_src_dir: "/home/dinalt/ai_assets/forgather/model_src"
# ns.output_dir: "./output_models/tiny_llama"
# ns.logging_dir: "./output_models/tiny_llama/runs/log_2025-08-24T09-51-27"
# ns.nproc_per_node: 1
# ns.trust_remote_code: False

####### Distributed Environment ########

distributed_env: &distributed_env !singleton:forgather.ml.distributed:DistributedEnvironment@distributed_env

############# Dependencies #############



################ Model #################

# https://huggingface.co/docs/transformers/en/model_doc/auto
model_constructor_args: &model_constructor_args {}

# Name: Dynamic Llama
# Description: A Llama compatible dynamic model.
# model_def.cls = "DynamicCasualLM"
# model_def.cfg_cls = "DynamicCausalLMConfig"
# model_def.config_path = "/home/dinalt/ai_assets/forgather/examples/tutorials/tiny_llama/output_models/tiny_llama/dynllama.py"
# model_def.model_path = "/home/dinalt/ai_assets/forgather/examples/tutorials/tiny_llama/output_models/tiny_llama/dynllama.py"
# model_def.short_name = "dynllama"
# model_def.model_type = "forgather-dynamic-causal-dynllama"
# model_def.model_path = "./output_models/tiny_llama/dynllama.py"
# model_def.model_template_searchpath = "/home/dinalt/ai_assets/forgather/modelsrc/templates"
# model_def.model_template_name = "hf_causal.py"
# model_def.name_policy = "named"

# **Tokenizer**

# Load custom tokenizer from sub-project definition
tokenizer: &tokenizer !singleton:forgather.ml.construct:load_from_config@tokenizer
    project_dir: "/home/dinalt/ai_assets/forgather/examples/tokenizers/tiny_stories_bpe"
    config_template: "2k.yaml"

# **Model Config**

# Model config dependencies

model_submodule_searchpath: &model_submodule_searchpath
    - "/home/dinalt/ai_assets/forgather/modelsrc/transformer"
    - "./output_models/tiny_llama"

loss_fn: &loss_fn !singleton:.causal_loss:CausalLoss@loss_fn []

layer_norm_factory: &layer_norm_factory !partial:torch.nn:RMSNorm@layer_norm_factory
    normalized_shape: !var "hidden_size"
    eps: !var "rms_norm_eps"

feedforward_factory: &feedforward_factory !partial:.glu_feedforward:GLUFeedforwardLayer@feedforward_factory
    d_model: !var "hidden_size"
    d_feedforward: !var "dim_feedforward"
    activation_factory: !partial:torch.nn.SiLU []
    dropout: !var "activation_dropout"

relative_pe: &relative_pe !singleton:.real_rotary_embeddings:RealRotaryPE@relative_pe
    d_head: !var "d_head"
    max_sequence_length: !var "max_sequence_length"
    rope_theta: !var "rope_theta"
    
attention_factory: &attention_factory !partial:.causal_rpe_attn:CausalRpeAttn@attention_factory
    d_model: !var "hidden_size"
    num_heads: !var "num_attention_heads"
    num_kv_heads: !var "num_kv_heads"
    dropout: !var "attention_dropout"
    bias: False
    sdpa_function: !partial:torch.nn.functional:scaled_dot_product_attention []
    pos_encoder: *relative_pe

layer_factory: &layer_factory !partial:.pre_ln_layer:PreLNLayer@layer_factory
    feedforward_factory: *feedforward_factory
    attention_factory: *attention_factory
    norm_factory: *layer_norm_factory
    dropout: !var "layer_dropout"
    residual_dropout: !var "residual_dropout"

layer_stack: &layer_stack !factory:.checkpoint_layer_stack:LayerStack@layer_stack
    layer_factory: *layer_factory
    num_hidden_layers: !var "num_hidden_layers"
    post_norm_factory: *layer_norm_factory
    enable_checkpoint: !var "enable_activation_checkpoint"
    checkpoint_stride: !var "checkpoint_stride"

output_decoder: &output_decoder !factory:torch.nn:Linear@output_decoder
    in_features: !var "hidden_size"
    out_features: !var "vocab_size"
    bias: False

absolute_pe: &absolute_pe null

input_encoder: &input_encoder !factory:.input_encoder:InputEncoder@input_encoder
    d_model: !var "hidden_size"
    vocab_size: !var "vocab_size"
    dropout: !var "embedding_dropout"
    positional_encoder: *absolute_pe
    scale_sqrt_d_model: False

# Init method based upon https://github.com/pytorch/torchtitan/blob/main/torchtitan/models/llama3/model/model.py
init_weights: &init_weights !partial:.init_weights:init_weights_by_regex@init_weights
    # Note: Yaml treats single and double quotes differently WRT escapes. Use single
    # quotes for regex expressions, wihc prevents Yaml from interpreting escapes.
    # For a literal ' use ''
    regex_list:
        - [ 'norm', "pass" ]
        - [ 'bias', "zeros" ]
        - [ 'embedding\.weight', "init_embeddings" ]
        - [ 'up_proj|query_linear|key_linear|value_linear', "trunc_normal_magic" ]
        - [ 'gate_proj|down_proj|output_linear', "trunc_normal" ]
        - [ 'output_decoder', "init_output_layer" ]
    init_f_map:
        pass: !partial:.init_weights:init_pass
        zeros: !partial:torch.nn.init:zeros_ []
        init_embeddings: !partial:.llama_init:init_embeddings []
        trunc_normal_magic: !partial:.llama_init:trunc_normal_magic []
        trunc_normal: !partial:.llama_init:trunc_normal
            std: !call:.llama_init:llama_std [ !var "num_hidden_layers" ]
        init_output_layer: !partial:.llama_init:init_output_layer { d_model: !var "hidden_size" }
    # Print how each param is being initialized.
    debug: False

model_factory: &model_factory !factory:.causal_lm:CasualLM@model_factory
    loss_fn: *loss_fn
    input_encoder: *input_encoder
    output_decoder: *output_decoder
    layer_stack: *layer_stack
    init_weights: *init_weights

model_code_generator: &model_code_generator !meta:forgather.codegen:generate_code@model_code_generator
    searchpath: "/home/dinalt/ai_assets/forgather/modelsrc/templates"
    template_name: "hf_causal.py"
    name_policy: "named"
    obj: *model_factory
    # Template args
    model_type: "forgather-dynamic-causal-dynllama"
    # Dynamic Llama
    supports_gradient_checkpointing: True
    supports_sdpa: True

model_code_writer: &model_code_writer !singleton:forgather.ml.construct:write_file@model_code_writer
    data: *model_code_generator
    output_file: "./output_models/tiny_llama/dynllama.py"
    return_value: "Model constructor generated by Forgather 1.0"    

model_config: &model_config !singleton:./output_models/tiny_llama/dynllama.py:DynamicCausalLMConfig@model_config
    submodule_searchpath: *model_submodule_searchpath
    # Set auto-map for custom model; this ensures that the source code stays with the model.
    auto_map:
        AutoConfig: "dynllama.DynamicCausalLMConfig"
        AutoModel: "dynllama.DynamicCasualLM"
    # Get the vocab-size from the tokenizer definition.
    vocab_size: !singleton:len [ *tokenizer ]
    pad_token_id: !singleton:getattr [ *tokenizer, 'pad_token_id' ]
    bos_token_id: !singleton:getattr [ *tokenizer, 'bos_token_id' ]
    eos_token_id: !singleton:getattr [ *tokenizer, 'eos_token_id' ]
    # Add dependency on code generator
    code_generator: *model_code_writer
    hidden_size: 4096
    num_attention_heads: 32
    # Default to MHA when null
    num_kv_heads: null
    d_head: 128 # Must be hidden_size // num_attention_heads
    num_hidden_layers: 32
    max_sequence_length: !singleton:getattr
        - *tokenizer
        - "model_max_length"
    dim_feedforward: 11008
    rope_theta: 10000.0
    embedding_dropout: 0.0
    rms_norm_eps: 1.0e-05
    layer_dropout: 0.0
    residual_dropout: 0.0
    attention_dropout: 0.0
    activation_dropout: 0.0
    enable_activation_checkpoint: False
    checkpoint_stride: 1
    
    # Tiny Llama overrides
    hidden_size: 256
    dim_feedforward: 1024
    num_attention_heads: 2
    num_hidden_layers: 4
    d_head: 128 # Must be hidden_size // num_attention_heads

# **Model Factory**

pretrained_model: &pretrained_model !partial:./output_models/tiny_llama/dynllama.py:DynamicCasualLM@pretrained_model
    args:
        - *model_config
    kwargs:
        submodule_searchpath: *model_submodule_searchpath
        <<: *model_constructor_args

model: &model !partial:forgather.ml.construct:dependency_list@model
    - !factory:call [ *pretrained_model ]
    - !singleton:forgather.ml.construct:copy_package_files
        - "./output_models/tiny_llama"
        - *model_config

############### Datasets ###############

tokenizer_args: &tokenizer_args !dict
    truncation: True
    max_length: 512
# Load dataset from sub-project
.define: &dataset_dict !call:forgather:from_project
    project_dir: "/home/dinalt/ai_assets/forgather/examples/datasets/roneneldan"
    config_template: "tinystories-abridged.yaml"
    targets: [  "train_dataset", "eval_dataset" ] 
    preprocess_args: *tokenizer_args
    tokenizer: *tokenizer

train_dataset: &train_dataset !call:getitem [ *dataset_dict, 'train_dataset' ]
eval_dataset: &eval_dataset !call:getitem [ *dataset_dict, 'eval_dataset' ]

############ Data Collator #############

# Data collator for causal model
# Batches are dynamically padded to longest sequence
# labels are set to input_ids, with pad tokens set to -100
data_collator: &data_collator !singleton:forgather.ml.data_collator:DataCollatorForCausalLM@DataCollatorForCausalLM
    tokenizer: *tokenizer
    return_tensors: pt

    # Tiny Llama
    truncation: True
    max_length: 512

########## Trainer Callbacks ###########

# **Dependencies**

# Experiment tracking: Tensorboard SummaryWriter
.define: &summary_writer !singleton:torch.utils.tensorboard:SummaryWriter
    - "./output_models/tiny_llama/runs/log_2025-08-24T09-51-27"

# Additional data to record to experiment loggers
experiment_info: &experiment_info !dict:@experiment_info
    date: "2025-08-24T09:51:27"
    name: "Tiny Llama"
    description: "A demo of training a tiny llama model from scratch"
    config: !var "pp_config"
    versions: {'python': '3.10.13', 'torch': '2.7.1', 'transformers': '4.51.3', 'accelerate': '1.7.0'}

# **Callback List**

# The model will be given the following prompts for text-gen at regular intervals.
testprompts: &testprompts !list:@testprompts
    # Test prompts from "https://arxiv.org/abs/2305.07759"
    - "Alice was so tired when she got back home so she went"
    - "Jack and Lily liked to watch the moon at night. They noticed that the moon changed its shape every night. Sometimes the moon was big and round, and sometimes it was"
    - "Jack and Lily saw a rainbow after a rainy day.They were amazed by the colors. Jack said, \"Look, Lily. A rainbow has"
    - "Jack wanted to read a book, so he went to"
    - "\"Can cows fly?\" Alice asked her mother."
    - "\"What do birds like to eat?\" Tom asked his mother."
    - "\"What language do they speak in France?\" Tom asked his mother."
    - "If I throw a ball up in the air, eventually it will"
    - "It was winter and cold outside so his mother told him, \"You should"
    - "Lily likes cats and dogs. She asked her mom for a dog and her mom said no, so instead she asked"
    - "Jack told Mary, \"If you give me your banana, I'll give you my apple.\" Mary gave Jack her Banana, so"
    - "On weekends Jack went to visit his grandmother whereas on weekdays he would go to school. Last weekend, when Jack was on his way to"
    - "Lily and Ben were having an argument. Ben said that cake is much better than ice cream and Lily said that"
    - "Lily and Ben are having an argument. They are trying to decide between the park and the swimming pool. Ben says, \"I want to go to the park\". Lily says"
    - "Jack's mother was not home, and his father was at home. When Jack came home, he said hello to"
    - "Lily doesn't like swimming. When her father wants to take her to the swimming pool, she says"
    - "Both Ben and Lily wanted cake. Father said that there was only one piece of cake left. They"
    - "Ben went to visit Lily in her house, but she was not at home. Ben knocked on the door,"

# Conservative text-generation parameters.
generation_config: &generation_config !dict:@generation_config
    identity: generation_config
    do_sample: True
    top_k: 20
    top_p: 0.9
    temperature: 0.7
    repitition_penalty: 1.15

trainer_callbacks: &trainer_callbacks !list:@trainer_callbacks
    # Log all training output to JSON
    - !singleton:forgather.ml.trainer.callbacks:JsonLogger
        <<: *experiment_info
    # Log configuration and metrics to Tensorboard file
    - !singleton:forgather.ml.trainer.callbacks:TBLogger
        args: [ *summary_writer ]
        kwargs:
            <<: *experiment_info
    - !singleton:forgather.ml.trainer.callbacks:TextgenCallback
        summary_writer: *summary_writer
        prompts: *testprompts
        generation_config: *generation_config
        max_new_tokens: 40
        generation_steps: 1000

############## Optimizer ###############

optimizer: &optimizer !partial:torch:optim.AdamW
    lr: 1.0e-3

############# LR Scheduler #############

# https://arxiv.org/html/2503.02844v1
lr_scheduler: &lr_scheduler !lambda:forgather.ml.optim.infinite_lr_scheduler:InfiniteLRScheduler@lr_scheduler
    warmup_steps: 500
    cooldown_steps: 50000
    constant_lr: 1.0e-4

############### Trainer ################

# Name: Forgather Trainer
# Description: A lightweight, extensible trainer; does not support multiple GPUs
# Trainer Config Class: forgather.ml.trainer:TrainingArguments
# Trainer Class: forgather.ml.trainer:Trainer
# nproc_per_node: 1

# **Trainer Args**



trainer_args: &trainer_args !singleton:forgather.ml.trainer:TrainingArguments@trainer_args
    save_strategy: "no"
    max_steps: -1
    output_dir: "./output_models/tiny_llama"
    logging_dir: "./output_models/tiny_llama/runs/log_2025-08-24T09-51-27"
    # Tiny Llama Project Overrides
    save_strategy: "steps"
    save_steps: 10000
    # Safetensors can't handle tied parameters/buffers, so fallback to PyTorch format.
    save_safetensors: False
    seed: 42
    per_device_train_batch_size: 32
    per_device_eval_batch_size: 64
    logging_steps: 100
    eval_steps: 500
    num_train_epochs: 1
    dataloader_num_workers: 1

model_preprocessor: &model_preprocessor !partial:call
    - *model

# **Trainer Constructor**

trainer: &trainer !singleton:forgather.ml.trainer:Trainer@trainer
    args: *trainer_args
    model_init: *model_preprocessor
    data_collator: *data_collator
    train_dataset: *train_dataset
    eval_dataset: *eval_dataset
    processing_class: *tokenizer
    callbacks: *trainer_callbacks
    # Trainer Args
    optimizer_factory: *optimizer
    lr_scheduler_factory: *lr_scheduler

# **Dynamic Args**
dynamic_args: !dlist
    null: ~
    max_steps:
        names: "--max-steps"
        type: "int"
        help: "Set maximum training steps"
    save_strategy:
        names: "--save-strategy"
        choices: [ "no", "steps", "epoch" ]
        type: "str"
        help: "When to save checkpoints"

#---------------------------------------
#          Configuration Output          
#---------------------------------------
meta: &meta_output !dict:@meta
    config_name: "Tiny Llama"
    config_description: "A demo of training a tiny llama model from scratch"
    config_class: "type.training_script.causal_lm"
    project_dir: "."
    workspace_root: "/home/dinalt/ai_assets/forgather"
    forgather_dir: "/home/dinalt/ai_assets/forgather"
    models_dir: "./output_models"
    tokenizers_dir: "/home/dinalt/ai_assets/forgather/tokenizers"
    datasets_dir: "/home/dinalt/ai_assets/forgather/datasets"
    output_dir: "./output_models/tiny_llama"
    model_src_dir: "/home/dinalt/ai_assets/forgather/model_src"
    logging_dir: "./output_models/tiny_llama/runs/log_2025-08-24T09-51-27"
    nproc_per_node: 1

main: !singleton:forgather.ml.training_script:TrainingScript@training_script
    meta: *meta_output
    do_train: True
    do_save: False
    do_eval: False
    distributed_env: *distributed_env
    trainer: *trainer
    do_save: True

```



## Load Project

Load the default configuraiton.

In [3]:
from forgather.project import Project
import forgather.nb.notebooks as nb

# Load the default project, which is "train_tiny_llama.yaml"
proj = Project()

## Start Tensorboard

This project has been configured to log training to Tensorboard (TB). To watch the model's training progress with TB, run the following command, which will generate a CLI command to start the TB server. Then run the command from a shell.

Tensorboard can be started from a terminal like this:

```bash
# By default, Tensorboard bind only to localhost. To bind to all interfaces, add --bind_all
tensorboard --logdir "/path/to/model/log/directory" [--bind_all]
```

You can use the CLI to launch TB for you, where it will automatically determine the path to the log directory:

```bash
# --all : Watch all output model directories, otherwise just the one for the current configuration.
# -- : Any arguments after '--' are passed directly to tensorboard, for example "--bind_all"
cd PROJECT_DIR
cfcli.py tb [--all] [-- <tensorboard-args>]
```

When TB starts, it should provide the URL to access it. e.g.

```
Serving TensorBoard on localhost; to expose to the network, use a proxy or pass --bind_all
TensorBoard 2.16.2 at http://localhost:6006/ (Press CTRL+C to quit)
```

## Train Model

You have a few options for training the mode.

1. Run it directly from the notebook. This should work find with this example, although for projects using multiple GPUs, you will want to use one of the other options. To train from the notebook, just run the following cell.
2. You can generate a training script and run it from the shell. To do so, run the cell with "generate_trainingscript()," then run the generated shell script from a terminal.
3. You can use the Forgather CLI.

```bash
# Open a shell in thie project's directory, then run this command:
cd PROJECT_DIR
forgather train

# See forgather --help for more details.
```

Once training starts, switch to Tensorboard in your browser. One of the first things you will want to do is enable automatic refresh. To do so, click the gear in the upper-right corner and check "Reload Data."

Once training has started, take a look at the "Text" tab. You will see that we have automatically logged the preprocessed configuraiton as well as having dumped the primary training artifacts.

Next, switch to the "Scalars" tab. You will see a plot of train and evaluation loss which will automatically update every 30 seconds. If you are not familiar with Tensorboard, now would be a good time to play with the UI elements to see how they work.

When training completes, the model will be automatically saved to the output directory ("./output_models/default_model").

In [4]:
# Train model in notebook.

# Construct the default target, "main," which is a training script.
training_script = proj()

# Start training the model.
training_script.run()

# Release resources
training_script = None

INFO:forgather.ml.training_script:**** Training Script Started *****
INFO:forgather.ml.training_script:config_name: Tiny Llama
INFO:forgather.ml.training_script:config_description: A demo of training a tiny llama model from scratch
INFO:forgather.ml.training_script:output_dir: ./output_models/tiny_llama
INFO:forgather.ml.training_script:logging_dir: ./output_models/tiny_llama/runs/log_2025-08-24T09-51-28
INFO:forgather.ml.trainer.trainer:Constructing model on default device and moving to 0


kwargs= {'model_type': 'forgather-dynamic-causal-dynllama', 'supports_gradient_checkpointing': True, 'supports_sdpa': True}


  0%|                                                                                                         …

total_examples: 212,000
total_train_samples: 212,000
per_device_train_batch_size: 32
actual_per_device_batch_size: 32
total_train_batch_size: 32
max_steps: 6,625
total_parameters: 5.2M
trainable_parameters: 5.2M
model:
DynamicCasualLM(
  (causal_lm): CasualLM(
    loss_fn=CausalLoss()
    (input_encoder): InputEncoder(
      d_model=256, vocab_size=2000
      (dropout): Identity()
      (embedding): Embedding(2000, 256)
    )
    (output_decoder): Linear(in_features=256, out_features=2000, bias=False)
    (layer_stack): LayerStack(
      gradient_checkpointing=False, checkpoint_stride=1
      (layers): ModuleList(
        (0-3): 4 x PreLNLayer(
          (feedforward): GLUFeedforwardLayer(
            d_model=256, d_feedforward=1024
            (up_proj): Linear(in_features=256, out_features=1024, bias=False)
            (gate_proj): Linear(in_features=256, out_features=1024, bias=False)
            (down_proj): Linear(in_features=1024, out_features=256, bias=False)
            (activa

INFO:forgather.ml.trainer.trainer:Saving final checkpoint at step 6625
INFO:forgather.ml.trainer.base_trainer:Saving checkpoint at ./output_models/tiny_llama/checkpoints/checkpoint-6625
INFO:forgather.ml.trainer.base_trainer:Saved training state to ./output_models/tiny_llama/checkpoints/checkpoint-6625/training_state.pt
INFO:forgather.ml.training_script:**** Training Completed *****
INFO:forgather.ml.training_script:{'train_runtime': 135.47482442855835, 'train_samples': 212000, 'step': 6625, 'train_samples_per_second': 1564.866, 'train_steps_per_second': 48.902, 'epoch': 1.0}
INFO:forgather.ml.training_script:Model saved to: ./output_models/tiny_llama


2025-08-24 09:53:57        6,625  1.0   train_runtime: 135.5 train_samples: 212,000 step: 6,625 train_samples_per_second: 1.565e+03 train_steps_per_second: 48.9 epoch: 1.0 


## Load Trained Model

You can use the regular HF APIs to load the saved model and tokenizer.

In [5]:
from forgather.project import Project
from transformers import AutoModelForCausalLM, AutoTokenizer
from transformers import GenerationConfig, StoppingCriteria
import torch

model_path = "./output_models/tiny_llama"

# Set device to run inference on
device = "cuda" if torch.cuda.is_available() else "cpu"

model = AutoModelForCausalLM.from_pretrained(model_path, trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained(model_path)

## Text Generation

This loop will use the newly trained model to generate text, seeded with the above prompts.

In [6]:
import torch

def generate_text(model, tokenizer, prompts, gen_config, max_new_tokens, device):
    model.to(device)
    model.eval()
    
    with torch.inference_mode():
        for prompt in prompts:
            tokenizer_outputs = tokenizer(
                [prompt],
                truncation=False,
                return_length=True,
                return_tensors="pt",
                return_attention_mask=True,
            )
        
            input_ids = tokenizer_outputs["input_ids"].to(device)
            attention_mask = tokenizer_outputs["attention_mask"].to(device)
            use_cache = getattr(model, "_supports_cache_class", False)
            outputs = model.generate(
                input_ids,
                attention_mask=attention_mask,
                generation_config=gen_config,
                return_dict_in_generate=True,
                use_cache=use_cache,
                past_key_values=None,
                max_new_tokens=max_new_tokens,
            )
    
            output_text = tokenizer.decode(
                outputs.sequences[0],
                skip_special_tokens=True,
            )
            yield prompt + " [START] " + output_text[len(prompt) + 1 :]

prompts = [
    'Alice was so tired when she got back home so she went',
    'Jack and Lily liked to watch the moon at night. They noticed that the moon changed its shape every night. Sometimes the moon was big and round, and sometimes it was',
    'Jack and Lily saw a rainbow after a rainy day.They were amazed by the colors. Jack said, "Look, Lily. A rainbow has',
    'Jack wanted to read a book, so he went to',
    '"Can cows fly?" Alice asked her mother.',
]

gen_config = GenerationConfig(
    pad_token_id=model.config.pad_token_id,
    bos_token_id=model.config.bos_token_id,
    eos_token_id=model.config.eos_token_id,
    do_sample=True,
    top_k=20,
    top_p=0.9,
    temperature=0.7,
    repitition_penalty=1.15,
)

for s in generate_text(model, tokenizer, prompts, gen_config, 100, "cuda:0"):
    print(s)
    print(f"{'-' * 40}")

Alice was so tired when she got back home so she went [START] to sleep. She closed her eyes and dreamt about her dream. Alice was always sleeping. Alice had a big dream about a brave little girl. In her dream, she saw a big, scary monster. The monster was very scary and scared. She had a big smile and a big smile.

The monster said, "I mustn't run away, little girl. The monster is not scary anymore." She was sad. She wanted to go away. The
----------------------------------------
Jack and Lily liked to watch the moon at night. They noticed that the moon changed its shape every night. Sometimes the moon was big and round, and sometimes it was [START] moon.

One night, they heard a loud noise outside. It was a big, scary monster! It was making a loud noise and a big monster appeared. The monster said, "Be careful, the moon. It can hurt you." Lily said, "Don't worry, monster. I will protect you."

But the moon did not move. The moon did not move. The moon was in a dark. The moon was
-----

## Train Hugginface LLama Model

Next, let's try training a Llama model using the Huggingface implementation.

Train the model on the CLI

```bash
forgather -t train_hf_llama.yaml train
```

In [None]:
nb.display_config(config_template="train_hf_llama.yaml", show_pp_config=True, show_generated_code=False)

## Let's See What Happens...

...if we replace the post-layer-norm implementation with a pre-layer-norm implementation.

In [None]:
nb.display_config(config_template="experimental_llama.yaml", show_pp_config=True, show_generated_code=False)

```bash
forgather -t experimental_llama.yaml train
```

## Test Model With the Inference Server

There is a simple OpenAI compatible inference server implementation in "tools/inference_server"  

To host your newly trained model on the inference server:

```bash
./server.py server_configs/tiny_llama.yaml
```

From another session, you can perform text completion like this:

```bash
./client.py client_configs/tiny_llama.yaml --stream --completion "Once upon a time,"
```

The Tiny Llama model, trained on Tiny Stories, will not be very good at interactive chat, but you cat test this with the following command:

```bash
./client.py client_configs/tiny_llama.yaml --stream --interactive
```

This server should work with other OpenAI compatible clients as well.