# Project Index

[Custom Model Notebook](../../../notebooks/custom_model.ipynb)  
[Training Notebook](../../../notebooks/train.ipynb)  
[Project Config Notebook](../../../notebooks/project_config.ipynb)  
[Forgather Notebook](../../../notebooks/forgather.ipynb)  

In [4]:
import forgather.nb.notebooks as nb

nb.display_project_index(config_template="", show_pp_config=True, show_generated_code=False)

## Dynamic Models

This is a demonstraction of how to perform model archetecture experiments by using the configuration system to dynamically change module types.

As most of the examples, we use "Tiny Causal" as a baseline, then make various changes for comparison.

### Common Configuration
- Tokenizer: tokenizers/tiny_2k_bpe.yaml
    - Vocabulary Size: 2000
    - Maximum Model Sequence: 2048
- Dataset: datasets/tiny/tiny_stories_abridged.yaml
    - Dataset ID: roneneldan/TinyStories
    - Reference: https://arxiv.org/abs/2305.07759
    - Train Select Range: 10% 
- Model:
    - Model Dimension: 256
    - MLP Dimension: 1024
    - Layers: 4
    - Heads: 2
    - All Dropout Probabilities: 0.0
- Trainer:
    - Class: aiws.trainer.Trainer
    - Epochs: 1
    - Initial Learning Rate: 1.0e-3
    - Train Batch Size: 32
    - LR Sheduler: Cosine

#### Project Directory: "/home/dinalt/ai_assets/forgather/examples/trainers/dynamic_models"

## Meta Config
Meta Config: [/home/dinalt/ai_assets/forgather/examples/trainers/dynamic_models/meta.yaml](meta.yaml)

- [meta.yaml](meta.yaml)
    - [meta_defaults.yaml](../../../forgather_workspace/meta_defaults.yaml)
        - [base_directories.yaml](../../../forgather_workspace/base_directories.yaml)

Template Search Paths:
- [/home/dinalt/ai_assets/forgather/examples/trainers/dynamic_models/templates](templates)
- [/home/dinalt/ai_assets/forgather/forgather_workspace](../../../forgather_workspace)
- [/home/dinalt/ai_assets/forgather/templates/tiny_experiments](../../../templates/tiny_experiments)
- [/home/dinalt/ai_assets/forgather/templates/modellib](../../../templates/modellib)
- [/home/dinalt/ai_assets/forgather/templates/base](../../../templates/base)

## Available Configurations
- [pre_ln.yaml](templates/experiments/pre_ln.yaml)
- [walsh_pe.yaml](templates/experiments/walsh_pe.yaml)
- [relu-glu.yaml](templates/experiments/relu-glu.yaml)
- [swish.yaml](templates/experiments/swish.yaml)
- [swi-glu.yaml](templates/experiments/swi-glu.yaml)
- [control.yaml](templates/experiments/control.yaml)

Default Configuration: control.yaml

Active Configuration: control.yaml

## Included Templates
- [experiments/control.yaml](templates/experiments/control.yaml)
    - [models/control.yaml](templates/models/control.yaml)
        - [models/tiny/tiny_causal.yaml](../../../templates/tiny_experiments/models/tiny/tiny_causal.yaml)
            - [tokenizers/tiny_2k.yaml](../../../templates/tiny_experiments/tokenizers/tiny_2k.yaml)
            - [models/dynamic_causal_transformer.yaml](../../../templates/modellib/models/dynamic_causal_transformer.yaml)
                - [models/abstract/dynamic_causal_lm.yaml](../../../templates/base/models/abstract/dynamic_causal_lm.yaml)
                    - [models/abstract/custom_causal_lm.yaml](../../../templates/base/models/abstract/custom_causal_lm.yaml)
                        - [models/abstract/base_language_model.yaml](../../../templates/base/models/abstract/base_language_model.yaml)
                            - [inc/formatting.jinja](../../../templates/base/inc/formatting.jinja)
    - [project.yaml](templates/project.yaml)
        - [projects/tiny.yaml](../../../templates/tiny_experiments/projects/tiny.yaml)
            - [prompts/tiny_stories.yaml](../../../templates/tiny_experiments/prompts/tiny_stories.yaml)
            - [types/training_script/causal_lm/causal_lm.yaml](../../../templates/base/types/training_script/causal_lm/causal_lm.yaml)
                - [trainers/trainer.yaml](../../../templates/base/trainers/trainer.yaml)
                    - [trainers/base_trainer.yaml](../../../templates/base/trainers/base_trainer.yaml)
                        - [trainers/minimal_trainer.yaml](../../../templates/base/trainers/minimal_trainer.yaml)
                - [callbacks/loggers.yaml](../../../templates/base/callbacks/loggers.yaml)
                    - [callbacks/base_callbacks.yaml](../../../templates/base/callbacks/base_callbacks.yaml)
                - [models/abstract/load_model.yaml](../../../templates/base/models/abstract/load_model.yaml)
                    - [models/abstract/causal_lm_from_pretrained.yaml](../../../templates/base/models/abstract/causal_lm_from_pretrained.yaml)
                - [types/training_script/training_script.yaml](../../../templates/base/types/training_script/training_script.yaml)
                    - [types/type.yaml](../../../templates/base/types/type.yaml)
                        - [base_directories.yaml](../../../forgather_workspace/base_directories.yaml)
            - [tiny.callbacks](../../../templates/tiny_experiments/projects/tiny.yaml)
            - [tiny.model_config](../../../templates/tiny_experiments/projects/tiny.yaml)
            - [tiny.trainer_config](../../../templates/tiny_experiments/projects/tiny.yaml)
            - [tiny.dataset_config](../../../templates/tiny_experiments/projects/tiny.yaml)
                - [datasets/tiny/tiny_stories_abridged.yaml](../../../templates/tiny_experiments/datasets/tiny/tiny_stories_abridged.yaml)
                    - [datasets/tiny/tiny_stories.yaml](../../../templates/tiny_experiments/datasets/tiny/tiny_stories.yaml)
                        - [datasets/abstract/base_datasets.yaml](../../../templates/base/datasets/abstract/base_datasets.yaml)
        - [project.trainer_config](templates/project.yaml)
### Config Metadata:

```python
{'config_class': 'type.training_script.causal_lm',
 'config_description': 'Tiny Causal; the baseline control',
 'config_name': 'Control',
 'create_new_model': 'True',
 'datasets_dir': '/home/dinalt/ai_assets/forgather/datasets',
 'eval': 'False',
 'forgather_dir': '/home/dinalt/ai_assets/forgather',
 'logging_dir': './output_models/tiny_causal/runs/control_2025-06-08T19-59-16',
 'model_src_dir': '/home/dinalt/ai_assets/forgather/model_src',
 'models_dir': './output_models',
 'output_dir': './output_models/tiny_causal',
 'project_dir': '.',
 'save_model': 'False',
 'tokenizers_dir': '/home/dinalt/ai_assets/forgather/tokenizers',
 'train': 'True',
 'workspace_root': '/home/dinalt/ai_assets/forgather'}

```

## Output Targets
- meta
- main
- model_code_writer
- distributed_env
- model
- trainer
- train_dataset
- eval_dataset
- data_collator
- trainer_callbacks
- trainer_args
- optimizer
- lr_scheduler
- model_constructor_args
- tokenizer

## Preprocessed Config

```yaml
#---------------------------------------
#                 Control                
#---------------------------------------
# 2025-06-08T19:59:16
# Description: Tiny Causal; the baseline control
# Project Dir: /home/dinalt/ai_assets/forgather/examples/trainers/dynamic_models
# Current Working Dir: "/home/dinalt/ai_assets/forgather/examples/trainers/dynamic_models"
# Forgather Config Dir: "/home/dinalt/.config/forgather"
# Model: tiny_causal
# Hostname: hal9000
# Versions:
#     python: 3.10.13
#     torch: 2.7.0
#     transformers: 4.51.3
#     accelerate: 1.7.0

############# Config Vars ##############

# ns.forgather_dir: "/home/dinalt/ai_assets/forgather"
# ns.models_dir: "/home/dinalt/ai_assets/forgather/examples/trainers/dynamic_models/output_models"
# ns.project_model_src_dir: "/home/dinalt/ai_assets/forgather/examples/trainers/dynamic_models/model_src"
# ns.tokenizers_dir: "/home/dinalt/ai_assets/forgather/tokenizers"
# ns.datasets_dir: "/home/dinalt/ai_assets/forgather/datasets"
# ns.model_src_dir: "/home/dinalt/ai_assets/forgather/model_src"
# ns.output_dir: "./output_models/tiny_causal"
# ns.logging_dir: "./output_models/tiny_causal/runs/control_2025-06-08T19-59-16"
# ns.create_new_model: True
# ns.save_model: False
# ns.train: True
# ns.eval: False
# ns.trust_remote_code: True

####### Distributed Environment ########

.define: &distributed_env !singleton:forgather.ml.distributed:DistributedEnvironment@distributed_env

############# Dependencies #############

# The model will be given the following prompts for text-gen at regular intervals.
.define: &testprompts !list:@testprompts
    # Test prompts from "https://arxiv.org/abs/2305.07759"
    - "Alice was so tired when she got back home so she went"
    - "Jack and Lily liked to watch the moon at night. They noticed that the moon changed its shape every night. Sometimes the moon was big and round, and sometimes it was"
    - "Jack and Lily saw a rainbow after a rainy day.They were amazed by the colors. Jack said, \"Look, Lily. A rainbow has"
    - "Jack wanted to read a book, so he went to"
    - "\"Can cows fly?\" Alice asked her mother."
    - "\"What do birds like to eat?\" Tom asked his mother."
    - "\"What language do they speak in France?\" Tom asked his mother."
    - "If I throw a ball up in the air, eventually it will"
    - "It was winter and cold outside so his mother told him, \"You should"
    - "Lily likes cats and dogs. She asked her mom for a dog and her mom said no, so instead she asked"
    - "Jack told Mary, \"If you give me your banana, I'll give you my apple.\" Mary gave Jack her Banana, so"
    - "On weekends Jack went to visit his grandmother whereas on weekdays he would go to school. Last weekend, when Jack was on his way to"
    - "Lily and Ben were having an argument. Ben said that cake is much better than ice cream and Lily said that"
    - "Lily and Ben are having an argument. They are trying to decide between the park and the swimming pool. Ben says, \"I want to go to the park\". Lily says"
    - "Jack's mother was not home, and his father was at home. When Jack came home, he said hello to"
    - "Lily doesn't like swimming. When her father wants to take her to the swimming pool, she says"
    - "Both Ben and Lily wanted cake. Father said that there was only one piece of cake left. They"
    - "Ben went to visit Lily in her house, but she was not at home. Ben knocked on the door,"

# Conservative text-generation parameters.
.define: &generation_config !dict:@generation_config
    identity: generation_config
    do_sample: True
    top_k: 20
    top_p: 0.9
    temperature: 0.7
    repitition_penalty: 1.15

################ Model #################

# https://huggingface.co/docs/transformers/en/model_doc/auto
.define: &model_constructor_args {}

# Name: Tiny Causal
# Description: A scaled-down version of the base Causal Transformer
# model_def.cls = "DynamicCasualLM"
# model_def.cfg_cls = "DynamicCausalLMConfig"
# model_def.config_path = "/home/dinalt/ai_assets/forgather/examples/trainers/dynamic_models/output_models/tiny_causal/tiny_causal_transformer.py"
# model_def.model_path = "/home/dinalt/ai_assets/forgather/examples/trainers/dynamic_models/output_models/tiny_causal/tiny_causal_transformer.py"
# model_def.short_name = "tiny_causal_transformer"
# model_def.model_type = "forgather-dynamic-causal-tiny_causal_transformer"
# model_def.model_path = "./output_models/tiny_causal/tiny_causal_transformer.py"
# model_def.model_template_searchpath = "/home/dinalt/ai_assets/forgather/templates/dynamic_models"
# model_def.model_template_name = "causal_lm.py"
# model_def.name_policy = "named"

# **Tokenizer**

# Load custom tokenizer from sub-project definition
.define: &tokenizer !singleton:forgather.ml.construct:load_from_config@tokenizer
    project_dir: "/home/dinalt/ai_assets/forgather/examples/tokenizers/tiny_stories_bpe"
    config_template: "2k.yaml"

# **Model Config**

.define: &model_submodule_searchpath
    - "./model_src"
    - "/home/dinalt/ai_assets/forgather/model_src/bits"
    - "./output_models/tiny_causal"

.define: &loss_fn !singleton:.causal_loss:CausalLoss@loss_fn []

.define: &layer_norm_factory !lambda:torch.nn:LayerNorm@layer_norm_factory
    normalized_shape: !var "hidden_size"

.define: &feedforward_factory !lambda:.feedforward_layer:FeedforwardLayer@feedforward_factory
    d_model: !var "hidden_size"
    d_feedforward: !var "dim_feedforward"
    dropout: !var "activation_dropout"

.define: &attention_factory !lambda:.causal_multihead_attn:CausalMultiheadAttn@attention_factory
    d_model: !var "hidden_size"
    num_heads: !var "num_attention_heads"
    dropout: !var "attention_dropout"

.define: &layer_factory !lambda:.post_ln_layer:PostLNLayer@layer_factory
    feedforward_factory: *feedforward_factory
    attention_factory: *attention_factory
    norm_factory: *layer_norm_factory
    dropout: !var "layer_dropout"
    residual_dropout: !var "residual_dropout"

.define: &layer_stack !singleton:.layer_stack:LayerStack@layer_stack
    layer_factory: *layer_factory
    num_hidden_layers: !var "num_hidden_layers"

.define: &output_decoder !singleton:torch.nn:Linear@output_decoder
    - !var "hidden_size"
    - !var "vocab_size"

.define: &positional_encoder !singleton:.sinusoidal_pe:SinusoidalPE@positional_encoder
    d_model: !var "hidden_size"
    max_sequence_length: !var "max_sequence_length"

.define: &input_encoder !singleton:.input_encoder:InputEncoder@input_encoder
    d_model: !var "hidden_size"
    vocab_size: !var "vocab_size"
    dropout: !var "embedding_dropout"
    positional_encoder: *positional_encoder

.define: &init_weights !lambda:.init_weights:simple_weight_init@init_weights []

.define: &model_factory !singleton:.causal_lm:CasualLM@model_factory
    loss_fn: *loss_fn
    input_encoder: *input_encoder
    output_decoder: *output_decoder
    layer_stack: *layer_stack
    init_weights: *init_weights

.define: &model_code_generator !meta:forgather.codegen:generate_code@model_code_generator
    searchpath: "/home/dinalt/ai_assets/forgather/templates/dynamic_models"
    template_name: "causal_lm.py"
    name_policy: "named"
    obj: *model_factory
    # Template args
    model_type: "forgather-dynamic-causal-tiny_causal_transformer"

.define: &model_code_writer !singleton:forgather.ml.construct:write_file@model_code_writer
    data: *model_code_generator
    output_file: "./output_models/tiny_causal/tiny_causal_transformer.py"
    return_value: "Model constructor generated by Forgather 1.0"    

.define: &model_config !singleton:./output_models/tiny_causal/tiny_causal_transformer.py:DynamicCausalLMConfig@model_config
    submodule_searchpath: *model_submodule_searchpath
    # Set auto-map for custom model; this ensures that the source code stays with the model.
    auto_map:
        AutoConfig: "tiny_causal_transformer.DynamicCausalLMConfig"
        AutoModel: "tiny_causal_transformer.DynamicCasualLM"
    # Get the vocab-size from the tokenizer definition.
    vocab_size: !singleton:len [ *tokenizer ]
    pad_token_id: !singleton:getattr [ *tokenizer, 'pad_token_id' ]
    bos_token_id: !singleton:getattr [ *tokenizer, 'bos_token_id' ]
    eos_token_id: !singleton:getattr [ *tokenizer, 'eos_token_id' ]
    # Add dependency on code generator
    code_generator: *model_code_writer
    hidden_size: 512
    num_attention_heads: 8
    num_hidden_layers: 6
    max_sequence_length: !singleton:getattr
        - *tokenizer
        - "model_max_length"
    dim_feedforward: 2048
    embedding_dropout: 0.10
    layer_dropout: 0.10
    residual_dropout: 0.0
    attention_dropout: 0.0
    activation_dropout: 0.0
    
    # Tiny Causal overrides
    hidden_size: 256
    dim_feedforward: 1024
    num_attention_heads: 2
    num_hidden_layers: 4
    embedding_dropout: 0.1
    layer_dropout: 0.1

# **Model Factory**

.define: &pretrained_model !partial:./output_models/tiny_causal/tiny_causal_transformer.py:DynamicCasualLM@pretrained_model
    args:
        - *model_config
    kwargs:
        submodule_searchpath: *model_submodule_searchpath
        <<: *model_constructor_args

.define: &model !partial:forgather.ml.construct:dependency_list@model
    - !factory:call [ *pretrained_model ]
    - !singleton:forgather.ml.construct:copy_package_files
        - "./output_models/tiny_causal"
        - *model_config

############### Datasets ###############

# Name: TinyStories Abridged
# Define: Abridged to 10% of original size; Dataset containing synthetically generated (by GPT-3.5 and GPT-4) short stories that only use a small vocabulary.
# Source: https://arxiv.org/abs/2305.07759
# Train Dataset: "roneneldan/TinyStories" : "train"
# Eval Dataset: "roneneldan/TinyStories" : "validation"

# **Source Datasets**

.define: &train_source_dataset !singleton:datasets:load_dataset@train_source_dataset
    - "roneneldan/TinyStories"

.define: &eval_source_dataset !singleton:datasets:load_dataset@eval_source_dataset
    - "roneneldan/TinyStories"

# **Dataset Splits**

.define: &train_dataset_split !singleton:operator:getitem
    - *train_source_dataset
    - "train"

.define: &eval_dataset_split !singleton:operator:getitem
    - *train_source_dataset
    - "validation"

# **Preprocess Dataset Args**

.define: &preprocess_args
    truncation: True

# **Preprocessed Datasets**

.define: &train_dataset !singleton:forgather.ml.datasets:preprocess_dataset@train_dataset
    dataset: *train_dataset_split
    tokenizer: *tokenizer
    select_range: 0.1
    desc: "Tokenizing train"
    fn_kwargs:
        <<: *preprocess_args

.define: &eval_dataset !singleton:forgather.ml.datasets:preprocess_dataset@eval_dataset
    dataset: *eval_dataset_split
    tokenizer: *tokenizer
    select_range: 500
    desc: "Tokenizing validation split"
    fn_kwargs:
        <<: *preprocess_args

############ Data Collator #############

# Data collator for causal model
# Batches are dynamically padded to longest sequence
# labels are set to input_ids, with pad tokens set to -100
.define: &data_collator !singleton:forgather.ml.data_collator:DataCollatorForCausalLM@DataCollatorForCausalLM
    tokenizer: *tokenizer
    return_tensors: pt

    # Tiny Project
    # Limit maximum sequence length 512 tokens, at the data-collator level.
    truncate_to: 512

########## Trainer Callbacks ###########

# **Dependencies**

# Experiment tracking: Tensorboard SummaryWriter
.define: &summary_writer !singleton:torch.utils.tensorboard:SummaryWriter
    - "./output_models/tiny_causal/runs/control_2025-06-08T19-59-16"

# Additional data to record to experiment loggers
.define: &experiment_info !dict:@experiment_info
    date: "2025-06-08T19:59:16"
    name: "Control"
    description: "Tiny Causal; the baseline control"
    config: !var "pp_config"
    versions: {'python': '3.10.13', 'torch': '2.7.0', 'transformers': '4.51.3', 'accelerate': '1.7.0'}

.define: &text_gen_callback_args
    summary_writer: *summary_writer
    prompts: *testprompts
    generation_config: *generation_config
    max_new_tokens: 40
    generation_steps: 2000

# **Callback List**

.define: &trainer_callbacks !list:@trainer_callbacks
    # Log all training output to JSON
    - !singleton:forgather.ml.json_logger:JsonLogger
        <<: *experiment_info
    # Log configuration and metrics to Tensorboard file
    - !singleton:forgather.ml.tb_logger:TBLogger
        args: [ *summary_writer ]
        kwargs:
            <<: *experiment_info
    - !singleton:forgather.ml.textgen_callback:TextgenCallback
        <<: *text_gen_callback_args

############## Optimizer ###############

.define: &optimizer !lambda:torch:optim.AdamW
    lr: 1.0e-3

############# LR Scheduler #############

# https://arxiv.org/html/2503.02844v1
.define: &lr_scheduler !lambda:forgather.ml.optim.infinite_lr_scheduler:InfiniteLRScheduler@lr_scheduler
    warmup_steps: 5000
    cooldown_steps: 50000
    constant_lr: 1.0e-4

############### Trainer ################

# Name: Custom forgather.ml.trainer.Trainer
# Description: A lightweight, extensible trainer; does not support multiple GPUs

# **Trainer Args**

.define: &trainer_args
    # Minimal Trainer Defaults
    # https://huggingface.co/docs/transformers/en/main_classes/trainer#transformers.TrainingArguments
    output_dir: "./output_models/tiny_causal"
    logging_dir: "./output_models/tiny_causal/runs/control_2025-06-08T19-59-16"
    logging_steps: 500
    per_device_train_batch_size: 16
    per_device_eval_batch_size: 32
    learning_rate: 5.0e-5
    num_train_epochs: 1
    # Base Trainer Defaults
    # https://huggingface.co/docs/transformers/en/main_classes/trainer#transformers.TrainingArguments
    overwrite_output_dir: True
    eval_steps: 100
    eval_strategy: "steps"
    save_strategy: "no"
    logging_strategy: "steps"

    # Tiny Project Overrides
    seed: 42
    per_device_train_batch_size: 32
    per_device_eval_batch_size: 64
    logging_steps: 100
    eval_steps: 500
    learning_rate: 1.0e-3
    num_train_epochs: 1
    dataloader_num_workers: 1

    # max_steps: 500

# **Trainer Constructor**

.define: &trainer !singleton:forgather.ml.trainer:Trainer@trainer
    model_init: *model
    args: !singleton:forgather.ml.trainer_types:TrainingArguments@trainer_args
        <<: *trainer_args
    data_collator: *data_collator
    train_dataset: *train_dataset
    eval_dataset: *eval_dataset
    processing_class: *tokenizer
    callbacks: *trainer_callbacks
    optimizer_factory: *optimizer
    lr_scheduler_factory: *lr_scheduler

#---------------------------------------
#          Configuration Output          
#---------------------------------------
meta: &meta_output !dict:@meta
    config_name: "Control"
    config_description: "Tiny Causal; the baseline control"
    config_class: "type.training_script.causal_lm"
    project_dir: "."
    workspace_root: "/home/dinalt/ai_assets/forgather"
    forgather_dir: "/home/dinalt/ai_assets/forgather"
    models_dir: "./output_models"
    tokenizers_dir: "/home/dinalt/ai_assets/forgather/tokenizers"
    datasets_dir: "/home/dinalt/ai_assets/forgather/datasets"
    output_dir: "./output_models/tiny_causal"
    model_src_dir: "/home/dinalt/ai_assets/forgather/model_src"
    logging_dir: "./output_models/tiny_causal/runs/control_2025-06-08T19-59-16"
    create_new_model: "True"
    save_model: "False"
    train: "True"
    eval: "False"

main: !singleton:forgather.ml.training_script:TrainingScript@training_script
    meta: *meta_output
    do_save: False
    do_train: True
    do_eval: False
    # Init distributed envrionment before initializing anyting which depends on it.
    distributed_env: *distributed_env
    trainer: *trainer
    pp_config: !var "pp_config"

model_code_writer: *model_code_writer
distributed_env: *distributed_env
model: *model
trainer: *trainer
train_dataset: *train_dataset
eval_dataset: *eval_dataset
data_collator: *data_collator
trainer_callbacks: *trainer_callbacks
trainer_args: *trainer_args
optimizer: *optimizer
lr_scheduler: *lr_scheduler
model_constructor_args: *model_constructor_args
tokenizer: *tokenizer

```



In [4]:
from forgather.project import Project
import forgather.nb.notebooks as nb

# Load default baseline config
proj = Project()

In [5]:
# Train model
training_script = proj()
training_script.run()

**** Training Script Started *****
config_name: Control
config_description: Tiny Causal; the baseline control
output_dir: ./output_models/tiny_causal
logging_dir: ./output_models/tiny_causal/runs/control_2025-06-09T02-17-55


  0%|                                                                                                         …

total_examples: 212,000
total_train_samples: 212,000
per_device_train_batch_size: 32
actual_per_device_batch_size: 32
total_train_batch_size: 32
max_steps: 6,625
total_parameters: 4.2M
trainable_parameters: 4.2M
model:
DynamicCasualLM(
  (causal_lm): CasualLM(
    loss_fn=CausalLoss()
    (input_encoder): InputEncoder(
      d_model=256, vocab_size=2000
      (dropout): Dropout(p=0.1, inplace=False)
      (embedding): Embedding(2000, 256)
      (positional_encoder): SinusoidalPE(d_model=256, max_sequence_length=2048)
    )
    (output_decoder): Linear(in_features=256, out_features=2000, bias=True)
    (layer_stack): LayerStack(
      (layers): ModuleDict(
        (0): PostLNLayer(
          (feedforward): FeedforwardLayer(
            d_model=256, d_feedforward=1024
            (linear1): Linear(in_features=256, out_features=1024, bias=True)
            (dropout): Identity()
            (activation): ReLU()
            (linear2): Linear(in_features=1024, out_features=256, bias=True)
  



2025-06-09 02:17:59          100  0.02  train-loss: 7.40545   learning-rate: 2.00e-05
2025-06-09 02:18:01          200  0.03  train-loss: 6.48599   learning-rate: 4.00e-05
2025-06-09 02:18:03          300  0.05  train-loss: 5.8017    learning-rate: 6.00e-05
2025-06-09 02:18:05          400  0.06  train-loss: 5.34018   learning-rate: 8.00e-05
2025-06-09 02:18:06          500  0.08  train-loss: 4.8715    learning-rate: 1.00e-04




  0%|                                                                                                         …

2025-06-09 02:18:07          500  0.08  eval-loss:  5.18221   
2025-06-09 02:18:10          600  0.09  train-loss: 4.52426   learning-rate: 1.20e-04
2025-06-09 02:18:12          700  0.11  train-loss: 4.25684   learning-rate: 1.40e-04
2025-06-09 02:18:14          800  0.12  train-loss: 4.11578   learning-rate: 1.60e-04
2025-06-09 02:18:15          900  0.14  train-loss: 3.9168    learning-rate: 1.80e-04
2025-06-09 02:18:17        1,000  0.15  train-loss: 3.7224    learning-rate: 2.00e-04




  0%|                                                                                                         …

2025-06-09 02:18:17        1,000  0.15  eval-loss:  4.03625   
2025-06-09 02:18:19        1,100  0.17  train-loss: 3.68163   learning-rate: 2.20e-04
2025-06-09 02:18:21        1,200  0.18  train-loss: 3.60762   learning-rate: 2.40e-04
2025-06-09 02:18:23        1,300  0.2   train-loss: 3.52088   learning-rate: 2.60e-04
2025-06-09 02:18:24        1,400  0.21  train-loss: 3.4514    learning-rate: 2.80e-04
2025-06-09 02:18:26        1,500  0.23  train-loss: 3.39678   learning-rate: 3.00e-04




  0%|                                                                                                         …

2025-06-09 02:18:26        1,500  0.23  eval-loss:  3.5571    
2025-06-09 02:18:28        1,600  0.24  train-loss: 3.32648   learning-rate: 3.20e-04
2025-06-09 02:18:29        1,700  0.26  train-loss: 3.25139   learning-rate: 3.40e-04
2025-06-09 02:18:31        1,800  0.27  train-loss: 3.16096   learning-rate: 3.60e-04
2025-06-09 02:18:33        1,900  0.29  train-loss: 3.11015   learning-rate: 3.80e-04
2025-06-09 02:18:35        2,000  0.3   train-loss: 3.10194   learning-rate: 4.00e-04




  0%|                                                                                                         …

2025-06-09 02:18:35        2,000  0.3   eval-loss:  3.18686   
2025-06-09 02:18:39        2,100  0.32  train-loss: 3.0213    learning-rate: 4.20e-04
2025-06-09 02:18:40        2,200  0.33  train-loss: 2.91586   learning-rate: 4.40e-04
2025-06-09 02:18:42        2,300  0.35  train-loss: 2.84726   learning-rate: 4.60e-04
2025-06-09 02:18:44        2,400  0.36  train-loss: 2.88745   learning-rate: 4.80e-04
2025-06-09 02:18:45        2,500  0.38  train-loss: 2.80327   learning-rate: 5.00e-04




  0%|                                                                                                         …

2025-06-09 02:18:45        2,500  0.38  eval-loss:  2.8713    
2025-06-09 02:18:47        2,600  0.39  train-loss: 2.77034   learning-rate: 5.20e-04
2025-06-09 02:18:49        2,700  0.41  train-loss: 2.69055   learning-rate: 5.40e-04
2025-06-09 02:18:50        2,800  0.42  train-loss: 2.68562   learning-rate: 5.60e-04
2025-06-09 02:18:52        2,900  0.44  train-loss: 2.58216   learning-rate: 5.80e-04
2025-06-09 02:18:54        3,000  0.45  train-loss: 2.46821   learning-rate: 6.00e-04




  0%|                                                                                                         …

2025-06-09 02:18:54        3,000  0.45  eval-loss:  2.59739   
2025-06-09 02:18:56        3,100  0.47  train-loss: 2.55538   learning-rate: 6.20e-04
2025-06-09 02:18:57        3,200  0.48  train-loss: 2.58311   learning-rate: 6.40e-04
2025-06-09 02:18:59        3,300  0.5   train-loss: 2.48109   learning-rate: 6.60e-04
2025-06-09 02:19:01        3,400  0.51  train-loss: 2.40831   learning-rate: 6.80e-04
2025-06-09 02:19:03        3,500  0.53  train-loss: 2.4158    learning-rate: 7.00e-04




  0%|                                                                                                         …

2025-06-09 02:19:03        3,500  0.53  eval-loss:  2.44907   
2025-06-09 02:19:04        3,600  0.54  train-loss: 2.45379   learning-rate: 7.20e-04
2025-06-09 02:19:06        3,700  0.56  train-loss: 2.35934   learning-rate: 7.40e-04
2025-06-09 02:19:08        3,800  0.57  train-loss: 2.35905   learning-rate: 7.60e-04
2025-06-09 02:19:10        3,900  0.59  train-loss: 2.41208   learning-rate: 7.80e-04
2025-06-09 02:19:11        4,000  0.6   train-loss: 2.41811   learning-rate: 8.00e-04




  0%|                                                                                                         …

2025-06-09 02:19:12        4,000  0.6   eval-loss:  2.3598    
2025-06-09 02:19:15        4,100  0.62  train-loss: 2.30591   learning-rate: 8.20e-04
2025-06-09 02:19:17        4,200  0.63  train-loss: 2.28682   learning-rate: 8.40e-04
2025-06-09 02:19:19        4,300  0.65  train-loss: 2.3223    learning-rate: 8.60e-04
2025-06-09 02:19:20        4,400  0.66  train-loss: 2.35272   learning-rate: 8.80e-04
2025-06-09 02:19:22        4,500  0.68  train-loss: 2.29876   learning-rate: 9.00e-04




  0%|                                                                                                         …

2025-06-09 02:19:22        4,500  0.68  eval-loss:  2.31837   
2025-06-09 02:19:24        4,600  0.69  train-loss: 2.2197    learning-rate: 9.20e-04
2025-06-09 02:19:26        4,700  0.71  train-loss: 2.21872   learning-rate: 9.40e-04
2025-06-09 02:19:28        4,800  0.72  train-loss: 2.24251   learning-rate: 9.60e-04
2025-06-09 02:19:29        4,900  0.74  train-loss: 2.24681   learning-rate: 9.80e-04
2025-06-09 02:19:31        5,000  0.75  train-loss: 2.25256   learning-rate: 1.00e-03




  0%|                                                                                                         …

2025-06-09 02:19:31        5,000  0.75  eval-loss:  2.2818    
2025-06-09 02:19:33        5,100  0.77  train-loss: 2.24558   learning-rate: 1.00e-03
2025-06-09 02:19:35        5,200  0.78  train-loss: 2.15454   learning-rate: 1.00e-03
2025-06-09 02:19:37        5,300  0.8   train-loss: 2.15086   learning-rate: 1.00e-03
2025-06-09 02:19:39        5,400  0.82  train-loss: 2.17233   learning-rate: 1.00e-03
2025-06-09 02:19:40        5,500  0.83  train-loss: 2.15159   learning-rate: 1.00e-03




  0%|                                                                                                         …

2025-06-09 02:19:41        5,500  0.83  eval-loss:  2.20254   
2025-06-09 02:19:42        5,600  0.85  train-loss: 2.1846    learning-rate: 1.00e-03
2025-06-09 02:19:44        5,700  0.86  train-loss: 2.2152    learning-rate: 1.00e-03
2025-06-09 02:19:46        5,800  0.88  train-loss: 2.15351   learning-rate: 9.99e-04
2025-06-09 02:19:47        5,900  0.89  train-loss: 2.1599    learning-rate: 9.99e-04
2025-06-09 02:19:49        6,000  0.91  train-loss: 2.08832   learning-rate: 9.99e-04




  0%|                                                                                                         …

2025-06-09 02:19:49        6,000  0.91  eval-loss:  2.18569   
2025-06-09 02:19:53        6,100  0.92  train-loss: 2.06596   learning-rate: 9.99e-04
2025-06-09 02:19:55        6,200  0.94  train-loss: 2.12255   learning-rate: 9.99e-04
2025-06-09 02:19:56        6,300  0.95  train-loss: 2.10061   learning-rate: 9.98e-04
2025-06-09 02:19:58        6,400  0.97  train-loss: 2.08376   learning-rate: 9.98e-04
2025-06-09 02:20:00        6,500  0.98  train-loss: 2.10188   learning-rate: 9.98e-04




  0%|                                                                                                         …

2025-06-09 02:20:00        6,500  0.98  eval-loss:  2.109     
2025-06-09 02:20:02        6,600  1.0   train-loss: 2.03015   learning-rate: 9.98e-04
2025-06-09 02:20:02        6,625  1.0   train_runtime: 124.8 train_samples: 212,000 step: 6,625 train_samples_per_second: 1.699e+03 train_steps_per_second: 53.1 train_loss: 2.922 epoch: 1.0 
**** Training Completed *****
{'train_runtime': 124.75879192352295, 'train_samples': 212000, 'step': 6625, 'train_samples_per_second': 1699.279, 'train_steps_per_second': 53.102, 'train_loss': 2.9222500324249268, 'epoch': 1.0}
Model saved to: ./output_models/tiny_causal


In [9]:
nb.generate_trainingscript(proj, "0")

#### Generated Shell Script
[control.sh](control.sh)
```bash
#!/bin/bash
CUDA_VISIBLE_DEVICES='0' torchrun --standalone --nproc-per-node 'gpu' '/home/dinalt/ai_assets/forgather/scripts/train_script.py' -p '/home/dinalt/ai_assets/forgather/examples/trainers/dynamic_models' "control.yaml"

```

In [7]:
nb.display_tb_command(proj, local_host=False)

#### Tensorboard Command

```bash
tensorboard --bind_all --logdir "/home/dinalt/ai_assets/forgather/examples/trainers/dynamic_models/output_models/tiny_causal"
```