# Project Index

[Custom Model Notebook](../../../notebooks/custom_model.ipynb)  
[Training Notebook](../../../notebooks/train.ipynb)  
[Project Config Notebook](../../../notebooks/project_config.ipynb)  
[Forgather Notebook](../../../notebooks/forgather.ipynb)  

In [1]:
import forgather.ml.notebooks as nb

nb.display_project_index(config_template="", materialize=True, pp_first=False)

Repo card metadata block was not found. Setting CardData to empty.


## Dynamic Models

This is a demonstraction of how to perform model archetecture experiments by using the configuration system to dynamically change module types.

As most of the examples, we use "Tiny Causal" as a baseline, then make various changes for comparison.

### Common Configuration
- Tokenizer: tokenizers/tiny_2k_bpe.yaml
    - Vocabulary Size: 2000
    - Maximum Model Sequence: 2048
- Dataset: datasets/tiny/tiny_stories_abridged.yaml
    - Dataset ID: roneneldan/TinyStories
    - Reference: https://arxiv.org/abs/2305.07759
    - Train Select Range: 10% 
- Model:
    - Model Dimension: 256
    - MLP Dimension: 1024
    - Layers: 4
    - Heads: 2
    - All Dropout Probabilities: 0.0
- Trainer:
    - Class: aiws.trainer.Trainer
    - Epochs: 1
    - Initial Learning Rate: 1.0e-3
    - Train Batch Size: 32
    - LR Sheduler: Cosine

#### Project Directory: "/home/dinalt/ai_assets/forgather/examples/trainers/dynamic_models"

## Meta Config
Meta Config: [/home/dinalt/ai_assets/forgather/examples/trainers/dynamic_models/meta.yaml](meta.yaml)

- [meta.yaml](meta.yaml)

Template Search Paths:
- [/home/dinalt/ai_assets/forgather/examples/trainers/dynamic_models/templates](templates)
- [/home/dinalt/ai_assets/forgather/templates/tiny_experiments](../../../templates/tiny_experiments)
- [/home/dinalt/ai_assets/forgather/templates/modellib](../../../templates/modellib)
- [/home/dinalt/ai_assets/forgather/templates/base](../../../templates/base)

## Available Configurations
- [pre_ln.yaml](templates/experiments/pre_ln.yaml)
- [walsh_pe.yaml](templates/experiments/walsh_pe.yaml)
- [relu-glu.yaml](templates/experiments/relu-glu.yaml)
- [swish.yaml](templates/experiments/swish.yaml)
- [swi-glu.yaml](templates/experiments/swi-glu.yaml)
- [control.yaml](templates/experiments/control.yaml)

Default Configuration: control.yaml

Active Configuration: control.yaml

## Available Templates
- [project.yaml](templates/project.yaml)
- [experiments/pre_ln.yaml](templates/experiments/pre_ln.yaml)
- [experiments/walsh_pe.yaml](templates/experiments/walsh_pe.yaml)
- [experiments/relu-glu.yaml](templates/experiments/relu-glu.yaml)
- [experiments/swish.yaml](templates/experiments/swish.yaml)
- [experiments/swi-glu.yaml](templates/experiments/swi-glu.yaml)
- [experiments/control.yaml](templates/experiments/control.yaml)
- [projects/tiny.yaml](../../../templates/tiny_experiments/projects/tiny.yaml)
- [datasets/tiny/tiny_stories.yaml](../../../templates/tiny_experiments/datasets/tiny/tiny_stories.yaml)
- [datasets/tiny/tiny_stories_abridged.yaml](../../../templates/tiny_experiments/datasets/tiny/tiny_stories_abridged.yaml)
- [models/tiny/tiny_causal.yaml](../../../templates/tiny_experiments/models/tiny/tiny_causal.yaml)
- [models/tiny/tiny_gpt2.yaml](../../../templates/tiny_experiments/models/tiny/tiny_gpt2.yaml)
- [models/tiny/tiny_llama.yaml](../../../templates/tiny_experiments/models/tiny/tiny_llama.yaml)
- [models/tiny/tiny_d128_l2.yaml](../../../templates/tiny_experiments/models/tiny/tiny_d128_l2.yaml)
- [prompts/tiny_stories.yaml](../../../templates/tiny_experiments/prompts/tiny_stories.yaml)
- [tokenizers/tiny_2k.yaml](../../../templates/tiny_experiments/tokenizers/tiny_2k.yaml)
- [tokenizers/tiny_8k.yaml](../../../templates/tiny_experiments/tokenizers/tiny_8k.yaml)
- [model_ctor/args.yaml](../../../templates/modellib/model_ctor/args.yaml)
- [models/dynamic_causal_transformer.yaml](../../../templates/modellib/models/dynamic_causal_transformer.yaml)
- [models/causal_transformer.yaml](../../../templates/modellib/models/causal_transformer.yaml)
- [models/gpt2.yaml](../../../templates/modellib/models/gpt2.yaml)
- [models/llama.yaml](../../../templates/modellib/models/llama.yaml)
- [trainers/accel_trainer.yaml](../../../templates/base/trainers/accel_trainer.yaml)
- [trainers/trainer.yaml](../../../templates/base/trainers/trainer.yaml)
- [trainers/hf_trainer.yaml](../../../templates/base/trainers/hf_trainer.yaml)
- [trainers/base_trainer.yaml](../../../templates/base/trainers/base_trainer.yaml)
- [datasets/abstract/pretokenized_dataset.yaml](../../../templates/base/datasets/abstract/pretokenized_dataset.yaml)
- [datasets/abstract/base_datasets.yaml](../../../templates/base/datasets/abstract/base_datasets.yaml)
- [models/abstract/causal_lm_from_config.yaml](../../../templates/base/models/abstract/causal_lm_from_config.yaml)
- [models/abstract/base_language_model.yaml](../../../templates/base/models/abstract/base_language_model.yaml)
- [models/abstract/custom_causal_lm.yaml](../../../templates/base/models/abstract/custom_causal_lm.yaml)
- [models/abstract/causal_lm_from_pretrained.yaml](../../../templates/base/models/abstract/causal_lm_from_pretrained.yaml)
- [models/abstract/dynamic_causal_lm.yaml](../../../templates/base/models/abstract/dynamic_causal_lm.yaml)
- [models/abstract/load_model.yaml](../../../templates/base/models/abstract/load_model.yaml)
- [callbacks/base_callbacks.yaml](../../../templates/base/callbacks/base_callbacks.yaml)
- [callbacks/loggers.yaml](../../../templates/base/callbacks/loggers.yaml)
- [types/meta_template.yaml](../../../templates/base/types/meta_template.yaml)
- [types/type.yaml](../../../templates/base/types/type.yaml)
- [types/tokenizer/tokenizer.yaml](../../../templates/base/types/tokenizer/tokenizer.yaml)
- [types/tokenizer/bpe/bpe.yaml](../../../templates/base/types/tokenizer/bpe/bpe.yaml)
- [types/model/model_type.yaml](../../../templates/base/types/model/model_type.yaml)
- [types/training_script/training_script.yaml](../../../templates/base/types/training_script/training_script.yaml)
- [types/training_script/causal_lm/causal_lm.yaml](../../../templates/base/types/training_script/causal_lm/causal_lm.yaml)

## Included Templates
- [experiments/control.yaml](templates/experiments/control.yaml)
    - [project.yaml](templates/project.yaml)
        - [projects/tiny.yaml](../../../templates/tiny_experiments/projects/tiny.yaml)
            - [datasets/tiny/tiny_stories_abridged.yaml](../../../templates/tiny_experiments/datasets/tiny/tiny_stories_abridged.yaml)
                - [datasets/tiny/tiny_stories.yaml](../../../templates/tiny_experiments/datasets/tiny/tiny_stories.yaml)
                    - [datasets/abstract/base_datasets.yaml](../../../templates/base/datasets/abstract/base_datasets.yaml)
                        - [inc/formatting.jinja](../../../templates/base/inc/formatting.jinja)
            - [prompts/tiny_stories.yaml](../../../templates/tiny_experiments/prompts/tiny_stories.yaml)
            - [types/training_script/causal_lm/causal_lm.yaml](../../../templates/base/types/training_script/causal_lm/causal_lm.yaml)
                - [trainers/trainer.yaml](../../../templates/base/trainers/trainer.yaml)
                    - [trainers/base_trainer.yaml](../../../templates/base/trainers/base_trainer.yaml)
                - [callbacks/loggers.yaml](../../../templates/base/callbacks/loggers.yaml)
                    - [callbacks/base_callbacks.yaml](../../../templates/base/callbacks/base_callbacks.yaml)
                - [models/abstract/load_model.yaml](../../../templates/base/models/abstract/load_model.yaml)
                    - [models/abstract/causal_lm_from_pretrained.yaml](../../../templates/base/models/abstract/causal_lm_from_pretrained.yaml)
                        - [models/abstract/base_language_model.yaml](../../../templates/base/models/abstract/base_language_model.yaml)
                - [types/training_script/training_script.yaml](../../../templates/base/types/training_script/training_script.yaml)
                    - [types/type.yaml](../../../templates/base/types/type.yaml)
            - [tiny.callbacks](../../../templates/tiny_experiments/projects/tiny.yaml)
            - [tiny.model_config](../../../templates/tiny_experiments/projects/tiny.yaml)
                - [models/tiny/tiny_causal.yaml](../../../templates/tiny_experiments/models/tiny/tiny_causal.yaml)
                    - [tokenizers/tiny_2k.yaml](../../../templates/tiny_experiments/tokenizers/tiny_2k.yaml)
                    - [models/dynamic_causal_transformer.yaml](../../../templates/modellib/models/dynamic_causal_transformer.yaml)
                        - [tokenizers/tiny_8k.yaml](../../../templates/tiny_experiments/tokenizers/tiny_8k.yaml)
                        - [models/abstract/dynamic_causal_lm.yaml](../../../templates/base/models/abstract/dynamic_causal_lm.yaml)
                            - [models/abstract/custom_causal_lm.yaml](../../../templates/base/models/abstract/custom_causal_lm.yaml)
            - [tiny.trainer_config](../../../templates/tiny_experiments/projects/tiny.yaml)
        - [project.model_config](templates/project.yaml)
        - [project.trainer_config](templates/project.yaml)
    - [experiment.model_config](templates/experiments/control.yaml)
### Config Metadata:

```python
{'config_description': 'Tiny Causal; the baseline control',
 'config_name': 'Control',
 'create_new_model': 'True',
 'datasets_dir': '../../../datasets',
 'eval': 'False',
 'logging_dir': './output_models/tiny_causal/runs/control_2024-08-10T19-34-05',
 'model_src_dir': '../../../model_src',
 'models_dir': './output_models',
 'output_dir': './output_models/tiny_causal',
 'project_dir': '.',
 'save_model': 'False',
 'tokenizers_dir': '../../../tokenizers',
 'train': 'True'}

```

## Modules
- [../../../model_src/dynamic_causal_lm.py](../../../model_src/dynamic_causal_lm.py) : DynamicCasualLM
    - [/home/dinalt/ai_assets/forgather/examples/trainers/dynamic_models/../../../model_src/dynamic_causal_lm.py](../../../model_src/dynamic_causal_lm.py) : dynamic_causal_lm
        - [/home/dinalt/ai_assets/forgather/examples/trainers/dynamic_models/../../../model_src/bits/causal_loss.py](../../../model_src/bits/causal_loss.py) : dynamic_causal_lm.causal_loss
        - [/home/dinalt/ai_assets/forgather/examples/trainers/dynamic_models/../../../model_src/bits/causal_multihead_attn.py](../../../model_src/bits/causal_multihead_attn.py) : dynamic_causal_lm.causal_multihead_attn
        - [/home/dinalt/ai_assets/forgather/examples/trainers/dynamic_models/../../../model_src/bits/post_ln_layer.py](../../../model_src/bits/post_ln_layer.py) : dynamic_causal_lm.post_ln_layer
        - [/home/dinalt/ai_assets/forgather/examples/trainers/dynamic_models/../../../model_src/bits/input_encoder.py](../../../model_src/bits/input_encoder.py) : dynamic_causal_lm.input_encoder
        - [/home/dinalt/ai_assets/forgather/examples/trainers/dynamic_models/../../../model_src/bits/init_weights.py](../../../model_src/bits/init_weights.py) : dynamic_causal_lm.init_weights
        - [/home/dinalt/ai_assets/forgather/examples/trainers/dynamic_models/../../../model_src/bits/causal_lm.py](../../../model_src/bits/causal_lm.py) : dynamic_causal_lm.causal_lm
        - [/home/dinalt/ai_assets/forgather/examples/trainers/dynamic_models/../../../model_src/bits/sinusoidal_pe.py](../../../model_src/bits/sinusoidal_pe.py) : dynamic_causal_lm.sinusoidal_pe
        - [/home/dinalt/ai_assets/forgather/examples/trainers/dynamic_models/../../../model_src/bits/layer_stack.py](../../../model_src/bits/layer_stack.py) : dynamic_causal_lm.layer_stack
        - [/home/dinalt/ai_assets/forgather/examples/trainers/dynamic_models/../../../model_src/bits/feedforward_layer.py](../../../model_src/bits/feedforward_layer.py) : dynamic_causal_lm.feedforward_layer
        - [/home/dinalt/ai_assets/forgather/examples/trainers/dynamic_models/./output_models/tiny_causal/model_factory.py](output_models/tiny_causal/model_factory.py) : dynamic_causal_lm.model_factory
- [../../../model_src/dynamic_causal_lm.py](../../../model_src/dynamic_causal_lm.py) : DynamicCausalLMConfig
## Preprocessed Config

```yaml
#---------------------------------------
#                 Control                
#---------------------------------------
# 2024-08-10T19:34:05
# Description: Tiny Causal; the baseline control
# Project Dir: .
# Current Working Dir: "/home/dinalt/ai_assets/forgather/examples/trainers/dynamic_models"
# Forgather Config Dir: "/home/dinalt/.config/forgather"
# Model: tiny_causal
# Hostname: hal9000
# Versions:
#     python: 3.10.13
#     torch: 2.3.1
#     transformers: 4.41.2
#     accelerate: 0.31.0

############# Config Vars ##############

# ns.forgather_dir: "../../.."
# ns.models_dir: "./output_models"
# ns.project_model_src_dir: "./model_src"
# ns.tokenizers_dir: "../../../tokenizers"
# ns.datasets_dir: "../../../datasets"
# ns.model_src_dir: "../../../model_src"
# ns.output_dir: "./output_models/tiny_causal"
# ns.logging_dir: "./output_models/tiny_causal/runs/control_2024-08-10T19-34-05"
# ns.create_new_model: True
# ns.save_model: False
# ns.train: True
# ns.eval: False
# ns.trust_remote_code: True

####### Distributed Environment ########

.define: &distributed_env !singleton:forgather.ml.distributed:DistributedEnvironment@distributed_env

############# Dependencies #############

# The model will be given the following prompts for text-gen at regular intervals.
.define: &testprompts !list:@testprompts
    # Test prompts from "https://arxiv.org/abs/2305.07759"
    - "Alice was so tired when she got back home so she went"
    - "Jack and Lily liked to watch the moon at night. They noticed that the moon changed its shape every night. Sometimes the moon was big and round, and sometimes it was"
    - "Jack and Lily saw a rainbow after a rainy day.They were amazed by the colors. Jack said, \"Look, Lily. A rainbow has"
    - "Jack wanted to read a book, so he went to"
    - "\"Can cows fly?\" Alice asked her mother."
    - "\"What do birds like to eat?\" Tom asked his mother."
    - "\"What language do they speak in France?\" Tom asked his mother."
    - "If I throw a ball up in the air, eventually it will"
    - "It was winter and cold outside so his mother told him, \"You should"
    - "Lily likes cats and dogs. She asked her mom for a dog and her mom said no, so instead she asked"
    - "Jack told Mary, \"If you give me your banana, I'll give you my apple.\" Mary gave Jack her Banana, so"
    - "On weekends Jack went to visit his grandmother whereas on weekdays he would go to school. Last weekend, when Jack was on his way to"
    - "Lily and Ben were having an argument. Ben said that cake is much better than ice cream and Lily said that"
    - "Lily and Ben are having an argument. They are trying to decide between the park and the swimming pool. Ben says, \"I want to go to the park\". Lily says"
    - "Jack's mother was not home, and his father was at home. When Jack came home, he said hello to"
    - "Lily doesn't like swimming. When her father wants to take her to the swimming pool, she says"
    - "Both Ben and Lily wanted cake. Father said that there was only one piece of cake left. They"
    - "Ben went to visit Lily in her house, but she was not at home. Ben knocked on the door,"

# Conservative text-generation parameters.
.define: &generation_config !dict:@generation_config
    identity: generation_config
    do_sample: True
    top_k: 20
    top_p: 0.9
    temperature: 0.7
    repitition_penalty: 1.15

################ Model #################

# https://huggingface.co/docs/transformers/en/model_doc/auto
.define: &model_constructor_args {}

# Name: Tiny Causal
# Description: A scaled-down version of the base Causal Transformer
# model_def.cls = "DynamicCasualLM"
# model_def.cfg_cls = "DynamicCausalLMConfig"
# model_def.config_path = "../../../model_src/dynamic_causal_lm.py"
# model_def.model_path = "../../../model_src/dynamic_causal_lm.py"

# **Tokenizer**

# Load custom tokenizer from sub-project definition
.define: &tokenizer !singleton:forgather.ml.construct:load_from_config@tokenizer
    project_dir: "../../../examples/tokenizers/tiny_stories_bpe"
    config_template: "2k.yaml"

# **Model Config**

.define: &model_submodule_searchpath
    - "../../../model_src/bits"
    - "./output_models/tiny_causal"

.define: &loss_fn_factory !factory:.causal_loss:CausalLoss@loss_fn_factory []

.define: &layer_norm_factory !factory:torch.nn:LayerNorm@layer_norm_factory
    normalized_shape: !var "hidden_size"

.define: &feedforward_factory !factory:.feedforward_layer:FeedforwardLayer@feedforward_factory
    d_model: !var "hidden_size"
    d_feedforward: !var "dim_feedforward"
    dropout: !var "activation_dropout"

.define: &attention_factory !factory:.causal_multihead_attn:CausalMultiheadAttn@attention_factory
    d_model: !var "hidden_size"
    num_heads: !var "num_attention_heads"
    dropout: !var "attention_dropout"

.define: &layer_factory !lambda:.post_ln_layer:PostLNLayer@layer_factory
    feedforward: *feedforward_factory
    attention: *attention_factory
    norm1: *layer_norm_factory
    norm2: *layer_norm_factory
    dropout: !var "layer_dropout"
    residual_dropout: !var "residual_dropout"

.define: &layer_stack_factory !factory:.layer_stack:LayerStack@layer_stack_factory
    layer_factory: *layer_factory
    num_hidden_layers: !var "num_hidden_layers"

.define: &output_decoder_factory !factory:torch.nn:Linear@output_decoder_factory
    - !var "hidden_size"
    - !var "vocab_size"

.define: &positional_encoder_factory !factory:.sinusoidal_pe:SinusoidalPE@positional_encoder_factory
    d_model: !var "hidden_size"
    max_sequence_length: !var "max_sequence_length"

.define: &input_encoder_factory !factory:.input_encoder:InputEncoder@input_encoder_factory
    d_model: !var "hidden_size"
    vocab_size: !var "vocab_size"
    dropout: !var "embedding_dropout"
    positional_encoder: *positional_encoder_factory

.define: &init_weights_factory !factory:.init_weights:InitWeights@init_weights_factory
    std: !var "initializer_range"

.define: &model_factory !singleton:.causal_lm:CasualLM@model_factory
    loss_fn: *loss_fn_factory
    input_encoder: *input_encoder_factory
    output_decoder: *output_decoder_factory
    layer_stack: *layer_stack_factory
    init_weights: *init_weights_factory

.define: &model_code_writer !singleton:forgather.ml.construct:write_file@model_code_writer
    data: &model_code_generator !meta:forgather.codegen:generate_code@model_code_generator
        obj: *model_factory
        factory_name: "construct_model"
        relaxed_kwargs: True
        name_policy: "named"
    output_file: "./output_models/tiny_causal/model_factory.py"
    return_value: "Model constructor generated by Forgather 1.0"

.define: &model_config !singleton:../../../model_src/dynamic_causal_lm.py:DynamicCausalLMConfig@model_config
    submodule_searchpath: *model_submodule_searchpath
    # Set auto-map for custom model; this ensures that the source code stays with the model.
    auto_map:
        AutoConfig: "dynamic_causal_lm.DynamicCausalLMConfig"
        AutoModel: "dynamic_causal_lm.DynamicCasualLM"
    # Get the vocab-size from the tokenizer definition.
    vocab_size: !singleton:len [ *tokenizer ]
    pad_token_id: !singleton:getattr [ *tokenizer, 'pad_token_id' ]
    bos_token_id: !singleton:getattr [ *tokenizer, 'bos_token_id' ]
    eos_token_id: !singleton:getattr [ *tokenizer, 'eos_token_id' ]
    # Add dependency on code generator
    code_generator: *model_code_writer
    hidden_size: 512
    num_attention_heads: 8
    num_hidden_layers: 6
    max_sequence_length: !singleton:getattr
        - *tokenizer
        - "model_max_length"
    dim_feedforward: 2048
    initializer_range: 0.02
    embedding_dropout: 0.10
    layer_dropout: 0.10
    residual_dropout: 0.0
    attention_dropout: 0.0
    activation_dropout: 0.0
    
    # Tiny Causal overrides
    hidden_size: 256
    dim_feedforward: 1024
    num_attention_heads: 2
    num_hidden_layers: 4
    embedding_dropout: 0.0
    layer_dropout: 0.0

# **Model Constructor**

.define: &pretrained_model !singleton:../../../model_src/dynamic_causal_lm.py:DynamicCasualLM@pretrained_model
    args:
        - *model_config
    kwargs:
        submodule_searchpath: *model_submodule_searchpath
        <<: *model_constructor_args

.define: &model !singleton:forgather.ml.construct:dependency_list@model
    - *pretrained_model
    - !singleton:forgather.ml.construct:copy_package_files
        - "./output_models/tiny_causal"
        - *model_config
    - !singleton:forgather.ml.construct:copy_package_files
        - "./output_models/tiny_causal"
        - *pretrained_model

############### Datasets ###############

# Name: TinyStories Abridged
# Define: Abridged to 10% of original size; Dataset containing synthetically generated (by GPT-3.5 and GPT-4) short stories that only use a small vocabulary.
# Source: https://arxiv.org/abs/2305.07759
# Train Dataset: "roneneldan/TinyStories" : "train"
# Eval Dataset: "roneneldan/TinyStories" : "validation"

# **Source Datasets**

.define: &train_source_dataset !singleton:datasets:load_dataset@train_source_dataset
    - "roneneldan/TinyStories"

.define: &eval_source_dataset !singleton:datasets:load_dataset@eval_source_dataset
    - "roneneldan/TinyStories"

# **Dataset Splits**

.define: &train_dataset_split !singleton:operator:getitem
    - *train_source_dataset
    - "train"

.define: &eval_dataset_split !singleton:operator:getitem
    - *train_source_dataset
    - "validation"

# **Tokenize Args**

.define: &tokenize_args
    truncation: True

# **Tokenized Datasets**

.define: &train_dataset !singleton:forgather.ml.datasets:tokenize_dataset@train_dataset
    dataset: *train_dataset_split
    tokenizer: *tokenizer
    select_range: 0.1
    desc: "Tokenizing train"
    fn_kwargs:
        <<: *tokenize_args

.define: &eval_dataset !singleton:forgather.ml.datasets:tokenize_dataset@eval_dataset
    dataset: *eval_dataset_split
    tokenizer: *tokenizer
    select_range: 500
    desc: "Tokenizing validation split"
    fn_kwargs:
        <<: *tokenize_args

############ Data Collator #############

# Data collator for causal model
# Batches are dynamically padded to longest sequence
# labels are set to input_ids, with pad tokens set to -100
# https://huggingface.co/docs/transformers/en/main_classes/data_collator#transformers.DataCollatorForLanguageModeling
.define: &data_collator !singleton:transformers:DataCollatorForLanguageModeling@data_collator
    args:
        - *tokenizer
    kwargs:
        mlm: False
        return_tensors: pt

########## Trainer Callbacks ###########

# **Dependencies**

# Experiment tracking: Tensorboard SummaryWriter
.define: &summary_writer !singleton:torch.utils.tensorboard:SummaryWriter
    - "./output_models/tiny_causal/runs/control_2024-08-10T19-34-05"

# Additional data to record to experiment loggers
.define: &experiment_info !dict:@experiment_info
    date: "2024-08-10T19:34:05"
    name: "Control"
    description: "Tiny Causal; the baseline control"
    config: !var "pp_config"
    versions: {'python': '3.10.13', 'torch': '2.3.1', 'transformers': '4.41.2', 'accelerate': '0.31.0'}

.define: &text_gen_callback_args
    summary_writer: *summary_writer
    prompts: *testprompts
    generation_config: *generation_config
    max_new_tokens: 40
    generation_steps: 2000

# **Callback List**

.define: &trainer_callbacks !list:@trainer_callbacks
    # Log all training output to JSON
    - !singleton:forgather.ml.json_logger:JsonLogger
        <<: *experiment_info
    # Log configuration and metrics to Tensorboard file
    - !singleton:forgather.ml.tb_logger:TBLogger
        args: [ *summary_writer ]
        kwargs:
            <<: *experiment_info
    - !singleton:forgather.ml.textgen_callback:TextgenCallback
        <<: *text_gen_callback_args

############### Trainer ################

# Name: Custom forgather.ml.trainer.Trainer
# Description: A lightweight, extensible trainer; does not support multiple GPUs

# **Trainer Args**

.define: &trainer_args
    # Base Trainer Defaults
    # https://huggingface.co/docs/transformers/en/main_classes/trainer#transformers.TrainingArguments
    output_dir: "./output_models/tiny_causal"
    logging_dir: "./output_models/tiny_causal/runs/control_2024-08-10T19-34-05"
    overwrite_output_dir: True
    per_device_train_batch_size: 16
    per_device_eval_batch_size: 32
    learning_rate: 1.0e-4
    num_train_epochs: 1
    eval_steps: 100
    logging_steps: 500
    eval_strategy: "steps"
    save_strategy: "no"
    logging_strategy: "steps"
    lr_scheduler_type: "cosine"

    # Tiny Project Overrides
    per_device_train_batch_size: 32
    per_device_eval_batch_size: 64
    logging_steps: 100
    eval_steps: 500
    learning_rate: 1.0e-3
    num_train_epochs: 1
    lr_scheduler_type: "cosine"

    # max_steps: 500

# **Trainer Constructor**

.define: &trainer !singleton:forgather.ml.trainer:Trainer@trainer
    model: *model
    args: !singleton:forgather.ml.trainer_types:TrainingArguments@trainer_args
        <<: *trainer_args
    data_collator: *data_collator
    train_dataset: *train_dataset
    eval_dataset: *eval_dataset
    tokenizer: *tokenizer
    callbacks: *trainer_callbacks

#---------------------------------------
#          Configuration Output          
#---------------------------------------
meta: &meta_output !dict:@meta
    config_name: "Control"
    config_description: "Tiny Causal; the baseline control"
    project_dir: "."
    models_dir: "./output_models"
    tokenizers_dir: "../../../tokenizers"
    datasets_dir: "../../../datasets"
    output_dir: "./output_models/tiny_causal"
    model_src_dir: "../../../model_src"
    logging_dir: "./output_models/tiny_causal/runs/control_2024-08-10T19-34-05"
    create_new_model: "True"
    save_model: "False"
    train: "True"
    eval: "False"

main: !singleton:forgather.ml.training_script:TrainingScript@training_script
    meta: *meta_output
    do_save: False
    do_train: True
    do_eval: False
    # Init distributed envrionment before initializing anyting which depends on it.
    distributed_env: *distributed_env
    trainer: *trainer

```

## Loaded Configuration to YAML

```yaml
.define: &meta !singleton:named_dict@meta
    config_name: 'Control'
    config_description: 'Tiny Causal; the baseline control'
    project_dir: '.'
    models_dir: './output_models'
    tokenizers_dir: '../../../tokenizers'
    datasets_dir: '../../../datasets'
    output_dir: './output_models/tiny_causal'
    model_src_dir: '../../../model_src'
    logging_dir: './output_models/tiny_causal/runs/control_2024-08-10T19-34-05'
    create_new_model: 'True'
    save_model: 'False'
    train: 'True'
    eval: 'False'

.define: &distributed_env !singleton:forgather.ml.distributed:DistributedEnvironment@distributed_env []

.define: &tokenizer !singleton:forgather.ml.construct:load_from_config@tokenizer
    project_dir: '../../../examples/tokenizers/tiny_stories_bpe'
    config_template: '2k.yaml'

.define: &loss_fn_factory !factory:.causal_loss:CausalLoss@loss_fn_factory []

.define: &positional_encoder_factory !factory:.sinusoidal_pe:SinusoidalPE@positional_encoder_factory
    d_model: !var hidden_size
    max_sequence_length: !var max_sequence_length

.define: &input_encoder_factory !factory:.input_encoder:InputEncoder@input_encoder_factory
    d_model: !var hidden_size
    vocab_size: !var vocab_size
    dropout: !var embedding_dropout
    positional_encoder: *positional_encoder_factory

.define: &output_decoder_factory !factory:torch.nn:Linear@output_decoder_factory
    - !var hidden_size
    - !var vocab_size

.define: &feedforward_factory !factory:.feedforward_layer:FeedforwardLayer@feedforward_factory
    d_model: !var hidden_size
    d_feedforward: !var dim_feedforward
    dropout: !var activation_dropout

.define: &attention_factory !factory:.causal_multihead_attn:CausalMultiheadAttn@attention_factory
    d_model: !var hidden_size
    num_heads: !var num_attention_heads
    dropout: !var attention_dropout

.define: &layer_norm_factory !factory:torch.nn:LayerNorm@layer_norm_factory
    normalized_shape: !var hidden_size

.define: &layer_factory !lambda:.post_ln_layer:PostLNLayer@layer_factory
    feedforward: *feedforward_factory
    attention: *attention_factory
    norm1: *layer_norm_factory
    norm2: *layer_norm_factory
    dropout: !var layer_dropout
    residual_dropout: !var residual_dropout

.define: &layer_stack_factory !factory:.layer_stack:LayerStack@layer_stack_factory
    layer_factory: *layer_factory
    num_hidden_layers: !var num_hidden_layers

.define: &init_weights_factory !factory:.init_weights:InitWeights@init_weights_factory
    std: !var initializer_range

.define: &model_factory !singleton:.causal_lm:CasualLM@model_factory
    loss_fn: *loss_fn_factory
    input_encoder: *input_encoder_factory
    output_decoder: *output_decoder_factory
    layer_stack: *layer_stack_factory
    init_weights: *init_weights_factory

.define: &model_code_generator !meta:forgather.codegen:generate_code@model_code_generator
    obj: *model_factory
    factory_name: 'construct_model'
    relaxed_kwargs: True
    name_policy: 'named'

.define: &model_code_writer !singleton:forgather.ml.construct:write_file@model_code_writer
    data: *model_code_generator
    output_file: './output_models/tiny_causal/model_factory.py'
    return_value: 'Model constructor generated by Forgather 1.0'

.define: &model_config !singleton:../../../model_src/dynamic_causal_lm.py:DynamicCausalLMConfig@model_config
    auto_map: 
        AutoConfig: 'dynamic_causal_lm.DynamicCausalLMConfig'
        AutoModel: 'dynamic_causal_lm.DynamicCasualLM'
    vocab_size: !singleton:len
        - *tokenizer
    pad_token_id: !singleton:getattr
        - *tokenizer
        - 'pad_token_id'
    bos_token_id: !singleton:getattr
        - *tokenizer
        - 'bos_token_id'
    eos_token_id: !singleton:getattr
        - *tokenizer
        - 'eos_token_id'
    code_generator: *model_code_writer
    hidden_size: 256
    num_attention_heads: 2
    num_hidden_layers: 4
    max_sequence_length: !singleton:getattr
        - *tokenizer
        - 'model_max_length'
    dim_feedforward: 1024
    initializer_range: 0.02
    embedding_dropout: 0.0
    layer_dropout: 0.0
    residual_dropout: 0.0
    attention_dropout: 0.0
    activation_dropout: 0.0

.define: &pretrained_model !singleton:../../../model_src/dynamic_causal_lm.py:DynamicCasualLM@pretrained_model
    - *model_config

.define: &model !singleton:forgather.ml.construct:dependency_list@model
    - *pretrained_model
    - !singleton:forgather.ml.construct:copy_package_files
        - './output_models/tiny_causal'
        - *model_config
    - !singleton:forgather.ml.construct:copy_package_files
        - './output_models/tiny_causal'
        - *pretrained_model

.define: &trainer_args !singleton:forgather.ml.trainer_types:TrainingArguments@trainer_args
    output_dir: './output_models/tiny_causal'
    logging_dir: './output_models/tiny_causal/runs/control_2024-08-10T19-34-05'
    overwrite_output_dir: True
    per_device_train_batch_size: 32
    per_device_eval_batch_size: 64
    learning_rate: 0.001
    num_train_epochs: 1
    eval_steps: 500
    logging_steps: 100
    eval_strategy: 'steps'
    save_strategy: 'no'
    logging_strategy: 'steps'
    lr_scheduler_type: 'cosine'

.define: &data_collator !singleton:transformers:DataCollatorForLanguageModeling@data_collator
    args:
        - *tokenizer
    kwargs:
        mlm: False
        return_tensors: 'pt'

.define: &train_source_dataset !singleton:datasets:load_dataset@train_source_dataset
    - 'roneneldan/TinyStories'

.define: &train_dataset !singleton:forgather.ml.datasets:tokenize_dataset@train_dataset
    dataset: !singleton:operator:getitem
        - *train_source_dataset
        - 'train'
    tokenizer: *tokenizer
    select_range: 0.1
    desc: 'Tokenizing train'
    fn_kwargs: 
        truncation: True

.define: &eval_dataset !singleton:forgather.ml.datasets:tokenize_dataset@eval_dataset
    dataset: !singleton:operator:getitem
        - *train_source_dataset
        - 'validation'
    tokenizer: *tokenizer
    select_range: 500
    desc: 'Tokenizing validation split'
    fn_kwargs: 
        truncation: True

.define: &alpha_ !singleton:torch.utils.tensorboard:SummaryWriter
    - './output_models/tiny_causal/runs/control_2024-08-10T19-34-05'

.define: &testprompts !singleton:named_list@testprompts
    - 'Alice was so tired when she got back home so she went'
    - 'Jack and Lily liked to watch the moon at night. They noticed that the moon changed its shape every night. Sometimes the moon was big and round, and sometimes it was'
    - 'Jack and Lily saw a rainbow after a rainy day.They were amazed by the colors. Jack said, "Look, Lily. A rainbow has'
    - 'Jack wanted to read a book, so he went to'
    - '"Can cows fly?" Alice asked her mother.'
    - '"What do birds like to eat?" Tom asked his mother.'
    - '"What language do they speak in France?" Tom asked his mother.'
    - 'If I throw a ball up in the air, eventually it will'
    - 'It was winter and cold outside so his mother told him, "You should'
    - 'Lily likes cats and dogs. She asked her mom for a dog and her mom said no, so instead she asked'
    - 'Jack told Mary, "If you give me your banana, I\'ll give you my apple." Mary gave Jack her Banana, so'
    - 'On weekends Jack went to visit his grandmother whereas on weekdays he would go to school. Last weekend, when Jack was on his way to'
    - 'Lily and Ben were having an argument. Ben said that cake is much better than ice cream and Lily said that'
    - 'Lily and Ben are having an argument. They are trying to decide between the park and the swimming pool. Ben says, "I want to go to the park". Lily says'
    - "Jack's mother was not home, and his father was at home. When Jack came home, he said hello to"
    - "Lily doesn't like swimming. When her father wants to take her to the swimming pool, she says"
    - 'Both Ben and Lily wanted cake. Father said that there was only one piece of cake left. They'
    - 'Ben went to visit Lily in her house, but she was not at home. Ben knocked on the door,'

.define: &generation_config !singleton:named_dict@generation_config
    identity: 'generation_config'
    do_sample: True
    top_k: 20
    top_p: 0.9
    temperature: 0.7
    repitition_penalty: 1.15

.define: &trainer_callbacks !singleton:named_list@trainer_callbacks
    - !singleton:forgather.ml.json_logger:JsonLogger
        date: '2024-08-10T19:34:05'
        name: 'Control'
        description: 'Tiny Causal; the baseline control'
        config: !var pp_config
        versions: 
            python: '3.10.13'
            torch: '2.3.1'
            transformers: '4.41.2'
            accelerate: '0.31.0'
    - !singleton:forgather.ml.tb_logger:TBLogger
        args:
            - *alpha_
        kwargs:
            date: '2024-08-10T19:34:05'
            name: 'Control'
            description: 'Tiny Causal; the baseline control'
            config: !var pp_config
            versions: 
                python: '3.10.13'
                torch: '2.3.1'
                transformers: '4.41.2'
                accelerate: '0.31.0'
    - !singleton:forgather.ml.textgen_callback:TextgenCallback
        summary_writer: *alpha_
        prompts: *testprompts
        generation_config: *generation_config
        max_new_tokens: 40
        generation_steps: 2000

.define: &trainer !singleton:forgather.ml.trainer:Trainer@trainer
    model: *model
    args: *trainer_args
    data_collator: *data_collator
    train_dataset: *train_dataset
    eval_dataset: *eval_dataset
    tokenizer: *tokenizer
    callbacks: *trainer_callbacks

.define: &training_script !singleton:forgather.ml.training_script:TrainingScript@training_script
    meta: *meta
    do_save: False
    do_train: True
    do_eval: False
    distributed_env: *distributed_env
    trainer: *trainer


meta: *meta
main: *training_script

```

### Generated Source Code

```python
from forgather.ml.construct import load_from_config
from transformers import DataCollatorForLanguageModeling
from forgather.ml.json_logger import JsonLogger
from datasets import load_dataset
from forgather.ml.trainer_types import TrainingArguments
from forgather.ml.datasets import tokenize_dataset
from forgather.ml.textgen_callback import TextgenCallback
from forgather.ml.distributed import DistributedEnvironment
from torch.utils.tensorboard import SummaryWriter
from forgather.ml.training_script import TrainingScript
from forgather.ml.construct import dependency_list
from forgather.ml.construct import copy_package_files
from forgather.ml.construct import write_file
from forgather.ml.trainer import Trainer
from forgather.ml.tb_logger import TBLogger
from importlib.util import spec_from_file_location, module_from_spec
import os
import sys

# Import a dynamic module.
def dynimport(module, name, searchpath):
    module_path = module
    module_name = os.path.basename(module).split(".")[0]
    module_spec = spec_from_file_location(
        module_name,
        module_path,
        submodule_search_locations=searchpath,
    )
    mod = module_from_spec(module_spec)
    sys.modules[module_name] = mod
    module_spec.loader.exec_module(mod)
    for symbol in name.split("."):
        mod = getattr(mod, symbol)
    return mod

DynamicCausalLMConfig = lambda: dynimport("../../../model_src/dynamic_causal_lm.py", "DynamicCausalLMConfig", ['../../../model_src/bits', './output_models/tiny_causal'])
DynamicCasualLM = lambda: dynimport("../../../model_src/dynamic_causal_lm.py", "DynamicCasualLM", ['../../../model_src/bits', './output_models/tiny_causal'])

def construct(
    pp_config,
):
    meta = {
        'config_name': 'Control',
        'config_description': 'Tiny Causal; the baseline control',
        'project_dir': '.',
        'models_dir': './output_models',
        'tokenizers_dir': '../../../tokenizers',
        'datasets_dir': '../../../datasets',
        'output_dir': './output_models/tiny_causal',
        'model_src_dir': '../../../model_src',
        'logging_dir': './output_models/tiny_causal/runs/control_2024-08-10T19-34-05',
        'create_new_model': 'True',
        'save_model': 'False',
        'train': 'True',
        'eval': 'False',
    }

    distributed_env = DistributedEnvironment()

    tokenizer = load_from_config(
        project_dir='../../../examples/tokenizers/tiny_stories_bpe',
        config_template='2k.yaml',
    )

    model_code_writer = write_file(
        data=(
            'from .causal_loss import CausalLoss\n'
            'from .causal_multihead_attn import CausalMultiheadAttn\n'
            'from .post_ln_layer import PostLNLayer\n'
            'from .input_encoder import InputEncoder\n'
            'from .init_weights import InitWeights\n'
            'from .causal_lm import CasualLM\n'
            'from torch.nn import LayerNorm\n'
            'from .sinusoidal_pe import SinusoidalPE\n'
            'from .layer_stack import LayerStack\n'
            'from torch.nn import Linear\n'
            'from .feedforward_layer import FeedforwardLayer\n'
            '\n'
            'def construct_model(\n'
            '    vocab_size,\n'
            '    dim_feedforward,\n'
            '    residual_dropout,\n'
            '    num_hidden_layers,\n'
            '    initializer_range,\n'
            '    activation_dropout,\n'
            '    layer_dropout,\n'
            '    embedding_dropout,\n'
            '    attention_dropout,\n'
            '    max_sequence_length,\n'
            '    hidden_size,\n'
            '    num_attention_heads,\n'
            '    **kwargs\n'
            '):\n'
            '    loss_fn_factory = lambda: CausalLoss()\n'
            '\n'
            '    positional_encoder_factory = lambda: SinusoidalPE(\n'
            '        d_model=hidden_size,\n'
            '        max_sequence_length=max_sequence_length,\n'
            '    )\n'
            '\n'
            '    input_encoder_factory = lambda: InputEncoder(\n'
            '        d_model=hidden_size,\n'
            '        vocab_size=vocab_size,\n'
            '        dropout=embedding_dropout,\n'
            '        positional_encoder=positional_encoder_factory(),\n'
            '    )\n'
            '\n'
            '    output_decoder_factory = lambda: Linear(\n'
            '        hidden_size,\n'
            '        vocab_size,\n'
            '    )\n'
            '\n'
            '    feedforward_factory = lambda: FeedforwardLayer(\n'
            '        d_model=hidden_size,\n'
            '        d_feedforward=dim_feedforward,\n'
            '        dropout=activation_dropout,\n'
            '    )\n'
            '\n'
            '    attention_factory = lambda: CausalMultiheadAttn(\n'
            '        d_model=hidden_size,\n'
            '        num_heads=num_attention_heads,\n'
            '        dropout=attention_dropout,\n'
            '    )\n'
            '\n'
            '    layer_norm_factory = lambda: LayerNorm(\n'
            '        normalized_shape=hidden_size,\n'
            '    )\n'
            '\n'
            '    layer_factory = lambda: PostLNLayer(\n'
            '        feedforward=feedforward_factory(),\n'
            '        attention=attention_factory(),\n'
            '        norm1=layer_norm_factory(),\n'
            '        norm2=layer_norm_factory(),\n'
            '        dropout=layer_dropout,\n'
            '        residual_dropout=residual_dropout,\n'
            '    )\n'
            '\n'
            '    layer_stack_factory = lambda: LayerStack(\n'
            '        layer_factory=layer_factory,\n'
            '        num_hidden_layers=num_hidden_layers,\n'
            '    )\n'
            '\n'
            '    init_weights_factory = lambda: InitWeights(\n'
            '        std=initializer_range,\n'
            '    )\n'
            '\n'
            '    model_factory = CasualLM(\n'
            '        loss_fn=loss_fn_factory(),\n'
            '        input_encoder=input_encoder_factory(),\n'
            '        output_decoder=output_decoder_factory(),\n'
            '        layer_stack=layer_stack_factory(),\n'
            '        init_weights=init_weights_factory(),\n'
            '    )\n'
            '    \n'
            '    return model_factory'
        ),
        output_file='./output_models/tiny_causal/model_factory.py',
        return_value='Model constructor generated by Forgather 1.0',
    )

    model_config = DynamicCausalLMConfig()(
        auto_map={
            'AutoConfig': 'dynamic_causal_lm.DynamicCausalLMConfig',
            'AutoModel': 'dynamic_causal_lm.DynamicCasualLM',
        },
        vocab_size=len(
            tokenizer,
        ),
        pad_token_id=tokenizer.pad_token_id,
        bos_token_id=tokenizer.bos_token_id,
        eos_token_id=tokenizer.eos_token_id,
        code_generator=model_code_writer,
        hidden_size=256,
        num_attention_heads=2,
        num_hidden_layers=4,
        max_sequence_length=tokenizer.model_max_length,
        dim_feedforward=1024,
        initializer_range=0.02,
        embedding_dropout=0.0,
        layer_dropout=0.0,
        residual_dropout=0.0,
        attention_dropout=0.0,
        activation_dropout=0.0,
    )

    pretrained_model = DynamicCasualLM()(
        model_config,
    )

    model = dependency_list(
        pretrained_model,
        copy_package_files(
            './output_models/tiny_causal',
            model_config,
        ),
        copy_package_files(
            './output_models/tiny_causal',
            pretrained_model,
        ),
    )

    trainer_args = TrainingArguments(
        output_dir='./output_models/tiny_causal',
        logging_dir='./output_models/tiny_causal/runs/control_2024-08-10T19-34-05',
        overwrite_output_dir=True,
        per_device_train_batch_size=32,
        per_device_eval_batch_size=64,
        learning_rate=0.001,
        num_train_epochs=1,
        eval_steps=500,
        logging_steps=100,
        eval_strategy='steps',
        save_strategy='no',
        logging_strategy='steps',
        lr_scheduler_type='cosine',
    )

    data_collator = DataCollatorForLanguageModeling(
        tokenizer,
        mlm=False,
        return_tensors='pt',
    )

    train_source_dataset = load_dataset(
        'roneneldan/TinyStories',
    )

    train_dataset = tokenize_dataset(
        dataset=train_source_dataset['train'],
        tokenizer=tokenizer,
        select_range=0.1,
        desc='Tokenizing train',
        fn_kwargs={
            'truncation': True,
        },
    )

    eval_dataset = tokenize_dataset(
        dataset=train_source_dataset['validation'],
        tokenizer=tokenizer,
        select_range=500,
        desc='Tokenizing validation split',
        fn_kwargs={
            'truncation': True,
        },
    )

    alpha_ = SummaryWriter(
        './output_models/tiny_causal/runs/control_2024-08-10T19-34-05',
    )

    testprompts = [
        'Alice was so tired when she got back home so she went',
        'Jack and Lily liked to watch the moon at night. They noticed that the moon changed its shape every night. Sometimes the moon was big and round, and sometimes it was',
        'Jack and Lily saw a rainbow after a rainy day.They were amazed by the colors. Jack said, "Look, Lily. A rainbow has',
        'Jack wanted to read a book, so he went to',
        '"Can cows fly?" Alice asked her mother.',
        '"What do birds like to eat?" Tom asked his mother.',
        '"What language do they speak in France?" Tom asked his mother.',
        'If I throw a ball up in the air, eventually it will',
        'It was winter and cold outside so his mother told him, "You should',
        'Lily likes cats and dogs. She asked her mom for a dog and her mom said no, so instead she asked',
        'Jack told Mary, "If you give me your banana, I\'ll give you my apple." Mary gave Jack her Banana, so',
        'On weekends Jack went to visit his grandmother whereas on weekdays he would go to school. Last weekend, when Jack was on his way to',
        'Lily and Ben were having an argument. Ben said that cake is much better than ice cream and Lily said that',
        'Lily and Ben are having an argument. They are trying to decide between the park and the swimming pool. Ben says, "I want to go to the park". Lily says',
        "Jack's mother was not home, and his father was at home. When Jack came home, he said hello to",
        "Lily doesn't like swimming. When her father wants to take her to the swimming pool, she says",
        'Both Ben and Lily wanted cake. Father said that there was only one piece of cake left. They',
        'Ben went to visit Lily in her house, but she was not at home. Ben knocked on the door,',
    ]

    generation_config = {
        'identity': 'generation_config',
        'do_sample': True,
        'top_k': 20,
        'top_p': 0.9,
        'temperature': 0.7,
        'repitition_penalty': 1.15,
    }

    trainer_callbacks = [
        JsonLogger(
            date='2024-08-10T19:34:05',
            name='Control',
            description='Tiny Causal; the baseline control',
            config=pp_config,
            versions={
                'python': '3.10.13',
                'torch': '2.3.1',
                'transformers': '4.41.2',
                'accelerate': '0.31.0',
            },
        ),
        TBLogger(
            alpha_,
            date='2024-08-10T19:34:05',
            name='Control',
            description='Tiny Causal; the baseline control',
            config=pp_config,
            versions={
                'python': '3.10.13',
                'torch': '2.3.1',
                'transformers': '4.41.2',
                'accelerate': '0.31.0',
            },
        ),
        TextgenCallback(
            summary_writer=alpha_,
            prompts=testprompts,
            generation_config=generation_config,
            max_new_tokens=40,
            generation_steps=2000,
        ),
    ]

    trainer = Trainer(
        model=model,
        args=trainer_args,
        data_collator=data_collator,
        train_dataset=train_dataset,
        eval_dataset=eval_dataset,
        tokenizer=tokenizer,
        callbacks=trainer_callbacks,
    )

    training_script = TrainingScript(
        meta=meta,
        do_save=False,
        do_train=True,
        do_eval=False,
        distributed_env=distributed_env,
        trainer=trainer,
    )
    
    return {
        'meta': meta,
        'main': training_script,
    }

```

## Constructed Project

```python
{'main': TrainingScript(meta={'config_description': 'Tiny Causal; the baseline '
                                                    'control',
                              'config_name': 'Control',
                              'create_new_model': 'True',
                              'datasets_dir': '../../../datasets',
                              'eval': 'False',
                              'logging_dir': './output_models/tiny_causal/runs/control_2024-08-10T19-34-05',
                              'model_src_dir': '../../../model_src',
                              'models_dir': './output_models',
                              'output_dir': './output_models/tiny_causal',
                              'project_dir': '.',
                              'save_model': 'False',
                              'tokenizers_dir': '../../../tokenizers',
                              'train': 'True'},
                        do_save=False,
                        do_train=True,
                        do_eval=False,
                        distributed_env=DistributedEnvironment(rank=0, local_rank=0, world_size=1, local_world_size=1, master_addr=localhost, master_port=29501, backend=None),
                        trainer=Trainer(model=DynamicCasualLM(
  (causal_lm): CasualLM(
    loss_fn=CausalLoss(), init_weights=InitWeights(std=0.02)
    (input_encoder): InputEncoder(
      d_model=256, vocab_size=2000, embedding_scale=16.0
      (dropout): Identity()
      (embedding): Embedding(2000, 256)
      (positional_encoder): SinusoidalPE(d_model=256, max_sequence_length=2048)
    )
    (output_decoder): Linear(in_features=256, out_features=2000, bias=True)
    (layer_stack): LayerStack(
      (layers): ModuleList(
        (0-3): 4 x PostLNLayer(
          (feedforward): FeedforwardLayer(
            d_model=256, d_feedforward=1024
            (linear1): Linear(in_features=256, out_features=1024, bias=True)
            (dropout): Identity()
            (activation): ReLU()
            (linear2): Linear(in_features=1024, out_features=256, bias=True)
          )
          (attention): CausalMultiheadAttn(
            d_model=256, num_heads=2
            (query_linear): Linear(in_features=256, out_features=256, bias=True)
            (key_linear): Linear(in_features=256, out_features=256, bias=True)
            (value_linear): Linear(in_features=256, out_features=256, bias=True)
            (output_linear): Linear(in_features=256, out_features=256, bias=True)
            (dropout): Identity()
          )
          (norm1): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
          (norm2): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
          (dropout): Identity()
          (residual_dropout): Identity()
        )
      )
    )
  )
),args=TrainingArguments(per_device_train_batch_size=32, output_dir='./output_models/tiny_causal', overwrite_output_dir=True, per_device_eval_batch_size=64, max_steps=-1, logging_steps=100, eval_steps=500, save_steps=500, learning_rate=0.001, num_train_epochs=1, seed=-1, lr_scheduler_type='cosine', warmup_steps=0, device=0, logging_dir='./output_models/tiny_causal/runs/control_2024-08-10T19-34-05', dataloader_num_workers=0, dataloader_pin_memory=True, dataloader_persistent_workers=False, dataloader_prefetch_factor=None, dataloader_drop_last=False, eval_strategy=<IntervalStrategy.STEPS: 'steps'>, logging_strategy=<IntervalStrategy.STEPS: 'steps'>, save_strategy=<IntervalStrategy.NO: 'no'>, logging_first_step=False, eval_delay=0, save_total_limit=2, use_cpu=False, torch_compile=False, torch_compile_backend='inductor', torch_compile_mode=None),data_collator=DataCollatorForLanguageModeling(tokenizer=PreTrainedTokenizerFast(name_or_path='../../../tokenizers/tiny_stories_2k', vocab_size=2000, model_max_length=2048, is_fast=True, padding_side='right', truncation_side='right', special_tokens={'bos_token': '<|BOS|>', 'eos_token': '<|EOS|>', 'unk_token': '<|UNK|>', 'pad_token': '<|PAD|>'}, clean_up_tokenization_spaces=True),  added_tokens_decoder={
	0: AddedToken("<|BOS|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
	1: AddedToken("<|PAD|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
	2: AddedToken("<|EOS|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
	3: AddedToken("<|UNK|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
}, mlm=False, mlm_probability=0.15, pad_to_multiple_of=None, tf_experimental_compile=False, return_tensors='pt'),train_dataset=Dataset({
    features: ['input_ids'],
    num_rows: 211971
}),eval_dataset=Dataset({
    features: ['input_ids'],
    num_rows: 500
}),tokenizer=PreTrainedTokenizerFast(name_or_path='../../../tokenizers/tiny_stories_2k', vocab_size=2000, model_max_length=2048, is_fast=True, padding_side='right', truncation_side='right', special_tokens={'bos_token': '<|BOS|>', 'eos_token': '<|EOS|>', 'unk_token': '<|UNK|>', 'pad_token': '<|PAD|>'}, clean_up_tokenization_spaces=True),  added_tokens_decoder={
	0: AddedToken("<|BOS|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
	1: AddedToken("<|PAD|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
	2: AddedToken("<|EOS|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
	3: AddedToken("<|UNK|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
},model_init=None,callbacks=[<forgather.ml.default_callbacks.ProgressCallback object at 0x7f0f26f6caf0>, <forgather.ml.default_callbacks.InfoCallback object at 0x7f0e5a93ddb0>, <forgather.ml.json_logger.JsonLogger object at 0x7f0e5a96c3d0>, <forgather.ml.tb_logger.TBLogger object at 0x7f0e5a96dba0>, <forgather.ml.textgen_callback.TextgenCallback object at 0x7f0e583d3940>],)),
 'meta': {'config_description': 'Tiny Causal; the baseline control',
          'config_name': 'Control',
          'create_new_model': 'True',
          'datasets_dir': '../../../datasets',
          'eval': 'False',
          'logging_dir': './output_models/tiny_causal/runs/control_2024-08-10T19-34-05',
          'model_src_dir': '../../../model_src',
          'models_dir': './output_models',
          'output_dir': './output_models/tiny_causal',
          'project_dir': '.',
          'save_model': 'False',
          'tokenizers_dir': '../../../tokenizers',
          'train': 'True'}}

```

