# Project Index
Useful for debugging configurations and viewing project configuration details.

Debug configuration here: [Configuration Notebook](project_config.ipynb)

In [1]:
# Set defaults
#default_projects_directory = '/home/dinalt/ai_assets/projects/experiments'
default_projects_directory = '../examples/trainers'
default_project = "tiny_models"
config_template = ""

from ipyfilechooser import FileChooser
import os
fc = FileChooser(
    os.path.join(default_projects_directory, default_project), show_only_dirs=True,
    title="Select a Project Directory", select_default=True)
display(fc)

FileChooser(path='/home/dinalt/ai_assets/forgather/examples/trainers/tiny_models', filename='', title='Select …

## Project Info

Note: This will fully construct the configuration, as this is required for resolving dynamic imports.

In [5]:
import sys, os
modules_path = os.path.join('..', 'src')
if modules_path not in sys.path: sys.path.insert(0, modules_path)
from pprint import pformat, pp
from IPython import display as ds
from forgather.config import (
    ConfigEnvironment,
    pconfig
)
from aiws.config import preprocessor_globals, MetaConfig
import aiws.notebooks as nb

meta = MetaConfig(fc.selected_path)
default_config = meta.default_config()
config_template_path = meta.config_path(config_template)
config, pp_config = environment.load(config_template_path).get()
main_output = config.main(pp_config=pp_config)

assert os.path.exists(fc.selected_path), "Project directory does not exist."
nb.show_project_readme(fc.selected_path)
nb.display_meta(meta, "### Meta Config\n")
nb.list_templates(meta.find_templates(meta.config_prefix), "### Available Configurations\n")

print('-' * 60)
print(f"Default Configuration: {default_config}")
def list_templates(prefix):
    nb.list_templates(meta.find_templates(prefix), "### Templates\n")
list_templates('')
# Create configuration envrionment
environment = ConfigEnvironment(
    searchpath=meta.searchpath,
    globals = preprocessor_globals(fc.selected_path),
)

nb.display_referenced_templates_tree(environment, config_template_path, "### Included Templates\n")
nb.display_referenced_source_list(config, title="### Sub-Modules\n", deep=True)
nb.display_preprocessed_template(environment, config_template_path, title="### Preprocessed Configuration\n")

Repo card metadata block was not found. Setting CardData to empty.


## Tiny Models

A collection of tiny models to train on the Tiny Stories dataset with the tiny_stories_2k tokenizer.

This allows for direct comparison of model archetectures.

### Featuring
- Tiny Causal Transformer -- an example custom transformer model
- Tiny Llama
- Tiny GPT2

### Common Configuration
- Tokenizer: tokenizers/tiny_2k_bpe.yaml
    - Vocabulary Size: 2000
    - Maximum Model Sequence: 2048
- Dataset: datasets/tiny/tiny_stories_abridged.yaml
    - Dataset ID: roneneldan/TinyStories
    - Reference: https://arxiv.org/abs/2305.07759
    - Train Select Range: 10% 
- Model:
    - Model Dimension: 256
    - MLP Dimension: 1024
    - Layers: 4
    - Heads: 2
    - All Dropout Probabilities: 0.0
- Trainer:
    - Class: aiws.trainer.Trainer
    - Epochs: 1
    - Learning Rate: 1.0e-3
    - Batch Size: 16

### Meta Config
Project Directory: /home/dinalt/ai_assets/forgather/examples/trainers/tiny_models

Meta Config: [/home/dinalt/ai_assets/forgather/examples/trainers/tiny_models/meta.yaml](../examples/trainers/tiny_models/meta.yaml)

Template Search Paths:
- [/home/dinalt/ai_assets/forgather/examples/trainers/tiny_models/templates](../examples/trainers/tiny_models/templates)
- [/home/dinalt/ai_assets/forgather/templates](../templates)


### Available Configurations
- [tiny_causal.yaml](../examples/trainers/tiny_models/templates/configs/tiny_causal.yaml)
- [tiny_gpt2.yaml](../examples/trainers/tiny_models/templates/configs/tiny_gpt2.yaml)
- [tiny_llama.yaml](../examples/trainers/tiny_models/templates/configs/tiny_llama.yaml)


------------------------------------------------------------
Default Configuration: tiny_causal.yaml


### Templates
- [project.yaml](../examples/trainers/tiny_models/templates/project.yaml)
- [configs/tiny_causal.yaml](../examples/trainers/tiny_models/templates/configs/tiny_causal.yaml)
- [configs/tiny_gpt2.yaml](../examples/trainers/tiny_models/templates/configs/tiny_gpt2.yaml)
- [configs/tiny_llama.yaml](../examples/trainers/tiny_models/templates/configs/tiny_llama.yaml)
- [trainers/accel_trainer.yaml](../templates/trainers/accel_trainer.yaml)
- [trainers/trainer.yaml](../templates/trainers/trainer.yaml)
- [trainers/hf_trainer.yaml](../templates/trainers/hf_trainer.yaml)
- [trainers/base_trainer.yaml](../templates/trainers/base_trainer.yaml)
- [model_ctor/args.yaml](../templates/model_ctor/args.yaml)
- [projects/tiny.yaml](../templates/projects/tiny.yaml)
- [datasets/abstract/pretokenized_dataset.yaml](../templates/datasets/abstract/pretokenized_dataset.yaml)
- [datasets/abstract/base_datasets.yaml](../templates/datasets/abstract/base_datasets.yaml)
- [datasets/tiny/tiny_stories.yaml](../templates/datasets/tiny/tiny_stories.yaml)
- [datasets/tiny/tiny_stories_abridged.yaml](../templates/datasets/tiny/tiny_stories_abridged.yaml)
- [models/dynamic_lm.yaml](../templates/models/dynamic_lm.yaml)
- [models/causal_transformer.yaml](../templates/models/causal_transformer.yaml)
- [models/gpt2.yaml](../templates/models/gpt2.yaml)
- [models/llama.yaml](../templates/models/llama.yaml)
- [models/abstract/causal_lm_from_config.yaml](../templates/models/abstract/causal_lm_from_config.yaml)
- [models/abstract/base_language_model.yaml](../templates/models/abstract/base_language_model.yaml)
- [models/abstract/custom_causal_lm.yaml](../templates/models/abstract/custom_causal_lm.yaml)
- [models/abstract/causal_lm_from_pretrained.yaml](../templates/models/abstract/causal_lm_from_pretrained.yaml)
- [models/abstract/load_model.yaml](../templates/models/abstract/load_model.yaml)
- [models/tiny/tiny_causal.yaml](../templates/models/tiny/tiny_causal.yaml)
- [models/tiny/tiny_gpt2.yaml](../templates/models/tiny/tiny_gpt2.yaml)
- [models/tiny/tiny_llama.yaml](../templates/models/tiny/tiny_llama.yaml)
- [models/tiny/tiny_d128_l2.yaml](../templates/models/tiny/tiny_d128_l2.yaml)
- [prompts/tiny_stories.yaml](../templates/prompts/tiny_stories.yaml)
- [callbacks/base_callbacks.yaml](../templates/callbacks/base_callbacks.yaml)
- [callbacks/loggers.yaml](../templates/callbacks/loggers.yaml)
- [types/meta_template.yaml](../templates/types/meta_template.yaml)
- [types/type.yaml](../templates/types/type.yaml)
- [types/tokenizer/tokenizer.yaml](../templates/types/tokenizer/tokenizer.yaml)
- [types/tokenizer/bpe/bpe.yaml](../templates/types/tokenizer/bpe/bpe.yaml)
- [types/model/model_type.yaml](../templates/types/model/model_type.yaml)
- [types/training_script/training_script.yaml](../templates/types/training_script/training_script.yaml)
- [types/training_script/causal_lm/causal_lm.yaml](../templates/types/training_script/causal_lm/causal_lm.yaml)
- [paths/example_paths.yaml](../templates/paths/example_paths.yaml)
- [tokenizers/tiny_2k.yaml](../templates/tokenizers/tiny_2k.yaml)


### Included Templates
- [configs/tiny_causal.yaml](../examples/trainers/tiny_models/templates/configs/tiny_causal.yaml)
    - [project.yaml](../examples/trainers/tiny_models/templates/project.yaml)
        - [projects/tiny.yaml](../templates/projects/tiny.yaml)
            - [types/training_script/causal_lm/causal_lm.yaml](../templates/types/training_script/causal_lm/causal_lm.yaml)
                - [types/training_script/training_script.yaml](../templates/types/training_script/training_script.yaml)
                    - [types/type.yaml](../templates/types/type.yaml)
                        - [inc/formatting.jinja](../templates/inc/formatting.jinja)
                    - [inc/formatting.jinja](../templates/inc/formatting.jinja)
                - [inc/formatting.jinja](../templates/inc/formatting.jinja)
                - [models/abstract/load_model.yaml](../templates/models/abstract/load_model.yaml)
                    - [models/abstract/causal_lm_from_pretrained.yaml](../templates/models/abstract/causal_lm_from_pretrained.yaml)
                        - [models/abstract/base_language_model.yaml](../templates/models/abstract/base_language_model.yaml)
                            - [inc/formatting.jinja](../templates/inc/formatting.jinja)
                - [callbacks/loggers.yaml](../templates/callbacks/loggers.yaml)
                    - [callbacks/base_callbacks.yaml](../templates/callbacks/base_callbacks.yaml)
                        - [inc/formatting.jinja](../templates/inc/formatting.jinja)
                - [trainers/trainer.yaml](../templates/trainers/trainer.yaml)
                    - [trainers/base_trainer.yaml](../templates/trainers/base_trainer.yaml)
                        - [inc/formatting.jinja](../templates/inc/formatting.jinja)
            - [paths/example_paths.yaml](../templates/paths/example_paths.yaml)
            - [prompts/tiny_stories.yaml](../templates/prompts/tiny_stories.yaml)
            - [datasets/tiny/tiny_stories_abridged.yaml](../templates/datasets/tiny/tiny_stories_abridged.yaml)
                - [datasets/tiny/tiny_stories.yaml](../templates/datasets/tiny/tiny_stories.yaml)
                    - [datasets/abstract/base_datasets.yaml](../templates/datasets/abstract/base_datasets.yaml)
                        - [inc/formatting.jinja](../templates/inc/formatting.jinja)
            - [tiny.trainer_config](../templates/projects/tiny.yaml)
            - [tiny.model_config](../templates/projects/tiny.yaml)
                - [models/tiny/tiny_causal.yaml](../templates/models/tiny/tiny_causal.yaml)
                    - [models/dynamic_lm.yaml](../templates/models/dynamic_lm.yaml)
                        - [models/abstract/custom_causal_lm.yaml](../templates/models/abstract/custom_causal_lm.yaml)
                            - [models/abstract/base_language_model.yaml](../templates/models/abstract/base_language_model.yaml)
                                - [inc/formatting.jinja](../templates/inc/formatting.jinja)
                        - [tokenizers/tiny_2k.yaml](../templates/tokenizers/tiny_2k.yaml)
            - [tiny.callbacks](../templates/projects/tiny.yaml)
                - [callbacks/loggers.yaml](../templates/callbacks/loggers.yaml)
                    - [callbacks/base_callbacks.yaml](../templates/callbacks/base_callbacks.yaml)
                        - [inc/formatting.jinja](../templates/inc/formatting.jinja)
        - [project.trainer_config](../examples/trainers/tiny_models/templates/project.yaml)
            - [tiny.trainer_config](../templates/projects/tiny.yaml)
    - [experiment.model_config](../examples/trainers/tiny_models/templates/configs/tiny_causal.yaml)
        - [models/tiny/tiny_causal.yaml](../templates/models/tiny/tiny_causal.yaml)
            - [models/dynamic_lm.yaml](../templates/models/dynamic_lm.yaml)
                - [models/abstract/custom_causal_lm.yaml](../templates/models/abstract/custom_causal_lm.yaml)
                    - [models/abstract/base_language_model.yaml](../templates/models/abstract/base_language_model.yaml)
                        - [inc/formatting.jinja](../templates/inc/formatting.jinja)
                - [tokenizers/tiny_2k.yaml](../templates/tokenizers/tiny_2k.yaml)


### Sub-Modules
- [/home/dinalt/ai_assets/forgather/model_src/dynamic_causal_lm.py](../model_src/dynamic_causal_lm.py) : DynamicCasualLM
    - [/home/dinalt/ai_assets/forgather/model_src/dynamic_causal_lm.py](../model_src/dynamic_causal_lm.py) : dynamic_causal_lm
        - [/home/dinalt/ai_assets/forgather/model_src/bits/materialize.py](../model_src/bits/materialize.py) : dynamic_causal_lm.materialize
        - [/home/dinalt/ai_assets/forgather/model_src/bits/causal_loss.py](../model_src/bits/causal_loss.py) : dynamic_causal_lm.causal_loss
        - [/home/dinalt/ai_assets/forgather/model_src/bits/sinusoidal_pe.py](../model_src/bits/sinusoidal_pe.py) : dynamic_causal_lm.sinusoidal_pe
        - [/home/dinalt/ai_assets/forgather/model_src/bits/input_encoder.py](../model_src/bits/input_encoder.py) : dynamic_causal_lm.input_encoder
        - [/home/dinalt/ai_assets/forgather/model_src/bits/causal_layer_stack.py](../model_src/bits/causal_layer_stack.py) : dynamic_causal_lm.causal_layer_stack
        - [/home/dinalt/ai_assets/forgather/model_src/bits/feedforward_layer.py](../model_src/bits/feedforward_layer.py) : dynamic_causal_lm.feedforward_layer
        - [/home/dinalt/ai_assets/forgather/model_src/bits/causal_multihead_attn.py](../model_src/bits/causal_multihead_attn.py) : dynamic_causal_lm.causal_multihead_attn
        - [/home/dinalt/ai_assets/forgather/model_src/bits/post_ln_layer.py](../model_src/bits/post_ln_layer.py) : dynamic_causal_lm.post_ln_layer
        - [/home/dinalt/ai_assets/forgather/model_src/bits/init_weights.py](../model_src/bits/init_weights.py) : dynamic_causal_lm.init_weights
        - [/home/dinalt/ai_assets/forgather/model_src/bits/causal_lm.py](../model_src/bits/causal_lm.py) : dynamic_causal_lm.causal_lm
- [/home/dinalt/ai_assets/forgather/model_src/dynamic_causal_lm.py](../model_src/dynamic_causal_lm.py) : DynamicCausalLMConfig


### Preprocessed Configuration
```yaml
#---------------------------------------
#               Tiny Causal              
#---------------------------------------
# 2024-07-29T02:53:24
# Description: A tiny causal transformer.
# Project Dir: /home/dinalt/ai_assets/forgather/examples/trainers/tiny_models
# Current Working Dir: "/home/dinalt/ai_assets/forgather/notebooks"
# Forgather Config Dir: "/home/dinalt/.config/forgather"
# Model: tiny_causal
# Hostname: hal9000
# Versions:
#     python: 3.10.13
#     torch: 2.3.1
#     transformers: 4.41.2
#     accelerate: 0.31.0

############# Config Vars ##############

# ns.models_dir: "/home/dinalt/ai_assets/forgather/examples/trainers/tiny_models/output_models"
# ns.tokenizers_dir: "/home/dinalt/ai_assets/forgather/tokenizers"
# ns.datasets_dir: "/home/dinalt/ai_assets/forgather/datasets"
# ns.model_src_dir: "/home/dinalt/ai_assets/forgather/model_src"
# ns.output_dir: "/home/dinalt/ai_assets/forgather/examples/trainers/tiny_models/output_models/tiny_causal"
# ns.logging_dir: "/home/dinalt/ai_assets/forgather/examples/trainers/tiny_models/output_models/tiny_causal/runs/log_2024-07-29T02-53-24"
# ns.create_new_model: True
# ns.save_model: True
# ns.train: True
# ns.eval: False
# ns.trust_remote_code: True

####### Distributed Environment ########

.define: &distributed_env !callable:aiws.distributed:DistributedEnvironment

############# Dependencies #############

# The model will be given the following prompts for text-gen at regular intervals.
.define: &testprompts
    # Test prompts from "https://arxiv.org/abs/2305.07759"
    - "Alice was so tired when she got back home so she went"
    - "Jack and Lily liked to watch the moon at night. They noticed that the moon changed its shape every night. Sometimes the moon was big and round, and sometimes it was"
    - "Jack and Lily saw a rainbow after a rainy day.They were amazed by the colors. Jack said, \"Look, Lily. A rainbow has"
    - "Jack wanted to read a book, so he went to"
    - "\"Can cows fly?\" Alice asked her mother."
    - "\"What do birds like to eat?\" Tom asked his mother."
    - "\"What language do they speak in France?\" Tom asked his mother."
    - "If I throw a ball up in the air, eventually it will"
    - "It was winter and cold outside so his mother told him, \"You should"
    - "Lily likes cats and dogs. She asked her mom for a dog and her mom said no, so instead she asked"
    - "Jack told Mary, \"If you give me your banana, I'll give you my apple.\" Mary gave Jack her Banana, so"
    - "On weekends Jack went to visit his grandmother whereas on weekdays he would go to school. Last weekend, when Jack was on his way to"
    - "Lily and Ben were having an argument. Ben said that cake is much better than ice cream and Lily said that"
    - "Lily and Ben are having an argument. They are trying to decide between the park and the swimming pool. Ben says, \"I want to go to the park\". Lily says"
    - "Jack's mother was not home, and his father was at home. When Jack came home, he said hello to"
    - "Lily doesn't like swimming. When her father wants to take her to the swimming pool, she says"
    - "Both Ben and Lily wanted cake. Father said that there was only one piece of cake left. They"
    - "Ben went to visit Lily in her house, but she was not at home. Ben knocked on the door,"

# Conservative text-generation parameters.
.define: &generation_config
    do_sample: True
    top_k: 20
    top_p: 0.9
    temperature: 0.7
    repitition_penalty: 1.15

################ Model #################

# https://huggingface.co/docs/transformers/en/model_doc/auto
.define: &model_constructor_args {}

# Name: Tiny Causal
# Description: A scaled-down version of the base Causal Transformer
# model_def.cls = "DynamicCasualLM"
# model_def.cfg_cls = "DynamicCausalLMConfig"
# model_def.config_path = "/home/dinalt/ai_assets/forgather/model_src/dynamic_causal_lm.py"
# model_def.model_path = "/home/dinalt/ai_assets/forgather/model_src/dynamic_causal_lm.py"

# **Tokenizer**

# Load custom tokenizer from sub-project definition
.define: &tokenizer !callable:aiws.construct:load_from_config
    project_dir: "/home/dinalt/ai_assets/forgather/examples/tokenizers/tiny_stories_bpe"
    config_template: "2k.yaml"

# **Model Config**

.define: &loss_fn_factory !callable:.causal_loss:CausalLoss []

.define: &layer_norm_factory !callable:torch.nn:LayerNorm
    - !key "hidden_size"

.define: &feedforward_factory !callable:.feedforward_layer:FeedforwardLayer
    d_model: !key "hidden_size"
    d_feedforward: !key "dim_feedforward"
    dropout: !key "activation_dropout"

.define: &attention_factory !callable:.causal_multihead_attn:CausalMultiheadAttn
    d_model: !key "hidden_size"
    num_heads: !key "num_attention_heads"
    dropout: !key "attention_dropout"

.define: &layer_factory !callable:.post_ln_layer:PostLNLayer
    as_lambda: True
    feedforward: *feedforward_factory
    attention: *attention_factory
    norm1: *layer_norm_factory
    norm2: *layer_norm_factory
    dropout: !key "layer_dropout"
    residual_dropout: !key "residual_dropout"

.define: &layer_stack_factory !callable:.causal_layer_stack:CausalLayerStack
    layer_factory: *layer_factory
    num_hidden_layers: !key "num_hidden_layers"

.define: &output_decoder_factory !callable:torch.nn:Linear
    - !key "hidden_size"
    - !key "vocab_size"

.define: &positional_encoder_factory !callable:.sinusoidal_pe:SinusoidalPE
    d_model: !key "hidden_size"
    max_sequence_length: !key "max_sequence_length"

.define: &input_encoder_factory !callable:.input_encoder:InputEncoder
    d_model: !key "hidden_size"
    vocab_size: !key "vocab_size"
    dropout: !key "embedding_dropout"
    positional_encoder: *positional_encoder_factory

.define: &init_weights_factory !callable:.init_weights:InitWeights
    std: !key "initializer_range"

.define: &model_factory !callable:.causal_lm:CasualLM
    as_lambda: True
    loss_fn: *loss_fn_factory
    input_encoder: *input_encoder_factory
    output_decoder: *output_decoder_factory
    layer_stack: *layer_stack_factory
    init_weights: *init_weights_factory

.define: &model_config
    # Set auto-map for custom model; this ensures that the source code stays with the model.
    auto_map:
        AutoConfig: "dynamic_causal_lm.DynamicCausalLMConfig"
        AutoModel: "dynamic_causal_lm.DynamicCasualLM"
    # Get the vocab-size from the tokenizer definition.
    vocab_size: !callable:forgather.construct:length [ *tokenizer ]
    pad_token_id: !callable:forgather.construct:get_attr [ *tokenizer, 'pad_token_id' ]
    bos_token_id: !callable:forgather.construct:get_attr [ *tokenizer, 'bos_token_id' ]
    eos_token_id: !callable:forgather.construct:get_attr [ *tokenizer, 'eos_token_id' ]
    # Convert model definition to a JSON compatible encoding for the configuration to store.
    model_definition: !callable:forgather.latent:Latent.to_serailizable
        - *model_factory
    hidden_size: 512
    num_attention_heads: 8
    num_hidden_layers: 6
    max_sequence_length: !callable:forgather.construct:get_attr
        - *tokenizer
        - "model_max_length"
    dim_feedforward: 2048
    initializer_range: 0.02
    embedding_dropout: 0.10
    layer_dropout: 0.10
    residual_dropout: 0.0
    attention_dropout: 0.0
    activation_dropout: 0.0
    
    # Tiny Causal overrides
    hidden_size: 256
    dim_feedforward: 1024
    num_attention_heads: 2
    num_hidden_layers: 4
    embedding_dropout: 0.0
    layer_dropout: 0.0
# Add 'bits' to model's module.
.define: &model_submodule_searchpath
    - "/home/dinalt/ai_assets/forgather/model_src/bits"

# **Model Constructor**

# Custom transformer model; registers for AutoClass and will save code with weights.
.define: &model !callable:aiws.construct:copy_package_files
    # Source files will be copied to model directory.
    - "/home/dinalt/ai_assets/forgather/examples/trainers/tiny_models/output_models/tiny_causal"
    # Construct model from configuration.
    - !callable:/home/dinalt/ai_assets/forgather/model_src/dynamic_causal_lm.py:DynamicCasualLM
        args:
            - !callable:aiws.construct:copy_package_files
                # Source files will be copied to model directory.
                - "/home/dinalt/ai_assets/forgather/examples/trainers/tiny_models/output_models/tiny_causal"
                # Construct configuration from config-args.
                - !callable:/home/dinalt/ai_assets/forgather/model_src/dynamic_causal_lm.py:DynamicCausalLMConfig
                    submodule_searchpath: *model_submodule_searchpath
                    <<: *model_config
        kwargs:
            submodule_searchpath: *model_submodule_searchpath
            # Constructor args specify things like flash-attention.
            <<: *model_constructor_args

############### Datasets ###############

# Name: TinyStories Abridged
# Define: Abridged to 10% of original size; Dataset containing synthetically generated (by GPT-3.5 and GPT-4) short stories that only use a small vocabulary.
# Source: https://arxiv.org/abs/2305.07759
# Train Dataset: "roneneldan/TinyStories" : "train"
# Eval Dataset: "roneneldan/TinyStories" : "validation"

# **Source Datasets**

.define: &train_source_dataset !callable:datasets:load_dataset
    args:
        - "roneneldan/TinyStories"

.define: &eval_source_dataset !callable:datasets:load_dataset
    args:
        - "roneneldan/TinyStories"

# **Dataset Splits**

.define: &train_dataset_split !callable:forgather.construct:get_item
    - *train_source_dataset
    - "train"

.define: &eval_dataset_split !callable:forgather.construct:get_item
    - *train_source_dataset
    - "validation"

# **Tokenize Args**

.define: &tokenize_args
    truncation: True

# **Tokenized Datasets**

.define: &train_dataset !callable:aiws.datasets:tokenize_dataset
    dataset: *train_dataset_split
    tokenizer: *tokenizer
    select_range: 0.1
    desc: "Tokenizing train"
    fn_kwargs:
        <<: *tokenize_args

.define: &eval_dataset !callable:aiws.datasets:tokenize_dataset
    dataset: *eval_dataset_split
    tokenizer: *tokenizer
    select_range: 500
    desc: "Tokenizing validation split"
    fn_kwargs:
        <<: *tokenize_args

############ Data Collator #############

# Data collator for causal model
# Batches are dynamically padded to longest sequence
# labels are set to input_ids, with pad tokens set to -100
# https://huggingface.co/docs/transformers/en/main_classes/data_collator#transformers.DataCollatorForLanguageModeling
.define: &data_collator !callable:transformers:DataCollatorForLanguageModeling
    args:
        - *tokenizer
    kwargs:
        mlm: False
        return_tensors: pt

########## Trainer Callbacks ###########

# **Dependencies**

# Experiment tracking: Tensorboard SummaryWriter
.define: &summary_writer !callable:torch.utils.tensorboard:SummaryWriter
    - "/home/dinalt/ai_assets/forgather/examples/trainers/tiny_models/output_models/tiny_causal/runs/log_2024-07-29T02-53-24"

# Additional data to record to experiment loggers
.define: &experiment_info
    date: "2024-07-29T02:53:24"
    name: "Tiny Causal"
    description: "A tiny causal transformer."
    config: !callable:pp_config
    versions: {'python': '3.10.13', 'torch': '2.3.1', 'transformers': '4.41.2', 'accelerate': '0.31.0'}

.define: &text_gen_callback_args
    summary_writer: *summary_writer
    prompts: *testprompts
    generation_config: *generation_config
    max_new_tokens: 40
    generation_steps: 2000

# **Callback List**

.define: &trainer_callbacks
    # Log all training output to JSON
    - !callable:aiws.json_logger:JsonLogger
        <<: *experiment_info
    # Log configuration and metrics to Tensorboard file
    - !callable:aiws.tb_logger:TBLogger
        args: [ *summary_writer ]
        kwargs:
            <<: *experiment_info
    - !callable:aiws.textgen_callback:TextgenCallback
        <<: *text_gen_callback_args

############### Trainer ################

# Name: Custom aiws.trainer.Trainer
# Description: A lightweight, extensible trainer; does not support multiple GPUs

# **Trainer Args**

.define: &trainer_args
    # Base Trainer Defaults
    # https://huggingface.co/docs/transformers/en/main_classes/trainer#transformers.TrainingArguments
    output_dir: "/home/dinalt/ai_assets/forgather/examples/trainers/tiny_models/output_models/tiny_causal"
    logging_dir: "/home/dinalt/ai_assets/forgather/examples/trainers/tiny_models/output_models/tiny_causal/runs/log_2024-07-29T02-53-24"
    overwrite_output_dir: True
    per_device_train_batch_size: 16
    per_device_eval_batch_size: 16
    learning_rate: 1.0e-3
    num_train_epochs: 1
    eval_steps: 100
    logging_steps: 500
    eval_strategy: "steps"
    save_strategy: "no"
    logging_strategy: "steps"
    lr_scheduler_type: "constant"

    # Tiny Project Overrides
    per_device_train_batch_size: 32
    per_device_eval_batch_size: 64
    logging_steps: 100
    eval_steps: 500
    learning_rate: 1.0e-3
    num_train_epochs: 1
    lr_scheduler_type: "cosine"

# **Trainer Constructor**

.define: &trainer !callable:aiws.trainer:Trainer
    model: *model
    args: !callable:aiws.trainer_types:TrainingArguments
        <<: *trainer_args
    data_collator: *data_collator
    train_dataset: *train_dataset
    eval_dataset: *eval_dataset
    tokenizer: *tokenizer
    callbacks: *trainer_callbacks

#---------------------------------------
#          Configuration Output          
#---------------------------------------
meta: &meta_output
    config_name: "Tiny Causal"
    config_description: "A tiny causal transformer."
    project_dir: "/home/dinalt/ai_assets/forgather/examples/trainers/tiny_models"
    models_dir: "/home/dinalt/ai_assets/forgather/examples/trainers/tiny_models/output_models"
    tokenizers_dir: "/home/dinalt/ai_assets/forgather/tokenizers"
    datasets_dir: "/home/dinalt/ai_assets/forgather/datasets"
    output_dir: "/home/dinalt/ai_assets/forgather/examples/trainers/tiny_models/output_models/tiny_causal"
    model_src_dir: "/home/dinalt/ai_assets/forgather/model_src"
    logging_dir: "/home/dinalt/ai_assets/forgather/examples/trainers/tiny_models/output_models/tiny_causal/runs/log_2024-07-29T02-53-24"
    create_new_model: "True"
    save_model: "True"
    train: "True"
    eval: "False"

main: !callable:aiws.training_script:TrainingScript
    meta: *meta_output
    do_save: True
    do_train: True
    do_eval: False
    # Init distributed envrionment before initializing anyting which depends on it.
    distributed_env: *distributed_env
    trainer: *trainer
```
