# Project Index

[Custom Model Notebook](../../../notebooks/custom_model.ipynb)  
[Training Notebook](../../../notebooks/train.ipynb)  
[Project Config Notebook](../../../notebooks/project_config.ipynb)  
[Forgather Notebook](../../../notebooks/forgather.ipynb)  

In [4]:
import forgather.nb.notebooks as nb

nb.display_project_index(
    config_template="control.yaml",
    show_available_templates=False,
    show_pp_config=False,
    show_loaded_config=False,
    show_generated_code=False,
    materialize=False,
    pp_first=False,
)

## Test Various Weight Initialization Methods

### Control

This uses the standard PyTorch initializaiton methods for Linear and Embedding layers.

Torch uses code equivalent to the followning for initializing linear layers:

```python
stdv = 1. / math.sqrt(self.weight.size(1))
self.weight.data.uniform_(-stdv, stdv)
```

See interesting discussions about this method:

https://github.com/pytorch/pytorch/issues/57109
https://soumith.ch/files/20141213_gplus_nninit_discussion.htm

### Regex

This is the same initialization as "Control," but it uses regular-expressions to control how the parameters are initialized.

This is more complex, but far more flexible.

### Xavier Uniform

Here. we use regex again, but use it to replace the torch default init with Xavier Unifrom initializaiton.

This performs relatively poorly.

### Xavier Uniform No Feedforward

This is the same as Xavier Unifrom, except for the feedforward layers, which are initialized with the "torch" method.

This demonstrates that the primary issue is with using fan-out to compute the scaling-factor, where fan-out is 4x fan-in.

Note that both methods are effectively the same for symetric matrices, like those used by the attention layers.

The only difference in this case is with the initialization of the output layers.

### Deepnet

DeepNet: Scaling Transformers to 1,000 Layers  
https://arxiv.org/pdf/2203.00555

Here we try using the method described in the above paper. Among the changes, this rescales both the feedforward initialization and that of the 
attention value and output layers by "beta," which is computed from the number of transformer layers and scales the residuals by "alpha," 
also dervived from the number of layers.

Even though this is using Xavier Uniform initializaiton, this performs on-par with the control, thus not showing the issue identified when 
testing with a simple Xavier Unifrom initialization.

### Deepnet Init

This uses the deepnet initialization method, but not the residual scaling factor. Performance is close to the other good methods.

### Deepnet Torch

This is the same as Deepnet, but we replace Xavier Uniform with the "Torch" method. Again, similar performance.



#### Project Directory: "/home/dinalt/ai_assets/forgather/examples/trainers/init_weights"

## Meta Config
Meta Config: [/home/dinalt/ai_assets/forgather/examples/trainers/init_weights/meta.yaml](meta.yaml)

- [meta.yaml](meta.yaml)
    - [meta_defaults.yaml](../../../forgather_workspace/meta_defaults.yaml)
        - [base_directories.yaml](../../../forgather_workspace/base_directories.yaml)

Template Search Paths:
- [/home/dinalt/ai_assets/forgather/examples/trainers/init_weights/templates](templates)
- [/home/dinalt/ai_assets/forgather/forgather_workspace](../../../forgather_workspace)
- [/home/dinalt/ai_assets/forgather/templates/tiny_experiments](../../../templates/tiny_experiments)
- [/home/dinalt/ai_assets/forgather/templates/modellib](../../../templates/modellib)
- [/home/dinalt/ai_assets/forgather/templates/base](../../../templates/base)

## Available Configurations
- [deepnet_torch.yaml](templates/configs/deepnet_torch.yaml)
- [deepnet.yaml](templates/configs/deepnet.yaml)
- [xavier_uniform_noff.yaml](templates/configs/xavier_uniform_noff.yaml)
- [regex.yaml](templates/configs/regex.yaml)
- [xavier_uniform.yaml](templates/configs/xavier_uniform.yaml)
- [deepnet_init.yaml](templates/configs/deepnet_init.yaml)
- [he_relu.yaml](templates/configs/he_relu.yaml)
- [control.yaml](templates/configs/control.yaml)

Default Configuration: control.yaml

Active Configuration: control.yaml

## Included Templates
- [configs/control.yaml](templates/configs/control.yaml)
    - [project.yaml](templates/project.yaml)
        - [projects/tiny.yaml](../../../templates/tiny_experiments/projects/tiny.yaml)
            - [prompts/tiny_stories.yaml](../../../templates/tiny_experiments/prompts/tiny_stories.yaml)
            - [types/training_script/causal_lm/causal_lm.yaml](../../../templates/base/types/training_script/causal_lm/causal_lm.yaml)
                - [trainers/trainer.yaml](../../../templates/base/trainers/trainer.yaml)
                    - [trainers/base_trainer.yaml](../../../templates/base/trainers/base_trainer.yaml)
                        - [trainers/minimal_trainer.yaml](../../../templates/base/trainers/minimal_trainer.yaml)
                - [callbacks/loggers.yaml](../../../templates/base/callbacks/loggers.yaml)
                    - [callbacks/base_callbacks.yaml](../../../templates/base/callbacks/base_callbacks.yaml)
                - [models/abstract/load_model.yaml](../../../templates/base/models/abstract/load_model.yaml)
                    - [models/abstract/causal_lm_from_pretrained.yaml](../../../templates/base/models/abstract/causal_lm_from_pretrained.yaml)
                        - [models/abstract/base_language_model.yaml](../../../templates/base/models/abstract/base_language_model.yaml)
                - [types/training_script/training_script.yaml](../../../templates/base/types/training_script/training_script.yaml)
                    - [types/type.yaml](../../../templates/base/types/type.yaml)
                        - [base_directories.yaml](../../../forgather_workspace/base_directories.yaml)
                - [inc/formatting.jinja](../../../templates/base/inc/formatting.jinja)
            - [tiny.callbacks](../../../templates/tiny_experiments/projects/tiny.yaml)
            - [tiny.model_config](../../../templates/tiny_experiments/projects/tiny.yaml)
                - [models/tiny/tiny_causal.yaml](../../../templates/tiny_experiments/models/tiny/tiny_causal.yaml)
                    - [tokenizers/tiny_2k.yaml](../../../templates/tiny_experiments/tokenizers/tiny_2k.yaml)
                    - [models/dynamic_causal_transformer.yaml](../../../templates/modellib/models/dynamic_causal_transformer.yaml)
                        - [models/abstract/dynamic_causal_lm.yaml](../../../templates/base/models/abstract/dynamic_causal_lm.yaml)
                            - [models/abstract/custom_causal_lm.yaml](../../../templates/base/models/abstract/custom_causal_lm.yaml)
            - [tiny.trainer_config](../../../templates/tiny_experiments/projects/tiny.yaml)
            - [tiny.dataset_config](../../../templates/tiny_experiments/projects/tiny.yaml)
                - [datasets/tiny/tiny_stories_abridged.yaml](../../../templates/tiny_experiments/datasets/tiny/tiny_stories_abridged.yaml)
                    - [datasets/tiny/tiny_stories.yaml](../../../templates/tiny_experiments/datasets/tiny/tiny_stories.yaml)
                        - [datasets/abstract/base_datasets.yaml](../../../templates/base/datasets/abstract/base_datasets.yaml)
        - [project.model_config](templates/project.yaml)
        - [project.trainer_config](templates/project.yaml)
    - [experiment.model_config](templates/configs/control.yaml)
### Config Metadata:

```python
{'config_class': 'type.training_script.causal_lm',
 'config_description': 'Baseline Simple Init',
 'config_name': 'Control',
 'create_new_model': 'True',
 'datasets_dir': '/home/dinalt/ai_assets/forgather/datasets',
 'eval': 'False',
 'forgather_dir': '/home/dinalt/ai_assets/forgather',
 'logging_dir': './output_models/simple_init/runs/simple_init_2025-05-24T20-19-08',
 'model_src_dir': '/home/dinalt/ai_assets/forgather/model_src',
 'models_dir': './output_models',
 'output_dir': './output_models/simple_init',
 'project_dir': '.',
 'save_model': 'False',
 'tokenizers_dir': '/home/dinalt/ai_assets/forgather/tokenizers',
 'train': 'True',
 'workspace_root': '/home/dinalt/ai_assets/forgather'}

```

## Modules
- [./output_models/simple_init/tiny_causal_transformer.py](output_models/simple_init/tiny_causal_transformer.py) : DynamicCasualLM
    - [/home/dinalt/ai_assets/forgather/examples/trainers/init_weights/./output_models/xavier_uniform_noff/tiny_causal_transformer.py](output_models/xavier_uniform_noff/tiny_causal_transformer.py) : tiny_causal_transformer
        - [/home/dinalt/ai_assets/forgather/model_src/bits/causal_lm.py](../../../model_src/bits/causal_lm.py) : tiny_causal_transformer.causal_lm
        - [/home/dinalt/ai_assets/forgather/model_src/bits/causal_loss.py](../../../model_src/bits/causal_loss.py) : tiny_causal_transformer.causal_loss
        - [/home/dinalt/ai_assets/forgather/model_src/bits/causal_multihead_attn.py](../../../model_src/bits/causal_multihead_attn.py) : tiny_causal_transformer.causal_multihead_attn
        - [/home/dinalt/ai_assets/forgather/model_src/bits/feedforward_layer.py](../../../model_src/bits/feedforward_layer.py) : tiny_causal_transformer.feedforward_layer
        - [/home/dinalt/ai_assets/forgather/model_src/bits/init_weights.py](../../../model_src/bits/init_weights.py) : tiny_causal_transformer.init_weights
        - [/home/dinalt/ai_assets/forgather/model_src/bits/input_encoder.py](../../../model_src/bits/input_encoder.py) : tiny_causal_transformer.input_encoder
        - [/home/dinalt/ai_assets/forgather/model_src/bits/layer_stack.py](../../../model_src/bits/layer_stack.py) : tiny_causal_transformer.layer_stack
        - [/home/dinalt/ai_assets/forgather/model_src/bits/post_ln_layer.py](../../../model_src/bits/post_ln_layer.py) : tiny_causal_transformer.post_ln_layer
        - [/home/dinalt/ai_assets/forgather/model_src/bits/sinusoidal_pe.py](../../../model_src/bits/sinusoidal_pe.py) : tiny_causal_transformer.sinusoidal_pe
- [./output_models/simple_init/tiny_causal_transformer.py](output_models/simple_init/tiny_causal_transformer.py) : DynamicCausalLMConfig
## Output Targets
- meta
- main
- model_code_writer
- distributed_env
- model
- trainer
- train_dataset
- eval_dataset
- data_collator
- trainer_callbacks
- trainer_args
- optimizer
- lr_scheduler
- model_constructor_args
- tokenizer



## Constuct Project

In [1]:
import forgather.nb.notebooks as nb
from forgather import Project

# Pass config name
proj = Project("xavier_uniform_noff.yaml")

## Dump Model Param Names

In [None]:
model = proj("model")
for name, param in model.named_parameters():
    print(name)

## Train Model in Notebook
This only works for a single GPU.

In [None]:
# Use default config and default output target (training script, in this example).
training_script = proj()
training_script.run()

## Start Tensorboard

In [6]:
# Show command to run tensorboard; local_host should be false if tensorboard should run on all network interfaces.
nb.display_tb_command(proj, local_host=False)

#### Tensorboard Command

```bash
tensorboard --bind_all --logdir "/home/dinalt/ai_assets/forgather/examples/trainers/init_weights/output_models/simple_init"
```

## Generate Trainingscript
The preferred way of running training is via the command-line. This generates a simple bash script to train the model.

In [None]:
# The second arg specifies which GPUs may be used. For example, "0,2" only allows the first and third GPU.
# Note that multi-GPU training requires a trainer implementation which supports this. e.g. "accel_trainer"
nb.generate_trainingscript(proj, "0")