[Bug Report] Cannot find `train_batch_size` in `LanguageModelSAERunnerConfig.init()` #161

MClarke1991 · 2024-05-24T16:30:41Z

Bug

When trying to run "tutorials/training_a_sparse_autoencoder.ipynb" in a Google Colab notebook with SAElens python package installed, I get the error

TypeError                                 Traceback (most recent call last)

/tmp/ipykernel_2837/4019108468.py in <cell line: 1>()
----> 1 cfg = LanguageModelSAERunnerConfig(
      2     # Data Generating Function (Model + Training Distibuion)
      3     model_name="tiny-stories-1L-21M",  # our model (more options here: https://neelnanda-io.github.io/TransformerLens/generated/model_properties_table.html)
      4     hook_point="blocks.0.hook_mlp_out",  # A valid hook point (see more details here: https://neelnanda-io.github.io/TransformerLens/generated/demos/Main_Demo.html#Hook-Points)
      5     hook_point_layer=0,  # Only one layer in the model.

TypeError: LanguageModelSAERunnerConfig.__init__() got an unexpected keyword argument 'train_batch_size'

Code example

cfg = LanguageModelSAERunnerConfig(
    # Data Generating Function (Model + Training Distibuion)
    model_name="tiny-stories-1L-21M",  # our model (more options here: https://neelnanda-io.github.io/TransformerLens/generated/model_properties_table.html)
    hook_point="blocks.0.hook_mlp_out",  # A valid hook point (see more details here: https://neelnanda-io.github.io/TransformerLens/generated/demos/Main_Demo.html#Hook-Points)
    hook_point_layer=0,  # Only one layer in the model.
    d_in=1024,  # the width of the mlp output.
    dataset_path="apollo-research/roneneldan-TinyStories-tokenizer-gpt2",  # this is a tokenized language dataset on Huggingface for the Tiny Stories corpus.
    is_dataset_tokenized=True,
    streaming=True,  # we could pre-download the token dataset if it was small.
    # SAE Parameters
    mse_loss_normalization=None,  # We won't normalize the mse loss,
    expansion_factor=16,  # the width of the SAE. Larger will result in better stats but slower training.
    b_dec_init_method="zeros",  # The geometric median can be used to initialize the decoder weights.
    apply_b_dec_to_input=False,  # We won't apply the decoder weights to the input.
    normalize_sae_decoder=False,
    scale_sparsity_penalty_by_decoder_norm=True,
    decoder_heuristic_init=True,
    init_encoder_as_decoder_transpose=True,
    normalize_activations=True,
    # Training Parameters
    lr=5e-5,  # lower the better, we'll go fairly high to speed up the tutorial.
    adam_beta1=0.9,  # adam params (default, but once upon a time we experimented with these.)
    adam_beta2=0.999,
    lr_scheduler_name="constant",  # constant learning rate with warmup. Could be better schedules out there.
    lr_warm_up_steps=lr_warm_up_steps,  # this can help avoid too many dead features initially.
    lr_decay_steps=lr_decay_steps,  # this will help us avoid overfitting.
    l1_coefficient=5,  # will control how sparse the feature activations are
    l1_warm_up_steps=l1_warm_up_steps,  # this can help avoid too many dead features initially.
    lp_norm=1.0,  # the L1 penalty (and not a Lp for p < 1)
    train_batch_size_tokens=batch_size,
    context_size=256,  # will control the lenght of the prompts we feed to the model. Larger is better but slower. so for the tutorial we'll use a short one.
    # Activation Store Parameters
    n_batches_in_buffer=64,  # controls how many activations we store / shuffle.
    training_tokens=total_training_tokens,  # 100 million tokens is quite a few, but we want to see good stats. Get a coffee, come back.
    store_batch_size_prompts=16,
    # Resampling protocol
    use_ghost_grads=False,  # we don't use ghost grads anymore.
    feature_sampling_window=1000,  # this controls our reporting of feature sparsity stats
    dead_feature_window=1000,  # would effect resampling or ghost grads if we were using it.
    dead_feature_threshold=1e-4,  # would effect resampling or ghost grads if we were using it.
    # WANDB
    log_to_wandb=True,  # always use wandb unless you are just testing code.
    wandb_project="sae_lens_tutorial",
    wandb_log_frequency=30,
    eval_every_n_wandb_logs=20,
    # Misc
    device=device,
    seed=42,
    n_checkpoints=0,
    checkpoint_path="checkpoints",
    dtype=torch.float32,
)

System info

Google Colab notebook

Checklist

I have checked that there is no similar issue in the repo (required)

The text was updated successfully, but these errors were encountered:

4gatepylon · 2024-05-29T00:10:44Z

@MClarke1991 I think you can fix it if you replace the key train_batch_size with train_batch_size_tokens. You might need to change the dtype parameter to be just the string "float32". AFAIK this is a documentation issue.

Edit: You also need to change hook_point_layer to hook_layer and hook_point to hook_name. These are breaking changes as of version 3.0. You can look at which version you have installed with pip3 list | grep sae-lens.

jbloomAus · 2024-05-29T10:34:52Z

thanks @4gatepylon !

jbloomAus closed this as completed May 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug Report] Cannot find `train_batch_size` in `LanguageModelSAERunnerConfig.init()` #161

[Bug Report] Cannot find `train_batch_size` in `LanguageModelSAERunnerConfig.init()` #161

MClarke1991 commented May 24, 2024

4gatepylon commented May 29, 2024 •

edited

Loading

jbloomAus commented May 29, 2024

[Bug Report] Cannot find train_batch_size in LanguageModelSAERunnerConfig.__init__() #161

[Bug Report] Cannot find train_batch_size in LanguageModelSAERunnerConfig.__init__() #161

Comments

MClarke1991 commented May 24, 2024

Bug

Code example

System info

Checklist

4gatepylon commented May 29, 2024 • edited Loading

jbloomAus commented May 29, 2024

[Bug Report] Cannot find `train_batch_size` in `LanguageModelSAERunnerConfig.init()` #161

[Bug Report] Cannot find `train_batch_size` in `LanguageModelSAERunnerConfig.init()` #161

4gatepylon commented May 29, 2024 •

edited

Loading