Introducing HookedEncoder #276

rusheb · 2023-05-18T13:15:04Z

Description

This feature was co-authored with @MatthewBaggins. Thanks also to @luciaquirke and @jbloomAus for helpful discussions throughout.

Closes issue #258.

Introducing HookedEncoder, a BERT-style encoder inheriting from HookedRootModule. Weights can be loaded from the huggingface bert-base-cased pretrained model.

Unlike HookedTransformer, it does not (yet) do any pre-processing of the weights (e.g. folding LayerNorm). Another difference is that the model can currently only be run with tokens, and not strings or list of strings. Currently, the supported task/architecture is masked language modelling. Next sentence prediction, causal language modelling, and other tasks are not supported. The HookedEncoder does not contain dropouts, which may lead to inconsistent results when pretraining.

This is an MVP implementation which serves as a starting point to iterate on. I've tried to keep the scope as small as possible for a number of reasons

Getting earlyish feedback on this PR
Getting earlier user feedback
Reducing the risk of getting distracted by other priorities. (Personally, I won't have much time to work on this over the next two months.)

Notes

In the end, based on the discussion in Introduce HookedEncoderConfig (Issue #258) #262, I decided to reuse HookedTransformerConfig rather than creating a new class HookedEncoderConfig. I believe the configs could still be separated if the need arises.
I chose the masked language modelling task based on feedback from people who want to do research using BERT.

Key uncertainties

I'm really not sure about the naming. Currently I have named the main transformer class HookedEncoder, but I've prefixed the new component names with Bert.
I'm hopeful that this will be easy to extend to support more variations on the architecture, like HookedTransformer, but I'd like to hear if anybody thinks otherwise.

Summary of Changes

Add new class HookedEncoder
Add new components
- TokenTypeEmbed
- BertEmbed
- BertMLMHead
- BertBlock
Add additive_attention_mask parameter to forward method of Attention component
Add BERT config and state dict to loading_from_pretrained
Extract methods from HookedTransformer for reuse:
- devices.move_to_and_update_config
- lodaing.fill_missing_keys
Add demo notebook demos/BERT.ipynb
Update Available Models list in Main Demo
Testing
- Unit and acceptance tests for HookedEncoder and sub-components
- New demo in demos/BERT.ipynb also acts as a test
- I also added some tests for existing components e.g. HookedTransformerConfig

Future work

Add support for different tasks, e.g. Next Sentence Prediction, Causal Language Modelling
Add more models: bert-base-uncased, bert-large-cased, bert-large-uncased
Add preprocessing of weights including LayerNorm folding
Accept strings as input and add tokenization helpers from HookedTransformer
Add support for training/finetuning (most notably, dropouts)
Adding examples of research using BERT to the demo notebooks

Type of change

Please delete options that are not relevant.

New feature (non-breaking change which adds functionality)
This change requires a documentation update

Checklist:

I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes
I have not rewritten tests relating to key interfaces which would affect backward compatibility

…orMaskedLM

…vice

transformer_lens/components.py

jbloomAus · 2023-05-19T09:07:25Z

Thanks @rusheb and @MatthewBaggins ! This is stellar work! I can't wait to see subsequent investigations!

rusheb and others added 26 commits May 17, 2023 15:54

Implement BERT core

a36f642

HookedEncoder run_with_cache and padding WIP

38d6e2e

Implement additive attention mask

7b1893c

Implement masked language modelling head

730759e

Add hooks to HookedEncoder and BertBlock

dbbc07e

Implment run_with_cache

61f2b8d

Add tokenizer parameter to HookedEncoder

fca5011

renamed x to input

d1e2b88

Add support for moving to device

c6d92ee

Refactor tests

4aaa415

Silence deprecation warning

26af6db

Test predictions

40f3a3c

Reuse fill_missing_keys

d4bfdda

added getters to HookedEncoder

b1f55e9

added return_type to HookedEncoder.forward

6183425

refactored convert_bet_weights

3b0157d

removed commented out defaults from convert_hf_model_config for BertF…

27ef87e

…orMaskedLM

renamed HookedEncoder.mlm_head to .classifier for generality

c5a7ef5

reordered methods in HookedEncoder and added all_head_labels

90e264a

microfix - 1-element tuple assigned to BERT state_dict

e62bf01

fixed HookedEncoder test problems caused by inconsistent moving to de…

151c9d8

…vice

typed new code

af61a2a

Add docstrings to new code

2d5d0d3

Write demo and also fix some issues

476e39a

Fix mistake in docs

1dc2cbe

Import Literal from typing_extensions

2b7005b

rusheb commented May 19, 2023

View reviewed changes

transformer_lens/components.py Outdated Show resolved Hide resolved

Update warning

ce0c46f

rusheb mentioned this pull request May 19, 2023

[Proposal] BERT: Future work #277

Open

10 tasks

rusheb merged commit c268a71 into TransformerLensOrg:main May 19, 2023
4 checks passed

rusheb deleted the rusheb-bert-WIP branch May 19, 2023 09:31

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Introducing HookedEncoder #276

Introducing HookedEncoder #276

rusheb commented May 18, 2023 •

edited

jbloomAus commented May 19, 2023

Introducing HookedEncoder #276

Introducing HookedEncoder #276

Conversation

rusheb commented May 18, 2023 • edited

Description

Notes

Key uncertainties

Summary of Changes

Future work

Type of change

Checklist:

jbloomAus commented May 19, 2023

rusheb commented May 18, 2023 •

edited