added transformers_config for passing arguments to the transformer #268

KennethEnevoldsen · 2021-03-07T19:02:48Z

Added transformers_config to allow the user to pass arguments to the transformers forward pass. Most notably the output_attentions.

for convenience, I used this example to test the code:

import spacy

nlp = spacy.blank("en")

# Construction via add_pipe with custom config
config = {
    "model": {
        "@architectures": "spacy-transformers.TransformerModel.v1",
        "name": "bert-base-uncased",
        "transformers_config": {"output_attentions": True},
    }
}
transformer = nlp.add_pipe(
    "transformer", config=config)
transformer.model.initialize()


doc = nlp("This is a sentence.")

which gives you:

len(doc._.trf_data.attention) # 12
doc._.trf_data.attention[-1].shape  # (1, 12, 7, 7) last layer of attention 
len(doc._.trf_data.tensors) # 2 
doc._.trf_data.tensors[0].shape # (1, 7, 768) <-- wordpiece embedding
doc._.trf_data.tensors[1].shape  # (1, 768) <-- assuming this is the pooled embedding?

Sidenote: it took me quite a while to find the default config str. It might be ideal to make this into a standalone file and load it in?

…though

spacy_transformers/architectures.py

spacy_transformers/layers/transformer_model.py

honnibal · 2021-05-06T06:21:00Z

@KennethEnevoldsen Thanks for this, and sorry for the delay reviewing it. I think it'll be a really nice feature, it's just a question of getting these details about the backwards compat right.

spacy_transformers/data_classes.py

KennethEnevoldsen · 2021-05-06T08:01:35Z

Thanks, @honnibal, and completely understandable. It is potentially a feature that could be integrated nicely with displacy for displaying the attention given to each word. Not entirely sure though seems like the judge is still out on how interpretability these attention weights are.

spacy_transformers/layers/transformer_model.py

spacy_transformers/pipeline_component.py

spacy_transformers/tests/util.py

svlandeg · 2021-07-05T07:56:53Z

It looks like a binary file ".DS_Store" got committed by accident...

svlandeg · 2021-07-05T10:11:52Z

@KennethEnevoldsen : Apologies again for the late follow-up! I just did one final round of review and think we should be able to wrap this up soon :-)

KennethEnevoldsen · 2021-07-06T11:08:15Z

No problem will follow up on this during the weekend

svlandeg · 2021-07-08T21:38:21Z

Thanks again for your contribution and patience! I think this looks good - just made an internal note about the documentation & moving v1 to legacy, we'll take care of that in a separate PR.

KennethEnevoldsen added 13 commits March 7, 2021 11:59

added transformers config

261960f

changed def config to include new transformers config

4335b78

fixed quotationmarks in config

de6e8f4

removed wierd symbol

d442ea7

added attention in data classes

4be954b

fixed keyerror

1a42c88

ibid

9820a60

added pass of config to forward

bea211d

ibid

61371b2

fix for init

83e5fed

fixed tensors in forward

92ce8c4

removed default for attention and added to_doc fix for attn

ff8c05a

reformatted to black (accidentally reformatted via. autopep8)

df2fa07

KennethEnevoldsen mentioned this pull request Mar 7, 2021

No way to extract Attention from my transformer? explosion/spaCy#7283

Closed

KennethEnevoldsen added 5 commits March 7, 2021 20:33

added def to transformerdata

596a60f

bugfixes - don't get why this does not use the default argument here …

fbc018d

…though

removed default trfconfig from trfmodel

d84f5bb

updated dummy transformer

11eaf37

fixed tests

c2cdbbc

svlandeg reviewed May 3, 2021

View reviewed changes

spacy_transformers/architectures.py Outdated Show resolved Hide resolved

svlandeg added the enhancement New feature or request label May 3, 2021

added Tok2VecTransformer.v2

bcb4473

honnibal reviewed May 6, 2021

View reviewed changes

spacy_transformers/layers/transformer_model.py Outdated Show resolved Hide resolved

svlandeg reviewed May 6, 2021

View reviewed changes

spacy_transformers/data_classes.py Outdated Show resolved Hide resolved

changed typing

5eb433f

KennethEnevoldsen added 3 commits May 7, 2021 14:33

fixed type hint

ed2088d

fixed name of transformer_tok2vec_v2, added def

23f5b7f

fixed default config to match name change

5300c52