ValueError while trying to finetune BERT on multi-label text classification #6483

nsorros · 2020-12-02T18:02:32Z

nsorros
Dec 2, 2020

At this point I am not sure if this is a bug or a feature that I have not fully understood how to make it work. Apologies if this is the second and the correct avenue is stack overflow. I am trying to connect textcat with the transformer component to fine tune bert on a multilabel text classification problem. I am using a TextCatCNN architecture for textcat even though it might be an overkill since a simple dense layer with sigmoid activations should do but thought to start with a pre registered architecture. I have not managed to resolve whether the error I am getting is because I am doing something wrong or because there is a bug hence the issue.

How to reproduce the behaviour

from spacy.training import Example
import numpy as np
import spacy

nlp = spacy.load("en_core_web_trf", disable=["lemmatizer", "tagger", "ner", "attribute_ruler", "parser"])

textcat = nlp.add_pipe(
    "textcat",
    config={
        "model": {
            "@architectures": "spacy.TextCatCNN.v1",
            "exclusive_classes": False,
            "tok2vec": {
                "@architectures": "spacy-transformers.TransformerListener.v1",
                "grad_factor": 1.0,
                "pooling": {"@layers": "reduce_mean.v1"}
            }
        }
    }
)

texts = [
    "One and two",
    "One only",
    "Three and four, nothing else",
    "Two nothing else",
    "Two and three"
]
cats = [
    {"one": 1, "two": 1, "three": 0, "four": 0},
    {"one": 1, "two": 0, "three": 0, "four": 0},
    {"one": 0, "two": 0, "three": 1, "four": 1},
    {"one": 0, "two": 1, "three": 0, "four": 0},
    {"one": 0, "two": 1, "three": 1, "four": 0}
    
]
examples = []
for text, cat in zip(texts, cats):
    doc = nlp.make_doc(text)
    example = Example.from_dict(doc, {"cats": cat})
    examples.append(example)
    
optimizer = nlp.resume_training()
for epoch in range(5):
    nlp.update(
        examples,
        sgd=optimizer,
        drop=0.1
    )

throws

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-51-b29eb3c6047e> in <module>
      4         examples,
      5         sgd=optimizer,
----> 6         drop=0.1
      7     )

~/code/test_spacy/venv/lib/python3.7/site-packages/spacy/language.py in update(self, examples, _, drop, sgd, losses, component_cfg, exclude)
   1093             if name in exclude or not hasattr(proc, "update"):
   1094                 continue
-> 1095             proc.update(examples, sgd=None, losses=losses, **component_cfg[name])
   1096         if sgd not in (None, False):
   1097             for name, proc in self.pipeline:

~/code/test_spacy/venv/lib/python3.7/site-packages/spacy/pipeline/textcat.py in update(self, examples, drop, set_annotations, sgd, losses)
    219             return losses
    220         set_dropout_rate(self.model, drop)
--> 221         scores, bp_scores = self.model.begin_update([eg.predicted for eg in examples])
    222         loss, d_scores = self.get_loss(examples, scores)
    223         bp_scores(d_scores)

~/code/test_spacy/venv/lib/python3.7/site-packages/thinc/model.py in begin_update(self, X)
    304         and returns the gradient with respect to the input.
    305         """
--> 306         return self._func(self, X, is_train=True)
    307 
    308     def predict(self, X: InT) -> OutT:

~/code/test_spacy/venv/lib/python3.7/site-packages/thinc/layers/chain.py in forward(model, X, is_train)
     52     callbacks = []
     53     for layer in model.layers:
---> 54         Y, inc_layer_grad = layer(X, is_train=is_train)
     55         callbacks.append(inc_layer_grad)
     56         X = Y

~/code/test_spacy/venv/lib/python3.7/site-packages/thinc/model.py in __call__(self, X, is_train)
    286         """Call the model's `forward` function, returning the output and a
    287         callback to compute the gradients via backpropagation."""
--> 288         return self._func(self, X, is_train=is_train)
    289 
    290     def initialize(self, X: Optional[InT] = None, Y: Optional[OutT] = None) -> "Model":

~/code/test_spacy/venv/lib/python3.7/site-packages/thinc/layers/chain.py in forward(model, X, is_train)
     52     callbacks = []
     53     for layer in model.layers:
---> 54         Y, inc_layer_grad = layer(X, is_train=is_train)
     55         callbacks.append(inc_layer_grad)
     56         X = Y

~/code/test_spacy/venv/lib/python3.7/site-packages/thinc/model.py in __call__(self, X, is_train)
    286         """Call the model's `forward` function, returning the output and a
    287         callback to compute the gradients via backpropagation."""
--> 288         return self._func(self, X, is_train=is_train)
    289 
    290     def initialize(self, X: Optional[InT] = None, Y: Optional[OutT] = None) -> "Model":

~/code/test_spacy/venv/lib/python3.7/site-packages/thinc/layers/chain.py in forward(model, X, is_train)
     52     callbacks = []
     53     for layer in model.layers:
---> 54         Y, inc_layer_grad = layer(X, is_train=is_train)
     55         callbacks.append(inc_layer_grad)
     56         X = Y

~/code/test_spacy/venv/lib/python3.7/site-packages/thinc/model.py in __call__(self, X, is_train)
    286         """Call the model's `forward` function, returning the output and a
    287         callback to compute the gradients via backpropagation."""
--> 288         return self._func(self, X, is_train=is_train)
    289 
    290     def initialize(self, X: Optional[InT] = None, Y: Optional[OutT] = None) -> "Model":

~/code/test_spacy/venv/lib/python3.7/site-packages/thinc/layers/chain.py in forward(model, X, is_train)
     52     callbacks = []
     53     for layer in model.layers:
---> 54         Y, inc_layer_grad = layer(X, is_train=is_train)
     55         callbacks.append(inc_layer_grad)
     56         X = Y

~/code/test_spacy/venv/lib/python3.7/site-packages/thinc/model.py in __call__(self, X, is_train)
    286         """Call the model's `forward` function, returning the output and a
    287         callback to compute the gradients via backpropagation."""
--> 288         return self._func(self, X, is_train=is_train)
    289 
    290     def initialize(self, X: Optional[InT] = None, Y: Optional[OutT] = None) -> "Model":

~/code/test_spacy/venv/lib/python3.7/site-packages/thinc/layers/chain.py in forward(model, X, is_train)
     52     callbacks = []
     53     for layer in model.layers:
---> 54         Y, inc_layer_grad = layer(X, is_train=is_train)
     55         callbacks.append(inc_layer_grad)
     56         X = Y

~/code/test_spacy/venv/lib/python3.7/site-packages/thinc/model.py in __call__(self, X, is_train)
    286         """Call the model's `forward` function, returning the output and a
    287         callback to compute the gradients via backpropagation."""
--> 288         return self._func(self, X, is_train=is_train)
    289 
    290     def initialize(self, X: Optional[InT] = None, Y: Optional[OutT] = None) -> "Model":

~/code/test_spacy/venv/lib/python3.7/site-packages/spacy_transformers/layers/listener.py in forward(model, docs, is_train)
     56 def forward(model: TransformerListener, docs, is_train):
     57     if is_train:
---> 58         model.verify_inputs(docs)
     59         return model._outputs, model.backprop_and_clear
     60     else:

~/code/test_spacy/venv/lib/python3.7/site-packages/spacy_transformers/layers/listener.py in verify_inputs(self, inputs)
     45     def verify_inputs(self, inputs):
     46         if self._batch_id is None and self._outputs is None:
---> 47             raise ValueError
     48         else:
     49             batch_id = self.get_batch_id(inputs)

ValueError:

Your Environment

spaCy version: 3.0.0rc2
Platform: Darwin-18.7.0-x86_64-i386-64bit
Python version: 3.7.9
Pipelines: en_core_web_sm (3.0.0a0), en_core_web_trf (3.0.0a0)

Answered by svlandeg

Dec 3, 2020

Could you try changing

textcat.initialize(lambda: examples, nlp=nlp, labels=labels)
optimizer = nlp.resume_training()

to

optimizer = nlp.initialize(lambda: examples)

The textcat labels are included in your examples, so they should be deduced automatically, you don't have to call add_label specifically or define the labels variable.

Also - make sure that your transformer is BEFORE the textcat in the pipeline, because the TransformerListener assumes that the transformer has already processed that batch of text.

View full answer

svlandeg · 2020-12-02T21:39:02Z

svlandeg
Dec 2, 2020

There's a few things here:

nlp = spacy.load("en_core_web_trf", disable=["lemmatizer", "tagger", "ner", "attribute_ruler", "parser"])
textcat = nlp.add_pipe(...)
...
optimizer = nlp.resume_training()
...

While it's true that you need to use resume_training instead of nlp.initialize on already trained components, in this case this won't work because the textcat is new and does need initialization. In the provided code, the textcat doesn't know its labels, which is typically taken care of in the initialize function (or by calling textcat.add_label specifically).

Are you actually interested in any of the other components from en_core_web_trf? If not, you could just create a new transformer component in a blank model, add the textcat, and start from there.

With spaCy 3, we really recommend using the new config system to train your custom pipelines. The config can source components from existing models if you want to build on top of the pretrained weights of en_core_web_trf. Once you've experimented a bit with using the config, you'll see that it's actually much easier to work with than implementing your own custom training loop and taking care of all the details correctly ;-)

If you want to create an entirely new model with a transformer and a textcat, you can get a basic config with this command:

python -m spacy init config textcat_gpu.cfg -p "textcat"

In the resulting file, where it reads

[components.transformer.model]
@architectures = "spacy-transformers.TransformerModel.v1"
name = "roberta-base"

you can change the name to any other model from the HF library. Once you start training your textcat component, the weights of the transformer will then be tuned to your specific challenge.

That said - we do need to make sure that error is little more user-friendly ;-)

0 replies

nsorros · 2020-12-03T09:35:11Z

nsorros
Dec 3, 2020
Author

Thanks for the quick response Sofie and the explanation plus guidance. I am also experimenting with the config files which seem to work but I am trying to port a class that wraps a spacy v2 custom training loop inside. I made some progress with your suggestion but I am still missing something to make it work. My current implementation throws yet another value error which is less cryptic but that it also hard to resolve as there is no obvious parameter n0 to set in the components to my knowledge. I did try different ways for creating the examples in case that was causing the error, for example ensuring that examples.predicted contained ._.trf_data by running through a different nlp initialised by en_core_web_trf and creating the doc with nlp(text) instead of nlp.make_doc but with no luck.

This is where I am following your suggestion.

nlp = spacy.blank("en")

texts = [
    "One and two",
    "One only",
    "Three and four, nothing else",
    "Two nothing else",
    "Two and three"
]
cats = [
    {"one": 1, "two": 1, "three": 0, "four": 0},
    {"one": 1, "two": 0, "three": 0, "four": 0},
    {"one": 0, "two": 0, "three": 1, "four": 1},
    {"one": 0, "two": 1, "three": 0, "four": 0},
    {"one": 0, "two": 1, "three": 1, "four": 0}
    
]
examples = []
for text, cat in zip(texts, cats):
    doc = nlp.make_doc(text)
    example = Example.from_dict(doc, {"cats": cat})
    examples.append(example)

textcat = nlp.add_pipe(
    "textcat",
    config={
        "model": {
            "@architectures": "spacy.TextCatCNN.v1",
            "exclusive_classes": True,
            "tok2vec": {
                "@architectures": "spacy-transformers.TransformerListener.v1",
                "grad_factor": 1.0,
                "pooling": {"@layers": "reduce_mean.v1"}
            }
        }
    }
)

transformer = nlp.add_pipe(
    "transformer",
    config={
        "model": {
            "@architectures": "spacy-transformers.TransformerModel.v1",
            "name": "robert-base",
            "tokenizer_config": {"use_fast": "true"},
        }
    }
)

labels = ["one", "two", "three", "four"]
#for label in labels:
#  textcat.add_label(label)
textcat.initialize(lambda: examples, nlp=nlp, labels=labels)

optimizer = nlp.resume_training()
for epoch in range(5):
    nlp.update(
        examples,
        sgd=optimizer,
        drop=0.1
    )

this now throws

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-124-e1a756e61499> in <module>
     30 )
     31 
---> 32 textcat.initialize(lambda : examples, nlp=nlp, labels=labels)
     33 
     34 optimizer = nlp.resume_training()

~/code/test_spacy/venv/lib/python3.7/site-packages/spacy/pipeline/textcat.py in initialize(self, get_examples, nlp, labels, positive_label)
    363         assert len(doc_sample) > 0, Errors.E923.format(name=self.name)
    364         assert len(label_sample) > 0, Errors.E923.format(name=self.name)
--> 365         self.model.initialize(X=doc_sample, Y=label_sample)
    366 
    367     def score(self, examples: Iterable[Example], **kwargs) -> Dict[str, Any]:

~/code/test_spacy/venv/lib/python3.7/site-packages/thinc/model.py in initialize(self, X, Y)
    294             validate_fwd_input_output(self.name, self._func, X, Y)
    295         if self.init is not None:
--> 296             self.init(self, X=X, Y=Y)
    297         return self
    298 

~/code/test_spacy/venv/lib/python3.7/site-packages/thinc/layers/chain.py in init(model, X, Y)
     84     for layer in model.layers:
     85         if layer.has_dim("nO") is None:
---> 86             layer.initialize(X=curr_input, Y=Y)
     87         else:
     88             layer.initialize(X=curr_input)

~/code/test_spacy/venv/lib/python3.7/site-packages/thinc/model.py in initialize(self, X, Y)
    294             validate_fwd_input_output(self.name, self._func, X, Y)
    295         if self.init is not None:
--> 296             self.init(self, X=X, Y=Y)
    297         return self
    298 

~/code/test_spacy/venv/lib/python3.7/site-packages/thinc/layers/chain.py in init(model, X, Y)
     84     for layer in model.layers:
     85         if layer.has_dim("nO") is None:
---> 86             layer.initialize(X=curr_input, Y=Y)
     87         else:
     88             layer.initialize(X=curr_input)

~/code/test_spacy/venv/lib/python3.7/site-packages/thinc/model.py in initialize(self, X, Y)
    294             validate_fwd_input_output(self.name, self._func, X, Y)
    295         if self.init is not None:
--> 296             self.init(self, X=X, Y=Y)
    297         return self
    298 

~/code/test_spacy/venv/lib/python3.7/site-packages/thinc/layers/chain.py in init(model, X, Y)
     84     for layer in model.layers:
     85         if layer.has_dim("nO") is None:
---> 86             layer.initialize(X=curr_input, Y=Y)
     87         else:
     88             layer.initialize(X=curr_input)

~/code/test_spacy/venv/lib/python3.7/site-packages/thinc/model.py in initialize(self, X, Y)
    294             validate_fwd_input_output(self.name, self._func, X, Y)
    295         if self.init is not None:
--> 296             self.init(self, X=X, Y=Y)
    297         return self
    298 

~/code/test_spacy/venv/lib/python3.7/site-packages/thinc/layers/chain.py in init(model, X, Y)
     88             layer.initialize(X=curr_input)
     89         if curr_input is not None:
---> 90             curr_input = layer.predict(curr_input)
     91 
     92     if model.layers[0].has_dim("nI"):

~/code/test_spacy/venv/lib/python3.7/site-packages/thinc/model.py in predict(self, X)
    310         only the output, instead of the `(output, callback)` tuple.
    311         """
--> 312         return self._func(self, X, is_train=False)[0]
    313 
    314     def finish_update(self, optimizer: Optimizer) -> None:

~/code/test_spacy/venv/lib/python3.7/site-packages/spacy_transformers/layers/listener.py in forward(model, docs, is_train)
     62             outputs = []
     63         elif any(doc._.trf_data is None for doc in docs):
---> 64             width = model.get_dim("nO")
     65             outputs = [
     66                 TransformerData.zeros(len(doc), width, xp=model.ops.xp)

~/code/test_spacy/venv/lib/python3.7/site-packages/thinc/model.py in get_dim(self, name)
    173         if value is None:
    174             err = f"Cannot get dimension '{name}' for model '{self.name}': value unset"
--> 175             raise ValueError(err)
    176         else:
    177             return value

ValueError: Cannot get dimension 'nO' for model 'transformer-listener': value unset

0 replies

svlandeg · 2020-12-03T09:55:46Z

svlandeg
Dec 3, 2020

Could you try changing

textcat.initialize(lambda: examples, nlp=nlp, labels=labels)
optimizer = nlp.resume_training()

to

optimizer = nlp.initialize(lambda: examples)

The textcat labels are included in your examples, so they should be deduced automatically, you don't have to call add_label specifically or define the labels variable.

Also - make sure that your transformer is BEFORE the textcat in the pipeline, because the TransformerListener assumes that the transformer has already processed that batch of text.

0 replies

nsorros · 2020-12-03T13:29:46Z

nsorros
Dec 3, 2020
Author

aha, so that did the trick 🙏

is there any way i can help with documentation to make that journey easier for people in the future? or you want to disincentivise that custom training loop use case?

0 replies

svlandeg · 2020-12-03T13:34:14Z

svlandeg
Dec 3, 2020

Happy to hear it worked!

It's just that there are a lot of small details to get right when writing your own custom training loop. I do think the documentation is already extensive, but if you find specific spots where a rephrasing or adding some explanation would help, feel free to submit a PR!

And yea, we really do want to encourage people to use the config system more. You get easy support for disabling components, sourcing them from a different model, keeping pretraining and training steps aligned, making sure your experiments are reproducable, etc. So we do focus more of the v3 documentation around that.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

ValueError while trying to finetune BERT on multi-label text classification #6483

Uh oh!

{{title}}

Uh oh!

Replies: 5 comments

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Uh oh!

ValueError while trying to finetune BERT on multi-label text classification #6483

Uh oh!

nsorros Dec 2, 2020

How to reproduce the behaviour

Your Environment

Replies: 5 comments

Uh oh!

Uh oh!

svlandeg Dec 2, 2020

Uh oh!

nsorros Dec 3, 2020 Author

Uh oh!

Uh oh!

svlandeg Dec 3, 2020

Uh oh!

nsorros Dec 3, 2020 Author

Uh oh!

svlandeg Dec 3, 2020

nsorros
Dec 2, 2020

svlandeg
Dec 2, 2020

nsorros
Dec 3, 2020
Author

svlandeg
Dec 3, 2020

nsorros
Dec 3, 2020
Author

svlandeg
Dec 3, 2020