You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
EntityRecognizer throws IndexError when used in pipeline with Transformer and custom span getter during training:
File "/home/---/---/research_spacy_ru/.venv/lib/python3.8/site-packages/spacy/language.py", line 1122, in update
proc.update(examples, sgd=None, losses=losses, **component_cfg[name])
File "spacy/pipeline/transition_parser.pyx", line 416, in spacy.pipeline.transition_parser.Parser.update
File "spacy/ml/parser_model.pyx", line 293, in spacy.ml.parser_model.ParserStepModel.finish_steps
File "spacy/ml/parser_model.pyx", line 456, in spacy.ml.parser_model.precompute_hiddens.begin_update.backward
File "/home/---/---/research_spacy_ru/.venv/lib/python3.8/site-packages/spacy/ml/_precomputable_affine.py", line 49, in backward
Xf = X[ids]
IndexError: index 221 is out of bounds for axis 0 with size 221
How to reproduce the behaviour
I created my custom span_getter: https://gist.github.com/tomateit/06e53b108f764e7240ea7ae8e2e830fd
It adapts number of words to respective number of word pieces, to better fit into transformer window.
Pipeline works with this function, the exception is thrown only at some documents.
Thanks for the report and sorry it's taken us a long time to follow up on this. Unfortunately, because the issue is happening deep in the spaCy internals and your custom code isn't very simple, it's hard to be sure what's going on here.
Can you create a small example we can run to reproduce the problem? A repo like the one you linked to with a project file would be great, but that repo's project file doesn't seem to work and doesn't use Transformers anyway.
I add my span getter (I added more comments to make its algorithm more clear)
I alter config to use my transformer of choise
And the error remains.
P.S. The repo I linked in my first message does use transformer config, in project file it's called by "train_trf" and not "train" - to be able to use both configs.
EntityRecognizer throws IndexError when used in pipeline with Transformer and custom span getter during training:
How to reproduce the behaviour
I created my custom span_getter: https://gist.github.com/tomateit/06e53b108f764e7240ea7ae8e2e830fd
It adapts number of words to respective number of word pieces, to better fit into transformer window.
Pipeline works with this function, the exception is thrown only at some documents.
I plug it into simple transformer + ner pipeline like this: https://github.com/tomateit/natasha-spacy/blob/transformer-pipeline/project/config_trf.cfg
(in my tests I disabled all but transformer and NER)
This error is emitted at the line https://github.com/explosion/spaCy/blob/master/spacy/ml/_precomputable_affine.py#L49
Your Environment
The text was updated successfully, but these errors were encountered: