Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

After Spacy 2.3 upgrade, command-line spacy train command keeps failing #5620

Closed
mbrunecky opened this issue Jun 20, 2020 · 9 comments · Fixed by #5624
Closed

After Spacy 2.3 upgrade, command-line spacy train command keeps failing #5620

mbrunecky opened this issue Jun 20, 2020 · 9 comments · Fixed by #5624
Labels
bug Bugs and behaviour differing from documentation feat / cli Feature: Command-line interface training Training and updating models

Comments

@mbrunecky
Copy link

How to reproduce the behaviour

I have been using Spacy 2.2.3 and the command below to train my models for almost a month. I decided to add GPU, and updated my Spacy installation using
pip install -D spacy[CUDA101]
This updated my install to Spacy 2.3 (I then repeatedly uninstall, pip --no-cache-dir install -U spacy[cuda101] - but that did not change the 'new' behavior)
I updated Spacy models (see environment below) AND pip install -U spacy-lookups-data

My training command now fails:

py -m spacy train en C:\Work\ML\Spacy\dataset\model C:\Work\ML\Spacy\dataset\train C:\Work\ML\Spacy\dataset\valid -v en_core_web_md
Training pipeline: ['tagger', 'parser', 'ner']
Starting with blank model 'en'
Loading vector from model 'en_core_web_md'
Traceback (most recent call last):
  File "C:\Program Files\Python\lib\runpy.py", line 193, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "C:\Program Files\Python\lib\runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "C:\Program Files\Python\lib\site-packages\spacy\__main__.py", line 33, in <module>
    plac.call(commands[command], sys.argv[1:])
  File "C:\Program Files\Python\lib\site-packages\plac_core.py", line 367, in call
    cmd, result = parser.consume(arglist)
  File "C:\Program Files\Python\lib\site-packages\plac_core.py", line 232, in consume
    return cmd, self.func(*(args + varargs + extraopts), **kwargs)
  File "C:\Program Files\Python\lib\site-packages\spacy\cli\train.py", line 266, in train
    _load_vectors(nlp, vectors)
  File "C:\Program Files\Python\lib\site-packages\spacy\cli\train.py", line 645, in _load_vectors
    util.load_model(vectors, vocab=nlp.vocab)
  File "C:\Program Files\Python\lib\site-packages\spacy\util.py", line 170, in load_model
    return load_model_from_package(name, **overrides)
  File "C:\Program Files\Python\lib\site-packages\spacy\util.py", line 191, in load_model_from_package
    return cls.load(**overrides)
  File "C:\Program Files\Python\lib\site-packages\en_core_web_md\__init__.py", line 12, in load
    return load_model_from_init_py(__file__, **overrides)
  File "C:\Program Files\Python\lib\site-packages\spacy\util.py", line 235, in load_model_from_init_py
    return load_model_from_path(data_path, meta, **overrides)
  File "C:\Program Files\Python\lib\site-packages\spacy\util.py", line 216, in load_model_from_path
    component = nlp.create_pipe(factory, config=config)
  File "C:\Program Files\Python\lib\site-packages\spacy\language.py", line 309, in create_pipe
    return factory(self, **config)
  File "C:\Program Files\Python\lib\site-packages\spacy\language.py", line 1080, in factory
    return obj.from_nlp(nlp, **cfg)
  File "pipes.pyx", line 62, in spacy.pipeline.pipes.Pipe.from_nlp
  File "pipes.pyx", line 378, in spacy.pipeline.pipes.Tagger.__init__
TypeError: __init__() got multiple values for keyword argument 'vocab'

When trying to train using an existing model (this never worked in 2.2.3), I get:

py -m spacy train en C:\Work\ML\Spacy\dataset\model  C:\Work\ML\Spacy\dataset\train C:\Work\ML\Spacy\dataset\valid -m en_core_web_md"

✘ Can't find model meta.json
en_core_web_md

(the file is there, in C:\Program Files\Python\Lib\site-packages\en_core_web_md\meta.json)

Your Environment

  • Operating System: Windows 10 (update 2004)
  • Python Version Used: Python 3.8.1
  • spaCy Version Used: 2.3.0 (fresh update from 2.2.3)
  • Environment Information:
    Installation performed as Administrator, NO venv
    py -m spacy validate
    ←[2K✔ Loaded compatibility table
    ←[1m
    ====================== Installed models (spaCy v2.3.0) ======================←[0m
    ℹ spaCy installation: C:\Program Files\Python\lib\site-packages\spacy

TYPE NAME MODEL VERSION
package en-core-web-sm en_core_web_sm 2.3.0 ✔
package en-core-web-md en_core_web_md 2.3.0 ✔
package en-core-web-lg en_core_web_lg 2.3.0 ✔

@svlandeg svlandeg added feat / cli Feature: Command-line interface training Training and updating models labels Jun 21, 2020
@mbrunecky
Copy link
Author

A minor correction:
My second complaint:
py -m spacy train en C:\Work\ML\Spacy\dataset\model C:\Work\ML\Spacy\dataset\train C:\Work\ML\Spacy\dataset\valid -m en_core_web_md
was INCORRECT.
I intended to use 'base' model -b en_core_web_md (got the option name confused). However, using en_core_web_md (still) ends in an error, which I though has been fixed in 2.3 (per error discussion).
I train for NER using two entity names (NAME_FROM, NAME_TO) and with -b en_core_web_md I (still) get:

Traceback (most recent call last):
  File "C:\Program Files\Python\lib\site-packages\spacy\cli\train.py", line 425, in train
    nlp.update(
  File "C:\Program Files\Python\lib\site-packages\spacy\language.py", line 526, in update
    proc.update(docs, golds, sgd=get_grads, losses=losses, **kwargs)
  File "nn_parser.pyx", line 446, in spacy.syntax.nn_parser.Parser.update
  File "nn_parser.pyx", line 548, in spacy.syntax.nn_parser.Parser._init_gold_batch
  File "ner.pyx", line 107, in spacy.syntax.ner.BiluoPushDown.preprocess_gold
  File "ner.pyx", line 165, in spacy.syntax.ner.BiluoPushDown.lookup_transition
KeyError: "[E022] Could not find a transition with the name 'B-NAME_FROM' in the NER model."

So the only option that seems to allow me to continue with Spacy using vectors is using one of my older (Spacy 2.2.3) models as 'base', and keep re-training them. However, the message Spacy is giving me is not very encouraging:
UserWarning: [W031] Model 'en_model' (0.0.0) requires spaCy v2.2 and is incompatible with the current spaCy version (2.3.0). This may lead to unexpected results or runtime errors. To resolve this, download a newer compatible model or retrain your custom model with the current spaCy version. For more details and available updates, run: python -m spacy validate

@adrianeboyd
Copy link
Contributor

Thanks for the report, the train command error with -v does look like a bug. If you want to go back to v2.2 in the meanwhile, you can install v2.2 with cupy like this:

pip install spacy[cuda101]==2.2.4

The warning above is correct: v2.2 and v2.3 models aren't compatible, so you need to be sure the models you're using match the spacy version you're using.

@adrianeboyd adrianeboyd added the bug Bugs and behaviour differing from documentation label Jun 22, 2020
@adrianeboyd
Copy link
Contributor

This looks like an unintended side effect of #5374.

@mbrunecky
Copy link
Author

Thank you for suggestion to revert to Spacy 2.2.4:
pip install spacy[cuda101]==2.2.4
I just wonder if re-installing cuda will not 'break' it again, as after installing spacy[cuda101] I had to remove cuda and reinstall it (per another bug report).

That said, I would prefer NOT to have to revert to 2.2.4. I was able to re-train one of my 2.1.3 models (using it as 'base'). My hope was 2.3 fixed the memory leak in beam-search. And in 2.3 it seems better, though still leaking - see bug ... (I need NER prediction confidence, and I really can not re-load the model in prediction server every 500 or so requests).
However, with -v option broken, my only option in 2.3 is using -b (base) model, which has another problem: re-training can NOT introduce any new entity types; when I do so, I get the old error:

File "ner.pyx", line 165, in spacy.syntax.ner.BiluoPushDown.lookup_transition
KeyError: "[E022] Could not find a transition with the name 'B-NAME_FROM' in the NER model."

Since bug 3782 has been closed, should I submit it again? Or is it an enhancement request "please allow adding new entity(ies) in re-training a base model"?

@adrianeboyd
Copy link
Contributor

You shouldn't have to modify your CUDA/cupy installation when upgrading spacy. It should be relatively independent. You can uninstall cupy-cuda101 and spacy will then just run on CPU with numpy, and then reinstall cupy-cuda101 and spacy will detect it again if you enable the GPU.

The beam search memory leak should be fixed in 2.2.4.

If you'd like to use 2.3.0, you can apply this very short patch and install spacy from source (or just edit this one file in your spacy install) to fix the -v option: https://github.com/explosion/spaCy/pull/5624/files

If you'd like to use the train CLI -b option with new labels, you can load the model, add the new labels with nlp.get_pipe("ner").add_label("LABEL"), save it with nlp.to_disk(), and then use that as the base model. It's hard for the train CLI to cover every possible training scenario (it's already too complicated to be honest), but you can make a copy of spacy/cli/train.py, adjust the imports, and use it as an independent script so it's easy to edit however you need.

@mbrunecky
Copy link
Author

Thank you. I will try to 'patch' the 2.3 ... the fix seems pretty simple/obvious.
That way I can keep using the vectors from Spay models such as en_core_web_md.

I got the GPU working, but I hoped for a better performance boost (but I did not spend that much on GPU either). Perhaps I can improve it by some parameter tuning. Right now Spacy reports about 30,000 GPU WPS, but the GPU utilization is very low, about 20% CUDA use on my GeForce GPX 1660 card. Looks like I will get 2x the speed...

The beam search memory leak should be fixed in 2.2.4.
Looking at my server, after loading model ~1.2 GB, after making 4000 page-predictions it goes to ~2.7 GB. It is better than in 2.2.3 but still unacceptable. Is there a bug # for this problem?

And thank you very much for the advice on adding the labels. CLI 'train' too complicated? I guess 'complicated' is a relative term. I may try it, though I do not trust my Python skills yet. Perhaps, almost 20 years ago, when my co-worker Mark Lutz was enthusiastically writing the first Python book, I should have joined him :-).

@adrianeboyd
Copy link
Contributor

In terms of the memory usage, maybe #5083 is relevant? The vocab in the model is not static and the memory usage will grow to some extent as you use on texts with tokens it hasn't seen before. If the memory usage doesn't look like it's explained by this, you can open a new issue with a minimal example that demonstrates the problem and we'll try to look into it. It could be related to the tee problem mentioned in #5083?

Most of the development has focused on efficient CPU implementations and the cupy/GPU implementation is not particularly efficient at this point. I think I normally see about a 2-4x difference between CPU and GPU, but it'll depend on your system. It's probably not going to be a huge difference.

With the train CLI there are so many possible combinations of options that it's hard to keep everything tested sufficiently. There are several configurations that we test thoroughly when training the provided models (we also use the train CLI directly for internal training), but some like this -v bug still end up falling through the cracks. The current train CLI is going to be replaced with training from config files in spacy v3, which should hopefully be easier to maintain.

@mbrunecky
Copy link
Author

mbrunecky commented Jun 25, 2020 via email

@github-actions
Copy link
Contributor

github-actions bot commented Nov 4, 2021

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Nov 4, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Bugs and behaviour differing from documentation feat / cli Feature: Command-line interface training Training and updating models
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants