Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

v2: Take existing model and retrain NER #1130

Closed
kootenpv opened this issue Jun 13, 2017 · 6 comments
Closed

v2: Take existing model and retrain NER #1130

kootenpv opened this issue Jun 13, 2017 · 6 comments
Labels
🌙 nightly Discussion and contributions related to nightly builds usage General spaCy usage

Comments

@kootenpv
Copy link
Contributor

kootenpv commented Jun 13, 2017

I've seen https://alpha.spacy.io/docs/usage/training-ner and I really like it.

I was going to try to take the existing alpha model (which already contains deps/NER), and I was hoping it is possible to train a few iterations over a small set of my data.

Like mentioned in the documentation, it is advised to annotate data it with one model, and then overwrite some things. (rather than really take a loaded model as a starting point, I tried that also, but it also does not train like that).

So, I made sure tags, heads and deps are correct in reformat_train_data, on a small data set, and tried to train a model from scratch, changing the code to:

nlp = English(pipeline=['tensorizer', 'tagger', 'parser', 'ner'])

I expected that then everything should work automatically.

The error:

<ipython-input-161-08855558d30e> in main(model_dir)
     12         return reformat_train_data(nlp.tokenizer, train_data)
     13 
---> 14     optimizer = nlp.begin_training(get_data)
     15 
     16     for itn in range(100):

~/python/lib/python3.7/site-packages/spacy/language.py in begin_training(self, get_gold_tuples, **cfg)
    336             if hasattr(proc, 'begin_training'):
    337                 context = proc.begin_training(get_gold_tuples(),
--> 338                                               pipeline=self.pipeline)
    339                 contexts.append(context)
    340         learn_rate = util.env_opt('learn_rate', 0.001)

~/python/lib/python3.7/site-packages/spacy/pipeline.pyx in spacy.pipeline.NeuralTagger.begin_training (spacy/pipeline.cpp:13901)()

~/python/lib/python3.7/site-packages/spacy/morphology.pyx in spacy.morphology.Morphology.__init__ (spacy/morphology.cpp:4655)()

~/python/lib/python3.7/site-packages/spacy/morphology.pyx in spacy.morphology.Morphology.add_special_case (spacy/morphology.cpp:5625)()

KeyError: 4062917326063685704

Am I trying something that has no chance to work?

The normal example works fine, it just seems that introducing tagger+parser in the pipeline does not work currently?

@twielfaert
Copy link

Reminds me of #1052 in the v1.x, which is caused by a missing mapping in the tag_map.

@honnibal honnibal added the usage General spaCy usage label Jun 15, 2017
@honnibal
Copy link
Member

I think @twielfaert analysis sounds correct -- likely something missing from the tag map. By the way, the capability to add new entity labels to a pre-trained model is temporarily not working. We need to be able to resize the output weights, which isn't wired up yet.

@kootenpv
Copy link
Contributor Author

@abhishekgupta10 Just follow what it says here: https://alpha.spacy.io/docs/usage/training-ner

@mikeatm
Copy link

mikeatm commented Jul 8, 2017

@honnibal is there a specific issue tracking the capability to add new entity labels to pre-trained models for v2?
its a feature that im looking forward to.

@ines ines added the 🌙 nightly Discussion and contributions related to nightly builds label Oct 16, 2017
@ines
Copy link
Member

ines commented Oct 27, 2017

Sorry about the messy training examples an docs! I spent the past few days going over all examples, cleaning them up and adding more documentation.

Here's the new training examples directory:
https://github.com/explosion/spaCy/tree/develop/examples/training

The current state only works with the spaCy version on develop – which will be released as soon as the new models are done training. The new docs are already in the website directory on develop, but not live yet, since we want to push the new version first.

(Unless there are serious bugs or problems, the upcoming alpha version will probably also be the version we'll promote to the release candidate 🎉 )

@ines ines closed this as completed Oct 27, 2017
@lock
Copy link

lock bot commented May 8, 2018

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@lock lock bot locked as resolved and limited conversation to collaborators May 8, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
🌙 nightly Discussion and contributions related to nightly builds usage General spaCy usage
Projects
None yet
Development

No branches or pull requests

5 participants