Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to save and load EntityRecognizer? #1174

Closed
huangenyan opened this issue Jul 4, 2017 · 5 comments
Closed

How to save and load EntityRecognizer? #1174

huangenyan opened this issue Jul 4, 2017 · 5 comments
Labels
usage General spaCy usage

Comments

@huangenyan
Copy link

I have trained my own EntityRecognizer, but EntityRecognizer does not provide from_disk and to_disk method currently. How can I save my trained EntityRecognizer and reuse it in the future?

@huangenyan
Copy link
Author

BTW, EntityRecognizer.load will cause the following problem and I don't know why.

Traceback (most recent call last):
  File "/Users/me/Developer/PycharmProjects/CardTrainNER/usemodel.py", line 36, in <module>
    trained_ner = load_ner_model(nlp.vocab, './data/ner')
  File "/Users/me/Developer/PycharmProjects/CardTrainNER/usemodel.py", line 31, in load_ner_model
    return EntityRecognizer.load(path=Path(path), vocab=vocab, require=True)
  File "spacy/syntax/parser.pyx", line 155, in spacy.syntax.parser.Parser.load (spacy/syntax/parser.cpp:7600)
  File "spacy/syntax/parser.pyx", line 178, in spacy.syntax.parser.Parser.__init__ (spacy/syntax/parser.cpp:7971)
  File "spacy/syntax/ner.pyx", line 84, in spacy.syntax.ner.BiluoPushDown.get_actions (spacy/syntax/ner.cpp:4379)
KeyError: 1

@huangenyan
Copy link
Author

I just figured out how to load the model.
In spacy/syntax/ner.pyx, in cdef class BiluoPushDown

    @classmethod
    def get_actions(cls, **kwargs):
        actions = kwargs.get('actions',
                    {
                        MISSING: [''],
                        BEGIN: [],
                        IN: [],
                        LAST: [],
                        UNIT: [],
                        OUT: ['']
                    })
        seen_entities = set()
        for entity_type in kwargs.get('entity_types', []):
            if entity_type in seen_entities:
                continue
            seen_entities.add(entity_type)
            for action in (BEGIN, IN, LAST, UNIT):
                actions[action].append(entity_type)
...

You can change the last line from actions[action].append(entity_type) to actions[str(action)].append(entity_type) to make it work. However, this will trigger an error when you save the model, so you should use the original code for saving a model and use the modified version for loading a model. The key type change from int to str when loading, it's really a weird bug...

I'm not an expert on Cython. Hope someone would help to fix.

@honnibal honnibal added the usage General spaCy usage label Jul 6, 2017
@honnibal
Copy link
Member

honnibal commented Jul 6, 2017

Saving and loading was definitely a weak point in the 1.x versions. It's much better in 2.x, so you might want to have a look at the alpha version of that. The stable release of 2.0 should be out in early August.

For now: have a look here for an example of saving the entity recognizer:

https://github.com/explosion/spaCy/blob/master/spacy/language.py#L355

The part that's confusing is that you need to save out the vocab and its member vocab.strings as well, because you need the string-to-int mapping at test-time to match what you had during training. (This is fixed in v2)

@asyd
Copy link

asyd commented Jul 9, 2017

It is already fixed in v2? Because I used this sample to train NER, but when I used from_disk I also have an exeption related to vocab:

import spacy
print(spacy.__version__)
nlp = spacy.load('en_core_web_sm')
nlp.from_disk('/home/asyd/Work/AI/nlp/data/model')

2.0.0a0
[..]

No such file or directory: '/home/asyd/Work/AI/nlp/data/model/tagger/tag_map'

@lock
Copy link

lock bot commented May 8, 2018

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@lock lock bot locked as resolved and limited conversation to collaborators May 8, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
usage General spaCy usage
Projects
None yet
Development

No branches or pull requests

4 participants