Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KeyError when adding special tokens #624

Closed
fmfn opened this issue Nov 11, 2016 · 5 comments
Closed

KeyError when adding special tokens #624

fmfn opened this issue Nov 11, 2016 · 5 comments
Labels
bug Bugs and behaviour differing from documentation

Comments

@fmfn
Copy link

fmfn commented Nov 11, 2016

I am trying to run the example of adding special tokens to the tokenizer and getting the following keyerror:

<ipython-input-3-3c4362d5406f> in <module>()
      8             POS: u'VERB'},
      9         {
---> 10             ORTH: u'me'}])
     11 assert [w.text for w in nlp(u'gimme that')] == [u'gim', u'me', u'that']
     12 assert [w.lemma_ for w in nlp(u'gimme that')] == [u'give', u'-PRON-', u'that']

/Users/<user>/venvs/general/lib/python3.5/site-packages/spacy/tokenizer.pyx in spacy.tokenizer.Tokenizer.add_special_case (spacy/tokenizer.cpp:8460)()

/Users/<user>/venvs/general/lib/python3.5/site-packages/spacy/vocab.pyx in spacy.vocab.Vocab.make_fused_token (spacy/vocab.cpp:7879)()

KeyError: 'F'

The code used is the following:

import spacy
from spacy.attrs import ORTH, POS, LEMMA

nlp = spacy.load("en", parser=False)

assert [w.text for w in nlp(u'gimme that')] == [u'gimme', u'that']
nlp.tokenizer.add_special_case(u'gimme',
    [
        {
            ORTH: u'gim',
            LEMMA: u'give',
            POS: u'VERB'},
        {
            ORTH: u'me'}])
assert [w.text for w in nlp(u'gimme that')] == [u'gim', u'me', u'that']
assert [w.lemma_ for w in nlp(u'gimme that')] == [u'give', u'-PRON-', u'that']

Am I missing something here?

System info:

  • MacOS
  • python3.5.2
  • spacy 1.2.0
@honnibal honnibal added the bug Bugs and behaviour differing from documentation label Nov 11, 2016
@honnibal
Copy link
Member

Sorry about this — the docs got a bit ahead of the code here. The docs describe how the feature should work, and will work shortly (I'll probably fix it over the weekend).

At the moment you can use the key "F" instead of ORTH, "L" instead of LEMMA, and "pos" instead of POS.

@fmfn
Copy link
Author

fmfn commented Nov 11, 2016

Nice!

I got it to work by passing 'F' and working backwards, after I traced the make_fused_token method. But "L" and "P" were extra hidden.

Thanks for the lightning reply and superb work.

@fmfn
Copy link
Author

fmfn commented Nov 11, 2016

After changing it to:

nlp.tokenizer.add_special_case(
    u'gimme',
    [
        {
            "F": u'gim',
            "L": u'give',
            "pos": u'VERB'
        },
        {
            "F": u'me',
        }
    ]
)

I get:

KeyError                                  Traceback (most recent call last)
<ipython-input-6-df7b9eb25a34> in <module>()
      8         },
      9         {
---> 10             "F": u'me',
     11         }
     12     ]

/Users/<>/venvs/general/lib/python3.5/site-packages/spacy/tokenizer.pyx in spacy.tokenizer.Tokenizer.add_special_case (spacy/tokenizer.cpp:8460)()

/Users/<>/venvs/general/lib/python3.5/site-packages/spacy/vocab.pyx in spacy.vocab.Vocab.make_fused_token (spacy/vocab.cpp:7907)()

/Users/<>/venvs/general/lib/python3.5/site-packages/spacy/morphology.pyx in spacy.morphology.Morphology.assign_tag (spacy/morphology.cpp:3919)()

KeyError: 97

@honnibal
Copy link
Member

This should now be fixed on master. Thanks for your patience.

@lock
Copy link

lock bot commented May 9, 2018

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@lock lock bot locked as resolved and limited conversation to collaborators May 9, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Bugs and behaviour differing from documentation
Projects
None yet
Development

No branches or pull requests

2 participants