KeyError when adding special tokens #624

fmfn · 2016-11-11T22:46:06Z

I am trying to run the example of adding special tokens to the tokenizer and getting the following keyerror:

<ipython-input-3-3c4362d5406f> in <module>()
      8             POS: u'VERB'},
      9         {
---> 10             ORTH: u'me'}])
     11 assert [w.text for w in nlp(u'gimme that')] == [u'gim', u'me', u'that']
     12 assert [w.lemma_ for w in nlp(u'gimme that')] == [u'give', u'-PRON-', u'that']

/Users/<user>/venvs/general/lib/python3.5/site-packages/spacy/tokenizer.pyx in spacy.tokenizer.Tokenizer.add_special_case (spacy/tokenizer.cpp:8460)()

/Users/<user>/venvs/general/lib/python3.5/site-packages/spacy/vocab.pyx in spacy.vocab.Vocab.make_fused_token (spacy/vocab.cpp:7879)()

KeyError: 'F'

The code used is the following:

import spacy
from spacy.attrs import ORTH, POS, LEMMA

nlp = spacy.load("en", parser=False)

assert [w.text for w in nlp(u'gimme that')] == [u'gimme', u'that']
nlp.tokenizer.add_special_case(u'gimme',
    [
        {
            ORTH: u'gim',
            LEMMA: u'give',
            POS: u'VERB'},
        {
            ORTH: u'me'}])
assert [w.text for w in nlp(u'gimme that')] == [u'gim', u'me', u'that']
assert [w.lemma_ for w in nlp(u'gimme that')] == [u'give', u'-PRON-', u'that']

Am I missing something here?

System info:

MacOS
python3.5.2
spacy 1.2.0

The text was updated successfully, but these errors were encountered:

honnibal · 2016-11-11T22:49:25Z

Sorry about this — the docs got a bit ahead of the code here. The docs describe how the feature should work, and will work shortly (I'll probably fix it over the weekend).

At the moment you can use the key "F" instead of ORTH, "L" instead of LEMMA, and "pos" instead of POS.

fmfn · 2016-11-11T22:54:10Z

Nice!

I got it to work by passing 'F' and working backwards, after I traced the make_fused_token method. But "L" and "P" were extra hidden.

Thanks for the lightning reply and superb work.

fmfn · 2016-11-11T23:01:13Z

After changing it to:

nlp.tokenizer.add_special_case(
    u'gimme',
    [
        {
            "F": u'gim',
            "L": u'give',
            "pos": u'VERB'
        },
        {
            "F": u'me',
        }
    ]
)

I get:

KeyError                                  Traceback (most recent call last)
<ipython-input-6-df7b9eb25a34> in <module>()
      8         },
      9         {
---> 10             "F": u'me',
     11         }
     12     ]

/Users/<>/venvs/general/lib/python3.5/site-packages/spacy/tokenizer.pyx in spacy.tokenizer.Tokenizer.add_special_case (spacy/tokenizer.cpp:8460)()

/Users/<>/venvs/general/lib/python3.5/site-packages/spacy/vocab.pyx in spacy.vocab.Vocab.make_fused_token (spacy/vocab.cpp:7907)()

/Users/<>/venvs/general/lib/python3.5/site-packages/spacy/morphology.pyx in spacy.morphology.Morphology.assign_tag (spacy/morphology.cpp:3919)()

KeyError: 97

…l-case rules.

honnibal · 2016-11-25T11:45:04Z

This should now be fixed on master. Thanks for your patience.

lock · 2018-05-09T02:38:23Z

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

honnibal added the bug Bugs and behaviour differing from documentation label Nov 11, 2016

honnibal added a commit that referenced this issue Nov 25, 2016

Fix #656, #624: Support arbitrary token attributes when adding specia…

1e0f566

…l-case rules.

honnibal added a commit that referenced this issue Nov 25, 2016

Test #656, #624: special case rules for tokenizer with attributes.

6652f2a

honnibal closed this as completed Nov 25, 2016

lock bot locked as resolved and limited conversation to collaborators May 9, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

KeyError when adding special tokens #624

KeyError when adding special tokens #624

fmfn commented Nov 11, 2016

honnibal commented Nov 11, 2016

fmfn commented Nov 11, 2016

fmfn commented Nov 11, 2016

honnibal commented Nov 25, 2016

lock bot commented May 9, 2018

KeyError when adding special tokens #624

KeyError when adding special tokens #624

Comments

fmfn commented Nov 11, 2016

honnibal commented Nov 11, 2016

fmfn commented Nov 11, 2016

fmfn commented Nov 11, 2016

honnibal commented Nov 25, 2016

lock bot commented May 9, 2018