# Modify Spacy Lemmas

  * spaCy used as simple dictionary lookup to find lemmas of a word.
  * spaCy's lemmas are stored in a binary file as part of the model since version 2.2
  * The optional package `spacy-lookups-data` can be installed to get a dictionary with the lemmas that can be modified
  * The lemma lookup tables of a model can be replaced by custom lemmas.
  
See also: https://github.com/explosion/spaCy/issues/2668
  
So if you want to modify the lemmas you need to follow these steps:

**1: Install spacy-lookups-data**

```
    conda install spacy-lookups-data
```

**2: Edit the lemma dictionary**

Find the location of the lemma file, e.g. for 'de':

In [1]:
import spacy_lookups_data

fname = spacy_lookups_data.de['lemma_lookup']
print(f"Lemma file: {fname}")

Lemma file: C:\Users\Jens\Anaconda3\lib\site-packages\spacy_lookups_data\data\de_lemma_lookup.json


  * Go to the directory.
  * Unzip the file you want to change. (All files are zipped by gzip.)
  * Open it and edit the lemmas you want to change (or do it on the fly as shown below)
  
**3: Load the nlp object with your custom lemmas**

In [5]:
import spacy
import spacy_lookups_data
import json

fname = spacy_lookups_data.de['lemma_lookup']

with open(fname, 'r') as file:
    lemma_lookup = json.load(file)

# optional: correct lemmas on the fly
lemma_lookup['Sonne'] = 'Sonne'
    
nlp = spacy.load('de')

# must be executed before first call to nlp
_ = nlp.vocab.lookups.remove_table('lemma_lookup')
_ = nlp.vocab.lookups.add_table('lemma_lookup', lemma_lookup)

**4. Test the changes**

In [7]:
texts = ["Ich sonne mich in der Sonne.", 
         "Dieser Gärtner wohnt im Haus.",
         "Das war ein abendfüllendes Programm."
        ]

for text in texts:
    doc = nlp(text)
    print(" ".join([t.lemma_ for t in doc]))

Ich sonne sich in der Sonne .
Dieser Gärtner wohnen im Haus .
der sein einen abendfüllend Programm .
