# <span style="color:blue"> B. Specific details for programmers: how it works</span>

## <span style="color:purple">Morphological analysis</span>

During the (full) morphological analysis, words will be segmented into morphemes and related linguistic units (e.g. roots/lemmas, suffixes), grammatical categories related to these morphemes (such as part of speech, form name) will be determined, and, in case of ambiguities, correct analyses for each word will be chosen based on contextual cues.

The core of morphological analysis is the VabamorfTagger. VabamorfTagger uses `'words'` and `'sentences'` as input layers, and tags the `'morph_analysis'` layer on the `'words'` layer. Morphological processing is done sentence-by-sentence, so the information about sentence boundaries is also required. 

In [1]:
from estnltk import Text
from estnltk.taggers import VabamorfTagger

# Initialize new morphological analyser
morph_tagger = VabamorfTagger()

# Prepare text
text=Text("Tõmba äpp kohe alla!")
# Tag layers required by morph analysis
text.tag_layer(['words','sentences'])

# Tag morph analysis
morph_tagger.tag( text )

# Results
text.morph_analysis

layer name,attributes,parent,enveloping,ambiguous,span count
morph_analysis,"normalized_text, lemma, root, root_tokens, ending, clitic, form, partofspeech",words,,True,5

text,normalized_text,lemma,root,root_tokens,ending,clitic,form,partofspeech
Tõmba,Tõmba,tõmbama,tõmba,['tõmba'],0.0,,o,V
äpp,äpp,äpp,äpp,['äpp'],0.0,,sg n,S
kohe,kohe,kohe,kohe,['kohe'],0.0,,,D
alla,alla,alla,alla,['alla'],0.0,,,D
!,!,!,!,['!'],,,,Z


Note that unlike previous layers (sentences, paragraphs), the morphological analysis layer is ambiguous. This means that one word can have more than one analysis. While VabamorfTagger also includes a disambiguation step, some linguistically difficult cases still remain ambiguous even after the disambiguation.

#### Using the lexicon extended with slang words

The parameter `slang_lex` switches on an extended version of Vabamorf's lexicon, which contains extra entries for analysing most common spoken and slang words. Example usage:

In [2]:
from estnltk import Text
from estnltk.taggers import VabamorfTagger

# Initialize new morphological analyser that uses lexicon extended with slang words
morph_tagger = VabamorfTagger(slang_lex=True)

# Prepare text
text=Text("Ok, pandikunn läks kidraga restosse saundi tekitama.")
# Tag layers required by morph analysis
text.tag_layer(['words','sentences'])

# Tag morph analysis
morph_tagger.tag( text )

# Results
text.morph_analysis

layer name,attributes,parent,enveloping,ambiguous,span count
morph_analysis,"normalized_text, lemma, root, root_tokens, ending, clitic, form, partofspeech",words,,True,9

text,normalized_text,lemma,root,root_tokens,ending,clitic,form,partofspeech
Ok,Ok,ok,ok,['ok'],0,,,I
",",",",",",",","[',']",,,,Z
pandikunn,pandikunn,pandikunn,pandi_kunn,"['pandi', 'kunn']",0,,sg n,S
läks,läks,minema,mine,['mine'],s,,s,V
kidraga,kidraga,kidra,kidra,['kidra'],ga,,sg kom,S
restosse,restosse,resto,resto,['resto'],sse,,sg ill,S
saundi,saundi,saund,saund,['saund'],0,,sg p,S
tekitama,tekitama,tekitama,tekita,['tekita'],ma,,ma,V
,tekitama,tekkima,tekki,['tekki'],tama,,tama,V
.,.,.,.,['.'],,,,Z


More technically: the standard lexicon excludes so called `nosp` ('not for speller') words, but the slang lexicon includes them. Detailed description of types of words included in Vabamorf's lexicon can be found from [https://github.com/Filosoft/vabamorf/blob/master/doc/morfi_leksikoni_kirjeldus.html](
https://nbviewer.jupyter.org/github/Filosoft/vabamorf/blob/master/doc/morfi_leksikoni_kirjeldus.html) (in Estonian). The section _"Changing Vabamorf's binary lexicons"_ below shows how to manually change Vabamorf's lexicons.

#### VabamorfTagger's components

Under the hood, VabamorfTagger applies the following processing steps:

  1. **Morphological analysis** of all the words in the input text, including guessing analyses for unknown words and proper names. This is done by applying a special tagger called `VabamorfAnalyzer`;
  
  2. **Post-processing of analyses**, which includes correcting words that contain numbers, correcting part of speech of compound tokens (such as names with initials, emoticons, abbreviations, numerics etc.), and marking some of the words as to be ignored during the morphological disambiguation; This is done by `PostMorphAnalysisTagger`;

  3. (optional) **Text-based pre-disambiguation of proper nouns**, which involves disambiguating proper name analyses based on lemma counts in the whole text. For this, a special component `CorpusBasedMorphDisambiguator` is applied. By default, this step is _disabled_ , but it can be enabled via setting `predisambiguate=True`;

  4. **Morphological disambiguation**, which involves picking out the most probable analyses for each word (based on the sentence context); This is done by a special tagger called `VabamorfDisambiguator`;
  
  5. (optional) **Text-based post-disambiguation**, which involves disambiguating remaining ambiguous analyses based on lemma -counts in the whole text. For this, a special component `CorpusBasedMorphDisambiguator` is applied. By default, this step is _disabled_ , but it can be enabled via setting `postdisambiguate=True`;

  6. **Reordering ambiguities**, which involves sorting remaining ambiguous analyses by their corpus frequency, using the frequencies [obtained from the Estonian UD corpus](https://github.com/estnltk/ambiguous-morph-reordering/). This is done by a special tagger called `MorphAnalysisReorderer`;

If you need to analyse morphologically non-standard varieties of Estonian (such as the Internet slang, transcripts of the spoken language, texts written in a dialect), you may want to decompose the morphological analysis process into substeps, make step-wise adaptations, and finally tie these adaptations together into a custom morphological analyser. For these purposes, we will also introduce sub-components of the morphological analyser.

### VabamorfAnalyzer

VabamorfAnalyzer generates morphological analyses for all the words in the input text. It also uses special heuristics to guess analyses for unknown words and proper names.

VabamorfAnalyzer also takes account the word normalization provided in the `'words'` layer: if word's attribute `'normalized_form'` contains a normalization for the word, then this normalization is used as the input in morphological analysis. If there is more than one `'normalized_form'`, then analyses will be generated for all of the normalized forms.

In [3]:
from estnltk import Text
from estnltk.taggers import VabamorfAnalyzer

# Initialize morphological analyser
morph_analyzer = VabamorfAnalyzer()

text=Text("Lae 2pp kohe alla!")
# Tag layers required by morph analysis
text.tag_layer(['words','sentences'])

# Tag morph analysis
morph_analyzer.tag( text )

text
Lae 2pp kohe alla!

layer name,attributes,parent,enveloping,ambiguous,span count
sentences,,,words,False,1
tokens,,,,False,5
compound_tokens,"type, normalized",,tokens,False,0
words,normalized_form,,,True,5
morph_analysis,"normalized_text, lemma, root, root_tokens, ending, clitic, form, partofspeech",words,,True,5


Examining the results, we can note that the resulting layer is ambiguous, and contains guesses:

In [4]:
text.morph_analysis

layer name,attributes,parent,enveloping,ambiguous,span count
morph_analysis,"normalized_text, lemma, root, root_tokens, ending, clitic, form, partofspeech",words,,True,5

text,normalized_text,lemma,root,root_tokens,ending,clitic,form,partofspeech
Lae,Lae,Laad,Laad,['Laad'],0.0,,sg g,H
,Lae,Lae,Lae,['Lae'],0.0,,sg g,H
,Lae,Lae,Lae,['Lae'],0.0,,sg n,H
,Lae,Lagi,Lagi,['Lagi'],0.0,,sg g,H
,Lae,laad,laad,['laad'],0.0,,sg g,S
,Lae,laadima,laadi,['laadi'],0.0,,o,V
,Lae,lagi,lagi,['lagi'],0.0,,sg g,S
2pp,2pp,2pp,2pp,['2pp'],0.0,,?,Y
kohe,kohe,kohe,kohe,['kohe'],0.0,,sg n,A
,kohe,kohe,kohe,['kohe'],0.0,,,D


Morphological categories (possible values for `partofspeech` and `form`) used in analyses are described [here](https://github.com/Filosoft/vabamorf/blob/master/doc/tagset.html) and [here](http://www.filosoft.ee/html_morf_et/morfoutinfo.html#2) (in Estonian).

Notes about the attribute `normalized_text`:

   * In usual situations (when _word normalization_ is not attempted), the `normalized_text` equals to `text`;

   * However, if you have modified the `'words'` layer (e.g. by adding more than one `'normalized_form'` for a word), then the attribute `normalized_text` marks, which was the `'normalized_form'` (or the surface form) that was used as a basis for generating the current analysis;

#### Analysis parameters

At the initialization of VabamorfAnalyzer, you can also change the analysis parameters: set boolean values for keyword arguments `propername`, `guess`, `compound` and `phonetic`.

Parameters `guess` and `propername` can be used to switch guessing off:

In [5]:
# Remove old morph analysis layer
text.pop_layer('morph_analysis')

# Initialize morphological analyser that does not guess proper names
morph_analyzer = VabamorfAnalyzer(propername=False)

# Tag new morph analysis
morph_analyzer.tag( text )

text.morph_analysis

layer name,attributes,parent,enveloping,ambiguous,span count
morph_analysis,"normalized_text, lemma, root, root_tokens, ending, clitic, form, partofspeech",words,,True,5

text,normalized_text,lemma,root,root_tokens,ending,clitic,form,partofspeech
Lae,Lae,laad,laad,['laad'],0.0,,sg g,S
,Lae,laadima,laadi,['laadi'],0.0,,o,V
,Lae,lagi,lagi,['lagi'],0.0,,sg g,S
2pp,2pp,2pp,2pp,['2pp'],0.0,,?,Y
kohe,kohe,kohe,kohe,['kohe'],0.0,,sg n,A
,kohe,kohe,kohe,['kohe'],0.0,,,D
alla,alla,alla,alla,['alla'],0.0,,,D
,alla,alla,alla,['alla'],0.0,,,K
!,!,!,!,['!'],,,,Z


In [6]:
# Remove old morph analysis layer
text.pop_layer('morph_analysis')

# Initialize morphological analyser that does not guess unknown words nor proper names
morph_analyzer = VabamorfAnalyzer(guess=False, propername=False)

# Tag new morph analysis
morph_analyzer.tag( text )

text.morph_analysis

layer name,attributes,parent,enveloping,ambiguous,span count
morph_analysis,"normalized_text, lemma, root, root_tokens, ending, clitic, form, partofspeech",words,,True,5

text,normalized_text,lemma,root,root_tokens,ending,clitic,form,partofspeech
Lae,Lae,laad,laad,['laad'],0.0,,sg g,S
,Lae,laadima,laadi,['laadi'],0.0,,o,V
,Lae,lagi,lagi,['lagi'],0.0,,sg g,S
2pp,,,,,,,,
kohe,kohe,kohe,kohe,['kohe'],0.0,,sg n,A
,kohe,kohe,kohe,['kohe'],0.0,,,D
alla,alla,alla,alla,['alla'],0.0,,,D
,alla,alla,alla,['alla'],0.0,,,K
!,,,,,,,,


After guessing has been switched off, unknown words receive analyses where all attribute values are set to `None`. It is important to keep in mind that before the text can be morphologically disambiguated, these `None`-analyses must be replaced with proper analyses, or otherwise, the disambiguator throws an exception.

### PostMorphAnalysisTagger

PostMorphAnalysisTagger provides post-corrections to morphological analyses before the disambiguation process. For instance, it fixes partofspeech of compound tokens, such as abbreviations and names with initials:

In [7]:
text=Text("Raamatu toim. J. K. Köstrimäe")
# Tag layers required by morph analysis
text.tag_layer(['words','sentences'])

# Initialize morphological analyser with the default settings
morph_analyzer = VabamorfAnalyzer()
# Tag morph analysis
morph_analyzer.tag( text )

# Examine results of analysis
text.morph_analysis

layer name,attributes,parent,enveloping,ambiguous,span count
morph_analysis,"normalized_text, lemma, root, root_tokens, ending, clitic, form, partofspeech",words,,True,3

text,normalized_text,lemma,root,root_tokens,ending,clitic,form,partofspeech
Raamatu,Raamatu,Raamat,Raamat,['Raamat'],0,,sg g,H
,Raamatu,raamat,raamat,['raamat'],0,,sg g,S
toim.,toim.,toim,toim,['toim'],0,,sg n,S
J. K. Köstrimäe,J. K. Köstrimäe,J. K. köstrimägi,J. K. _köstri_mägi,"['J. K. ', 'köstri', 'mägi']",0,,sg g,S


Note that because PostMorphAnalysisTagger is a `Retagger`, then we need to call the method **`retag()`** for applying the tagger:

In [8]:
# Create new postanalysis tagger
from estnltk.taggers import PostMorphAnalysisTagger

postanalysis_tagger = PostMorphAnalysisTagger()

# Rewrite morph_analysis with fixes
postanalysis_tagger.retag(text)

# Re-examine the results
text.morph_analysis

layer name,attributes,parent,enveloping,ambiguous,span count
morph_analysis,"normalized_text, lemma, root, root_tokens, ending, clitic, form, partofspeech, _ignore",words,,True,3

text,normalized_text,lemma,root,root_tokens,ending,clitic,form,partofspeech,_ignore
Raamatu,Raamatu,Raamat,Raamat,['Raamat'],0,,sg g,H,False
,Raamatu,raamat,raamat,['raamat'],0,,sg g,S,False
toim.,toim.,toim,toim,['toim'],0,,sg n,Y,False
J. K. Köstrimäe,J. K. Köstrimäe,J. K. köstrimägi,J. _K. _Köstri_mägi,"['J. K. ', 'köstri', 'mägi']",0,,sg g,H,False


In the previous example, part of speech of the abbreviation was fixed to 'Y', and part of speech of the name with initials was fixed to 'H'.

Additionally, `PostMorphAnalysisTagger` adds a temporary attribute `'_ignore'` that tells disambiguator which words should be ignored during the disambiguation.

Upon initialization of `PostMorphAnalysisTagger`, flags can be used for turning different types of post-corrections off (by default all corrections are turned on). The following boolean flags can be used:

 * `ignore_emoticons` -- if set, all emoticons (compound tokens of type `emoticon`) will be ignored during the disambiguation -- that is, their analyses will be marked with `_ignore=True`;


 * `ignore_xml_tags` -- if set, all xml tags (compound tokens of type `xml_tag`) will be ignored during the disambiguation -- that is, their analyses will be marked with `_ignore=True`;


 * `ignore_hashtags` -- if set, all hashtags (compound tokens of type `hashtag`) will be ignored during the disambiguation -- that is, their analyses will be marked with `_ignore=True`;


 * `fix_names_with_initials` -- names with initials (compound tokens of type `name_with_initial`) will have their partofspeech changed to 'H' and root strings normalized: underscores added between different parts of the name, and name start positions converted to uppercase;


 * `fix_emoticons` -- all emoticons will have their part of speech changed to 'Z';

 
 * `fix_www_addresses` -- all web addresses will have their part of speech changed to 'H';


 * `fix_email_addresses` -- all email addresses will have their part of speech changed to 'H';
 
 
 * `fix_hashtags_and_usernames` -- hashtags and Twitter-style username mentions will have their part of speech changed to 'H';


 * `fix_abbreviations` -- abbreviations with postags 'S' and 'H' will have their part of speech changed to 'Y';


 * `fix_number_postags` -- number tokens and numbers with percentages will have their part of speech changed from 'Y' to 'N';
 
 
 * `fix_number_analyses_using_rules` -- number tokens will be fixed using rules loaded from a CSV file. For instance, this will correct numbers with case endings (such as _'10e krooniga'_ and _'6t krooni'_) so that they will receive proper morphological form and part of speech. The constructor takes the name of the corrections CSV file as a parameter `number_analysis_rules`, and if no name is provided, [the default set of rules](https://github.com/estnltk/estnltk/tree/c42f8ace79bb3d0517c2f39c4de38e779e74b3dd/estnltk/taggers/morph_analysis/number_fixes) is used. In the CSV file, each line represents a single rule in the format:
 
         number_regexp,number_suffix,pos,form,ending
 
   For instance, the rule:
 
         (|[2-9]|([1-9][0-9]*[02-9]))6$,t,N,sg p,t
   
   defines a fix for tokens consisting of numbers that end with `6` and the suffix `t` (excluding tokens ending with `16t`) -- such tokens will have their part of speech fixed to `N`, form fixed to `sg p` and, ending fixed to `t`;
   
   You can find the default rules from the file [number_analysis_rules.csv](https://github.com/estnltk/estnltk/tree/c42f8ace79bb3d0517c2f39c4de38e779e74b3dd/estnltk/taggers/morph_analysis/number_fixes).
 
 
 * `remove_duplicates` -- duplicate morphological analyses will be removed;

Finally: if you need to change the configuration of `PostMorphAnalysisTagger` (or create your own Tagger for post corrections), you can use VabamorfTagger's argument `postanalysis_tagger` to replace the default post corrector with your own one:

In [9]:
from estnltk.taggers import VabamorfTagger
from estnltk.taggers import PostMorphAnalysisTagger

# Create a post-corrector that does not fix part of speech of emoticons
postanalysis_tagger_2 = PostMorphAnalysisTagger( fix_emoticons=False )

# Initialize VabamorfTagger (analyzer + disambiguator) with new post-corrector
vabamorf_tagger = VabamorfTagger(postanalysis_tagger = postanalysis_tagger_2)

# Input text
text=Text("Äge värk :P")
# Tag layers required by morph analysis
text.tag_layer(['words','sentences'])

# Tag morph analysis
vabamorf_tagger.tag( text )

# Examine results
text.morph_analysis

layer name,attributes,parent,enveloping,ambiguous,span count
morph_analysis,"normalized_text, lemma, root, root_tokens, ending, clitic, form, partofspeech",words,,True,3

text,normalized_text,lemma,root,root_tokens,ending,clitic,form,partofspeech
Äge,Äge,äge,äge,['äge'],0,,sg n,A
värk,värk,värk,värk,['värk'],0,,sg n,S
:P,:P,P,P,['P'],0,,?,Y


### VabamorfDisambiguator

VabamorfDisambiguator completes the morphological analysis, and picks out analyses that are most probable for a word in a sentence context. 

In [10]:
text=Text("Tõmba äpp kohe alla!")

# Tag layers required by morph analysis
text.tag_layer(['words','sentences'])

# Tag morph analysis
morph_analyzer.tag( text )

# Examine that the results are ambiguous
text.morph_analysis

layer name,attributes,parent,enveloping,ambiguous,span count
morph_analysis,"normalized_text, lemma, root, root_tokens, ending, clitic, form, partofspeech",words,,True,5

text,normalized_text,lemma,root,root_tokens,ending,clitic,form,partofspeech
Tõmba,Tõmba,tõmbama,tõmba,['tõmba'],0.0,,o,V
äpp,äpp,äpp,äpp,['äpp'],0.0,,sg n,S
kohe,kohe,kohe,kohe,['kohe'],0.0,,sg n,A
,kohe,kohe,kohe,['kohe'],0.0,,,D
alla,alla,alla,alla,['alla'],0.0,,,D
,alla,alla,alla,['alla'],0.0,,,K
!,!,!,!,['!'],,,,Z


In [11]:
# Now, create a new disambiguator
from estnltk.taggers import VabamorfDisambiguator

vabamorf_disambiguator = VabamorfDisambiguator()

# disambiguate the text
vabamorf_disambiguator.retag( text )

# re-examine the results
text.morph_analysis

layer name,attributes,parent,enveloping,ambiguous,span count
morph_analysis,"normalized_text, lemma, root, root_tokens, ending, clitic, form, partofspeech",words,,True,5

text,normalized_text,lemma,root,root_tokens,ending,clitic,form,partofspeech
Tõmba,Tõmba,tõmbama,tõmba,['tõmba'],0.0,,o,V
äpp,äpp,äpp,äpp,['äpp'],0.0,,sg n,S
kohe,kohe,kohe,kohe,['kohe'],0.0,,,D
alla,alla,alla,alla,['alla'],0.0,,,D
!,!,!,!,['!'],,,,Z


If the input layer contains attribute `'_ignore'`, then words which analyses have `_ignore=True` will be completely ignored during the disambiguation. This can be used to exclude XML tags, and other similar "text formatting elements" from being analysed. 

Important note: when changing `'_ignore'` values, please keep in mind that a selective ignoring of word's analyses is currently not allowed -- either all analyses of the word should be set to `_ignore=True`, or none of them.

Also note that the attribute `'_ignore'` is temporary, and it will be removed during the disambiguation. So, the new 'morph_analysis' layer created by the disambiguator does not have the attribute.

**(!)** It is important to re-iterate: if you apply morphological analyzer with guessing switched off, and/or you remove  analyses during the post-correction, you must make sure that before applying the disambiguation, all words have morphological  analyses. If VabamorfDisambiguator stumbles upon words that do not have analyses (or have analysis with all attributes set to `None`), it raises an exception.

### CorpusBasedMorphDisambiguator ( _text-based disambiguation_ )

See tutorial [B_07b_morph_analysis_with_corpus-based_disambiguation.ipynb](B_07b_morph_analysis_with_corpus-based_disambiguation.ipynb) for details.

### MorphAnalysisReorderer

See tutorial [B_07c_morph_analysis_reordering.ipynb](B_07c_morph_analysis_reordering.ipynb) for details.

---

### [Advanced] _Changing Vabamorf's binary lexicons_

Under the hood, `VabamorfTagger`, `VabamorfAnalyzer` and `VabamorfDisambiguator` are using Filosoft's [Vabamorf](https://github.com/Filosoft/vabamorf) tool for performing morphological analysis and disambiguation.
The Vabamorf tool has [two binary lexicon files](https://github.com/Filosoft/vabamorf/tree/master/dct/binary): `et.dct` (analyzer's lexicon), and `et3.dct` (disambiguator's lexicon). 
If required, you can also change these files inside EstNLTK.

#### Changing lexicon directory

First of all, EstNLTK comes with a few pre-compiled Vabamorf's lexicons, and their directories are listed in the variable `VM_LEXICONS`:

In [12]:
from estnltk.vabamorf.morf import VM_LEXICONS
VM_LEXICONS

['2020-01-22_nosp', '2020-01-22_sp']

Directories are named as ISO dates, indicating when corresponding lexicons were lastly updated or introduced into EstNLTK. 
In addition, directory names can have suffixes `nosp` vs `sp`:
 * `sp` -- the standard language lexicon. This is the default lexicon directory used by EstNLTK; 
 * `nosp` -- a variant of the standard lexicon that has been extended with non-standard and slang words, such as _kodukas_ , _mõnsa_ , _mersu_ , _kippelt_ . This lexicon can be useful for analysing Internet language texts, but as the label `nosp` ( _no speller_ ) indicates -- you should not use this lexicon in a speller, as it can give you bad spelling suggestions;
 
   Detailed description of different types of words included in Vabamorf's lexicon can be found from [here](
https://nbviewer.jupyter.org/github/Filosoft/vabamorf/blob/master/doc/morfi_leksikoni_kirjeldus.html) (in Estonian);

By default, binary lexicons are taken from the directory which name "bears the latest date". 
If the latest directories also have suffixes `nosp` and `sp`, then `sp` is chosen.
You can change which directory is used in the following manner. 
First, create a new `VabamorfInstance` that uses a different lexicon directory instead of the default one:

```python
from estnltk.vabamorf.morf import Vabamorf as VabamorfInstance
# Use an "no speller" lexicon directory instead of the default one
nosp_vm_instance = VabamorfInstance( lexicon_dir='2020-01-22_nosp' )
```
Now, if you create an instance of `VabamorfTagger`, `VabamorfAnalyzer` or `VabamorfDisambiguator`, you can use the constructor parameter `vm_instance` to override the default `VabamorfInstance`, so that using the different binary lexicons takes an effect in morphological analysis. For example:

```python
from estnltk.taggers import VabamorfTagger
new_vm_tagger = VabamorfTagger( vm_instance=nosp_vm_instance )
# Tag text with new lexicons
new_vm_tagger.tag( ... )
```

Note: `VabamorfInstance`'s parameter `lexicon_dir` can also be a full path to your own directory. In that case, the directory must contain binary lexicon files `et.dct` and `et3.dct`.

#### Specifying binary lexicon names separately

If required, you can also fully specify different names and locations for new Vabamorf's binary lexicons.
For this, you need to create a new `VabamorfInstance` by specifying parameters `lex_path` (analyzer's lexicon) and `disamb_lex_path` (disambiguator's lexicon):

```python
from estnltk.vabamorf.morf import Vabamorf as VabamorfInstance
# paths to new lexicons:
new_lex_path = 'et_new.dct'
new_disamb_lex_path = 'et3_new.dct'
new_vm_instance = VabamorfInstance( lex_path=new_lex_path, disamb_lex_path=new_disamb_lex_path )
```

Now, if you create an instance of `VabamorfTagger`, `VabamorfAnalyzer` or `VabamorfDisambiguator`, you can use the constructor parameter `vm_instance` to override the default `VabamorfInstance`, so that custom lexicons are used in morphological analysis. For example:

```python
from estnltk.taggers import VabamorfTagger
new_vm_tagger = VabamorfTagger(  vm_instance=new_vm_instance )
# Tag text with new lexicons
new_vm_tagger.tag( ... )
```

Note that if the parameter `lexicon_dir` is also specified, parameters `lex_path` and `disamb_lex_path` will override it.

**What to keep in mind when using customized lexicons:**

 * If you decide to change lexicons, you should update both analyzer's lexicon and disambiguator's lexicon, and **change both lexicons simultaneously**. Changing only one of the lexicons is risky and may result in lessened quality of analysis/disambiguation;
 * Vabamorf's lexicon updates can go hand-in-hand with source updates, and as a result, backward compatibility of binary lexicons is not guaranteed. Still, there are some configurations that should be compatible:
     * EstNLTK v1.6.0b to v1.6.5b (probably also v1.4, although it has not been tested) <=> Vabamorf's source and lexicons up to [the commit c9f63652e1](https://github.com/Filosoft/vabamorf/tree/c9f63652e185b52bfb83f0ed6ac9ed4131d0bc03) (incl);
     * EstNLTK v1.6.6b <=> Vabamorf's source and lexicons from [the commit e80a2c4e91](https://github.com/Filosoft/vabamorf/tree/e80a2c4e91f149a4a1446be60a05c5e30a5a403a) to [the commit 7a44b62dba](https://github.com/Filosoft/vabamorf/tree/7a44b62dba66cd39116edaad57db4f7c6afb34d9) (incl);