In [1]:
from estnltk.validators.word_validator import Token

`Token` is a subclass of `str`.

In [2]:
t = Token('seda-ja-teist')
t

'seda-ja-teist'

`Token.replace` returns a `Token`.

In [3]:
t.replace('-', '')

'sedajateist'

`Token.split` returns a `list` of `Token`.

In [4]:
t.split('-')

['seda', 'ja', 'teist']

In [5]:
t.analysis

[{'clitic': '',
  'ending': 'st',
  'form': 'pl el',
  'lemma': 'seda-ja-sina',
  'partofspeech': 'P',
  'root': 'se+da-ja-sina',
  'root_tokens': ['seda', 'ja', 'sina']},
 {'clitic': '',
  'ending': 'ist',
  'form': 'pl el',
  'lemma': 'seda-ja-tee',
  'partofspeech': 'S',
  'root': 'se+da-ja-tee',
  'root_tokens': ['seda', 'ja', 'tee']},
 {'clitic': '',
  'ending': '0',
  'form': '',
  'lemma': 'seda-ja-teist',
  'partofspeech': 'D',
  'root': 'se+da-ja-teist',
  'root_tokens': ['seda', 'ja', 'teist']},
 {'clitic': '',
  'ending': 't',
  'form': 'sg p',
  'lemma': 'seda-ja-teine',
  'partofspeech': 'O',
  'root': 'se+da-ja-teine',
  'root_tokens': ['seda', 'ja', 'teine']},
 {'clitic': '',
  'ending': 't',
  'form': 'sg p',
  'lemma': 'seda-ja-teine',
  'partofspeech': 'P',
  'root': 'se+da-ja-teine',
  'root_tokens': ['seda', 'ja', 'teine']}]

In [6]:
t.part_of_speeches

{'D', 'O', 'P', 'S'}

In [7]:
t.lemmas()

{'seda-ja-sina', 'seda-ja-tee', 'seda-ja-teine', 'seda-ja-teist'}

In [8]:
t.lemmas('P')

{'seda-ja-sina', 'seda-ja-teine'}

A **case** is one of the

In [9]:
t.all_cases

frozenset({'ab',
           'abl',
           'ad',
           'adt',
           'adt_or_ill',
           'all',
           'el',
           'es',
           'g',
           'ill',
           'in',
           'kom',
           'n',
           'p',
           'ter',
           'tr'})

If the token has a case `'adt'` or `'ill'`, then `'adt_or_ill'` is also addet to the set of cases.

In [10]:
t.cases()

{'el', 'p'}

In [11]:
t.cases('S')

{'el'}

In [12]:
Token('lauta').cases()

{'ab', 'adt', 'adt_or_ill', 'p'}

A token is a **word** if it has at least one analysis where the part of speech isnot `'Y'` or `'Z'`.

In [13]:
t.is_word

True

A token is a **conjunction** if it has at least one analysis where the part of speech is `'J'`.

In [14]:
t.is_conjunction

False

A **normal** form of a token is the token with removed stammer and hyphenation if the result is a word. Otherwise, the normal form of the token is the token itself.

In [15]:
Token('v-v-v-ve-ve-ve-vere-taoline').normal

'veretaoline'

A token is a **simple pronoun** if it has an analysis where the part of speech is `'P'` and the lemma is listed in `estnltk/estnltk/rewriting/syntax_preprocessing/rules_files/pronouns.csv`

A token is a **pronoun** if 
- it is a simple pronoun

or
- the normal form of the token consists of hyphen-separated parts such that the last part is a simple pronoun and
    - one of the parts has a lemma `'teadma'`
    
    or
    
    - all the parts are either conjunctions or simple pronouns and the simple pronouns have a common case
    
    or
    
    - the last part has the case `'ter'`, `'es'`, `'ab'`, or `'kom'` and the rest of the parts are either conjunctions or simple pronouns with case `'g'`. 

In [16]:
t.is_pronoun

True