<a href="https://colab.research.google.com/github/elleish/apertium_lexc_tools/blob/main/examples.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

### Python morphological generator based on apertium

<ul><li>Apertium git <a href="https://github.com/apertium/apertium-sah">github.com/apertium/apertium-sah</a></li>
<li>Apertium online service <a href="https://beta.apertium.org/index.eng.html#analysis?aLang=sah">beta.apertium.org</a></li>
<li>Apertium explained <a href="https://blogs.helsinki.fi/language-technology/files/2016/09/FINMT2016-francis-tyers.pdf">blogs.helsinki.fi/language-technology</a></li>
<li>Starting a new language with HFST <a href="https://wiki.apertium.org/wiki/Starting_a_new_language_with_HFST">wiki.apertium.org</a>
</ul>

In [1]:
!git clone http://github.com/elleish/apertium_lexc_tools

Cloning into 'apertium_lexc_tools'...
remote: Enumerating objects: 86, done.[K
remote: Counting objects: 100% (86/86), done.[K
remote: Compressing objects: 100% (79/79), done.[K
remote: Total 86 (delta 49), reused 14 (delta 7), pack-reused 0[K
Receiving objects: 100% (86/86), 64.67 KiB | 2.69 MiB/s, done.
Resolving deltas: 100% (49/49), done.


In [2]:
import apertium_lexc_tools.lexc_parser as lp
Sakha = lp.download('Sakha')
tree = lp.Tree(Sakha)

The .lexc file of  Sakha downloaded from https://raw.githubusercontent.com/apertium/apertium-sah/master/apertium-sah.sah.lexc


In [4]:
print('------------------------------------')
print('Parts of speech')
print('------------------------------------')
print(tree.tree['Root'])

------------------------------------
Parts of speech
------------------------------------
['Miscellaneous', 'Copula', 'Conjunctions', 'Postpositions', 'Determiners', 'Pronouns', 'Numerals', 'Nouns', 'ProperNouns', 'Adjectives', 'Adverbs', 'Verbs', 'Interjections', 'Abbreviations', 'Punctuation', 'Digits', 'Guesser', 'Modals']


In [5]:
print('------------------------------------')
print('Count the lemmas in a specific part of speech')
print('------------------------------------')
print(len(tree.tree['Nouns']))

------------------------------------
Count the lemmas in a specific part of speech
------------------------------------
12890


In [6]:
print('------------------------------------')
print('List of tree nodes')
print('------------------------------------')
print(tree.tree.keys())

------------------------------------
List of tree nodes
------------------------------------
dict_keys(['__empty', 'Multichar_Symbols', 'Root', 'CLIT-EMPH', 'CLITICS-NO-COP', 'CLITICS-INCL-COP', 'COPULA', 'CASES-OBL', 'CASES-POSS-3SP', 'CASES-POSS-12SG', 'CASES-NOM', 'POSS-OBL', 'POSS-PX3PL-OBL-SG', 'POSS-PX3PL-OBL-PL', 'POSS-OBL-PL', 'POSS-OBL-SG', 'POSS-NOM', 'POSS-NOM-ENDINGS', 'POSS-PX3PL-NOM-SG', 'POSS-PX3PL-NOM-PL', 'POSS-NOM-PL', 'POSS-NOM-SG', 'ATTR-SUBST', 'GENPOSS-ETC', 'CASES-ETC', 'N-INFL-COMMON-SG', 'N-INFL-COMMON-PL', 'CASES', 'GER-SUBST', 'GER-SUBST-NOM', 'SUBST', 'LII-POSTPOSITION', 'FULL-NOMINAL-INFLECTION', 'N1', 'N1-IRREG-PL', 'N-COMPOUND-PX-COMMON', 'N5', 'NP-COMMON', 'NP-ANT-M', 'NP-ANT-F', 'NP-PAT-VICH', 'NP-COG-OBIN-FEM', 'NP-COG-OB', 'NP-COG-IN', 'NP-COG-M', 'NP-COG-MF', 'NP-PAT-M', 'NP-TOP', 'NP-TOP-RUS', 'NP-TOP-ASSR', 'NP-TOP-COMPOUND', 'NP-TOP-ABBR', 'NP-ORG', 'NP-AL', 'A1', 'A2', 'A3', 'A4', 'A9', 'NUM', 'NUM-DIGIT', 'NUM-ORD', 'NUM-COLL', 'NUM-ROMAN', 'PRO

In [7]:
print('------------------------------------')
print('N1')
print('------------------------------------')
print(tree.print_tree('N1', depth_restrict=3))

------------------------------------
N1
------------------------------------
┠── <n> :  SUBST
┃  ┠── <n> :  N-INFL-COMMON-SG
┃  ┃  ┠── <n> :  POSS-NOM-SG
┃  ┃  ┠── <n> :  POSS-OBL-SG
┃  ┃  ┖── <n> :  CASES
┃  ┖── <n><pl> : >{L}{A}р N-INFL-COMMON-PL
┃  ┃  ┠── <n><pl> : >{L}{A}р POSS-NOM-PL
┃  ┃  ┠── <n><pl> : >{L}{A}р POSS-OBL-PL
┃  ┃  ┖── <n><pl> : >{L}{A}р CASES
┖── <n> :  LII-POSTPOSITION
┃  ┖── <n>+лыы<post> : >{L}{I}{I} 
None


In [8]:
print('------------------------------------')
print('V-IV')
print('------------------------------------')
print(tree.print_tree('V-IV', depth_restrict=3))

------------------------------------
V-IV
------------------------------------
┠── <v><iv> :  V-COMMON
┃  ┠── <v><iv> :  V-FINITE-REGULAR_NEGATIVE
┃  ┃  ┖── <v><iv><ifi> : >{T} V-PERS-IFI
┃  ┠── <v><iv><neg> : >{B}{A} V-FINITE-REGULAR_NEGATIVE
┃  ┃  ┖── <v><iv><neg><ifi> : >{B}{A}>{T} V-PERS-IFI
┃  ┠── <v><iv> :  V-FINITE-IRREGULAR_NEGATIVE
┃  ┃  ┠── <v><iv><aor> : >{A}{р} V-PERS-S1
┃  ┃  ┠── <v><iv><neg><aor> : >{B}{A}т V-PERS-S1
┃  ┃  ┠── <v><iv><past> : >{B}{I}т V-PERS-S1
┃  ┃  ┠── <v><iv><neg><past> : >{B}{A}т{A}х V-PERS-S1
┃  ┃  ┠── <v><iv><plu> : >{B}{I}т V-PERS-S2
┃  ┃  ┠── <v><iv><neg><plu> : >{B}{A}т{A}х V-PERS-S2
┃  ┃  ┠── <v><iv><pii> : >{A}{Р} V-PERS-S2
┃  ┃  ┠── <v><iv><neg><pii> : >{B}{A}т V-PERS-S2
┃  ┃  ┠── <v><iv><epis> : >{B}{I}тт{A}{A}х V-PERS-S1
┃  ┃  ┠── <v><iv><aor><nec> : >{A}р>д{A}{A}х V-PERS-S1
┃  ┃  ┠── <v><iv><fut><nec> : >{I}{A}х>т{A}{A}х V-PERS-S1
┃  ┃  ┠── <v><iv><past><ded> : >{T}{A}х V-PERS-S2
┃  ┃  ┠── <v><iv><fut> : >{I}{A}х V-PERS-S2
┃  ┃  ┠── <v><iv>

In [9]:
print('------------------------------------')
print('N1')
print('------------------------------------')
print(tree.print_tree('N1', depth_restrict=16))

------------------------------------
N1
------------------------------------
┠── <n> :  SUBST
┃  ┠── <n> :  N-INFL-COMMON-SG
┃  ┃  ┠── <n> :  POSS-NOM-SG
┃  ┃  ┃  ┠── <n> :  POSS-NOM
┃  ┃  ┃  ┃  ┠── <n> :  CASES-ETC
┃  ┃  ┃  ┃  ┃  ┖── <n> :  GENPOSS-ETC
┃  ┃  ┃  ┃  ┃  ┃  ┖── <n><loc> : >{T}{A}{A}ҕ{I} ATTR-SUBST
┃  ┃  ┃  ┃  ┃  ┃  ┃  ┠── <n><loc><subst> : >{T}{A}{A}ҕ{I} CASES-NOM
┃  ┃  ┃  ┃  ┃  ┃  ┃  ┃  ┖── <n><loc><subst><nom> : >{T}{A}{A}ҕ{I} 
┃  ┃  ┃  ┃  ┃  ┃  ┃  ┠── <n><loc><subst> : >{T}{A}{A}ҕ{I} CASES-OBL
┃  ┃  ┃  ┃  ┃  ┃  ┃  ┠── <n><loc><subst><pl> : >{T}{A}{A}ҕ{I}>{L}{A}р CASES-NOM
┃  ┃  ┃  ┃  ┃  ┃  ┃  ┃  ┖── <n><loc><subst><pl><nom> : >{T}{A}{A}ҕ{I}>{L}{A}р 
┃  ┃  ┃  ┃  ┃  ┃  ┃  ┖── <n><loc><subst><pl> : >{T}{A}{A}ҕ{I}>{L}{A}р CASES-OBL
┃  ┃  ┃  ┃  ┖── <n> :  POSS-NOM-ENDINGS
┃  ┃  ┃  ┃  ┃  ┠── <n><px1sg> : >{i}м CASES-NOM
┃  ┃  ┃  ┃  ┃  ┃  ┖── <n><px1sg><nom> : >{i}м 
┃  ┃  ┃  ┃  ┃  ┠── <n><px2sg> : >{i}ҥ CASES-NOM
┃  ┃  ┃  ┃  ┃  ┃  ┖── <n><px2sg><nom> : >{i}ҥ 
┃  ┃  ┃  ┃  ┃  