Highly-Language-Independent-Word-Lemmatization-Using-a-Machine-Learning-Classifier

Highly Language-Independent Word Lemmatization Using a Machine-Learning Classifier

In short, we train a RandomForest multiclass-multiputput Classifier on lemma-wordform wordpairs encoded by character embeddings obtained from the charcter cooccurrence matrix and test this approach on 25 languages. We compared our results with UDPipe (http://ufal.mff.cuni.cz/udpipe).

The dictionaries for 23 languages were taken from https://github.com/michmech/lemmatization-lists for which we thank Michal Měchura a lot! Russian dictionary source is http://odict.ru and for Turkish it is Zargan dictionary

The method is fully described in the article https://www.cys.cic.ipn.mx/ojs/index.php/CyS/article/view/3775.

It would be nice if you cite our article as:

@ARTICLE {iskanderakhmetovalexandrpakirinaualiyevaalexandergelbukh2020, author = "Iskander Akhmetov, Alexandr Pak, Irina Ualiyeva, Alexander Gelbukh", title = "Highly Language-Independent Word Lemmatization Using a Machine-Learning Classifier", journal = "Computacion y Sistemas", year = "2020", volume = "24", number = "3", pages = "1353-1364", month = "sep" }

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
DS_lemm		DS_lemm
README.md		README.md
Vec_lemmatizer_new2020-tree_classification_char_emb-bulk_all_lang-udpipe.ipynb		Vec_lemmatizer_new2020-tree_classification_char_emb-bulk_all_lang-udpipe.ipynb
lang_codes_dic.csv		lang_codes_dic.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Highly-Language-Independent-Word-Lemmatization-Using-a-Machine-Learning-Classifier

About

Releases

Packages

Languages

iskander-akhmetov/Highly-Language-Independent-Word-Lemmatization-Using-a-Machine-Learning-Classifier

Folders and files

Latest commit

History

Repository files navigation

Highly-Language-Independent-Word-Lemmatization-Using-a-Machine-Learning-Classifier

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages