Skip to content

A Python package to transliterate various Cyrillic alphabets to and from the Latin alphabet using the American Library Association-Library of Congress Romanization tables.

License

Notifications You must be signed in to change notification settings

ian-nai/domovyk

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Domovyk

A Python package to transliterate various Cyrillic alphabets to and from the Latin alphabet using the American Library Association-Library of Congress Romanization tables.

domovyk statue

Domovyk is named after household spirits from Slavic mythology.

Installation

Install using pip:

pip3 install domovyk

Or download the GitHub repo and the install the only dependency (for sentence tokenization):

pip3 install nltk

Usage

The package uses the ALA-LC Romanization tables to transliterate each of the following languages:

  • Belarusian
  • Bulgarian
  • Carpatho-Rusyn
  • Church Slavonic
  • Macedonian
  • Russian
  • Serbian
  • Ukrainian

Four functions can be called for each language:

  • transliterate(var, lang) - Transliterates a Cyrillic string or list of strings (var) from a specified language (lang) into the Latin alphabet. Returns a string.
  • translatinate(var, lang) - Transliterates a string or list of strings in the Latin alphabet (var) to a specified Cyrillic script (lang). Returns a string.
  • transliterateSents(var, lang) - Tokenizes a given Cyrillic (var) into a list of sentences, then transliterates those sentences from a specified language (lang) into the Latin alphabet. Returns a list.
  • translatinateSents(var, lang) - Tokenizes a given string in the Latin alphabet (var) into a list of sentences, then transliterates those sentences to a specified Cyrillic script (lang). Returns a list.

Some usage examples are below:

from domovyk import translit

belarusian_to_latin = translit.transliterate('Як справы?', 'bel')
latin_to_macedonian = translit.translatinate('Hi, how are you?', 'mac')
ukrainian_to_latin_sents = translit.transliterateSents('Єхидна, ґава, їжак ще й шиплячі плазуни бігцем форсують Янцзи.', 'ukr')
latin_to_russian_sents = translit.translatinateSents('Hello, how are you doing?', 'rus')

The languages can be called in the code using the following abbreviations:

  • Belarusian - 'bel'
  • Bulgarian - 'bulg'
  • Carpatho-Rusyn - 'carp'
  • Church Slavonic - 'church'
  • Macedonian - 'mac'
  • Russian 'rus'
  • Serbian - 'serb'
  • Ukrainian - 'ukr'

About

A Python package to transliterate various Cyrillic alphabets to and from the Latin alphabet using the American Library Association-Library of Congress Romanization tables.

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages