Skip to content

Transliterate Cyrillic → Latin in every possible way

License

Notifications You must be signed in to change notification settings

petertretyakov/Iuliia

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Iuliia — Swift Version

Transliterate Cyrillic → Latin in every possible way

This is a Swift package for Iuliia by Anton Zhiyanov. It requires Swift 5.3 and Xcode 12 since from Swift 5.3 SPM supports package resources (JSON schema files and localization files in this case).

Transliteration means representing Cyrillic data (mainly names and geographic locations) with Latin letters. It is used for international passports, visas, green cards, driving licenses, mail and goods delivery etc.

Iuliia makes transliteration easy as calling iuliia.translate() in your favorite programming language.

Why use Iuliia:

  • 20 transliteration schemas (rule sets), including all main international and Russian standards.
  • Correctly implements not only the base mapping, but all the special rules for letter combinations and word endings (AFAIK, Iuliia is the only library which does so).
  • Simple API and zero third-party dependencies.

Supported schemas

Actual schemas:

And deprecated ones:

For schema details and other information, see https://dangry.ru/iuliia (in Russian).

Basic Usage

let iuliia = try! Iuliia(name: .wikipedia) // Parses schema file and initializes Schema
iuliia.translate("Юлия") // → Iuliia

Custom Schemas

You can create your own schema JSON file with the following structure:

{
    "name": "Your Schema name", 
    "mapping": {
        "а": "a",
        "б": "b",
        "в": "v",
        .
        .
        .
        "э": "e",
        "ю": "yu",
        "я": "ya"
    },
    "prev_mapping": {
        "е": "ye",
        "ае": "ye"
    },
    "next_mapping": {
        "ъа": "y",
        "ъи": "y"
    },
    "ending_mapping": {
        "ий": "y",
        "ый": "y"
    }
}
Key Required Type Description Comment
name NO String Readable title for schema "Custom" by default
mapping YES [String: String] Key — Cyrillic letter; Value — Latin represenation Only one character per key allowed, keys with more than one character will be ommited during transliteration. To define custom transliteration logic for sequence of characters use prev_mapping, next_mapping and ending_mapping.
prev_mapping NO [String: String] Key — 1 or 2 cyrillic letters; Value — Latin represenation Mapping for letters with respect to previous sibling. One letter used for transliteration in beginning of words. According to this schema any е character in beginning of word or after а character will be transliterated to ye.
next_mapping NO [String: String] Key — 2 cyrillic letters; Value — Latin represenation Mapping for letters with respect to next sibling. According to this schema any ъ character before а and и characters will be transliterated to y.
ending_mapping NO [String: String] Key — Any quantity of cyrillic letters; Value — Latin represenation Mapping for word endings. According to this schema any word ended with ий or ый will end with just y.

For example, if you want to transliterate sequence of two cyrillic characters into one latin character (ксx is the common case) you can achieve this with the following prev_mapping and next_mapping structure:

{
    "prev_mapping": {
        "кс": "x"
    },
    "next_mapping": {
        "кс": ""
    }
}

To use your schema with Iuliia initialize it with schema URL

let iuliia = try! Iuliia(schemaURL: /path/to/your/custom/schema.json)

Additionaly you can create Swift Schema object and initialize Iuliia with it.

let schema = Schema(
    name: "My Custom Schema",
    letters: [ ... ],
    previous: [ ... ],
    next: [ ... ],
    ending: [ ... ]
)
let iuliia = Iuliia(schema: schema)

As with JSON schema files only letters required, other parameters are optional.

Additional features

  • You can see if Schema.Name is actual or deprecated with isActual and isDeprecated boolean variables.
  • Full Schema localized (English or Russian) name is located in Schema.name property or in Schema.Name.title for any pre-built schemas.
  • Check IuliiaError enum for list of possible errors during initialization process. Though they can be thrown during initialization of pre-built schemas, but they all are tested, so only some black magic can produce them.

Issues and limitations

In general:

  • Only Russian subset of Cyrillic is supported in pre-built schemas.
  • Does not support composite Unicode characters (e.g., Ё, but not Ё).

Schema-specific:

  • BS 2979:1958. This schema defines two alternative translations for Ы: ЫȲ (used by the Oxford University Press) and ЫUI (used by the British Library). Iuliia uses ЫȲ.
  • GOST R 7.0.34-2014. This schema defines alternatives for many letters, but does not specify when to use which. Therefore, Iuliia uses the first of suggested translations for each such letter.
  • MVD-310. This schema defines С between two vowels → SS" rule. There is no such rule in other schemas, and MVD-310 itself is deprecated, so I decided to ignore this specific rule for the sake of code simplicity.

If you found any problems while working with Iuliia feel free to create an Issue here.

About

Transliterate Cyrillic → Latin in every possible way

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages