Skip to content

A library to filter unicode characters based on language locales!

License

Notifications You must be signed in to change notification settings

stargate-rewritten/Language-Linter

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 

Repository files navigation

Language Linter

A language-based approach to unicode filtering.

This project is a WORK IN PROGRESS!
No stable versions are as of yet available!

Purpose

Language-Linter is a library that is designed to help projects filter unicode characters based on linguistic characteristics.
Given an ISO 639 identifier, LangLint's methods enable the identification and management of foreign-language characters.

For example, in ar (Arabic) environments, users will be able to use 'هذه الرموز', but not '这些符号'.
Conversely, in zh (Chinese) environments, users will be able to use '这些符号', but not 'هذه الرموز'.

Subject to configuration, Latin characters (U+0000 - U+007F) may be whitelisted regardless of locale.

Background

Original Use Case:

(Click to Expand) Development for this library was started by the SG-Rewritten Project to accommodate the use-case outlined below:

Under its default configuration, Stargate allows its end users to name their own gates, networks, etc.
While gate names are used to identify specific portals, network names serve to identify groups of portals.
In both cases, the plugin facilitates valid use cases wherein other players may need to retype the collected strings.

Accordingly, such strings must be memorable, legible, and most importantly, capable of being copied by other players.
It follows that they should prevent unicode characters when they are inaccessible to most of a relevant userbase.

Through an ISO 639-1 config option, SG has multilingual support; thus, filtering out non-Latin characters is not an option.
Instead, SG must filter out all unicode characters that are supported by neither Latin nor a target locale.

Language-Linter is a library developed by SG-Rewritten; its goal is to facilitate linguistic unicode filtration.

Apart from the above, we have thought of three other situations within which this library may be useful:

Other Possible Use Cases:

  • Helping chat filtration systems detect spam and bypasses.
  • Sanitizing user inputs to ensure they can be easily reproduced by other users.
  • Ensuring string legibility and preventing things such as t̶̪̅h̷̼͝í̴̼s̸̬̋.

Features

  • Tools to convert from ISO 639-1 to unicode block aliases.
  • Tools to filter strings based on a passed ISO 639-1 locale.
  • Future features TBD.

How it works

LangLint maps languages to scripts; it also provides a collection of methods that interact with these mappings.
The mappings themselves are compiled with data from the Unicode Consortium's Common Locale Data Repository.

Key concepts:

Languages
Scripts
  • Unicode is divided into scripts.
  • Scripts are groups of symbols with common histories; generally, they are related to systems of writing.

Documentation

Not yet available.

Releases

No releases published

Packages

No packages published