Language development by @marrus-sh
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Failed to load latest commit information.

The LANGDEV Project

GitHub repo for all language development by @marrus-sh.


This project is split into several branches, described below:


The data branch contains data files related to the various LANGDEV languages. These files are used to generate the LANGDEV dictionaries. The data branch also contains the language-subtag-registry, which defines the various LANGDEV subtags (see below), and a UCD to assist with integrating LANGDEV scripts with Unicode.

LANGDEV uses the LexisML Index Record (LREC) format in conjunction with HTML to keep track of languages' lexicons. Due to the scope of the project, however, the now-depreciated LexisML 1.0 and 2.0 formats are still sometimes in use.

Atom users may find the language-lrec package useful for viewing LREC documents.


The documentation branch contains documentation and information about each language or script. Documentation files are provided in GitHub-Flavored Markdown (GFM) and designed to render well through

The documentation branch also includes various standards related to language development, and information on character sets, including Unicode.


The tools branch contains various processing tools for dealing with the documents presented in data. See the GitHub Pages website to view some of these tools in action, or view the source in the master branch.


Font development for The LANGDEV Project takes place in the fonts branch. Font development takes place using FontForge.


The master branch merges together all of the above branches into one complete directory. It also contains the GitHub Pages data for

Because the contents of the master branch derives from the others, it may run slightly behind them from time to time. Those interested in the most up-to-date data should always check the appropriate branch instead.

Document naming and IETF language tags:

The organization of documents within The LANGDEV Project is based heavily on the IETF language tags. Three-letter codes are (hypothetical) primary language subtags, each referring to a language or language family. These will sometimes be nested to show ancestry; for example, CLASSICAL SEVENSI is in the JASTU-SEVENSI family of languages, so osv can be found within the jsv folder.

Four-letter codes beginning with a number (0009, 1der) or other five-to-eight–letter codes (proto, final) specify certain variants within the main language family; in particular, these are used to denote different stages of the language's development. Special variant codes of the form block--- are used to define development blocks, described below.

development blocks

In the case of some languages, the total lexicon may be too large for meaningful development to take place in a reasonable manner. In these instances, the lexicon may be broken down into development blocks, which contain only a small subset of words within the lexicon. Development blocks are denoted with the variant subtag block---, where --- is replaced with a three-digit number from 001 to 999, with block000 being reserved for extralinguistic content (for example, orthographies or phonological information). Development blocks often only make sense when used with another variant subtag; for example, osv-0010-block001 refers to the first development block of CLASSICAL SEVENSI X, but osv-block001 is not valid.

creating compliant tags

The language tags used in The LANGDEV Project have not been registered and are thus not valid IETF tags. To remedy this, they should be prefixed with the string art-x-; for example, art-x-svi. User agents wishing to support the languages of The LANGDEV Project should treat language tags prefixed with art-x- according to the definitions in the provided language subtag registry if the remainder of the subtag is a valid according to the definitions in that file. Note that script subtags should be carried through in this process; for example, art-Latn-x-svi should be treated as svi-Latn.


All files published to this git repository are, to the extent possible under law, in the public domain. For more information, see