Skip to content

Word lists for human-memoriable encoding of large bases.

Notifications You must be signed in to change notification settings

imuli/word-bases

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

word-base

This is a cross-linguistic mapping of numbers to semantic points, and those senses to lexemes in multiple languages.

Structure

For each base, there is a list of semantic points (from wordnet) in bases/base. Semantic points may be correlated across multiple classes (verb, noun, adjective), with the primary point listed first.

Each language has a tab separated file at langs/language mapping semantic points to one production word and from multiple synonyms of that sense and homonyms of that word.

Bases

256

256 was generated by taking the most common 3-5 letter English words (from Google's ngram analysis) and filtering for words whose primary sense was present in all Multilingual Wordnet synsets which covered more than 80% of their "core" vocabulary. The word "star" was added manually.

4096

Does not currently exist, but will likely be derived from the ~5000 word "core" vocabulary from Multilingual Wordnet.

Languages

English

English is at version 0.1.0, a set of 256 senses mapped to production words, in need of additional recognition words and probably some rearrangement.

About

Word lists for human-memoriable encoding of large bases.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published