Skip to content

dictionary database to index etymological and derivational relationships

License

Notifications You must be signed in to change notification settings

lady3bean/sem.io

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

44 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

# About

Sem.io is an ongoing effort to mine and organize dictionary data into an efficient database that intrinsically links words to their etymologies with ActiveRecord Associations.

Words have references to their word origins, derivations, and etymological relationships.

Words have spellings and ISO language codes that have been indexed with btree for faster lookup.

See the rake db:import task in /lib for the code for importing the ISO language code and word data while automatically assigning their relationships. With about 470,000 entries in tsv format, this took about 2 hours on my machine, or O(n2) time in the abstract.

# Next Steps

I will soon be deploying to a server so this can be consumed as API. Watch github.com/lady3bean/semio_frontend for updates on the frontend. I am using d3.js in Angular to build an interactive data visual for users to explore and see the relationships between words.

# Resources

Data is taken from Gerad de Melo’s Etymological Wordnet: www1.icsi.berkeley.edu/~demelo/etymwn/. So far, I have imported only the content with English as a language, and any words that those words reference in other languages. I am planning to source word definitions from Wordnik’s API: developer.wordnik.com and word lemmas and grammar data from LinguaSys’ Linguistic API: nlp.linguasys.com/docs/services/

About

dictionary database to index etymological and derivational relationships

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published