Skip to content

ninetypercentlanguage/wiktionary-orchestrator

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

list of words and working directory
|
|
v
wiktionary-download
|
|
v
downloaded pages
|
|
v
wiktionary-breakdown
|
|
v
tokens json files (each section of the targeted language divided by line into array)
|
|
v
wiktionary-lemma
|
|
v
lemmas json files
|
|
v
lemma-regroup (re-runs download and breakdown for the lemmas, adding to the already created downloads and tokens directories)
|
|
v
wiktionary-definitions
|
|
v
definitions json
|
|
v
wiktionary-combine
|
|
v
final json files, one for each word:

[
    {
        "part_of_speech": string, // verb, adverb e.g.
        "lemmas": [
            "word": string, // lemma for word and part of speech (usually there is just one),
            "definitions": []string, // the definitions of said lemma
        ]
    }
]

About

Script running the entire word procuring pipeline

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages