list of words and working directory
|
|
v
wiktionary-download
|
|
v
downloaded pages
|
|
v
wiktionary-breakdown
|
|
v
tokens json files (each section of the targeted language divided by line into array)
|
|
v
wiktionary-lemma
|
|
v
lemmas json files
|
|
v
lemma-regroup (re-runs download and breakdown for the lemmas, adding to the already created downloads and tokens directories)
|
|
v
wiktionary-definitions
|
|
v
definitions json
|
|
v
wiktionary-combine
|
|
v
final json files, one for each word:
[
{
"part_of_speech": string, // verb, adverb e.g.
"lemmas": [
"word": string, // lemma for word and part of speech (usually there is just one),
"definitions": []string, // the definitions of said lemma
]
}
]