Skip to content

An exhaustive word list for ispell and company-ispell.

Notifications You must be signed in to change notification settings

hongyi-zhao/english-wordlist

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

36 Commits
 
 
 
 
 
 

Repository files navigation

See here for the motivation of this repository. Basically, the aim is to build an exhaustive Enaglish word list for ispell and company-ispell, hence the name american-english-exhaustive is used. The target audience is academic research and technology users who use Emacs as a way of life.

The stardict relevant word lists are generated by pyglossary using the dictionaries download from here. The following steps is a example to create these word lists which I've described here:

$ sudo apt-get install python3-tk tix
# pyenv python environment for this operation:
$ pyenv shell datasci
$ pip install gobject PyGObject pyglossary
$ mkdir -p ~/.stardict/dic && cd $_
# Do the same steps for other stardict dictionaries to generate the csv files.
$ curl -O http://download.huzheng.org/bigdict/stardict-Webster_s_Unabridged_3-2.4.2.tar.bz2
$ tar xvf stardict-Webster_s_Unabridged_3-2.4.2.tar.bz2
$ cd stardict-Webster_s_Unabridged_3-2.4.2
$ pyglossary Webster_s_Unabridged_3.ifo Webster_s_Unabridged_3.csv
# Extract the word list entries from the above csv files:
$ for i in *.csv; do grep -hv '^"#' $i | awk -F, '{sub(/^["]/,"",$1);sub(/["]$/,"",$1);print $1}' > ${i%.csv}.txt; done

The word list american-english-insane is generated by SCOWL Custom List/Dictionary Creator with the following option:

image

The word list words.txt comes here. Finally the word list american-english-exhaustive is generated by the following command:

# https://groups.google.com/g/comp.unix.shell/c/ha5t3U54GmY/m/oQ_wd0HOBAAJ
# https://groups.google.com/g/comp.unix.shell/c/ha5t3U54GmY/m/bpLYxoqEAAAJ
$ find ./source -type f -exec cat {} + | sort -uo american-english-exhaustive
# or
$ find ./source -type f -print0 | xargs -0 cat | sort -uo american-english-exhaustive

Set the follownig variable in Emacs initialization file to use american-english-exhaustive:

(setq ispell-alternate-dictionary (file-truename "/path/to/american-english-exhaustive"))
;or
;(setq company-ispell-dictionary (file-truename "/path/to/american-english-exhaustive"))

About

An exhaustive word list for ispell and company-ispell.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published