GitHub - jmoy/norvig-spell: Different Implementations of Norvig's spellchecker

Implementations of Norvig's spelling corrector in various languages.

Languages

Python (minor modification of Norvig's original program)
C++14 (common header)
- undordered_map
- hat-trie
Haskell
- HashMap
- Trie
C

dynamic programming

hat-trie is the fast trie implementation by dcjones.

Building

The hat-trie library is included as a submodule. Doing

git submodule init
git submodule update
bash -c 'cd hat-trie && autoreconf -i && ./configure && make'

will build it.

The Haskell programs depend on the bytestring, bytestring-trie and unordered-containers packages.

The Java program requires Java 8 and Maven.

Once you have all the dependencies, running make at the top level will build all the programs and place them in bin/.

Each program can be run as

norvig_xx [training data set]

where [training data set] is a plain text file from which the program will learn word frequencies. The programs expect to be given one word per line on standard input and print

word, correction

pairs on standard output.

The folder data/ has a training file train.txt and a test file test.txt. The first is exactly Norvig's big.txt. The second is Norvig's test set but with multiword strings removed.

Performance

On Linux,running make benchmark creates a file benchmarks/all.md containing performance results.

The different implementations use different algorithms and code organization, so the benchmarks are not in any sense a comparison of the different languages.

This is what I get on my setup.

Version	Time (s)	Memory use (Max resident,M)	Lines of code
Python	12.6	75.5	31
C++ (unordered_map)	6.0	5.6	165
C++ (hat-trie)	3.5	3.6	166
Haskell (HashMap)	11.5	81.3	60
Haskell (Trie)	22.3	157.0	61
C (dynamic programming)	0.3	162.7	225
Racket (v.6.1.1)	16.2	277.7	73 (with tests)
Java	10.8	363.2	100

Name		Name	Last commit message	Last commit date
Latest commit History 76 Commits
benchmarks		benchmarks
bin		bin
c_dp		c_dp
cxx-common		cxx-common
cxx_hat		cxx_hat
cxx_umap		cxx_umap
data		data
haskell-trie		haskell-trie
haskell		haskell
hat-trie @ 9e8a566		hat-trie @ 9e8a566
java		java
python2		python2
racket		racket
util		util
.gitignore		.gitignore
.gitmodules		.gitmodules
Makefile		Makefile
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Languages

Building

Performance

About

Releases

Packages

Contributors 4

Languages

jmoy/norvig-spell

Folders and files

Latest commit

History

Repository files navigation

Languages

Building

Performance

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Languages

Packages