Skip to content
hspell fork -- free Hebrew spellchecker and morphology engine.
C Shell Perl JavaScript PHP VimL Standard ML
Find file
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Failed to load latest commit information.


This is version 0.2 of Hspell, the free Hebrew spellchecker and morphology

It is a fully working Hebrew spellchecker, not a toy release. On typical
documents it should recognize the majority of correct words. However, users
of this release must take into account that it still will not recognize *all*
the correct words; The dictionary is still admittedly not complete, and this
situation will be improved in the next releases. On the other hand, barring
bugs hspell should not recognize incorrect words - extreme attention has been
given to the correctness and consistency of the dictionary.

You can get Hspell from:

Hspell was written by Nadav Har'El and Dan Kenigsberg.

People who wish to integrate Hspell's technology into their or others' GPL
software (e.g., word processors, editors), are encouraged to do so, and are
welcome to contact us for help. People who wish to help us with enlarging
the word lists, are also encouraged to contact us and will be appreciated
(see below for instructions on how you can help).

The rest of this README file explains Hspell's spelling standard (niqqud-less),
a bit about the technology behind Hspell, how to use the "hspell" program
(but see the manual page for more current information), and lists a few future

About Hspell's spelling standard

Hspell was designed to be 100% and strictly compliant with the official
niqqud-less spelling rules ("Ha-ktiv Khasar Ha-niqqud", colloquially known as
"Ktiv Male") published by the Academy of the Hebrew Language. This is both an
advantage and a disadvantage, depending on your viewpoint. It's an advantage
because it encourages a *correct* and consistent spelling style throughout
your writing. It is a disadvantage, because a few of the Academia's official
spelling decisions are relatively unknown to the general public.

Users of Hspell (and all Hebrew writers, for that matter) are encouraged to 
read the Academia's official niqqud-less spelling rules (which are printed at
the end of most modern Hebrew dictionaries), and to refer to Hebrew
dictionaries which use the niqqud-less spelling (such as Millon Ha-hove or
Rav Milim).

Future releases might include an option for alternative spelling standards.

The technology behind Hspell

The "hspell" program itself is a simple Perl script, but the real "brains"
behind it are the word lists (dictionary) provided by the Hspell project.

In order for it to be completely free of other people's copyright restrictions,
the Hspell project is a clean-room implementation, not based on other
companies' word lists, on other companies' spell checkers, or on copying of
printed dictionaries. The word list is also not based on automatic scanning
of available Hebrew documents (such as online newspapers), because there is
no way to guarantee that such a list will be correct (and not contain
misspellings, useless proper names, slang, and so on), complete (certain
inflections might not appear in the chosen samples) or consistent (especially
when it comes to niqqud-less spelling rules).

Instead, our idea was to write programs which know how to correctly inflect
Hebrew nouns and conjugate Hebrew verbs. The input to these programs is a
list of noun stems and verb roots, plus hints needed for the correct
inflection when these cannot be figured out automatically. These input files
are obviously an important part of the Hspell project. The "word list
generators" (written in Perl, and are also part of the Hspell project) then
create the complete word-list for use by the spellchecking program, hspell.
The generated lists are useful for much more than spellchecking, by the
way - see more on that below ("the future").

Although we wrote all of Hspell's code ourselves, we are truly indebted to
the old-style "open source" pioneers - people who wrote books instead of
hiding their knowledge in proprietary software. For the correct noun
inflections, Dr. Shaul Barkali's "The Complete Noun Book" has been a great
help. Prof. Uzzi Ornan's booklet "Verb Conjugation in Flow Charts" has been
instrumental in the implementation of verb conjugation, and Barkali's
"The Complete Verb Book" was used too.
During our work we have extensively used a number of Hebrew dictionaries,
including Even Shoshan, Millon Ha-hove and Rav-Milim, to ensure the correctness
of certain words. Various Hebrew newspapers and books, both printed and online,
were used for inspiration and for finding words we still do not recognize.
We wish to thank Cilla Tuviana and Dr. Zvi Har'El for their assistance with
some grammatical questions.

Using hspell

After unpacking the distribution and running "make" and "make install",
the hspell executable (a Perl script, actually) is installed (by default)
in /usr/local/bin, and the dictionary files are in /usr/local/share/hspell.

The "hspell" program can be used on any sort of text file containing Hebrew
and potentially non-Hebrew characters which it ignores. For example, it
works well on Hebrew text files, TeX/LaTeX files, and HTML. Running

	hspell filename

Will check the spelling in filename and will output the list of incorrect
words (just like the old-fashioned UNIX "spell" program did). If run without
a file parameter, hspell reads from its standard input.

In the current release, hspell expects ISO-8859-8-encoded files. If files
using a different encoding (e.g., UTF8) are to be checked, they must be
converted first to ISO-8859-8 (e.g., see iconv(1), recode(1)).

If the "-c" option is given, hspell will suggest corrections for misspelled
words, whenever it can find such corrections. The correction mechanism in this
release is especially good at finding corrections for incorrect niqqud-less
spellings, with missing or extra 'immot-qri'a.
The "-v" (verbose) option will explain for each correct word why it was
recognized (show the basic noun, verb, etc., that this inflection relates to).

Because hspell's output (naturally) is "logical-order", it is normally
useful to pipe it to bidiv or rev before viewing. For example

	hspell -c filename | bidiv | less

Another convenient alternative is to run hspell on a BiDi-enabled terminal.

How *you* can help

As mentioned above, hspell does not cover yet all modern Hebrew. Few types of
nouns are not inflected correctly, and therefore we cannot add them to the
dictionary. Few extremely irregular verbs do not fit in, too. However, most
hebrew words are inflected correctly, and all we need is to collect them all.

This is where you enter the picture. If you stumble upon a commonly used word
in modern Israeli Hebrew, you are welcome to add its stem to the appropriate
dat file. Notice that in some cases you should add some flags to hint how the
inflection should be done.
Run wolig/woo to inflect it (or simply "make"), and make sure the output is
correct. Since "open source" should not mean "low quality", you should examine
the outcome carefully to make sure the spelling is perfect - look each and
every word in a good dictionary and/or consult grammer books.

If you send us lists of tested nouns or verbs, and even proper names, you would
do great service to this GPLed project. Be sure not to copy words out of a
dictionary or another copyrighted word lists, but rather use you own knowledge
of Hebrew.

Also, we are very keen to know if you find a spelling error that creeped into
hspell's word lists.

The future

See the TODO file for a more detailed list of things that should be done
to improve the Hspell project. Here are just 3 ideas that will be relatively
easy to do and will greatly enhance the Hspell project's usefulness.

* Completeness and correctness: the word list generated by this release, over
  120,000 inflected words, isn't bad in the sense that in typical documents a
  majority of the correct words will be recognized as such. On the other hand,
  typical documents still contain a non-negligible number of correct words
  (especially nouns) that will not be recognized, and this should be fixed
  by adding more noun bases, verb roots and other miscellaneous words to the
  dictionary. The TODO file mentions a few other special cases that Hspell
  does not yet recognize as correct words, and should be fixed.
  Also, in some cases hspell accepts words that it were better if it didn't,
  for example it currently allows a "he" prefix even before the imperative of

* Morphology: The Hspell project's data files and generated word lists
  provide much more than a word list for spell checking: it also allows
  finding for a given inflected word its base for (e.g., basic noun, the
  basic verb and the root, etc.), and vice-versa - given a base word one
  can find its inflections. Such a "morphology" engine can have many uses,
  including, for example, Hebrew search engines, Hebrew dictionaries (with
  definitions and/or translation to another language), and machine translation
  The "-v" option to hspell is a small demonstration of what hspell's
  morphology engine is capable of. Even the spellchecker itself could benefit
  from more use of those techniques: e.g., it could explain how the corrections
  it suggests were derived.

* Compression: currently, "make install" installs roughly 1 MB of data
  files. We have designed a compression algorithm which takes this down
  to only 50 KB (!!), which is used if "make install_compressed" is done
  instead of "make install". However, our compression algorithm currently
  doesn't keep the base-word information that allows hspell's "-v" option
  to work. This should be fixed in a later release, because a 20-fold
  (or even "just" 10-fold) reduction in dictionary size will be very welcome
  by other projects (e.g., Linux distributions) who will want to distribute

Hspell's license

Hspell is free software, released under the GNU General Public License (GPL).
Note that not only the programs in the distribution, but also the dictionary
files and the generated word lists, are licensed under the GPL.
There is no warranty of any kind for the contents of this distribution.

See the LICENSE file for more information and the exact license terms.

Contacting the authors

Nadav Har'El:    nyh    @
Dan Kenigsberg:  danken @
Something went wrong with that request. Please try again.