Permalink
Fetching contributors…
Cannot retrieve contributors at this time
120 lines (92 sloc) 5.42 KB
The README file describes what Hspell is and what it includes. This file
explains how to build and install it.
===========================================================================
Hspell is normally installed and used in one of two ways:
1. Native Hspell: Hspell can be used as a command-line tool "hspell", and/or
using a library libhspell, together with a dictionary in Hspell's own
format.
2. Derivative dictionaries: Hspell's dictionary data is compiled into a
format used by some common multi-lingual spell-checker, such as aspell,
myspell or hunspell.
One benefit of the native Hspell method is much better peformance: When
Hspell's native spell-checker is compared to hunspell, for example, it is
10 times smaller on disk, 10 times faster to start, uses half the memory,
and spell-checks hundreds of times (!) faster. Hspell's code also has
additional features that no multi-lingual spell-checker currently supports,
especially morphological analysis.
The benefit of generating dictionaries for one of the existing multi-lingual
spell-checkers like aspell or hunspell are obvious: no additional code needs
to be installed, so it will work on any system where such multi-lingual spell
checker works. Even more importantly: Large applications, such as OpenOffice,
Firefox and even Google's Gmail, which already use aspell or hunspell to
provide spell-checking for many languages, gain Hebrew spell-checking without
any extra effort.
============================================ Native Hspell ===============
Installing Hspell on a Unix-compatible system (Linux, Unix, Mac OS X) is
usually as simple as running
./configure
make
make install
Note that before running "make install", if you want to run the hspell
executable from the build directory, you must tell it to expect the dictionary
files in the current directory, rather than in their final location. Do this
by running "hspell -Dhebrew.wgz".
By default, Hspell is built for installation in the /usr/local tree. If you
want to install it somewhere else, use "./configure --prefix=/some/dir".
The --prefix option is just one of configure's usual options that give
you more control on the way that Hspell is compiled - run "./configure -h"
to see the entire list of these options.
In addition to configure's usual options, Hspell's configure add a few
options whose names start with "--enable-", that enable optional features
in Hspell. These are the options you might want to use:
--enable-fatverb
Allow "objective kinuyim" on all forms of verbs. Because this adds
as many as 130,000 correct but very rarely-used (in modern texts)
inflections, a compile-time option is present for enabling or
disabling these forms. The default in this version is not to enable
them.
--enable-linginfo
Include a full morphological analyzer in "hspell -l", explaining how
each correct word could be derived. This slows down the build and makes
the installation about 4 times larger, but doesn't slow hspell if "-l"
isn't used, so it is recommended.
These optional features are not turned on by default because they present
a feature/performance tradeoff (you get more features but slower build,
larger installation, and/or slower executable), or a feature/feature tradeoff
(when you add more rare word forms, you're allowing more spelling mistakes
to masquerade as real words).
============================================ Derivative Dictionaries =====
After you run "configure" as explained above, the Makefile has additional
targets for creating dictionaries for several common multi-lingual
spell-checkers and applications:
Except where otherwise noted, these dictionary faithfully reproduce all of
Hebrew's morphological richness as understood by the native Hspell spell-
checker. This includes correctly allowing the various prefixes used in Hebrew,
and not allowing them when they are not appropriate (e.g., the definite article
on a verb).
"make hunspell" -
Creates the files "he.dic" and "he.aff".
These dictionaries are uncompressed. It is recommended that they be
compressed with "hzip". While hzip compression is not as good as
aspell's prezip-bin (or our own wzip), this is the compression format
which hunspell understand, and still can compress he.dic to a tenth
of its original size.
If you package these files, please also package misc/Copyright,
so that users know they were generated by Hspell.
"make aspell" -
Creates the files "he_affix.dat" and "he.wl".
Additionally, one can do "make he.rws" to create he.rws from he.wl.
(rws is a dump of aspell's in-memory hash table, which allows aspell
to mmap(2) the dictionary almost instantly, instead of reading it).
Unfortunately, there is one case where the aspell dictionary cannot
correctly reproduce Hspell (because it lacks hunspell's NEEDAFFIX
extension): In Hebrew, the infinitive verb may be preceded by the
prefixes lamed, bet, kaf or mem, but often must not come without any
prefix. We have not yet found a way to express this in the aspell,
so words like éùåï are incorrectly accepted as correct.
If you package these files, please also package misc/Copyright,
so that users know they were generate by Hspell.
============================================ Additional Targets =====
Finally, there is a target, "make hif", for creating a full inflection list
which might be useful for other future applications besides spell-checking.
For more information, see README-hif.