Be notified of new releases
Create your free GitHub account today to subscribe to this repository for new releases and build software alongside 50 million developers.Sign up
tar xvzf hunspell-1.7.0.tar.gz cd hunspell-1.7.0 autoreconf -vfi ./configure make sudo make install hunspell -d en_US -l file.txt webpage.html document.odt
LibreOffice language technology – News & Best practices by László Németh, Tirana, 2018
New features and bug fixes by László Németh, supported by FSF.hu Foundation:
No annoying suggestion times any more, especially in languages with
compound word handling and complex morphology. By adding balanced
multi-level time limits, now the guaranteed suggestion time is there
within half a second, not seconds (nor dozen of seconds or more
in extreme cases) for longer misspellings, too.
add SPELLML support for run-time dictionary extension with optional
affixation of user words. See new "Grammar By" feature of
language-specific user dictionaries of LibreOffice 6.0:
Screencast with English example: https://www.youtube.com/watch?v=EsS3gaBTfOo
Screencast with German example: https://www.youtube.com/watch?v=aYVFDqCUb6I
Improved, highly customizable suggestions on level of dictionary words:
Pronunciations and typical misspellings defined by optional "ph:" fields of
the dictionary words are used not only in n-gram suggestions, but as
elements of the REP replacement list getting the highest priority in normal
suggestions, also giving the best suggestions for short words, too.
More information: see "ph:" in man 5 hunspell.
Handling multiple word suggestions is much more easier. Like in a
traditional spelling dictionary, for example, to get the correct suggestion
"a lot" for the typical misspelling "alot" at the first place, now it's
enough to put the following line to the dic(tionary) file:
Limit compound overgeneration by dictionary based word pairs:
Now it's possible to filter bad compound words by listing
the correct word pairs with space in the dictionary, as in a traditional
no n-gram and compound word suggestions, if "good" suggestion
exists, ie. uppercase, REP, ph: or dictionary word pair suggestions
word pairs are always suggested, if they exist in the dic file
word pairs have top priority in suggestions, and
these are the only suggestions if there is no other good suggestion.
also dictionary word pairs separated by dash instead of space
are handled specially in two-word suggestion (depending from the
limit bad suggestions by improved n-gram suggestion rules:
don't suggest capitalized dictionary words for lower
case misspellings in n-gram suggestions, except
- PHONE usage, or
- in the case of German, where not only proper
nouns are capitalized, or
- the capitalized word has special pronunciation
and don't suggest if the difference of lengths of misspellings and
suggestions is 5 or more characters.
Extend dotless i and dotted I rules to Crimean Tatar language
Allow dotted I in dictionary, and disable bad capitalization of i.
BREAK: extended recursive word breaking algorithm to handle words or
words with suffixes when they already contain word break characters,
for example, "e-mail" is a dictionary word with a word break character, and
it wasn't accepted before in compounds in some languages.
FORBIDDENWORD precedes BREAK: Now it's possible to forbid compound
forms recognized by BREAK word breaking by adding the bad compounds to
the dictionary with FORBIDDENWORD flags.
lower limit for "doubletwochars" suggestion algorithm:
one of the typical misspellings recognized by Hunspell suggestion
mechanism is the syllable duplication. Along the old pattern
ABABA -> ABA, for example nutrITITIon -> nutrITIon, now also the
simpler ABAB -> AB pattern is recognized in non-starting position,
for example, regretTETEd -> regretTEd.
lower limit for longswapchar and movechar: recognized only max.
4-character distances to avoid slow and bad suggestions.
fix compound handling for new Hungarian orthography reform
Allow suggestion search for prefix + two suffixes:
Remove artificial performance limit to get correct
suggestions for relatively simple misspellings in
Hungarian, etc., when the word form contains prefix
and both derivative and inflectional suffixes, too:
lefikszálása -> lefixálása
Improvements for command-line Hunspell:
Remove false alarms during checking OpenDocument (ODF)
documents by ignoring
creates a lot of
<text:span>elements also within words
during text reediting, resulted often huge amount of broken
words before this fix.)
List filenames during filtering multiple files in command-line:
$ hunspell -l *.odt a.odt: mispelling b.odt: egzample $ hunspell -l -G *.odt a.odt: good b.odt: words
- Dictionary search by option -D doesn't wait for the standard input
(fixed by Siva Mahadevan)
makealias dictionary compression: add option --minimize-diff
to reuse free positions of alias lists to create minimal and
readable diffs for alias compressed dictionaries stored in
revision control systems, as dictionaries of LibreOffice.
Brazilian-Portuguese translation by Rafael Fontenelle
Catalan translation by robert dot buj at gmail
Minor bug fixes by several contributors, see git log
- Library changes:
- Performance improvements in suggest()
- Fixes regressions for Hungarian related to compounding.
- Fixes regressions for Korean related to ICONV.
- Command line tool:
- Added Tajik translation
- Fix regarding serching of OOo dicts installed in user folder
- Fix microsoft-cp1251 to cp1251. Dicts should not use the first.
- Changes in the library:
- Performance improvement in
ngsuggest(), suggestions should be faster.
MAXWORDLENto 100 as in 1.3.3 for performance reasons.
MAXWORDLENcan be set during build time with
- Fix crash when word with 102 consecutive X is spelled.
- Performance improvement in
- Changes in the command line tool:
-Dshows all loaded dictionares insted of only the first.
-Dproperly lists all available dictionaries on Windows.
- Lot of stability fixes
- Fixed compilation errors on various systems (Windows, FreeBSD)
- Small performance improvement compared to 1.4.0
- Added new API with C++ types (string, vector), yet full API backward compatibility with 1.4 is kept