Permalink
1296 lines (1051 sloc) 50 KB

LanguageTool Change Log

4.3-SNAPSHOT (2018-09-26)

English

  • added and improved rules

Galician

  • added and improved rules

German

  • added and improved rules

German (simple)

  • added and improved rules

Portuguese

  • added and improved rules
  • improvements to disambiguation, and segmentation
  • dictionary update

Russian

  • added and improved rules

General

HTTP API / LT server

  • The server can now accept JSON as the data parameter that describes markup. For example:
    {"annotation":[
      {"text": "A "},
      {"markup": "<b>"},
      {"text": "test"},
      {"markup": "</b>"}
    ]}
    
    With this input, LT will ignore the markup parts and run the check only on the text parts. The error offset positions will still refer to the original input including the markup, so that suggestions can easily be applied. You can optionally use interpretAs to have markup interpreted as whitespace, like this:
    {"markup": "<p>", "interpretAs": "\n\n"}
    
    Note that HTML entities (including &nbsp;) still need to be converted to Unicode characters before feeding them into LT.
    (Issue: https://github.com/languagetool-org/languagetool/issues/757)
  • The blockedReferrers setting now also considers the Origin header

4.2 (2018-06-26)

Breton

  • made many messages shorter
  • updated FSA spelling dictionary from An Drouizig Breton Spellchecker 0.15

Catalan

  • added and improved rules
  • rules and updated dictionary for new diacritics rules (IEC 2017)

Dutch

  • added and improved rules

English

Esperanto

  • added and improved rules

German

  • added and improved rules
  • updated jwordsplitter to 4.4 to prevent excessively long processing times for artificially long compounds
  • prohibit.txt: lines ending with ".*" will prohibit all words starting with the previous string

German (simple)

  • added and improved rules

Greek

  • added rules

Portuguese

  • added and improved rules

Russian

  • added and improved grammar and punctuation rules
  • upgraded the tagging and synthesizer dictionaries from AOT.ru rev.269 (extend tags, add missing tags)
  • spelling dictionary update

Spanish

  • added and improved a few rules

Ukrainian

  • dictionary update (more than 15k of new words)
  • added and improved rules
  • some improvements to tokenization, tagging and disambiguation

HTTP API / LT server

  • The JSON contains a new section detectedLanguage (under language) that contains information about the automatically detected language. This way clients can suggest switching to that language, e.g. in cases where the user had selected the wrong language.
  • New optional configuration setting blockedReferrers: a comma-separated list of HTTP referrers that are blocked and will not be served
  • BETA: New optional configuration settings dbDriver, dbUrl, dbUsername, dbPassword to allow user-specific dictionaries

Java API

  • The parameters of the *SpellerRule classes (e.g. MorfologikRussianSpellerRule) have changed
  • LanguageIdentifier will now only consider the first 1000 characters when identifying the language of a text. This improves performance for long texts.

4.1 (2018-03-27)

Catalan

  • added and improved rules

Chinese

  • added some rules

Dutch

  • added and improved rules
  • added new Java rule NL_PREFERRED_WORD_RULE that suggests preferred words (e.g., 'fiets' for 'rijwiel')

English

  • all-uppercase words are now also spellchecked
  • added and improved rules
  • added remaining collocation rules (~130) contributed by Nicholas Walker (Bokomaru)

Esperanto

  • words written with x-sistemo now get proper POS tag so grammar mistakes can now be found in: ambaux virino (->ambaux virinoj), mi farigxis maljunan (-> mi farigxis maljuna), etc.
  • added and improved rules
  • added many <url> to rules

French

  • improved suggestion for spelling mistakes (#912)

Galician

  • added a couple of rules

German

  • added and improved rules
  • New rule that checks coherent use of Du/du, Dich/dich etc. Assumes that the first use has 'correct' capitalization and suggests the same capitalization for subsequent uses.
  • New line extension -* for ignore.txt: entries ending with -* are ignored only if they are part of a hyphenated compound (e.g, Fair-Trade-* allows Fair-Trade-Kakao)
  • Added a new rule that tries to find compounds that are probably not correct, like Lehrzeile instead of Leerzeile, requires ngram data (rule id DE_PROHIBITED_COMPOUNDS)

German (simple)

  • added and improved rules

Portuguese

  • added and improved rules

Russian

  • sentence segmentation improvements
  • added and improved rules
  • upgraded the tagging and synthesizer dictionaries with extended POS tags from AOT.ru rev.269

Spanish

  • update to the part-of-speech dictionary

Ukrainian

  • dictionary update (~5K new lemmas)
  • compound word tagging improvements
  • many new disambiguation rules
  • several new barbarism and grammar rules

HTTP API / LT server

  • The server now returns HTTP error code 500 in case of a timeout (it used to return 503)

Java API

4.0 (2017-12-29)

Catalan

  • added and improved rules
  • updated and renamed dictionary: ca-ES.dict (external dependency: catalan-pos-dict 1.6)
  • added new dictionary for Valencian including most words from Diccionari Normatiu Valencià (AVL): ca-ES-valencia.dict (external dependency: catalan-pos-dict 1.6)

Dutch

  • added and improved rules

English

  • added and improved rules
  • removed the category MISC and moved the rules to more specific categories
  • added WordCoherencyRule, to detect cases where two different variants of a word are used in the same text (e.g. archaeology and archeology)
  • added approximately 70 collocation rules contributed by Nicholas Walker (Bokomaru)
  • added support for locale-specific spelling suggestions (locale-specific spelling_en-XY.txt files)
  • updated en_GB spellchecker dictionary from https://github.com/marcoagpinto/aoo-mozilla-en-dict
  • updated en_US spellchecker dictionary from http://wordlist.aspell.net (Version 2017.08.24)
  • updated en_CA spellchecker dictionary from http://wordlist.aspell.net (Version 2017.08.24)

French

  • LT now offers suggestions for spelling errors

Galician

  • added and improved rules, including:
    • grammar: agreement rules added (only number and gender agreement)
    • common normative errors: includes Castilianisms, Lusitanianisms, Hipergalicisms, archaisms and Anglicisms correction
    • style: barbarism, redundant expressions, and wordy expressions detection added
    • typography: spacing and number formatting improvements; chemical formulas; degree signs; dashes; punctuation; international system standards; and mathematical symbol formatting
  • development, punctuation and repetition rules categories added
  • multiword disambiguation added
  • disambiguation improvements
  • new word tokenizer
  • significant POS tagging and synthesizing improvements
  • spellchecking exceptions for:
    • abbreviations;
    • variables in formulas, units, and related statistical vocabulary;
    • common Latin, English and French expressions;
    • species scientific names;
    • famous personalities
  • updated Hunspell dictionaries to:
    • [gl-ES] Version 12.10 "Xoán Manuel Pintos"

German

  • added and improved rules
  • New rule that checks coherency of hyphen usage in compounds, e.g. it complains when "Ärzteverband" and "Ärzte-Verband" are both used in the same text. While both spellings are correct, it's probably a good idea to stick to one spelling.
  • improved POS tagging of hyphenated compounds (e.g., "CO2-arm" is recognized as a variant of "arm")

Polish

  • added rules
  • disambiguation improvements

Portuguese

  • added and improved rules
  • LibreOffice category rules moved to other categories
  • disambiguation improvements
  • updated Hunspell dictionaries to:
    • [pt-PT pos-AO] Dicionários Portugueses Complementares 2.2
    • [pt-AO pre-AO] Dicionários Portugueses Complementares 2.2
    • [pt-MZ pre-AO] Dicionários Natura 14.08.2017

Russian

  • added and improved grammar and punctuation rules
  • spelling dictionary update
  • new Russian-English false friends added (thanks to ZakShaker)

Serbian

  • initial support for Serbian by Zoltán Csala

Ukrainian

  • big dictionary update (~10K new lemmas)
  • improvements in tokenization
  • compound word tagging improvements
  • more than 350 new disambiguation rules
  • several new barbarism and grammar rules

General

  • Now runs with Java 9 (compilation with Maven still has issues with Java9)
  • The spell checker tries harder to find suggestion for misspellings that have a Levenshtein distance of larger than 2. The maximum Levenshtein distance is now 3. This way you now get a suggestion for e.g. algortherm (algorithm) or theromator (thermometer). In the worst case (every single word of a text misspelled), this has a performance penalty of about 30%.
  • Better support for Unicode codepoints greater than 0xFFFF

word2vec

GUI (stand-alone version)

Java API

  • A RuleMatch can now have a URL, too. The URL usually points to a page that describes the error or grammar rule in more detail. Before, only the Rule could have a URL. A RuleMatch URL will overwrite the Rule URL in the JSON output.
  • A RuleMatch now also has information about the sentence the error occurred in (it used to have only position information and the caller was expected to find the error context and/or sentence position in the original text).

HTTP API / LT server

  • change in configuration: requestLimit and requestLimitPeriodInSeconds now both need to be set for the limit to work
  • new property key timeoutRequestLimit: similar to requestLimit, but this one limits not all requests but blocks once this many timeouts have been caused by the IP in the time span set by requestLimitPeriodInSeconds
  • new property key requestLimitInBytes: similar to requestLimit, but this one limits the aggregated size of requests caused by an IP in the time span set by requestLimitPeriodInSeconds
  • New property key maxErrorsPerWordRate: set the maximum allowed errors per word, e.g. 0.3 if the maximum is about one error per three words. More errors will stop the check with an exception. This is useful so no processing time gets wasted for texts with a huge amount of errors that are only caused by the wrong language being selected (leading to most words being detected as spelling errors).
  • The JSON output now contains a sentence property with the text of the sentence the error occurred in.

3.9 (2017-09-26)

Breton

  • small rule improvements

Catalan

  • added and improved rules

Dutch

  • added and improved rules

English

  • added and improved rules

Esperanto

  • added and improved rules

French

  • added and improved rules
  • upgraded dictionaries to Dicollecte-6.1

German

  • added and improved rules
  • spell checker suggestions have been improved a lot by considering more words, especially compounds (de-DE only so far, not yet active for de-AT and de-CH) (https://github.com/languagetool-org/languagetool/issues/725)
  • added special dictionary extension files spelling-de-AT.txt and spelling-de-CH.txt for de-AT and de-CH that will be considered in addition to spelling.txt
  • updates according to "Amtliches Regelwerk der deutschen Rechtschreibung aktualisiert", 6/2017 (http://www.rechtschreibrat.com/DOX/rfdr_PM_2017-06-29_Aktualisierung_Regelwerk.pdf)
  • added POS tagging of alternative imperative forms such as "Geh" or "küss" (in addition to "Gehe"/"küsse")
  • introduced two new line endings ('?' and '$') for the data-file compounds.txt; these endings indicate that the mid-word parts of the compound need to be lower-cased (e.g., 'Geräte Wahl' -> 'Gerätewahl')

Portuguese

  • added and improved grammar and style rules, including:
    • grammar: general agreement rules, pronominal collocations, paronyms and homophones detection improvements; time agreement rules added
    • punctuation: greetings and farewell punctuation
    • style: puffery, weasel words, weak expressions, and biased opinion words detection added (disabled by default)
    • syntax: new category; fragment detection improvements
    • typography: spacing, number, and mathematical symbol formatting improvements
  • disambiguation improvements
  • false friends added
    • Portuguese to Galician (16 new pairs)
  • significant POS tagging and synthesizing improvements
  • spellchecking exceptions for abbreviations, variables in formulas, units, and related statistical vocabulary
  • updated Hunspell dictionaries to:
    • [pt-PT pos-AO] Dicionários Portugueses Complementares 2.0
    • [pt-AO pre-AO] Dicionários Portugueses Complementares 2.0
    • [pt-MZ pre-AO] Dicionários Natura 15.06.2017

Russian

  • spelling dictionary update
  • added and improved some rules

Spanish

  • added and improved some rules

Ukrainian

  • significant dictionary update:
    • more than 60K of new words
    • some inflection adjustments
  • improved dynamic tagging for compound words
  • many new rules (barbarism, grammar, and spelling)
  • inflection agreement rule updates

Java API

  • AnnotatedText (built via AnnotatedTextBuilder) can now contain document-level meta data. This might be used by rules in the future.

3.8 (2017-06-27)

Catalan

  • added and improved rules
  • updated dictionary and rules for official names of Valencian municipalities

Chinese

  • added one rule

Dutch

  • added many rules (by Ruud Baars)
  • spelling dictionary update

English

  • added and improved rules

German

  • added and improved rules
  • improved messages for old spelling variants, e.g. Kuß now suggests only Kuss and also has a message explaining the user that Kuß is an old spelling

Polish

  • added rules
  • added some common typos

Portuguese

  • added and improved grammar and style rules, including:
    • grammar: general agreement rules, contractions, pronominal collocations, compounding, and paronyms detection
    • style: wordy expressions detection added and significant redundant expressions detection improvements
    • punctuation: significant improvements
    • formal speech: archaims, cacophonies, childish language and slang detection added
    • typography: international system standards, number and mathematical symbol formatting
    • misspellings: foreign famous personalities common misspelings
    • AO90: identify words with changed spelling
  • disambiguation improvements
  • false friends support added
    • Portuguese to Catalan (26 new pairs)
    • Portuguese to Spanish (7 new pairs)
  • spell checking exceptions for common Latin, English, and French expressions, species scientific names, and famous personalities
  • updated Hunspell dictionaries to:
    • [pt-PT pos-AO] Dicionários Portugueses Complementares 1.4
    • [pt-BR] VERO version 2.1.4

Russian

  • added and improved rules

Slovak

  • major rule updates by Matúš Matula

Ukrainian

  • Significant dictionary update:
    • thousands of new words
    • some inflection adjustments
  • Improved dynamic tagging for compound words
  • Many new rules (barbarism, grammar, and spelling)
  • New noun-verb agreement rule

HTTP API

  • The deprecated AfterTheDeadline mode has been removed
  • The apiVersion property of the JSON output is now a number instead of a string (issue #712)

Java API

  • Some deprecated methods and classes have been removed.

Internal

  • spelling.txt allows multi-word entries: the words/tokens (separated by " ") of one line are converted to a DisambiguationPatternRule in which each word is a case-sensitive and non-inflected PatternToken (result: the entire multi-word entry is ignored by the spell checker)

LT server

  • When running a LT server, the enabled/disabled rules loaded from a configuration file at the startup time will be the new default rules. Previously these rules were "forgotten" when a server query used the parameters for enabling and disabling rules. Now the rules from the query will be added to the rules from the configuration file.

3.7 (2017-03-27)

Breton

  • small rule improvements

Catalan

  • added and improved rules
  • updated dictionary

English

  • added and improved rules

French

  • improved rules
  • upgraded dictionaries to Dicollecte-6.0.2

German

Greek

  • added and improved rules

Italian

  • added one rule

Lithuanian, Malayalam, and Icelandic

  • Lithuanian, Malayalam, and Icelandic are not part of this release anymore. They still exist in the git repository and can be re-activated as soon as a new maintainer takes care of them.

Portuguese

  • added and improved grammar and style rules, including:
    • grammar: general agreement rules, 'crase', pronomial colocations, impersonal verbs, fragment, and paronyms detection improvements
    • capitalization: AO90 and AO45 rules
    • style: repetitions and barbarism detection
    • typography: number formating, chemical formulas, degrees signs, dash signs, and punctuation
    • semantics: wrong words in the context (22 confusion pairs), url validator and date checker improvements
    • registered brands category added
    • translation errors category added
  • false friends support added:
    • Portuguese to Spanish (186 new pairs)
    • Portuguese to English (156 new pairs)
    • Portuguese to French (78 new pairs)
    • Portuguese to German (16 new pairs)
    • Portuguese to Galician (9 new pairs)
  • spellchecking suggestions activated
  • updated Hunspell dictionary to:
    • [pt-PT pos-AO] Dicionários Portugueses Complementares 1.2
    • [pt-AO pre-AO] Dicionários Portugueses Complementares 1.2
    • [pt-MZ pre-AO] Dicionários Natura 18.02.2017

Russian

  • added and improved rules
  • updated tagger dictionary from AOT.ru rev.269 with extended POS tags

Ukrainian

  • Significant dictionary update:
    • many new words
    • some inflection adjustments
  • Many new rules (barbarism, punctuations, and grammar)
  • Improved dynamic tagging for compound words

LibreOffice / Apache OpenOffice Integration

  • Options dialog now uses system theme instead of Nimbus.

Command-line

  • Added a --languageModel option to the embedded server, thanks to Michał Janik (issue #404)

HTTP API

  • The 'AfterTheDeadline' mode has been deprecated and will be removed in the next version, unless users complain and present a valid use case.
  • The old XML-based API has been removed. The migration to the new JSON-based API is documented at https://languagetool.org/http-api/migration.php
  • Speed up with a cache for cases where the same sentences get checked again (e.g. due to a correction in a text that doesn't affect all sentences but causes the whole text to be re-checked)

Java API

  • Some deprecated methods have been removed.
  • A new class ResultCache has been added to speed up the LT server
  • EnglishRule, GermanRule, CatalanRule, and FrenchRuleare now deprecated. These are empty abstract classes that never had any real use. Rules that extend these classes will directly extend Rule or TextLevelRule in a future release.
  • All rules that work on the text level instead of the sentence level (e.g. word coherency) now extend TextLevelRule instead of Rule

Internal

  • OpenNLP has been updated from 1.6.0 to 1.7.2 (only used for English)

3.6 (2016-12-28)

Breton

  • small rule improvements

Catalan

  • added and improved rules

English

French

  • added and improved rules
  • upgraded dictionaries to Dicollecte-5.7

German

Lithuanian

  • The Lithuanian class has been deprecated. Lithuanian in LT hasn't been maintained for years and there's no new maintainer in sight. It has also very low usage on languagetool.org and very few error detection rules anyway, so we'll remove its support from LT in the next release.

Malayalam

  • The Malayalam class has been deprecated. Malayalam in LT hasn't been maintained for years and there's no new maintainer in sight. It has also very low usage on languagetool.org and very few error detection rules anyway, so we'll remove its support from LT in the next release.

Portuguese

  • general agreement rules added
    • number and gender words agreement
    • general subject-verb agreement
    • accentuated form confusion, 'dequeísmos' and many more
  • new compound form detection (pt-PT recognizes all compound verbal derivations)
  • duplications, redundancies, typography and semantics categories added
  • style category rules added
    • new word repetitions rules, fragment detection, verbosity checks, passive voice and many other
  • new sentence disambiguator and new word tokenizer
  • sentence segmentation improvements
  • former rules and messages revision, improvement and classification
  • post-reform agreement support added and pre-reform components updated
    • European Portuguese specific rule group added
      • post-reform agreement by default
      • compound verbs, possessive pronouns, reflexive forms placement, gerund and more
    • pre-reform agreement locales support added
      • Angola, Cape Verde, East Timor, Guinea Bissau, Macau, Mozambique and São Tomé e Principe
    • base spelling dictionary and tagger update
    • variants dictionaries added and many part-of-speech fixes
  • Portuguese has been prepared to use ngram data, that means it has a confusion_sets.txt file where word pairs could be added. See http://wiki.languagetool.org/finding-errors-using-n-gram-data for more information but note that we cannot offer the required ngram data yet for Portuguese, as we rely on the Google ngram data and Portuguese isn't part of that.

Russian

  • added and improved many rules
  • added new rules with java filter
  • added new Java rule RussianWordCoherencyRule
  • added words suggested by users
  • improved disambiguation rules
  • updated tagger dictionary from AOT rev.268 with extended POS tags
  • improved SRX sentences segmentation
  • added removed.txt for words that need to be removed from the dictionary

Spanish

  • added and improved rules

Ukrainian

  • significant dictionary update
  • new adj/noun inflection rule
  • dynamic tagging improvements
  • disambiguation improvements
  • some improvements to existing rules
  • experimental noun/verb agreement rule

HTTP API

Java API

  • A new method for removing overlapping errors has been implemented. By default, it is enabled for the HTTP API and LibreOffice outputs, and disabled for the command-line output. If necessary, priorities for rules and categories can bet set in Language.getPriorityForId(String id). Default value is 0, positive integers have higher priority and negative integers have lower priority.

  • Language.getShortName() has been deprecated, use Language.getShortCode() instead

  • Language.getShortNameWithCountryAndVariant() has been deprecated, use Language.getShortCodeWithCountryAndVariant() instead

  • Languages.getLanguageForShortName() has been deprecated, use Languages.getLanguageForShortCode() instead

  • The following languages have been unmaintained for a long time. A warning has been shown for some time on languagetool.org and in the stand-alone GUI for these languages. This warning has now been extended to Java in the form of a deprecation, i.e. the constructors of the following languages have been deprecated. That does not mean they are going to be removed in the next version, but it's a warning that we cannot offer support for them or guarantee they will be included in the future:

    • Belarusian
    • Swedish
    • Icelandic
    • Tagalog
    • Asturian
    • Danish
    • Slovenian

    If you're interested in contributing to one of these languages, please post to our forum at http://forum.languagetool.org.

  • The uppercase sentence start rule (id UPPERCASE_SENTENCE_START) now ignores immunized tokens - this way users can add lowercase words to disambiguation.xml so the rule won't complain about these lowercase words at the beginning of a sentence.

Command-line

Internal

  • Apache commons-lang has been updated from 2.6 to commons-lang3 3.5
  • Updated lucene-gosen-ipadic to 6.2.1 (#376)

3.5 (2016-09-30)

Catalan

  • added and improved rules
  • added words suggested by users

English

Esperanto

  • added and improved rules

French

  • added and improved rules

German

  • added rules
  • fixed several false alarms

Polish

  • added and improved rules

Portuguese (European)

  • added and improved rules

Portuguese (Brazilian)

  • added rules

Russian

  • now possible checking the texts with the signs of stress
  • added and improved many new grammar and style rules
  • added words suggested by users
  • improved disambiguation rules
  • for review, test and improve rules, feedback in bugtracker thanks to Konstantin Ladutenko

Spanish

  • added and improved rules

Ukrainian

  • added ~6k new words
  • added many new grammar and styling rules
  • added many new barbarism replacement suggestions
  • improved dynamic word tagging

General

  • Bugfix: avoid repeating the same suggestion
  • Enhancement: ignore e-mail addresses

Java API

  • Rule.getCorrectExamples() now returns a list of CorrectExamples instead of a list of Strings.

GUI (stand-alone version)

  • speed up for long texts with many errors (#530)
  • add new menu item for showing/hiding the result area

Command-line

3.4 (2016-06-27)

Catalan

  • added and improved rules
  • added words suggested by users

English

French

German

  • added some rules
  • improved handling of hyphenated compound words

Greek

  • added some rules

Polish

  • added and improved rules
  • removed some false alarms

Portuguese

  • added and improved rules

Spanish

Russian

  • rebuilt spellchecker dictionary
  • added words suggested by users
  • added and improved rules

Ukrainian

  • big dictionary update (thousands of new words and many fixes)
  • compound tagger improvements
  • several new rules and many improvements to existing ones
  • new token inflection agreement rule (still work-in-progress so turned off by default)
  • new replacement suggestions for barbarisms

Java API

  • some formerly deprecated code has been removed
  • all rules now have a category ("Misc" if the rule doesn't specify a category)
  • a new module languagetool-http-client has been added with a class RemoteLanguageTool that you can use to query a remote LanguageTool server via HTTP or HTTPS
  • removed the public modifier from LanguageComboBox

Embedded HTTPS server

  • The existing HTTP/HTTPS API will be replaced by a new one that returns JSON. This version of LanguageTool supports both APIs. The new API is prefixed with /v2/. It is documented at https://languagetool.org/http-api/swagger-ui/#/default. Please do not use the old XML-based HTTP API anymore. Information about migrating from the old to the new API can be found at https://languagetool.org/http-api/migration.php
  • Changed behaviour for OutOfMemory situations: the server process now stops instead of being in an unstable state
  • Missing parameters (like text) now cause a 400 Bad Request response (it used to produce 500 Internal Server Error)
  • New parameter preferredVariants to specify which variant is preferred when the language is auto-detected: Example: language=auto&preferredVariants=en-GB,de-AT - if English text is detected, British English will be used, if German text is detected, German (Austria) will be used.
  • Code refactorings: methods have been removed without being deprecated first, e.g. in LanguageToolHttpHandler

Rule Syntax

  • groups of rules and categories are now required to have non-empty names to avoid user confusion

GUI (stand-alone version)

  • detect encoding of files with BOM header
  • add new menu to open recent files
  • add new configuration option to allow user to select the GUI language
  • preserve GUI state between program restarts

Command-line

  • detect encoding of files with BOM header when there is no encoding parameter

3.3 (2016-03-28)

Breton

  • small rule improvements

Catalan

  • added and improved rules
  • added words suggested by users
  • minor change in the format of the binary dictionary: POS tag and frequency data are no longer separated by a separator character.

Dutch

  • small rule improvements and URL updates, thanks to Koen Vervloesem

English

Esperanto

  • improved several rules

French

  • added and improved rules

German

Greek

  • new rule for checking correct spell of ordinal numerals
  • added new XML rules

Polish

  • added and improved a large number of rules, largely improved disambiguation
  • upgraded the tagging and synthesis dictionaries to Morfologik Polimorf 2.1
  • improved tokenization of number ranges (such as 1-1234 or 1--10)
  • added checks on date ranges

Portuguese

  • added and improved rules, improved categorization of rules

Russian

  • added and improved rules, improved categorization of rules
  • added words suggested by users

Spanish

  • added German false friends

Ukrainian

  • big dictionary update:
    • more than 202K lemmas
    • homonyms have been properly split
    • vocative case for inanimates has be added
    • list of barbarism has been updated
  • improved some rules
  • improved sentence tokenization
  • improved dynamic tagging for compounds
  • some improvements for disambiguation

Java API

  • some formerly deprecated code has been removed
  • added acceptPhrases(List<String> phrases) to SpellingCheckRule so you can avoid false alarms on names and technical terms that consist of more than one word.

Embedded HTTPS server

  • Speed up for input with short sentences
  • Added new parameters enabledCategories and disabledCategories that take a comma-separated list of categories to enable/disable. Fixes https://github.com/languagetool-org/languagetool/pull/326.
  • The output now contains a shortmsg attribute if available, which is a short version of the msg attribute.
  • The output now contains a categoryid attribute if available. It's supposed not to change in future versions (while category might change).

Command-line

  • new parameters --enablecategories and --disablecategories to activate/deactivate all rules in a category (https://github.com/languagetool-org/languagetool/issues/66)
  • Bugfix: for files >= 64,000 bytes, the position information (fromx and tox) could be wrong. Also, rules that work across paragraphs like the German word coherency rule wouldn't work. Both bugs have been fixed but with the side-effect that large files will now be loaded into memory completely. If you're using LanguageTool on large files (several MB) you might need to split these files now before you check them. If you need the old behavior, use the --line-by-line switch. https://github.com/languagetool-org/languagetool/issues/254

Wikipedia

Core code

  • Fixed a bug while sentence and paragraph end tags were removed during disambiguation.
  • Fixed a bug with a possible NullPointerException for tokens containing soft hyphens that might be disambiguated.

Morfologik binary dictionaries

  • Updated Morfologik library to version 2.1.0. The tools for building dictionaries (languagetool-tools) have been adapted to the new version. The format of the dictionaries has not changed, except for a minor change only in Catalan.

3.2 (2015-12-29)

  • LanguageTool requires Java 8 now

Belarusian

Catalan

  • fixed false alarms
  • added new rules
  • added words suggested by users

Danish

  • updated hunspell dictionary to Version 2.3 (2015-11-15):
    • Corrections made regarding new spelling of 2012
    • General cleanup
    • A lot of compound flags added
  • fixed bug where Hunspell flags wrongly was in the tagger-dictionary. For example:
    vintrenes+F+sub:bes:plu:utr:gen/115,70,85,976,941,947
    vinåndstermometrenes+F+sub:bes:plu:neu:gen/70,118,85,976
  • added new tags
  • updated and made adjustment for the new things introduced by the new spelling of 2012 and Hunspell-da 2.3

English

Esperanto

  • added/improved several rules

French

  • upgraded dictionaries to Dicollecte-5.5
  • added/improved several rules

German

  • added/improved a few rules
  • improved agreement rule to detect errors like Ich gebe dir ein kleine Kaninchen. where the determiner is indefinite but the adjective fits only for a definite determiner
  • added de/removed.txt so incorrect readings of the POS tagger can be avoided without rebuilding the binary dictionary

Italian

  • added an agreement rule

Portuguese

  • added/improved several rules

Russian

  • added/improved several rules
  • added words suggested by users to spellchecker dictionary

Ukrainian

  • big dictionary update: more than 10k new words, many fixes (the dictionary source is now available at https://github.com/arysin/dict_uk)
  • many new rules
  • improvements for euphony rules
  • improvements in dynamic compound tagger
  • new disambiguation rules

Rule Syntax

  • New rule syntax <regexp>...<regexp> as a simple alternative to <pattern><token>...</token></pattern>. Note that this is limited: E.g. it's not possible to address POS tags and the <suggestion> cannot change the case of the match. Available attributes: type with value smart (treats space in the regular expression as \s+ or a non-breaking space) or exact (smart is the default), mark to specify which part of the match gets underlined (everything by default, use 1 to only underline the first group etc.)
  • Non-breaking spaces (\u00A0) are now treated like regular spaces. Before, using a non-breaking space could cause a rule not to match.
  • <filter> can now also be used in disambiguation.xml

Embedded HTTPS server

  • Speed up for testing short sentences for de-DE, de-AT, and de-CH

Java API

  • GeneralCatalan has been removed, use Catalan instead
  • SuggestionExtractorTool and SuggestionExtractor have been removed
  • ConfusionProbabilityRule has been moved to package org.languagetool.rules.ngrams
  • ConfusionProbabilityRule.getWordTokenizer() is now called ConfusionProbabilityRule.getGoogleStyleWordTokenizer()
  • RuleAsXmlSerializer has been renamed to RuleMatchAsXmlSerializer
  • some formerly deprecated code has been removed
  • some code has been deprecated
  • StringTools.isWhitespace() now returns true for a token that is a non-breaking space or a narrow non-breaking space
  • RuleFilter is not an interface anymore but an abstract class
  • the LanguageModel interface has been redesigned, see BaseLanguageModel for a class similar to the previous implementation
  • Class BerkeleyLanguageModel was added to support BerkeleyLM language models. See https://github.com/adampauls/berkeleylm for the software and e.g. http://tomato.banatao.berkeley.edu:8080/berkeleylm_binaries/ for pre-built models. To use the new models your language class needs to overwrite the getLanguageModel(File) method. For now, we recommend you continue using the Lucene-based models at http://languagetool.org/download/ngram-data/.

LibreOffice / Apache OpenOffice Integration

  • fix: disabling rules that are disabled by default and had been enabled didn't work

Internal

3.1 (2015-09-28)

Catalan

  • added new rules
  • fixed false alarms
  • added words suggested by users

English

French

  • upgraded Hunspell dictionary to Dicollecte-5.4.1
  • upgraded POS tag and Synthesizer dictionaries to Dicollecte-5.4
  • added/improved several rules
  • new filter to be used for matching the part-of-speech of parts of words, e.g.:
       <filter class="org.languagetool.rules.fr.FrenchPartialPosTagFilter"
               args="no:1 regexp:(.*)-tu postag_regexp:V.*(ind|con|sub).*2\ss negate_pos:yes"/>

German

Japanese

  • major rule update with 700+ new rules, thanks to Shugyousha

Polish

  • added some compound prepositions to avoid false alarms (thanks to Sławek Borewicz)

Portuguese

  • added/improved several rules

Russian

  • added and improved a few rules
  • added a few false friends rules (Russian/English)

Ukrainian

  • significant dictionary update (fixes, lot of new adjectives and last names)

LibreOffice / Apache OpenOffice Integration

ngrams

Embedded server

API

  • several deprecated methods and classes have been removed
  • Rules can now overwrite getAntiPatterns() with patterns to be ignored. See the javadoc for details of what needs to be considered to make this work. See org.languagetool.rules.de.CaseRule for an example.

Internal

  • updated to Lucene 5.2.1
  • updated to Apache OpenNLP 1.6.0

3.0 (2015-06-29)

Breton

  • updated FSA spelling dictionary from An Drouizig Breton Spellchecker 0.13
  • updated POS dictionary from Apertium (svn r61079)

Catalan

  • added new rules
  • fixed false alarms
  • added words suggested by users

English

  • added a few new rules
  • ConfusionProbabilityRule (only enabled with the --languagemodel option) has been rewritten and homophones.txt has been renamed to confusion_sets.txt and now only has few items enabled by default, the rest is commented out to improve quality (less false alarms). Also see http://wiki.languagetool.org/finding-errors-using-big-data

German

  • fixed some false alarms
  • updated to jwordsplitter 4.1 for better compound splitting
  • the spell checker offers correct suggestions now for incorrect past tense forms like "gehte" -> "ging" (useful mostly for non-native speakers)
  • added word frequency information to improve spelling suggestions (but this won't help for compounds which are not in the dictionary)

Polish

  • added new rules
  • fixed dozens of false alarms

Portuguese

  • added/improved several rules (started adding morphologic rules)

Russian

  • improved rules
  • updated spellchecker

Slovak

  • dictionary update and several new rules

Ukrainian

  • big dictionary update (thousands of new words, new tagging for pronouns)
  • improved sentence and word tokenization
  • improved tokenization and tagging of lowercase abbreviations
  • new grammar and styling rules
  • new spelling rules, especially for lowercase abbreviations with dots
  • improved compound word tagging
  • improved some rules coverage
  • many new barbarism replacement suggestions

Bug Fixes

  • UppercaseSentenceStartRule didn't properly reset its state so that different errors could be found when e.g. JLanguageTool.check() got called twice with the same text.
  • Authenticator.setDefault() is now only called if it's allowed by the Java security manager. In rare cases, this might affect using external XML rule files as documented at http://wiki.languagetool.org/tips-and-tricks#toc9 (Github issue #255)

GUI (stand-alone version)

Embedded server

  • performance improvements for checking small texts for the use case that creates a new JLanguageTool object for every check, as done by the embedded server (or multithreaded LT users in general)

Command-line

  • Fixed an error with the --api option that printed invalid XML for large documents or when the input was STDIN (Github issue #251)
  • Print some information to STDERR instead of STDOUT so the --api option makes more sense

API

  • added MultiThreadedJLanguageTool.shutdown() to clean up the thread pool
  • several deprecated methods and classes have been removed, e.g.
    • Language.REAL_LANGUAGES is now Languages.get()
    • Language.LANGUAGES is now Languages.getWithDemoLanguage() - but you will probably want to use Languages.get()
  • Other static methods from class Language have also been moved to Languages
  • Language.addExternalRuleFile() and Language.getExternalRuleFiles() have been removed. To add rules, load them with PatternRuleLoader and call JLanguageTool.addRule().
  • getAllRules(), getAllActiveRules(), and getPatternRulesByIdAndSubId() in class JLanguageTool used to call reset() for all rules. This is not the case anymore. reset() is now called when one of the check() methods is called. This shouldn't make a difference for all common use-cases.
  • Language.setName() has been removed. If you need to set the name, overwrite the getName() method instead.
  • Rule.getCorrectExamples()/getIncorrectExamples(), PatternToken.getOrGroup()/getAndGroup() and RuleMatch.getSuggestedReplacements() now return an unmodifiable list
  • AbstractSimpleReplaceRule.getFileName() and AbstractWordCoherencyRule.getFileName() have been removed, the sub classes are now themselves responsible for loading their data
  • Sub classes of AbstractCompoundRule are now responsible for loading the compound data themselves using CompoundRuleData
  • AbstractCompoundRule.setShort(String) has been removed and added as a constructor parameter instead.

Internal

  • updated to language-detector 0.5

2.9.1 (2015-05-14)

LibreOffice / Apache OpenOffice Integration

Older versions

See CHANGES.txt for changes before 2.9.1.