Create a JUnit test that would check if the messages and titles of rules in LanguageTool are correct #63

milekpl · 2014-02-13T09:56:31Z

There may be typos in messages and rule titles, in particular in XML pattern rules. Create a check that would run LanguageTool on these messages. Note: some rule titles quote the error that they match, so the match of that very rule should be ignored.

Requires: knowledge of Java

czojo26 · 2016-12-20T08:36:34Z

Hi, I am Michał and I would like to contribute to this project. This is the first problem I would like to solve,
but I am not sure where this JUnit should be placed. In which module such test would fit?

danielnaber · 2016-12-20T22:12:21Z

Thanks for your interest in LT. You could place the test in languagetool-standalone, like this test: https://github.com/languagetool-org/languagetool/blob/master/languagetool-standalone/src/test/java/org/languagetool/JLanguageToolTest.java#L40

janschreiber · 2016-12-21T13:28:39Z

Maybe this excellent idea can be expanded to run checks on the (supposedly) correct example sentences as well?

EgorNemchinov · 2018-03-02T08:36:22Z

Hello! May I take this issue?

danielnaber · 2018-03-02T09:01:11Z

Hello! May I take this issue?

Sure! Let us know here or on the forum if you have questions.

EgorNemchinov · 2018-03-02T11:58:25Z

@danielnaber, My questins are rather concerning the understanding of LT basics.
There are rules in languagetool-core, which extend "Rule" class. What is title of a rule here? What does "in particular in XML pattern rules" mean? I want to understand what does title and message mean applied to Rule class.
And the task is to apply all rules to titles of each of these rules?

There's a lot of documentation, it would be wonderful if you could guide me what to explore firstly.
Any information would be appreciated. Thank you

danielnaber · 2018-03-02T12:32:40Z

title is a string shown in the configuration dialog where users can enable/disable rules. message is the string (often with a variable) that is shown to the user when a potential error is found.

And the task is to apply all rules to titles of each of these rules?

Basically yes, but as titles are not complete sentence, we need to see if it makes sense. Maybe some rules are just not useful for this use case.

EgorNemchinov · 2018-03-02T19:05:18Z

All right, I can tell a little about my intermediate results. I analysed descriptions of English rules with the rules themselves.

There are cases, when this approach finds errors, for example:
Phrase: Space character at the begin of paragraph
Rule: a/the + infinitive

Yes, often rules that must be applied to whole sentence don't make sense here (like UPPERCASE_SENTENCE_START, SENTENCE_FRAGMENT). But let's suppose we can filter the rules by some category (I haven't explored embedded categories yet).

There is a problem with indeterminacy of using brackets and overall pattern of writing bad and good samples in titles.
Sometimes brackets are used to explain the rule:

Example №1: Who + verb (who know's/knows)
Example №2: whos NN (possessive)
But mostly brackets are used to show the right way:
Example №1: could of (could have)
Example №2: must be do (done)
Similar thing applies to quotes, here are different use cases:
Example №1: we'Re' (we're) etc
Example №2: Replace '12 pm' with 'noon'
Example №3:Agreement: 'I is / you is / ... ' (at sentence start only)

I'll continue exploring this, but for now it's clear there are a lot of cases to be considered
By the way, It seems to me that some rules are incorrectly applied, but I'll look into that more carefully

danielnaber · 2018-03-02T19:46:09Z

Thanks for the update. It probably makes sense to focus on messages first, and care about titles later.

EgorNemchinov · 2018-03-06T17:45:30Z

Hey! Sorry, last few days had been at a hackathon.
May I ask for an advice? How should I extract message from Rule object?
RuleMatch has .getMessage() method, but Rule doesn't.
Am I missing something obvious? Should I look into rules in XML format?

danielnaber · 2018-03-06T17:54:48Z

The message indeed doesn't depend on the rule but on the specific match. You can load the rules using org.languagetool.rules.patterns.PatternRuleLoader, each rule has at least one incorrect example which you can run to get a match with its message. (Not sure now if you even need PatternRuleLoader or whether you can iterate the rules of a language.)

EgorNemchinov · 2018-03-06T17:55:49Z

Thanks, will try!

EgorNemchinov · 2018-03-07T12:57:46Z

Even though there is a lot of noise, i.e. found RuleMatches aren't really caused by mistake, but rather based on Rules' messages properties, there is some signal.
Also there are a lot of repeated whitespaces, unpaired brackets and suggestions to replace simple quotes with smart ones.
What I did is run all rules on a Rule message and excluded matches that were in the . Also I need to exclude the ones in single quotation marks

For example

The term 'Anglo-Saxon' is generally used to describe 'a member of any of the West Germanic tribes
Message: Consider simply using of instead
- in the English speaking world*
  Message: Did you mean the adjective English-speaking?

So, if we disable some rules and apply some conditions - it might be somewhat sensible, I can try.

danielnaber · 2018-03-07T13:15:23Z

Sounds useful. This is probably not something that will run on every test run, but maybe every few months, or before release. And someone will need to look at it anyway.

EgorNemchinov · 2018-03-07T19:30:14Z

Yeah, I agree. So, how do you see it? Should I just write a test? Then how do achieve that it's not run each time?

danielnaber · 2018-03-07T19:35:25Z

Write a test and use the @Ignore notation.

ales-blaze · 2018-11-06T05:37:03Z

May i work on this issue?

danielnaber · 2018-11-06T07:58:43Z

@ales-blaze Sure, feel free to give it a try.

milekpl added the easy fix label Feb 13, 2014

EgorNemchinov added a commit to EgorNemchinov/languagetool that referenced this issue Mar 8, 2018

Check rules' messages by LanguageTool: issue languagetool-org#63

177761a

EgorNemchinov mentioned this issue Mar 8, 2018

Check rules' messages by LanguageTool: issue #63 #935

Merged

danielnaber pushed a commit that referenced this issue Mar 9, 2018

Check rules' messages by LanguageTool: issue #63

97e673a

danielnaber closed this as completed Jan 1, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Create a JUnit test that would check if the messages and titles of rules in LanguageTool are correct #63

Create a JUnit test that would check if the messages and titles of rules in LanguageTool are correct #63

milekpl commented Feb 13, 2014

czojo26 commented Dec 20, 2016

danielnaber commented Dec 20, 2016

janschreiber commented Dec 21, 2016

EgorNemchinov commented Mar 2, 2018

danielnaber commented Mar 2, 2018

EgorNemchinov commented Mar 2, 2018 •

edited

danielnaber commented Mar 2, 2018

EgorNemchinov commented Mar 2, 2018

danielnaber commented Mar 2, 2018

EgorNemchinov commented Mar 6, 2018

danielnaber commented Mar 6, 2018

EgorNemchinov commented Mar 6, 2018

EgorNemchinov commented Mar 7, 2018 •

edited

danielnaber commented Mar 7, 2018

EgorNemchinov commented Mar 7, 2018

danielnaber commented Mar 7, 2018

ales-blaze commented Nov 6, 2018

danielnaber commented Nov 6, 2018

Create a JUnit test that would check if the messages and titles of rules in LanguageTool are correct #63

Create a JUnit test that would check if the messages and titles of rules in LanguageTool are correct #63

Comments

milekpl commented Feb 13, 2014

czojo26 commented Dec 20, 2016

danielnaber commented Dec 20, 2016

janschreiber commented Dec 21, 2016

EgorNemchinov commented Mar 2, 2018

danielnaber commented Mar 2, 2018

EgorNemchinov commented Mar 2, 2018 • edited

danielnaber commented Mar 2, 2018

EgorNemchinov commented Mar 2, 2018

danielnaber commented Mar 2, 2018

EgorNemchinov commented Mar 6, 2018

danielnaber commented Mar 6, 2018

EgorNemchinov commented Mar 6, 2018

EgorNemchinov commented Mar 7, 2018 • edited

danielnaber commented Mar 7, 2018

EgorNemchinov commented Mar 7, 2018

danielnaber commented Mar 7, 2018

ales-blaze commented Nov 6, 2018

danielnaber commented Nov 6, 2018

EgorNemchinov commented Mar 2, 2018 •

edited

EgorNemchinov commented Mar 7, 2018 •

edited