Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create a JUnit test that would check if the messages and titles of rules in LanguageTool are correct #63

Closed
milekpl opened this issue Feb 13, 2014 · 18 comments
Labels

Comments

@milekpl
Copy link
Member

milekpl commented Feb 13, 2014

There may be typos in messages and rule titles, in particular in XML pattern rules. Create a check that would run LanguageTool on these messages. Note: some rule titles quote the error that they match, so the match of that very rule should be ignored.

Requires: knowledge of Java

@czojo26
Copy link
Contributor

czojo26 commented Dec 20, 2016

Hi, I am Michał and I would like to contribute to this project. This is the first problem I would like to solve,
but I am not sure where this JUnit should be placed. In which module such test would fit?

@danielnaber
Copy link
Member

Thanks for your interest in LT. You could place the test in languagetool-standalone, like this test: https://github.com/languagetool-org/languagetool/blob/master/languagetool-standalone/src/test/java/org/languagetool/JLanguageToolTest.java#L40

@janschreiber
Copy link
Contributor

Maybe this excellent idea can be expanded to run checks on the (supposedly) correct example sentences as well?

@EgorNemchinov
Copy link
Contributor

Hello! May I take this issue?

@danielnaber
Copy link
Member

Hello! May I take this issue?

Sure! Let us know here or on the forum if you have questions.

@EgorNemchinov
Copy link
Contributor

EgorNemchinov commented Mar 2, 2018

@danielnaber, My questins are rather concerning the understanding of LT basics.
There are rules in languagetool-core, which extend "Rule" class. What is title of a rule here? What does "in particular in XML pattern rules" mean? I want to understand what does title and message mean applied to Rule class.
And the task is to apply all rules to titles of each of these rules?

There's a lot of documentation, it would be wonderful if you could guide me what to explore firstly.
Any information would be appreciated. Thank you

@danielnaber
Copy link
Member

title is a string shown in the configuration dialog where users can enable/disable rules. message is the string (often with a variable) that is shown to the user when a potential error is found.

And the task is to apply all rules to titles of each of these rules?

Basically yes, but as titles are not complete sentence, we need to see if it makes sense. Maybe some rules are just not useful for this use case.

@EgorNemchinov
Copy link
Contributor

All right, I can tell a little about my intermediate results. I analysed descriptions of English rules with the rules themselves.

There are cases, when this approach finds errors, for example:
Phrase: Space character at the begin of paragraph
Rule: a/the + infinitive

Yes, often rules that must be applied to whole sentence don't make sense here (like UPPERCASE_SENTENCE_START, SENTENCE_FRAGMENT). But let's suppose we can filter the rules by some category (I haven't explored embedded categories yet).

There is a problem with indeterminacy of using brackets and overall pattern of writing bad and good samples in titles.
Sometimes brackets are used to explain the rule:

  • Example №1: Who + verb (who know's/knows)
  • Example №2: whos NN (possessive)
    But mostly brackets are used to show the right way:
  • Example №1: could of (could have)
  • Example №2: must be do (done)
    Similar thing applies to quotes, here are different use cases:
  • Example №1: we'Re' (we're) etc
  • Example №2: Replace '12 pm' with 'noon'
  • Example №3:Agreement: 'I is / you is / ... ' (at sentence start only)

I'll continue exploring this, but for now it's clear there are a lot of cases to be considered
By the way, It seems to me that some rules are incorrectly applied, but I'll look into that more carefully

@danielnaber
Copy link
Member

Thanks for the update. It probably makes sense to focus on messages first, and care about titles later.

@EgorNemchinov
Copy link
Contributor

Hey! Sorry, last few days had been at a hackathon.
May I ask for an advice? How should I extract message from Rule object?
RuleMatch has .getMessage() method, but Rule doesn't.
Am I missing something obvious? Should I look into rules in XML format?

@danielnaber
Copy link
Member

The message indeed doesn't depend on the rule but on the specific match. You can load the rules using org.languagetool.rules.patterns.PatternRuleLoader, each rule has at least one incorrect example which you can run to get a match with its message. (Not sure now if you even need PatternRuleLoader or whether you can iterate the rules of a language.)

@EgorNemchinov
Copy link
Contributor

Thanks, will try!

@EgorNemchinov
Copy link
Contributor

EgorNemchinov commented Mar 7, 2018

Even though there is a lot of noise, i.e. found RuleMatches aren't really caused by mistake, but rather based on Rules' messages properties, there is some signal.
Also there are a lot of repeated whitespaces, unpaired brackets and suggestions to replace simple quotes with smart ones.
What I did is run all rules on a Rule message and excluded matches that were in the . Also I need to exclude the ones in single quotation marks

For example

  1. The term 'Anglo-Saxon' is generally used to describe 'a member of any of the West Germanic tribes
    Message: Consider simply using of instead
    • in the English speaking world*
      Message: Did you mean the adjective English-speaking?

So, if we disable some rules and apply some conditions - it might be somewhat sensible, I can try.

@danielnaber
Copy link
Member

Sounds useful. This is probably not something that will run on every test run, but maybe every few months, or before release. And someone will need to look at it anyway.

@EgorNemchinov
Copy link
Contributor

Yeah, I agree. So, how do you see it? Should I just write a test? Then how do achieve that it's not run each time?

@danielnaber
Copy link
Member

Write a test and use the @Ignore notation.

@ales-blaze
Copy link

May i work on this issue?

@danielnaber
Copy link
Member

@ales-blaze Sure, feel free to give it a try.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

6 participants