Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[en] Enhance the disambiguator to allow the 'hiding' of text #9160

Open
MikeUnwalla opened this issue Aug 23, 2023 · 4 comments
Open

[en] Enhance the disambiguator to allow the 'hiding' of text #9160

MikeUnwalla opened this issue Aug 23, 2023 · 4 comments

Comments

@MikeUnwalla
Copy link
Contributor

Some possible users of LT cannot use it because LT is for plain text only. We have some integrations, but they are very limited (https://dev.languagetool.org/software-that-supports-languagetool-as-a-plug-in-or-add-on).

To increase the potential user base (and to make my ASD-STE100 checker available to a wider audience), I want to ignore XML and HTML code and let LT analyse the text AS IF the code did not exist. I cannot do this with immunization (https://forum.languagetool.org/t/what-is-the-purpose-of-immunization-what-should-it-do-or-not-do/9204).

Disambiguator enhancement: Make something similar to 'Immunizing words from matching' (https://dev.languagetool.org/developing-a-disambiguator#immunizing-words-from-matching). But, let it fully ignore the text, as if the text did not exist. This would let users add rules like this to the disambiguator:

  <rule id="SIMPLE_IMMUNIZE_XML" name="Immunize XML code">
      <pattern>
        <marker>
          <token regexp="yes">&lt;</token>
          <token spacebefore="no" regexp="yes">[a-z]+</token>
          <token spacebefore="no" regexp="yes">&gt;</token>
        </marker>
      </pattern>
     <disambig action="hide"/>
    </rule>

Then, for example, with this sentence:
I am in a <em>trouble</em>.
the grammar rule IN_A_TROUBLE would give a warning for: a <em>trouble

Optional extras:

  • Show the disambiguation rule name in Tagger Result.
  • For testing, permit untouched and ambiguous examples in the rule.
  • Automatically ignore the spelling of 'hidden' text.
@danielnaber
Copy link
Member

I understand, but a cleaner solution would be to use a pre-processing that removes the (e.g.) XML or use the API's data parameter to send information about what is text and what is markup.

@MikeUnwalla
Copy link
Contributor Author

@danielnaber, thanks.

  1. If the tags are removed, the workflow is 1-way only. For example, I cannot use File>Open in the GUI, do the analysis, correct errors, and then save the text.
  2. I do not understand what you mean by "the API's data parameter". Is there a page on the LT website that explains?

Thanks.

@danielnaber
Copy link
Member

@MikeUnwalla I was referring to the HTTP documented at https://languagetool.org/http-api/#!/default/post_check - it can receive information about which parts of the input are text and which are markup (but that information needs to be generated by the caller of the API).

@MikeUnwalla
Copy link
Contributor Author

@danielnaber, thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants