[en] Enhance the disambiguator to allow the 'hiding' of text #9160

MikeUnwalla · 2023-08-23T08:07:35Z

Some possible users of LT cannot use it because LT is for plain text only. We have some integrations, but they are very limited (https://dev.languagetool.org/software-that-supports-languagetool-as-a-plug-in-or-add-on).

To increase the potential user base (and to make my ASD-STE100 checker available to a wider audience), I want to ignore XML and HTML code and let LT analyse the text AS IF the code did not exist. I cannot do this with immunization (https://forum.languagetool.org/t/what-is-the-purpose-of-immunization-what-should-it-do-or-not-do/9204).

Disambiguator enhancement: Make something similar to 'Immunizing words from matching' (https://dev.languagetool.org/developing-a-disambiguator#immunizing-words-from-matching). But, let it fully ignore the text, as if the text did not exist. This would let users add rules like this to the disambiguator:

  <rule id="SIMPLE_IMMUNIZE_XML" name="Immunize XML code">
      <pattern>
        <marker>
          <token regexp="yes">&lt;</token>
          <token spacebefore="no" regexp="yes">[a-z]+</token>
          <token spacebefore="no" regexp="yes">&gt;</token>
        </marker>
      </pattern>
     <disambig action="hide"/>
    </rule>

Then, for example, with this sentence:
I am in a <em>trouble</em>.
the grammar rule IN_A_TROUBLE would give a warning for: a <em>trouble

Optional extras:

Show the disambiguation rule name in Tagger Result.
For testing, permit untouched and ambiguous examples in the rule.
Automatically ignore the spelling of 'hidden' text.

The text was updated successfully, but these errors were encountered:

danielnaber · 2023-08-23T08:42:59Z

I understand, but a cleaner solution would be to use a pre-processing that removes the (e.g.) XML or use the API's data parameter to send information about what is text and what is markup.

MikeUnwalla · 2023-08-23T08:55:13Z

@danielnaber, thanks.

If the tags are removed, the workflow is 1-way only. For example, I cannot use File>Open in the GUI, do the analysis, correct errors, and then save the text.
I do not understand what you mean by "the API's data parameter". Is there a page on the LT website that explains?

Thanks.

danielnaber · 2023-08-23T09:38:23Z

@MikeUnwalla I was referring to the HTTP documented at https://languagetool.org/http-api/#!/default/post_check - it can receive information about which parts of the input are text and which are markup (but that information needs to be generated by the caller of the API).

MikeUnwalla · 2023-08-23T10:14:03Z

@danielnaber, thank you.

MikeUnwalla added enhancement English code/java labels Aug 23, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[en] Enhance the disambiguator to allow the 'hiding' of text #9160

[en] Enhance the disambiguator to allow the 'hiding' of text #9160

MikeUnwalla commented Aug 23, 2023

danielnaber commented Aug 23, 2023

MikeUnwalla commented Aug 23, 2023

danielnaber commented Aug 23, 2023

MikeUnwalla commented Aug 23, 2023

[en] Enhance the disambiguator to allow the 'hiding' of text #9160

[en] Enhance the disambiguator to allow the 'hiding' of text #9160

Comments

MikeUnwalla commented Aug 23, 2023

danielnaber commented Aug 23, 2023

MikeUnwalla commented Aug 23, 2023

danielnaber commented Aug 23, 2023

MikeUnwalla commented Aug 23, 2023