Sentence splitting with maths replacements for English texts #22

matze-dd · 2020-02-05T14:47:35Z

With tex2txt.py --lang en ..., the LaTeX input

We know that
\[
    f(n) = 0 \text{ for all } n.
\]
this completes the proof.

currently results in this plain text version:

We know that
  U for all V. 
this completes the proof.

It seems that the dot at 'V.' is not recognised as sentence splitter by LanguageTool (LT), since it might be the acronym of a first name. Consequently, LT will not complain about the lower-case 'this' starting a new sentence.

According to some experiments, the following settings for maths replacements are more appropriate in English texts.

parms.inline_math = ('B-B-B', 'C-C-C', 'D-D-D', 'E-E-E', 'F-F-F', 'G-G-G')
parms.display_math = ('U-U-U', 'V-V-V', 'W-W-W', 'X-X-X', 'Y-Y-Y', 'Z-Z-Z')

Now, LT's sensitivity seems to be almost as good as for German texts with the current replacement collections ('D1D', 'I1I', ...). Word repetitions due to missing interpunction in equations and missing white space in connection with \text{...} parts are detected as before.

Still, there is at least one difference to the German version. In the following snippet, the missing dot is not detected in the English variant. LT does not complain about the capital 'This'.

We know that
\[
    f(n) = 0 \text{ for all } n
\]
This completes the proof.

But LT also won't generate a message for

This Is a pity.

The text was updated successfully, but these errors were encountered:

matze-dd added the enhancement New feature or request label Feb 5, 2020

matze-dd mentioned this issue Feb 5, 2020

Need better math replacements for English texts #14

Closed

matze-dd added a commit that referenced this issue Feb 6, 2020

See Issue #22

40179c3

matze-dd added a commit that referenced this issue Feb 8, 2020

Issue #22, also for German

d70f91a

matze-dd closed this as completed in 31a4940 Feb 8, 2020

matze-dd added a commit that referenced this issue Feb 8, 2020

See Issue #22; new options for shell.py

524d2ba

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sentence splitting with maths replacements for English texts #22

Sentence splitting with maths replacements for English texts #22

matze-dd commented Feb 5, 2020 •

edited

Sentence splitting with maths replacements for English texts #22

Sentence splitting with maths replacements for English texts #22

Comments

matze-dd commented Feb 5, 2020 • edited

matze-dd commented Feb 5, 2020 •

edited