Skip to content
This repository has been archived by the owner on Feb 5, 2021. It is now read-only.

Sentence splitting with maths replacements for English texts #22

Closed
matze-dd opened this issue Feb 5, 2020 · 0 comments
Closed

Sentence splitting with maths replacements for English texts #22

matze-dd opened this issue Feb 5, 2020 · 0 comments
Labels
enhancement New feature or request

Comments

@matze-dd
Copy link
Owner

matze-dd commented Feb 5, 2020

With tex2txt.py --lang en ..., the LaTeX input

We know that
\[
    f(n) = 0 \text{ for all } n.
\]
this completes the proof.

currently results in this plain text version:

We know that
  U for all V. 
this completes the proof.

It seems that the dot at 'V.' is not recognised as sentence splitter by LanguageTool (LT), since it might be the acronym of a first name. Consequently, LT will not complain about the lower-case 'this' starting a new sentence.

According to some experiments, the following settings for maths replacements are more appropriate in English texts.

parms.inline_math = ('B-B-B', 'C-C-C', 'D-D-D', 'E-E-E', 'F-F-F', 'G-G-G')
parms.display_math = ('U-U-U', 'V-V-V', 'W-W-W', 'X-X-X', 'Y-Y-Y', 'Z-Z-Z')

Now, LT's sensitivity seems to be almost as good as for German texts with the current replacement collections ('D1D', 'I1I', ...). Word repetitions due to missing interpunction in equations and missing white space in connection with \text{...} parts are detected as before.

Still, there is at least one difference to the German version. In the following snippet, the missing dot is not detected in the English variant. LT does not complain about the capital 'This'.

We know that
\[
    f(n) = 0 \text{ for all } n
\]
This completes the proof.

But LT also won't generate a message for

This Is a pity.
@matze-dd matze-dd added the enhancement New feature or request label Feb 5, 2020
matze-dd added a commit that referenced this issue Feb 6, 2020
matze-dd added a commit that referenced this issue Feb 8, 2020
matze-dd added a commit that referenced this issue Feb 8, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant