Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AsciiDoc: Sentences joined by a double space in a para - sentence per line #464

Closed
git-pear opened this issue Jan 30, 2024 · 9 comments
Closed

Comments

@git-pear
Copy link

Hello,

I have noticed that if an AsciiDoc text paragraph is styled as 'sentence per line', the resulting .po(t) file contains double spaces between the sentences of the paragraph.

If the paragraph in asciidoc is not styled as 'sentence per line' (all the para's sentences are on one line), the .po file is "normal", without double spaces.

Is such conversion of the 'sentence per line' paragraph of asciidoc' into 'double spaced' .pot/.po intentional, with a reason behind that?

Thank you very much for looking into this matter.

Josef Hruska

@git-pear
Copy link
Author

This can be used as a sample asciidoc text:

= Test of a sentence per line para

== Para not styled as 'sentence per line'

Not sentence per line para. This para is written such as all it's sentences are placed on one line only. After po4a processing, there is just single space between the senteces of the para.

== Para styled as 'sentence per line'

Sentence per line para.
This para is written such as all it's sentences are placed each one on its own line.
After po4a processing, there are double spaces between the sentences of the para.

@jnavila
Copy link
Collaborator

jnavila commented Feb 18, 2024

What I found is that, this is mostly due to the fact that the carriage return is replaced with a space, and you can have a dangling space at the end of the line.

I can make po4a tidy the input string when it is not "wrap", but that logic will upset a lot of existing translations.

@git-pear
Copy link
Author

I can make po4a tidy the input string when it is not "wrap", but that logic will upset a lot of existing translations.

Thank you for looking into this. I consider that not worth to make others upset.

@jnavila
Copy link
Collaborator

jnavila commented Feb 18, 2024

Maybe I can add an option, so that the default behavior is retained but you can still get a cleaned up po file. Let's try this.

@git-pear
Copy link
Author

Then thank you very much indeed. I did not think about this 'configuration' possibility.

@mquinson
Copy link
Owner

@jnavila I think that an option for that would complicate the use of the software for little gain, unless we find a simplification opportunity such as "legacy mode" where we never do such fixes, and "modern mode" where we do all of them (ie, this one and the future comparable ones). But a specific option for this specific bug that we cannot fix without upsetting users seems like a bad idea.

Maybe, legacy needs to be a scalar related to the date instead of a boolean, so that people can chose the level of legacy they want in the future, to not force anyone to either embrace bugs older than their project by jumping in the legacy more or get rid of the bugs they are used to.

Still somewhat unsure here

@git-pear
Copy link
Author

Still somewhat unsure here

Ok, this is not urgent IMHO, take your time to think it all through.

I was puzzled by the double spaces mainly because they are, let's say, highlighted in weblate editor I use for translating documentation. Not experienced translators may tend towards transferring the double spaces also to their translation(s), which is actually not necessary.

@jnavila
Copy link
Collaborator

jnavila commented Feb 19, 2024

I was puzzled by the double spaces mainly because they are, let's say, highlighted in weblate editor I use for translating documentation.

This also bogged me, and I feel that something needs to be done. Doing the translation of git manpages, I spotted some places where the authors use two spaces after a final dot. This is totally useless with asciidoc, because the processor deduplicates them anyway. And Weblate, which is unaware of asciidoc, tends to be very picky on maintaining double-spaces in translations. So this is obviously something I'd like to tackle.

Maybe, legacy needs to be a scalar related to the date instead of a boolean, so that people can chose the level of legacy they want in the future, to not force anyone to either embrace bugs older than their project by jumping in the legacy more or get rid of the bugs they are used to.

Of course, I don't want to upset already existing po-files; the default will be to keep all spaces. Each time I "fix" something in the management of po-files for asciidoc, unfortunately this comes with fuzzied entries in existing stuff if you apply them. The scalar optional stuff would be a good idea, but then, you have an indirection between an command line option and the actual stuff being fixed. I don't think that this is a good idea because this changes are not additive. For instance, 'tablecells' can make sense for some files but you may want to not enable it on others.

I don't think there's going to be a lot more options to develop (I hope!) .

jnavila added a commit that referenced this issue Feb 28, 2024
Extra spaces are removed in no-wrap segments, so that the output
conforms to English typography rules.

This fixes GitHub #464
mquinson pushed a commit that referenced this issue May 9, 2024
Extra spaces are removed in no-wrap segments, so that the output
conforms to English typography rules.

This fixes GitHub #464
@mquinson
Copy link
Owner

mquinson commented May 9, 2024

IIUC, this is fixed by #481

Please do not hesitate to reopen if some issue remains.

@mquinson mquinson closed this as completed May 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants