Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fenced divs should be either "verbatim" (as now) or "translate" (new behavior) #381

Open
joelnitta opened this issue Aug 24, 2022 · 12 comments

Comments

@joelnitta
Copy link

There seem to be some possibly related issues (#291, #357, #359), but I couldn't find anything describing exactly what I'm encountering, so I am filing a new one.

I am translating markdown with input like this (let's call this file test-long-line.md):

:::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::: instructor

Inline instructor notes

Note that the line with many colons ending with instructor is a pandoc fenced div and needs to remain one one line, and should not be translated.

I generate the PO file with po4a-updatepo -f text -m test-long-line.md -p test-long-line.po -o markdown --wrap-po newlines, then edit it to look as follows (call this test-long-line.po):

# SOME DESCRIPTIVE TITLE
# Copyright (C) YEAR Free Software Foundation, Inc.
# This file is distributed under the same license as the PACKAGE package.
# FIRST AUTHOR <EMAIL@ADDRESS>, YEAR.
#
#, fuzzy
msgid ""
msgstr ""
"Project-Id-Version: PACKAGE VERSION\n"
"POT-Creation-Date: 2022-08-24 02:44+0000\n"
"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
"Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
"Language-Team: LANGUAGE <LL@li.org>\n"
"Language: \n"
"MIME-Version: 1.0\n"
"Content-Type: text/plain; charset=UTF-8\n"
"Content-Transfer-Encoding: 8bit\n"

#. type: Plain text
#: test-long-line.md:2
#, markdown-text
msgid ":::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::: instructor"
msgstr ""

#. type: Plain text
#: test-long-line.md:3
#, markdown-text
msgid "Inline instructor notes"
msgstr "インストラクター用メモ"

When I translate from the PO file, the instructor part gets put on a new line, even though I want to avoid this behavior.

Command:

po4a-translate -f text -m test-long-line.md -p test-long-line.po -l test-long-line.ja.md -o markdown -k 0 --width 1000 --wrap-po newlines

Output:

::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
instructor

インストラクター用メモ

I also tried with nobullets as suggested in #359, but that did not work.

Thanks!

po4a dev version (4cc0afd)
running in docker container joelnitta/po4a:latest

@mquinson
Copy link
Owner

I didn't new that fenced code blocks can also be written with columns (:). So far, they are possible with backquotes (`) and tildes (~). I think that the following patch solves your issue, but I'd like to have a full example to integrate to the test suite before I commit it to our git.

--- a/lib/Locale/Po4a/Text.pm
+++ b/lib/Locale/Po4a/Text.pm
@@ -714,7 +714,7 @@ sub parse_markdown {
         $self->pushline( $line . "\n" );
         $paragraph        = "";
         $end_of_paragraph = 1;
-    } elsif ( $line =~ /^([ ]{0,3})(([~`])\3{2,})(\s*)([^`]*)\s*$/ ) {
+    } elsif ( $line =~ /^([ ]{0,3})(([~`:])\3{2,})(\s*)([^`]*)\s*$/ ) {
         my $fence_space_before  = $1;
         my $fence               = $2;
         my $fencechar           = $3;

@mquinson
Copy link
Owner

@joelnitta it would be really good if you could propose an extension to https://raw.githubusercontent.com/mquinson/po4a/master/t/fmt/txt-markdown/PandocFencedCodeBlocks.md testing "your" variant of fenced blocks, please. Just tell me what text chunk should be added, and I'll integrate properly in our test suite.

@joelnitta
Copy link
Author

Thanks @mquinson! I hadn't thought of this as a fenced code block, but rather as a markdown version of HTML divs (as described in the pandoc manual). But I suppose they are similar. The one thing that may differ is that pandoc fenced_divs can be nested, and I don't know if that applies to code blocks. So po4a would need to be able to account for that (again, my work-around was going to be to just not translate them, but if they were actually recognized and handled appropriately that would be even better).

I think borrowing from the pandoc manual should be fine for testing. Here are two examples.

First one is non-nested.

::::: {#special .sidebar}
Here is a paragraph.

And another.
:::::

Second one is nested.

::: Warning ::::::
This is a warning.

::: Danger
This is a warning within a warning.
:::
::::::::::::::::::

@mquinson
Copy link
Owner

Ok, I think it's fixed now. The fact that it can be nested made the patch more complex than I thought.
Thanks for reporting.

@joelnitta
Copy link
Author

Thanks @mquinson for your help with this.

Sorry to make this request after you have already closed the issue, but I hope you might consider some other ways to handle this situation.

The problem with this approach IMHO is that if there is a large amount of content within a fenced div, it all shows up as a single msgid. I think smaller msgids (generally one markdown paragraph at a time) are preferable. Also, this means that the translator may have to deal with more raw code (e.g., linebreaks (\n)) that would otherwise not show up in the PO file.

For my project I plan to crowdsource the translation part (i.e. the localization), so I want translators to be exposed to a minimum amount of code.

This is an example of what happens using the current approach.

Original text:

::::::::::::::::::::::::::::::::::::: challenge 

## Challenge 1: Can you do it?

What is the output of this command?

```r
paste("This", "new", "lesson", "looks", "good")
```

:::::::::::::::::::::::: solution 

## Output
 
```output
[1] "This new lesson looks good"
```

:::::::::::::::::::::::::::::::::

## Challenge 2: how do you nest solutions within challenge blocks?

:::::::::::::::::::::::: solution 

You can add a line with at least three colons and a `solution` tag.

:::::::::::::::::::::::::::::::::

::::::::::::::::::::::::::::::::::::::::::::::::

PO file (header excluded):

#. type: Fenced div block (challenge )
#: /409952fb95fbb825992593fca10961ea_1/test.Rmd:29
#, no-wrap
msgid ""
"\n"
"## Challenge 1: Can you do it?\n"
"\n"
"What is the output of this command?\n"
"\n"
"```r\n"
"paste(\"This\", \"new\", \"lesson\", \"looks\", \"good\")\n"
"```\n"
"\n"
":::::::::::::::::::::::: solution \n"
"\n"
"## Output\n"
" \n"
"```output\n"
"[1] \"This new lesson looks good\"\n"
"```\n"
"\n"
":::::::::::::::::::::::::::::::::\n"
"\n"
"## Challenge 2: how do you nest solutions within challenge blocks?\n"
"\n"
":::::::::::::::::::::::: solution \n"
"\n"
"You can add a line with at least three colons and a `solution` tag.\n"
"\n"
":::::::::::::::::::::::::::::::::\n"
"\n"
msgstr ""

For comparison, this is the PO file generated before the patch:

#. type: Plain text
#: /409952fb95fbb825992593fca10961ea_1/test.Rmd:30
#, markdown-text
msgid "::::::::::::::::::::::::::::::::::::: challenge"
msgstr ""

#. type: Title ##
#: /409952fb95fbb825992593fca10961ea_1/test.Rmd:31
#, markdown-text, no-wrap
msgid "Challenge 1: Can you do it?"
msgstr ""

#. type: Plain text
#: /409952fb95fbb825992593fca10961ea_1/test.Rmd:34
#, markdown-text
msgid "What is the output of this command?"
msgstr ""

#. type: Fenced code block (r)
#: /409952fb95fbb825992593fca10961ea_1/test.Rmd:35
#, no-wrap
msgid "paste(\"This\", \"new\", \"lesson\", \"looks\", \"good\")\n"
msgstr ""

#. type: Plain text
#: /409952fb95fbb825992593fca10961ea_1/test.Rmd:40
#: /409952fb95fbb825992593fca10961ea_1/test.Rmd:52
#, markdown-text
msgid ":::::::::::::::::::::::: solution"
msgstr ""

#. type: Title ##
#: /409952fb95fbb825992593fca10961ea_1/test.Rmd:41
#, markdown-text, no-wrap
msgid "Output"
msgstr ""

#. type: Fenced code block (output)
#: /409952fb95fbb825992593fca10961ea_1/test.Rmd:43
#, no-wrap
msgid "[1] \"This new lesson looks good\"\n"
msgstr ""

#. type: Plain text
#: /409952fb95fbb825992593fca10961ea_1/test.Rmd:48
#: /409952fb95fbb825992593fca10961ea_1/test.Rmd:56
#, markdown-text
msgid ":::::::::::::::::::::::::::::::::"
msgstr ""

I think having more msgid blocks will be significantly easier for translators.

@mquinson
Copy link
Owner

Ok, then. Let's reopen this bug. What we will need is an option to alternate between fenced-div=verbatim (as I did) and fenced-div=translate (as you propose).

I still think that we need both because the translate behavior may lead to some subtle difficulties when a nested div is inlined. In that case, the translators may want to change the location of the nested div in the englobing sentence.

@mquinson mquinson reopened this Aug 29, 2022
@mquinson mquinson changed the title Can't disable wrapping of translated output Markdown: fenced divs should be either "verbatim" (as now) or "translate" (new behavior) Aug 29, 2022
@mquinson mquinson changed the title Markdown: fenced divs should be either "verbatim" (as now) or "translate" (new behavior) Fenced divs should be either "verbatim" (as now) or "translate" (new behavior) Aug 29, 2022
@joelnitta
Copy link
Author

Thanks!

A few ideas... in the later case (fenced-div=translate), if the fenced div line will show up as a msgid, perhaps include a translator note that it does not need to be translated? Another option may be my original work-around of not including fenced divs in the PO file at all (possibly related to #77).

@joelnitta
Copy link
Author

@mquinson just checking in... is there anything I can do to help with this? (without knowing perl... sorry...)

This would be a great feature to have, especially because of the heavy use of fenced divs by Quarto, which is rapidly gaining popularity as a cross-language authoring system.

@joelnitta
Copy link
Author

Hi @mquinson unless I'm missing something obvious, I think this should be re-opened because it does not provide an option to choose treating fenced divs as either "verbatim" or "translate".

As mentioned above, the currently implementation results in unnecessary markdown formatting (especially line breaks, \n) showing up in the PO file.

Thanks!

@mquinson
Copy link
Owner

I forgot everything about this issue since then, sorry. Feel free to reopen it it's appropriate, then.

@mquinson mquinson reopened this May 15, 2023
@joelnitta
Copy link
Author

Thanks for the re-open. Please let me know if there's anything I can clarify.

@joelnitta
Copy link
Author

Actually, I'll go ahead and clarify a bit now:

Ideal behavior would be if the parsed text in the PO file accounted for all markdown formatting between fenced divs (detection of type: Title ##, etc) as well as the divs themselves. But if that is too difficult, the option to ignore fenced divs as a work-around so that any markdown formatting between them gets properly detected would be OK too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants