east_asian_line_breaks Extension for Org-mode input #3703

wenxin-wang · 2017-05-28T10:53:37Z

The east_asian_line_breaks extension, which eliminates SoftBreak between two East Asian wide characters, currently works only for the markdown reader. Org-mode needs the same handling.

Here's a simple example

* Example
你
好

I don't know haskell, but from the code it seems to me that softBreakFilter is rather generic and could be adopted by the Org reader.

The pandoc version I'm using is

pandoc 1.19.2.1
Compiled with pandoc-types 1.17.0.5, texmath 0.9.4, skylighting 0.1.1.5

Thank you for making pandoc!

The text was updated successfully, but these errors were encountered:

jgm · 2017-05-28T11:39:50Z

Indeed, this would be simple to add. We could, with a bit of replumbing, even make the extension work for all the readers.

wenxin-wang · 2017-05-28T15:03:33Z

Here's a python filter that does the same thing (at least I hope), which could serve as a temporary solution. Please correct me if there's any mistake in the code.

jgm · 2017-05-30T08:35:48Z

Here's a thought. Currently we do the transformation in the reader, so the AST has no SoftBreak elements between east asian characters.

This means that you can't meaningfully do --wrap=preserve with the east_asian_line_breaks option.

What if we removed the east_asian_line_breaks extension and instead had a fourth option for --wrap: --wrap=auto|asian|none|preserve.

This would operate in the writer, not the reader, and would simply omit softbreak elements (not replacing with space). (This could most conveniently be implemented by adding a filter right before the writer.)

I can think of one reason why one might want to keep this feature in the reader: certain constructions are sensitive to spaces (e.g. _ emphasis), and we might want the reader to act as if the line breaks between two east asian characters are not spaces at all. However, the current implementation (which is a filter AFTER parsing) does not do this. If we wanted this, we'd need to integrate the feature more directly in the parsing.

jgm · 2017-05-30T08:36:59Z

I made commit 774075c as a first step.

jgm · 2017-06-27T09:53:55Z

OK, I now think my thinking above was confused. If you want to use --wrap=preserve, you can just leave off east_asian_line_breaks. The two don't need to be made compatible.

jgm · 2017-06-27T10:02:51Z

New thought 1: we might want to combine the "treat soft breaks between east asian characters as if they were not there" behavior with either --wrap=preserve (preserving other soft breaks) or --wrap=auto. So maybe the thing to do is to add a new command line option, --asian-soft-breaks, which would simply cause soft breaks between asian characters to be ignored. This could be implemented with the current filter, applied in Text.Pandoc.App between the reader and the writer, so that it would affect all input formats. The east_asian_line_breaks option could then be removed.

New thought 2: or, we could keep east_asian_line_breaks and handle it intelligently in Text.Pandoc.App, as follows. If both the input and the output format have east_asian_line_breaks, or if neither do, do nothing. (Thus, soft breaks would be preserved, for example, on markdown -> markdown translation.) If input has east_asian_line_breaks and output doesn't, apply the filter to strip out SoftBreaks between the reader and the writer. If output has east_asian_line_breaks but input doesn't, do nothing.

I like New thought 2 best, I think. It would be simple to implement and wouldn't require any new options. The only odd thing is that the extension would be applied in Text.Pandoc.App rather than in the reader itself, which would mean it couldn't be used easily when the readers/writers are called as libraries. This might cause some confusion.

jgm added this to the pandoc 2.0 milestone Jun 10, 2017

jgm closed this as completed in 69b2cb3 Jun 30, 2017

kaizhang16 mentioned this issue May 26, 2018

emphasize Ext_east_asian_line_breaks will not take effect when called as libraries #4674

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

east_asian_line_breaks Extension for Org-mode input #3703

east_asian_line_breaks Extension for Org-mode input #3703

wenxin-wang commented May 28, 2017

jgm commented May 28, 2017 via email

wenxin-wang commented May 28, 2017 •

edited

Loading

jgm commented May 30, 2017

jgm commented May 30, 2017

jgm commented Jun 27, 2017

jgm commented Jun 27, 2017

east_asian_line_breaks Extension for Org-mode input #3703

east_asian_line_breaks Extension for Org-mode input #3703

Comments

wenxin-wang commented May 28, 2017

jgm commented May 28, 2017 via email

wenxin-wang commented May 28, 2017 • edited Loading

jgm commented May 30, 2017

jgm commented May 30, 2017

jgm commented Jun 27, 2017

jgm commented Jun 27, 2017

wenxin-wang commented May 28, 2017 •

edited

Loading