Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

east_asian_line_breaks Extension for Org-mode input #3703

Closed
wenxin-wang opened this issue May 28, 2017 · 6 comments
Closed

east_asian_line_breaks Extension for Org-mode input #3703

wenxin-wang opened this issue May 28, 2017 · 6 comments
Milestone

Comments

@wenxin-wang
Copy link

The east_asian_line_breaks extension, which eliminates SoftBreak between two East Asian wide characters, currently works only for the markdown reader. Org-mode needs the same handling.

Here's a simple example

* Example
你
好

I don't know haskell, but from the code it seems to me that softBreakFilter is rather generic and could be adopted by the Org reader.

The pandoc version I'm using is

pandoc 1.19.2.1
Compiled with pandoc-types 1.17.0.5, texmath 0.9.4, skylighting 0.1.1.5

Thank you for making pandoc!

@jgm
Copy link
Owner

jgm commented May 28, 2017 via email

@wenxin-wang
Copy link
Author

wenxin-wang commented May 28, 2017

Here's a python filter that does the same thing (at least I hope), which could serve as a temporary solution. Please correct me if there's any mistake in the code.

@jgm
Copy link
Owner

jgm commented May 30, 2017

Here's a thought. Currently we do the transformation in the reader, so the AST has no SoftBreak elements between east asian characters.

This means that you can't meaningfully do --wrap=preserve with the east_asian_line_breaks option.

What if we removed the east_asian_line_breaks extension and instead had a fourth option for --wrap: --wrap=auto|asian|none|preserve.

This would operate in the writer, not the reader, and would simply omit softbreak elements (not replacing with space). (This could most conveniently be implemented by adding a filter right before the writer.)

I can think of one reason why one might want to keep this feature in the reader: certain constructions are sensitive to spaces (e.g. _ emphasis), and we might want the reader to act as if the line breaks between two east asian characters are not spaces at all. However, the current implementation (which is a filter AFTER parsing) does not do this. If we wanted this, we'd need to integrate the feature more directly in the parsing.

@jgm
Copy link
Owner

jgm commented May 30, 2017

I made commit 774075c as a first step.

@jgm jgm added this to the pandoc 2.0 milestone Jun 10, 2017
@jgm
Copy link
Owner

jgm commented Jun 27, 2017

OK, I now think my thinking above was confused. If you want to use --wrap=preserve, you can just leave off east_asian_line_breaks. The two don't need to be made compatible.

@jgm
Copy link
Owner

jgm commented Jun 27, 2017

New thought 1: we might want to combine the "treat soft breaks between east asian characters as if they were not there" behavior with either --wrap=preserve (preserving other soft breaks) or --wrap=auto. So maybe the thing to do is to add a new command line option, --asian-soft-breaks, which would simply cause soft breaks between asian characters to be ignored. This could be implemented with the current filter, applied in Text.Pandoc.App between the reader and the writer, so that it would affect all input formats. The east_asian_line_breaks option could then be removed.

New thought 2: or, we could keep east_asian_line_breaks and handle it intelligently in Text.Pandoc.App, as follows. If both the input and the output format have east_asian_line_breaks, or if neither do, do nothing. (Thus, soft breaks would be preserved, for example, on markdown -> markdown translation.) If input has east_asian_line_breaks and output doesn't, apply the filter to strip out SoftBreaks between the reader and the writer. If output has east_asian_line_breaks but input doesn't, do nothing.

I like New thought 2 best, I think. It would be simple to implement and wouldn't require any new options. The only odd thing is that the extension would be applied in Text.Pandoc.App rather than in the reader itself, which would mean it couldn't be used easily when the readers/writers are called as libraries. This might cause some confusion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants