Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handling of line breaks within RST inlines #4618

Open
danse opened this issue May 2, 2018 · 7 comments
Open

Handling of line breaks within RST inlines #4618

danse opened this issue May 2, 2018 · 7 comments

Comments

@danse
Copy link
Contributor

danse commented May 2, 2018

Inlines containing line breaks are converted to RST markup which seems to have different semantics depending on the position of the line breaks.

I assume that we can simply replace all newlines within inlines in the RST writer, but i could not find any mention to this case (newlines within inline markup) in the RST specification.

I tested using the docutils parser (via rst2html) and i found out that inlines containing:

  1. \n trigger a syntax error
  2. text\n get parsed as a paragraph without style
  3. \ntext trigger a syntax error
  4. text\ntext get parsed as a styled inline

i'm not sure about the correct methodology to follow here. should we use the code of the docutils parser as reference for the standard?

in my opinion it is sensible to replace all line breaks contained in inlines with spaces before writing RST. this way we will get rid of the syntax errors and handle the three strings consistently. this can easily be added to our inline transformation function that is already walking the inlines structure

@jgm
Copy link
Owner

jgm commented May 2, 2018 via email

@danse
Copy link
Contributor Author

danse commented May 2, 2018

i added numbers to the list of strings in order to refer to them more easily. i also added examples below every string description. yes, the case you mention is the number 4, a string where inlines are in the middle. as i wrote, that gets correctly parsed as a styled inline

@jgm
Copy link
Owner

jgm commented May 2, 2018

What I meant was to give full examples: did you have in mind this?

  1. *\n*
  2. *hi\n*
  3. *\nhi*
  4. *hi\nlo*

If so, then yes, in RST, you can't have whitespace after the beginning delimiter or before the end delimiter, which explains why 1-3 don't work but 4 does. (\n no different from space or tab in this respect.)

What would be examples of real pandoc conversions that gives one of these bad results.

@danse
Copy link
Contributor Author

danse commented May 3, 2018

sorry i added the full examples but i forgot to save the change in the comment, they were like yours.

we can consider line breaks like white space and remove them as part of stripLeadingTrailingSpace, that would be a solution to the syntax errors.

about the examples of real conversions, i can't track this back to the original document unfortunately, i just remember that we had an inline containing only a line break, and it was coming from a DOCX document. i can't produce a similar document: when i try to break a styled line it turns into two styled paragraphs. if you prefer, we can close this issue until the moment when a corresponding source document is provided

@jgm
Copy link
Owner

jgm commented May 3, 2018 via email

@danse
Copy link
Contributor Author

danse commented May 3, 2018

if i understand correctly, in the native format we want all whitespace to be encoded with the corresponding inline elements (LineBreak, Space, etcetera), so this is the case we want to handle.

about the idea of relocating breaks out of the inlines, what is the advantage in doing so, rather than handling them like all the other white space? handling this in a special way would add complexity to the code for a case which doesn't seem likely to happen often

@jgm
Copy link
Owner

jgm commented May 3, 2018 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants