-
-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Handling of line breaks within RST inlines #4618
Comments
I'm not sure what you're referring to exactly when you say
"inlines containing line breaks."
rst2html.py seems to allow, for example,
*hi
there*
as emphasis. Can you give some specific examples?
|
i added numbers to the list of strings in order to refer to them more easily. i also added examples below every string description. yes, the case you mention is the number 4, a string where inlines are in the middle. as i wrote, that gets correctly parsed as a styled inline |
What I meant was to give full examples: did you have in mind this?
If so, then yes, in RST, you can't have whitespace after the beginning delimiter or before the end delimiter, which explains why 1-3 don't work but 4 does. ( What would be examples of real pandoc conversions that gives one of these bad results. |
sorry i added the full examples but i forgot to save the change in the comment, they were like yours. we can consider line breaks like white space and remove them as part of about the examples of real conversions, i can't track this back to the original document unfortunately, i just remember that we had an inline containing only a line break, and it was coming from a DOCX document. i can't produce a similar document: when i try to break a styled line it turns into two styled paragraphs. if you prefer, we can close this issue until the moment when a corresponding source document is provided |
I'd definitely be interesting in knowing more about where this pops up
naturally, since it might indicate a problem in a reader.
When you say "line break", do you mean a LineBreak inline, or a Str
with a newline?
Probably the most correct thing to do is relocate these outside the
inline container:
[Emph [LineBreak, Str "hi"]] ==> [LineBreak, Emph [Str "hi"]]
|
if i understand correctly, in the native format we want all whitespace to be encoded with the corresponding inline elements ( about the idea of relocating breaks out of the inlines, what is the advantage in doing so, rather than handling them like all the other white space? handling this in a special way would add complexity to the code for a case which doesn't seem likely to happen often |
Francesco Occhipinti <notifications@github.com> writes:
if i understand correctly, in the native format we want all whitespace to be encoded with the corresponding inline elements (`LineBreak`, `Space`, etcetera), so this is the case we want to handle.
about the idea of relocating breaks out of the inlines, what is the advantage in doing so, rather than handling them like all the other white space? handling this in a special way would add complexity to the code for a case which doesn't seem likely to happen often
I'm okay with just stripping these out. But the other approach is one
we follow in some of the other writers/readers, if I recall correctly.
This way the space or line break doesn't just disappear.
|
Inlines containing line breaks are converted to RST markup which seems to have different semantics depending on the position of the line breaks.
I assume that we can simply replace all newlines within inlines in the RST writer, but i could not find any mention to this case (newlines within inline markup) in the RST specification.
I tested using the docutils parser (via
rst2html
) and i found out that inlines containing:\n
trigger a syntax errortext\n
get parsed as a paragraph without style\ntext
trigger a syntax errortext\ntext
get parsed as a styled inlinei'm not sure about the correct methodology to follow here. should we use the code of the docutils parser as reference for the standard?
in my opinion it is sensible to replace all line breaks contained in inlines with spaces before writing RST. this way we will get rid of the syntax errors and handle the three strings consistently. this can easily be added to our inline transformation function that is already walking the inlines structure
The text was updated successfully, but these errors were encountered: