-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Inline markup incompatibilities #44
Comments
Not-so-related bug, but also the result of this series of tests: jgm/pandoc#4820 |
From my understanding of Emacs Muse code, it allows line start, -, [, <, (, ', " and ` before the first "*" (or "=", or "_" which is underline and not supported by Amusewiki): Then it checks that the character after the corresponding "*" marker is not from the "w" class (letters and digits, basically): |
Here is another testcase:
(just fixed it in pandoc: jgm/pandoc@81131ef) |
Relevant tests are failing. Reference: #44
I'm willing to push this forward, but it's kind of low priority task as they are mostly corner cases. |
@melmothx I see you added the test for Amusewiki produces this completely wrong result:
In LaTeX:
|
That's simply the result of the autocorrection when there are open tags left. That's incorrect/random input anyway. Determining what's the correct output is tricky and/or arbitrary. Bottom line is: garbage in, garbage out. Just saying. |
I am also not sure whether we need bug-for-bug compatibility. |
Emacs Muse "parser" is simply a number of regexp rewrite rules with priorities, so trying to copy its behaviour is hard with a proper parser. Cases like Let's limit this issue to "what characters are allowed before and after *". |
@labdsf I've added a commit in a branch which fixes the case 2, which is clearly a Text::Amuse bug. Regarding the 4th case, the more I look at it, the less I think it's a bug in your and my code. For symmetry with |
Both in left-to-right and right-to-left texts the comma is followed by the space, so there is no need for symmetry. Forgetting a space after comma is a mistake. I also thought about symmetry first, see my description of case 2. But looking at the code of both Org-mode and Emacs Muse I can tell there is intentional asymmetry in both of them.
It does not only have a configuration variable, it also has asymmetric defaults that allow punctuation only in the end, and only opening parentheses in the beginning. It is not directly related to Muse, though. I looked into Org-mode only because I didn't find relevant code in Emacs Muse quickly. Now that I have looked into Emacs Muse, I am also sure the set of characters allowed before opening * instead of simple "any non-word character" is not a result of some random typo. It still makes sense to intentionally break compatibility with Emacs Muse here and document it as incompatibility. I do not like arbitrary hardcoded set of characters allowed before *, which does not even include "{". Something based on character classes is definitely better. |
Another incompatibility is that Emacs Muse allows whitespace before closing
Emacs Muse and pandoc (mostly accidentally) interpret it as
Amusewiki interprets it as
It is another case where I agree with Amusewiki. The reason for the bug is probably that Text::Amuse replaces each |
@link2xt Could you check this commit with the draft of the formalization, and see if we can agree on this? |
It is mostly ok, not whitespace on the inner side and not "word" on the outer side. But I would replace "word" with "alphanumeric". "Word", as defined by "\w" regexp, includes underscore. Example:
Emacs Muse:
Amusewiki:
Pandoc:
Why allow underscore after, especially if it is not allowed before? |
@link2xt Actually, in the branch we're talking about the outcome is
|
@link2xt So, the specification would be:
|
@melmothx |
@link2xt yes, of course, according to that specification, which is going to be added to the manual. |
@link2xt Changes (and some additional tests) are in. Are we done here? Can I release and update the doc? |
Looks good, thanks. |
Next version of pandoc will output lightweight markup when it can: jgm/pandoc@6ea6011 |
Excellent! |
…p elements See melmothx/text-amuse#44 for discussion on these rules
I want to make pandoc Muse writer generate lightweight inline markup instead of tags when possible. For that I need to understand when it is possible to use it.
I have tested various inline markup examples with Emacs Muse, Amusewiki, Pandoc Muse, Emacs Org and Pandoc Org. Org mode markup is very close to Muse markup, so I hope it can help with resolving some ambiguities. Here are the results:
I think the second case should not be parsed as emphasis. There is no reason for asymmetry with the first case.
The third case is obviously correct, because we may want to emphasize a word before comma. Normally there is a space after the comma, but there is no reason to check for it in the parser and make it more complicated. All parsers are consistent here by the way.
As for the fourth case I tend to believe it is a bug in both Text::Amuse and Pandoc Muse reader. In Org mode it obviously works as intended, because Emacs Org mode has a configuration variable org-emphasis-regexp-components which specifies different allowed pre and post characters. It makes sense to only check for punctuation in the end, because even in right-to-left text the commas are after the word and are followed by space. From Emacs Muse code it is not so clear whether it is intended or just happened to work this way, but there is no evidence that there is a bug, so I think making it more compatible with the Org mode and Emacs Muse at the same time is the right thing to do.
The text was updated successfully, but these errors were encountered: