Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rst code blocks with language tag not converting correctly to gfm #5204

Closed
mMerlin opened this issue Jan 8, 2019 · 4 comments
Closed

rst code blocks with language tag not converting correctly to gfm #5204

mMerlin opened this issue Jan 8, 2019 · 4 comments

Comments

@mMerlin
Copy link

mMerlin commented Jan 8, 2019

Fedora 27 4.18.19-100.fc27.x86_64
pandoc 2.5 «installed from tarbell»
Compiled with pandoc-types 1.17.5.4, texmath 0.11.1.2, skylighting 0.7.4

Searching through existing issues #721, #4523 look to be related. Sorry about the length of this, but the variation details could important.

Are there multiple flavours of rst, as there for markdown? This is a simplified extract from an rst document from a bitbucket wiki page.

.. sourcecode:: python

    import thonny
    thonny.THONNY_USER_DIR = "H:\\home\\.thonny"

passing that through pandoc -s --to=gfm --from=rst -o test.md test.rst give test.md as

<div class="sourcecode">

python

import thonny thonny.THONNY\_USER\_DIR = "H:\\home\\.thonny"

</div>

I would expect

```python
import thonny
thonny.THONNY\_USER\_DIR = "H:\\home\\.thonny"
```

rst is not a format I have worked with, but doing some research, I could not find an rst structure that pandoc would convert to the expected markdown. Here is a bit of an extended case in sourcecode.rst

.. sourcecode:: python

    import sys

.. code:: python

    import sys

.. code-block:: python

    import sys

.. codeblock:: python

    import sys

and pandoc -s --to=gfm --from=rst -o sourcecode.md sourcecode.rst give sourcecode.md as

<div class="sourcecode">

python

import sys

</div>

``` sourceCode python
import sys
```

``` sourceCode python
import sys
```

<div class="codeblock">

python

import sys

</div>

none of which looks like the expected markdown block. In fact, round trip does not work. Starting from block.md

```python
import sys
import os
```

``` python
import sys
import os
```

converting with pandoc -s --to=rst --from=gfm -o block.rst block.md gives block.rst as

.. code:: python

   import sys
   import os

.. code:: python

   import sys
   import os

and pandoc -s --to=gfm --from=rst -o block1.md block.rst gives block1.md as

``` sourceCode python
import sys
import os
```

``` sourceCode python
import sys
import os
```

Am I missing something obvious? rst is new to me, but gfm I have done a lot of work with.

@mb21
Copy link
Collaborator

mb21 commented Jan 8, 2019

You can also use -t native to see the intermediate representation.

But basically it seems to boil down to whether the RST reader should add the sourceCode class to the intermediate AST. I would say the answer is no (for both .. code:: and .. sourcecode::), because the markdown reader doesn't do it either.

@mMerlin
Copy link
Author

mMerlin commented Jan 8, 2019

Plus .. codeblock:: and ..code-block::?

Or is it that the markdown (gfm) writer should be ignoring the class? That would fix 2 of the 4 sample cases as well. The other 2 are coming through as embedded html, to they have more problems.

Assuming they are valid rst of course.

@jgm
Copy link
Owner

jgm commented Jan 8, 2019

Looks like sourcecode is an undocumented synonym for code directive:
https://github.com/docutils-mirror/docutils/blob/e88c5fb08d5cdfa8b4ac1020dd6f7177778d5990/docutils/parsers/rst/languages/en.py

@jgm
Copy link
Owner

jgm commented Jan 8, 2019

As for sourceCode class:

% ack sourceCode src
src/Text/Pandoc/Readers/LaTeX.hs
1826:       (codeBlockWith ("",["sourceCode","literate","haskell"],[]) <$>

src/Text/Pandoc/Readers/Markdown.hs
751:  (return . B.codeBlockWith ("",["sourceCode","literate","haskell"],[]) <$>
753:    <|> (return . B.codeBlockWith ("",["sourceCode","haskell"],[]) <$>

src/Text/Pandoc/Readers/RST.hs
424:  return $ B.codeBlockWith ("", ["sourceCode", "literate", "haskell"], [])
998:          classes' = "sourceCode" : lang
1417:    "code" -> return $ B.codeWith (addClass "sourceCode" attr) contents

src/Text/Pandoc/Writers/RST.hs
291:                     c `notElem` ["sourceCode","literate","numberLines",

Note that skylighting adds the sourceCode class in HTML. I think we should modify the readers not to add it (and we could tell the RST writer not to ignore it).

@jgm jgm closed this as completed in 230e07d Jan 8, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants