Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix RST raw directive to allow file includes? #8584

Open
spollard opened this issue Jan 28, 2023 · 7 comments
Open

Fix RST raw directive to allow file includes? #8584

spollard opened this issue Jan 28, 2023 · 7 comments

Comments

@spollard
Copy link

The reStructuredText raw directive can take an option named :file: with a filename value, in which case the contents of the filename are passed in as the body of the directive (https://docutils.sourceforge.io/docs/ref/rst/directives.html#raw-data-pass-through), but only if the output format on the raw directive is the same as the output format that the document is actually being converted to.

At the moment, I don't think the RST parser's raw directive takes into account the output format (See https://github.com/jgm/pandoc/blob/main/src/Text/Pandoc/Readers/RST.hs#L665). I agree with this comment (#2716 (comment)) that the readers shouldn't know the output format or behave differently for a given output format, so I think this is fine.

I think the RST parser only works with raw content directly in the body of the raw directive, but not read in from a separate file. I don't think it would be too difficult to implement this, but my Haskell is rusty and I bet somebody else knows how to 1. detect if the :file: option is part of the raw directive 2. read the contents of a file and 3. include the contents as the body of the raw directive.

My guess is the solution might be using insertIncludedFile or a stripped out version of the includeDirective (https://github.com/jgm/pandoc/blob/main/src/Text/Pandoc/Readers/RST.hs#L456). Looks like you can detect the :file: field similarly to this let startLine = lookup "start-line" fields >>= safeRead from the includeDirective.

Steps to reproduce:

  1. Run rst2html including.rst and see that Included has been included (docutils must be installed)
  2. Try running pandoc including.rst and see that Included is not there

Files
include_me.html

<div>Included</div>

including.rst


Including
============

.. raw:: html
   :file: include_me.html

pandoc 2.7.3 on Ubuntu 20.04
Compiled with pandoc-types 1.17.5.4, texmath 0.11.2.2, skylighting 0.8.1

@spollard spollard added the bug label Jan 28, 2023
@jgm
Copy link
Owner

jgm commented Jan 28, 2023

You're using a version of pandoc that was released over 3 years ago. Please try with the latest before reporting a bug.

@spollard
Copy link
Author

spollard commented Jan 28, 2023

Good call. I just got the same result using version 3.0.1.

@tarleb
Copy link
Collaborator

tarleb commented May 5, 2023

@spollard would you still be interested in coding a fix for this issue? I might be able to help out, e.g. by pairing (if time permits).

@spollard
Copy link
Author

I may have time to take a crack at it soon.Thanks for the reminder! I'll start by trying to implement what I described in the paragraph starting with "My guess...", unless you have a better idea.

@spollard
Copy link
Author

My Haskell-fu isn't the greatest at the moment, and I saw there were many steps to setting up the dev environment, so I just sketched out the possible solution. I didn't even check that it compiled, because I'm almost positive it won't.
The logic should all be there except for how to parse the file as a raw block. Would somebody else who has a functioning dev env and a functioning Haskell brain clean it up and try it out?

@frasertweedale
Copy link
Contributor

frasertweedale commented Jul 26, 2023

This feature should be guarded by a configuration option and disabled by default. Parsing untrusted inputs or inputs supplied by users/attackers could result in exposure of confidential data (access local file via file or internal URLs via url). The exact impact would be determined by how the Writer handles it, up to complete exposure if the Writer passes the content through unchanged, or if the format of the targeted file/URL matches the output format.

The already implemented csv-table directive support (function csvTableDirective) is a similar concern, but the targeted file/URL content has to parse as CSV.

@jgm
Copy link
Owner

jgm commented Jul 28, 2023

@frasertweedale - see item 2 under Security in the manual. We already support include in other contexts in RST. The solution, if you are processing untrusted data or using pandoc in a server environment, is to use the --sandbox feature which will reliably disallow any IO in readers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants