Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

idea: include files (and csv tables) #553

Open
anton-k opened this issue Jun 29, 2012 · 108 comments
Open

idea: include files (and csv tables) #553

anton-k opened this issue Jun 29, 2012 · 108 comments

Comments

@anton-k
Copy link

anton-k commented Jun 29, 2012

As far as I understand pandoc can process several files
in one way only. You have to list them in the command line. There is
a solution to simulate include files with scripting. It's indicated
in the pandoc's official guide.

Markdown is a tiny language. We should keep it small. So here is an idea
of how to simulate latex's input command without extending Markdown syntax.
We can overload include image construction. If a file has an image extension,
than it's treated as an image, but if it's .txt, it can be treated as Markdown:

![Show me if there is no such a file](subfile.txt)

I've come to this idea while thinking about long tables.
Imagine that someone is writing a research report. There are long
tables produced by an algorithm. Tables are saved in some
standard format for tables, for example CSV. And then user can write

![So it goes](table.csv)
@daroczig
Copy link

Not sure if this would fit in Pandoc's goals as being a "universal document converter", but you can do this easily with some wrapper around Pandoc. This would of course require some technical skills from you, but you would gain much more then the above suggested features.

There are a bunch of programming tools in Pandoc extra, from which I know (and develop) pander. You could easily write a simple brew file which would compile a list of all images in a directory and create a link for those and also reading csv files and printing would be not problematic too (e.g. putting a simple read.table(foo) in a brew chunk between <%=...%> tags.

I hope you would find this useful.

@michelk
Copy link
Sponsor

michelk commented Jul 5, 2012

anton-k, I like the idea; had something similar in mind, when writing
a technical report recently. File-extension-dependent inclusion would be a nice
pandoc-extension for markdown.

Another idea related to that:

It would be great to have more general support for literate programming.

Currently I use the R-knitr-package
for mixing programming languages in technical reports; as an example see
https://github.com/yihui/knitr/blob/master/inst/examples/knitr-lang.Rmd.

Using pandoc directly with a file format, say lmd -- for literate markdown,
would facilitate the workflow considerably.

In that sense knitr works pretty well: You could include different languages
eg

``` {r test-r, engine='R'}
set.seed(123)
rnorm(5)
```

Unfortunately haskell is currently not included like eg

``` {engine='ghc'}
[x^2|x <- [1..10], x > 3 ]
```

With that in mind writing tutorials with REPLs like ghci, irb, R would be
more pleasant.

@jgm
Copy link
Owner

jgm commented Nov 4, 2012

This could be done easily using the techniques described in the scripting documentation.

@michelk
Copy link
Sponsor

michelk commented Nov 7, 2012

I opened a seperate issure #656 for it.

@jcangas
Copy link

jcangas commented Jun 19, 2013

Hi. I'm also looking for this feature :)
I found Marked.app has a nice extension:

  <<[Code title](folder/filename)

Same syntax is also supported by Leanpub system. I think some "include feature" is a must if you write large text in markdown.
Now I'm using Marked.app with Custom Markdown processor configured for pandoc, so I can include files that include files and so on. Very useful if you are writing a little book with code source samples :). But is a bit tedious need printing to PDF from the Marked.app. Having this feature in pandoc will allow for command line automation :)

@thewatts
Copy link

@jcangas -> it looks like ThoughtBot has done this before, based on looking at the raw markdown files from their Backbone on Rails book.

@jasonm was the person who worked on the project.

@jcangas
Copy link

jcangas commented Jun 26, 2013

@thewatts, Thanks for the clue. It is very easy to follow the "do your self" way, of course: I have a bit of Ruby that does the magic. But I see value in it as a standard feature with a standard syntax a no need for externals tools...

@thewatts
Copy link

Found what they use - they have a rakefile that will take and parse the <<[Code title](folder/filename) code, and then add it into the main file.

@dloureiro
Copy link

There is gpp in Pandoc Extras mentionned by @daroczig that can be used to include file directly (gpp is a gcc-like preprocessor) and much more. It provides a syntax to preprocess files and execute commands and the file inclusion could be achieved through #include <include ..> or even \include directives depending on the mode you select.
I'm currently working on a python wrapper aiming at using gpp to preprocess special commands in a markdown file before providing it to pandoc (things like file inclusion, code inclusion, color, underline, etc).
I will soon put it on github and if people are interested in such a wrapper I will add some more info about it.

@jcangas
Copy link

jcangas commented Jun 26, 2013

@thewatts I also have a rake file doing the same thing :). Well, mine is recursive also. I copy here so it can help others

# yields every line. Assume root_dir & file are Pathname objects
def merge_mdown_includes(root_dir, file, &block)
  file.each_line do |line|
    if line =~/(.*)<<\[(.*)\]$/
      incl_file = root_dir + $2
      yield $1 if block
      merge_mdown_includes(root_dir, incl_file, &block)
    else
      yield line if block
    end
  end
end

# hin about use previous routine:
merge_mdown_includes(root_dir, file) do |line|
   output_file.puts line
end

@nichtich
Copy link
Contributor

Instead of adding another preprocessing syntax on top of Pandoc Markdown I use the following syntax to include files:

`filename.md`{.include}

one could also extend this to:

~~~ {.include}
filename.md
~~~

This way the inclusion syntax can act on the abstract syntax tree (AST) of a Pandoc document - one can get the same result from HTML like this (HTML -> Markdown -> Markdown with inclusions -> Target format):

<code class="include">filename</code>

Here is a small hack in form of a Perl script that I use by now.

while(<>) {
    if (/^`([^`]+)`\{\.include\}\s*$/) {
        if (-e $1 && open my $fh, '<', $1) {
            local $/;
            print <$fh>;
            close $fh;
        } else {
            print STDERR "failed to include file $1\n";
        }
    } else {
        print $_;
    }
}

The final implementation should work on the AST as well to allow inclusion inside other elements, for instance:

* `longlistitem.md`{.include}

@mdengler
Copy link

mdengler commented Mar 2, 2014

@nichtich Nice idea; converted to python and combined with Makefile:

# Makefile fragment

%.pdf : %.md
    cat $^ | ./include.py | pandoc -o $@
#!/usr/bin/env python

import re
import sys                                                                                                     
include = re.compile("`([^`]+)`\{.include}")
for line in sys.stdin:
    if include.search(line):
        input_file = include.search(line).groups()[0]
        file_contents = open(input_file, "rb").read()
        line = include.sub(line, file_contents)
    sys.stdout.write(line)

@mpickering
Copy link
Collaborator

See also this discussion on the mailing list.

@mb21
Copy link
Collaborator

mb21 commented Jul 13, 2015

And here's my take on a Haskell filter that includes CSV's as tables: pandoc-placetable

@adius
Copy link

adius commented Nov 9, 2015

File extension dependent overloading of the image inclusion is a great idea!
Would love to see it implemented!

@steindani
Copy link

I've written a basic Pandoc filter in Haskell that could include referenced Markdown files recursively, meaning the nested includes are also included. (Although only 2 levels deep, for now.) Take a look:

https://github.com/steindani/pandoc-include

To include one or multiple files use the following syntax:

```include
chapter1.md
chapter2.md
#dontinclude.md
```

K4zuki added a commit to K4zuki/yet-another-ble-chip that referenced this issue Jan 12, 2016
@ickc
Copy link
Contributor

ickc commented Nov 10, 2016

Hi, @mpickering, may I ask what's the status on this? Are there any branch that has work-in-progress (to see if anything to help)?

I think there are a few different categories of file extensions that can be included:

  1. those file extensions associated with pandoc readers: this allow including multiple different sources in the markdown source. e.g. ![](file.docx) would actually use the pandoc docx reader to read it into AST and include at the position.
  2. RawInline: some might not want the pandoc readers to read it though. So e.g. ![](file.tex){RawInline="true"}, ![](file.html){RawInline="true"}, will include the raw TeX and raw HTML at the position.
  3. CodeBlock: ![](file.md){CodeBlock ="true"}, ![](file.py){CodeBlock="true} would include the files as a code-block.
  4. csv: e.g. pandoc-placetable
  5. media: audio/videos files.

@HaoZeke
Copy link

HaoZeke commented Nov 9, 2017

Is this feature still under development? This would allow a complete replacement most static site generators..

@tarleb
Copy link
Collaborator

tarleb commented Nov 9, 2017

I don't think anybody is working on this. My personal opinion is that this is out of scope, as the increase in complexity seems not worth it.

A solution for CSV exists with pandoc-placetable. If one does not want to install additional binaries, pandoc 2 makes it easy achieve most of what was suggested here via lua filters. E.g., the below filter would replace an figure with its Markdown content if an image has class "markdown". This is fully portable and doesn't require extra software other than pandoc.

function Para (elem)
  if #elem.content == 1 and elem.content[1].t == "Image" then
    local img = elem.content[1]
    if img.classes[1] == "markdown" then
      local f = io.open(img.src, 'r')
      local blocks = pandoc.read(f:read('*a')).blocks
      f:close()
      return blocks
    end
  end
end

@ickc
Copy link
Contributor

ickc commented Nov 10, 2017

Is this feature still under development?

Do you mean include files or table? Apparently 2 different (related) issues are mentioned here.

I think the reason why it's been taking so long is mainly not because of the difficulty/feasibility to include files, but about the question of if this should be included in pandoc, and how it should behaves (e.g. recursive?).

e.g. @jgm has an pandoc-include example in the tutorial in writing pandoc filters, and has been distributed in pandoc-include: Include other Markdown files. And there's also panflute filter doing so. So does it needed to be done in pandoc?

This would allow a complete replacement most static site generators..

Having a better template system is more important than having native pandoc-include in this aspect. I remember there's an issue about this. try searching it and see if you have any comments/suggestions there.

@HaoZeke
Copy link

HaoZeke commented Nov 10, 2017

pandoc-include is built against pandoc 1.19 , so the newer syntax is not parsed correctly..
eg. Div classes via ::::{.class} ::::

Currently my workaround is to use paru-insert.rb but it's really rather slow, pushing my build times up by 10s just to include 3 partials..

@cagix
Copy link
Contributor

cagix commented May 15, 2021

I'd rather stay in the (pandoc) markdown "language" and would argue not to use "alien" syntax like comments from other languages and also not to introduce even further syntax like <[](){} ...

From a user's perspective I'd like to use the same syntax used to include images, i.e. ![alt](path){attrs}, as this already is used to include things. Using empty markdown divs (::: {.include path} ... :::) could work, but that's quite boilerplate code ... Maybe a span could be an alternative to overloading the image links: [path]{.include attrs}?

@jgm
Copy link
Owner

jgm commented May 15, 2021

OK, I have to retract my earlier comment about using {=markdown} raw blocks. That actually makes no sense at all. That format is through passing through markdown to the output format (if it is compatible); what we want is entirely different, including some markdown in the input stream to be parsed. So never mind all that, I guess I needed more sleep!

@alerque I'm not convinced of the need for inline-level includes. Maybe you could explain what this would be useful for?

@encodis YAML metadata is currently allowed anywhere in a pandoc Markdown document (though commonmark+yaml_metadata_block only supports it at the beginning). Metadata in included files would be incorporated in the same way that multiple metadata blocks in a single document currently are.

As for globs, I'd prefer not to support that.

@alerque and @cagix -- I think the "alienness" of the // syntax is part of what attracts me to it. Using image syntax, or something else that reads similarly, seems wrong to me because includes are fundamentally different from other syntactic elements in Markdown. Other syntactic elements denote elements in the AST (e.g., an ![](...) will get you an Image in the AST), but an include doesn't denote any particular AST element. It is, rather, a processing instruction that tells the parser to insert some more text into the stream of characters to be parsed. For that reason, I think that a syntax that stands apart from existing Markdown constructions is preferable.

As for the more particular objections raised to //:
Yes, many languages use // for comments. But many languages use # for comments, and Markdown uses it to mark headings. I don't think people tend to get confused about that, once they get used to it.
Spaces in filenames could be handled quite easily:

// ./My Documents/My File with spaces.md

That said, I'm not wedded to //. Something like #include could also work, or maybe %%. Requiring quotes around filenames with spaces is another option.

@gabyx
Copy link

gabyx commented May 15, 2021

... 2nd can have a problem on file ordering (even if alphabetical, different file system can sort them differently.)

Any decent glob library should provide a hard spec on how the resulting list is sorted no matter the file or os system, shouldnt it?

@jgm The automatic shifting of heading levels turns a level 2 heading in the include file under a level 3 heading in the original file into a level 5 heading.

Maybe inline includes are way more complicated than I could imagine, however imagine an include in a cell table, that would be really cool, but ok out of scope here probably.

Because modifying input stream is a fundam. diff. thing, we are talking about parser command syntax, what about unicode emojis:

⚙️📥 text.md

😵

@cagix
Copy link
Contributor

cagix commented May 16, 2021

Using image syntax, or something else that reads similarly, seems wrong to me because includes are fundamentally different from other syntactic elements in Markdown.

From a CS / compiler construction perspective, I wholeheartedly agree. However, from a user's perspective, it would be really helpful not to introduce further syntax for this and keep the language (Pandoc Markdown) lean.

Edit: A random thought: Since including a figure creates a new AST node, wouldn't it be conceivable to treat including Markdown files the same way? I.e. an include on a Markdown file first creates just a new AST node (analogous to the image), which could then be resolved in a second pass and replaced by the AST for the parsed Markdown file? This could work recursively if a file is in turn included in the included Markdown. You would just have to remember that you need another pass ...

@jgm
Copy link
Owner

jgm commented May 16, 2021

Since including a figure creates a new AST node, wouldn't it be conceivable to treat including Markdown files the same way? I.e. an include on a Markdown file first creates just a new AST node (analogous to the image), which could then be resolved in a second pass and replaced by the AST for the parsed Markdown file? This could work recursively if a file is in turn included in the included Markdown. You would just have to remember that you need another pass ...

We already have filters that can do just this. But note the limitations: if we parse the included content separately, then footnotes and link references defined elsewhere in the document can't affect it. What we really need is just a way to include a file in the input stream.

@jgm
Copy link
Owner

jgm commented May 16, 2021

From a CS / compiler construction perspective, I wholeheartedly agree. However, from a user's perspective, it would be really helpful not to introduce further syntax for this and keep the language (Pandoc Markdown) lean

I don't know. I think that from a user's perspective, it's helpful if different syntax is used for different things, rather than overloading several very different functions on one syntactic element (image syntax), or having two very slightly different syntactic elements (<[], ![]) that do VERY different things.

@ickc
Copy link
Contributor

ickc commented May 16, 2021

I agree the include syntax should be alien and block level. (Including something very short that should be online seems won’t be a common pattern.)

Also agree that it shouldn’t be parsed, include should just include things. (More like LaTeX input command.) So it is more a pre-processor than AST parser (like other existing include solutions.) Anything more complicated is doable from filter.

I guess we should discuss about the primary use case for this to be fruitful. And I think the primary use case is for people to write longer form of document that they want to “modularize” their document to make it more tractable. If this is true then we can assume people already is typing for example the headers in the correct level and don’t need to be able to specify how you want to change that.

If you keep it very general, there’s going to be a performance penalty (say parse doc and walk “filter” and repeat) in addition to possibly more complicated syntax (link-like syntax vs just {{ file }}.)

[Also we may need to think about how to “escape” the include. Say you settle for a certain block level syntax (Ie in its own line) that defines include. What happen in the same document I want to have a code-block, perhaps illustrating the include syntax, then how can we tell pandoc not to expand that here? (Code block won’t work as it is not parsed yet.) may be to solution is very simple, that we should just escape the sequence: {{ file }}.]

@jgm
Copy link
Owner

jgm commented May 17, 2021

@ickc It's a processing instruction, not a preprocessor -- we're parsing the document as we go, so we can still tell the difference between the include directive in a code block and one outside of one. No worries there.

@ickc
Copy link
Contributor

ickc commented May 17, 2021

Oh, that's interesting. But wouldn't people expect it to be included in the code-block as well (i.e. include syntax inside code-block would still be interpreted as include)? I think people used to include being a pre-processor (as all existing implementation does AFAIK) might actually be surprised by it.

I think this is superior and may need to promote it a bit in the documentation. (Just a few sentences and may be 2 examples, one for telling people would they should do if they want include in code-block with the other syntax above.)

@jgm
Copy link
Owner

jgm commented May 17, 2021

For includes in code blocks, you'll need to use the different syntax, probably:

``` {include="foobar.hs"}
```

Of course, if you'd rather have a genuine preprocessor (the sort that doesn't pay attention to the Markdown syntax), you can always use m4 or another preprocessor in a pipe before pandoc.

@gabyx
Copy link

gabyx commented May 17, 2021

If we introduce here processing directives, we should have a look at Asciidoc, and how they do it. We should agree to a uniform syntax together with its escaping rules. File transclusion might not be the only thing we might have in the future:

Asciidoc has the following preprocessing macros (they call it) relevant to this discussion:

  • include::path.adoc[leveloffset=offset,lines=ranges,tag(s)=name(s),indent=depth,opts=optional] Link
  • ifdef::[...] and endif::[...] Link
  • ifeval::[...] Link

The asciidoc include syntax also acts on input stream level and has no awarness of the document (however having some nice IMO needed features such as leveloffset (suppose its the same as heading-offset)).

The question is: can such parser directives be standardized? Since that would mean any decent parser can read these, act on them or ignore them if not known. How do we separate parser directives, with what character(s)?

#<directive> <expr> { ...options... }
\<directive> <expr> { ...options... }
%<directive> <expr> { ...options... }
{{ <directive> <expr> {... options ...} }}

or if we want more alianate things why not

  • ⚙️📥 "text.md" {...options...}
  • parser::include "text.md" {... options ...}

IMO #<directive> beats them all, with escaping like \#<directive>

@nichtich
Copy link
Contributor

See also Subtext syntax (a rough subset of markdown) uses & for transclusion links:

& example.csv
& https://example.com

Would include file example.csv or document at https://example.com.

@zaxtax
Copy link

zaxtax commented May 22, 2021

I like the {{ foo.hs }} syntax as it resembles Hugo shortcodes and I'd love to have that syntax for my own pandoc filters.

@gabyx
Copy link

gabyx commented May 23, 2021

Current summary: #553 (comment)

@phispi
Copy link

phispi commented Oct 21, 2021

An other already existing solution to include external files (without adding syntax) is codebraid (actually intended to include auto-generated content to Markdown):

```{.python .cb.run}
with open('chapter01.md') as fp:
    print(fp.read())
```

To convert it to any output format, use something like the following:

codebraid pandoc main.md --to markdown

CSV or Excel tables from external sources can be included as well:

```{.python .cb.run}
import pandas as pd
table = pd.read_csv('table.csv')
print(talbe.to_markdown())
```

@ickc
Copy link
Contributor

ickc commented Oct 21, 2021

This is nothing new, basically a filter that execute code. (There are filters that already does this.) Note the huge security risk you're exposing yourself in though, just know what you're signing up for, and only run it on documents you trust.

This is a digression to this thread however, which asks for a native include syntax.

@phispi
Copy link

phispi commented Oct 22, 2021

Thanks for emphasizing the security concern. The reason why I mentioned codebraid in this thread was (a) I hoped to support the discussion whether such an "include" feature is needed at all by contributing to "what alternative solutions already exist" (in addition to the filters that were already mentioned) and (b) the URL to this discussion page is wide-spread in the internet (stack overflow, ...) related to the topic "include external files in pandoc markdown" and people are searching for ways to do so here. Still, I'm sorry if I cluttered the discussion with my posts - feel free to delete them. ;-)

@jgm
Copy link
Owner

jgm commented Oct 22, 2021

I think it's fine to mention workarounds like this here, where people who search for this issue will find them.

@ickc
Copy link
Contributor

ickc commented Oct 23, 2021

@phispi, nothing wrong with that, may be I should put it in another way:

Include using a filter approach is solved, there's many solutions doing just that already. And the example you gave while does the job, is

  1. more general than needed, (so general such that it has security concerns for documents)
  2. involve non-markdown-ish syntax, (as basically you are writing code rather than declaring syntax)

it can be seen as the following 3 different levels:

  1. (very bad) filter solution that allow arbitrary code execution (don't get me wrong, executable document a.k.a. literate programming has its place. Just overkilled for this purpose.)
  2. (OK) a typical filter that declare syntaxes/conventions and does the IO behind the scene, smaller security concern depending on how well it is written, it probably can "steal" arbitrary files that the user has permission to read. (This alone wouldn't be a problem unless you publish a rendered document mindlessly...?)
  3. (best) a native pandoc syntax and implementation (<- which is what this thread is about.)

Edit: see these comments for the limitation of a filter approach and how a native approach can do better:

#553 (comment)
#553 (comment)
#553 (comment)

@gabyx
Copy link

gabyx commented Oct 23, 2021

@ickc Thanks for the good summary: I am wondering if we could push point 3 towards a agreeable solution:

Summary: #553 (comment)

Conclusions so far:
#553 (comment)

  • include syntax for structuring documents into different files,
  • which acts on block-level
  • and which is a processing instruction not a preprocessor one
  • and which leaves room for additional processing instructions beeing added without inventing another alianate different syntax (#if)
  • e.g. #include "text.md" {format=commonmark}

Hope I did not miss any good objections mentioned so far…

@gabyx
Copy link

gabyx commented Jan 19, 2022

Any progress on this?

@ickc
Copy link
Contributor

ickc commented Aug 27, 2022

Just to mention one more syntax I see in the wild:

Obsidian uses ![[filename without extension]] syntax to transclude, a natural extension of the wiki link syntax [[filename without extension]] to link to a file. For the reason behind this, they documented it here:

@WovenTales
Copy link

WovenTales commented Oct 15, 2022

From the point of view of someone who writes Markdown but isn't familiar with the Pandoc internals, the image-like syntaxes are by far my favorite (in Markdown itself I don't see any conceptual difference between instructing my renderer to include an image and it including the contents of a text file, but I recognize that most output formats are going to care about the difference). Whatever way it goes, the two considerations I find most important are:

  1. It's distinct enough to not conflict with normal writing (// path/file.md seems very easily prone to someone unknowingly having two slashes before an unrelated path-like string -- replicating a C-style comment, maybe -- and suddenly getting a whole lot of text they don't want in the output; the Subtex & is even worse).
  2. It has some graceful fallback for Markdown renderers/editors which don't speak the pandoc flavour (:[]()/<[]() is perfect since it just becomes a standard link, which might not be what the author was wanting but puts it only a single click away).

I also do see some benefit in making it an inlineable syntax, at least as far as I understand the AST from reading this thread. Specifically, I'm looking at having some file I wrote normally but want to include it into a blockquote: > <[note text](path/to/note.md). It might not be particularly common and I agree that including something into the middle of a sentence feels a bit weird, but the table cell/quotation cases seem logical enough to support. If that doesn't require "inline" as y'all are meaning it, though, that's perfectly fine as well.

@ickc
Copy link
Contributor

ickc commented Oct 15, 2022

Your fallback point is good point, some markdown extensions has this feature too such as the definition list.

The last feature you mentioned seems to be hard to define. Logically it is like nesting an arbitrary data structure inside another, but the AST is not like that. e.g. should we understand heading as nesting, and (perhaps optionally) indent the heading level inside the outer heading? Or e.g. in the case of <[](), should it considered to be a code-block of <[](), or having the content of the file as a code-block? (the former makes more sense, but this feature can invites people to think they can expect the latter behavior.)

@WovenTales
Copy link

I figured it would get more complicated in the AST than it is conceptually. If it's not easily possible, that's perfectly fine and I'd not miss it too badly. I brought that up mostly just to lay my full wishlist on the table; it's really only the first two points that I'd personally consider critical. Still, I'm not involved in implementing anything about this, so I don't have as much of a say as y'all who are.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests