Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Supporting pandoc 2.11 #142

Closed
ickc opened this issue Jul 2, 2020 · 67 comments
Closed

Supporting pandoc 2.11 #142

ickc opened this issue Jul 2, 2020 · 67 comments

Comments

@ickc
Copy link
Collaborator

ickc commented Jul 2, 2020

There's a change in pandoc AST since 2.10. c.f. jgm/pandoc#1024, jgm/pandoc#6277 and https://github.com/jgm/pandoc/releases/tag/2.10

The new Underline is probably easy. The change of AST regarding table is huge (jgm/pandoc#1024) including support for span.

Edit: Links on new table AST

@sergiocorreia
Copy link
Owner

Thanks! This is indeed an important change.

First of all, I feel like I should just add support for id/attrib/classes to all elements, because we seem to be converging towards that.

For the tables, is there an actual summary of the changes of the new AST? That issue thread looks huge

@ickc
Copy link
Collaborator Author

ickc commented Jul 2, 2020

@ickc
Copy link
Collaborator Author

ickc commented Jul 2, 2020

I feel like I should just add support for id/attrib/classes to all elements, because we seem to be converging towards that.

IIRC @jgm is reluctant to that idea. I don't think it will really happen, although many might ask for it. The solution seems to be just wrap whichever element one want to have attributes by a div and do it there.

@jgm
Copy link

jgm commented Jul 3, 2020

If I were designing afresh, I'd have attributes on everything. But it would be quite a big change now.

@sergiocorreia
Copy link
Owner

sergiocorreia commented Jul 6, 2020

@ickc by any chance do you have on hand any input file I can use to test the recent changes? I was initially thinking about having a markdown file with a complex table, but I understand that although the AST changed, the readers/writers haven't (yet).

So, are you aware of a way to receive and feed data into pandoc? One option is to try using a filter and seeing if pandoc accepts it, but seems a bit fragile.

@ickc
Copy link
Collaborator Author

ickc commented Jul 8, 2020

I think may be the best way to obtain a test file would be starting from https://github.com/despresc/pandoc-types/blob/09cb4314010365abc4512c2363b83711c92ac18b/test/test-pandoc-types.hs. Here's the diff leading to this: https://github.com/jgm/pandoc-types/pull/66/files?file-filters%5B%5D=.hs#diff-01f4ffe52cf097ab9ff89eebce394c86. There's a JSON AST there that might be useful.

Other than the JSON, pandoc native AST could be a start. For short tables it is still manageable by hand.

@ickc
Copy link
Collaborator Author

ickc commented Jul 8, 2020

Add some comments related to chdemko/pandoc-numbering#18 (comment)

According to https://github.com/jgm/pandoc-types/pull/66/files#diff-8c56b05df299ade82ea6265ee0b71bd0, these are added:

, Caption(..)
                               , ShortCaption
                               , RowHeadColumns(..)
                               , Alignment(..)
                               , ColWidth(..)
                               , ColSpec
                               , Row(..)
                               , TableHead(..)
                               , TableBody(..)
                               , TableFoot(..)
                               , Cell(..)
                               , RowSpan(..)
                               , ColSpan(..)

and TableCell is removed.

Then later,

		 -- | A short caption, for use in, for instance, lists of figures.
 type ShortCaption = [Inline]

  -- | The caption of a table, with an optional short caption.
 data Caption = Caption (Maybe ShortCaption) [Block]

So the ShortCaption is an Inline and Caption is block element.

@dilawar
Copy link

dilawar commented Jul 19, 2020

Is there a beta version for this feature? I have pandoc-2.10 installed and if possible if would rather not downgrade.

@NMarkgraf
Copy link

As I can not see any commits toward a solution to this problem, is there any way we could help?

@aubertc
Copy link
Contributor

aubertc commented Jul 27, 2020

The reasons why this is more challenging than thought have been sketched by @sergiocorreia , one of panflute's author, at chdemko/pandoc-numbering#18 (comment) . If you find a good solution to the absence of test, I believe this could be extremely useful. You can also, probably, "hack" something, but there will be no guarantee that it will work on the long run (that is, when writers / readers will change too).

@NMarkgraf
Copy link

Okay, but so panflute is currently unusable ... I find that is also not a good idea...

@ickc
Copy link
Collaborator Author

ickc commented Jul 28, 2020

To everyone who complain, Bare in mind you need to maintain your whole dependency stack. If you depends on panflute and panflute depends on pandoc<=2.9 for now, don't upgrade your pandoc and then complain. All of these open source developments are out of everyone's free time and is uncompensated.

This is really nothing out of ordinary whenever software libraries are used. Remember this whenever you want to spam any repos.

@fralau
Copy link

fralau commented Sep 10, 2020

@sergiocorreia Clearly this is not your fault. The pandoc team thought they were doing something right by adding merged cells, but they did not anticipate the repercussions (IMHO this change was so significant it warranted a change of version of 3.0). Oh, well.

@ickc I did what you recommended (downgrade pandoc to 2.9). It was a bit of fuss with brew on MacOS, but that's doable (in case somebody wants indications, I kept a few notes). Plus I warned my colleagues about the issue.

I'd say we take this with philosophy. If there is something that the community can do to help, let us know.

@ickc
Copy link
Collaborator Author

ickc commented Sep 15, 2020

The pandoc team thought they were doing something right by adding merged cells, but they did not anticipate the repercussions (IMHO this change was so significant it warranted a change of version of 3.0)

In fact it is a much anticipated feature for years. Pandoc's versioning model is major.major.minor.critical. So 2.9 to 2.10 is a major change that does not guarantee backward compatibility.

Many people surprised by something like this for the first time. But whichever software stack one depends on has to be managed can should not be blindly upgraded especially for major change that may break backward compatibility. Homebrew is one of the package manager that is not good at managing multiple versions (i.e. older versions.) For better control, you can use something like conda, conda create -n panflute pandoc<2.10 panflute -c conda-forge will gives you the necessary versions. OS level package managers often has this difference with those managing software stacks. In this case homebrew/macports/pacman from Arch, etc. can help you manage OS level software stacks, but other package manager such as conda can help managing your development software stacks.

While users of pandoc filters might not think that using them are "developing", but as long as they are using pandoc filters, they are developing in the sense that you have to manage the software stacks more carefully. That's partly why the idea of a pandoc package manager (pandocpm) died. In the case of this new table in AST, even the native filter framework embedded in pandoc failed. But in general the embedded framework should be the safest bet because it is officially supported and supposed to be upgraded in sync with pandoc itself.

@ickc
Copy link
Collaborator Author

ickc commented Sep 15, 2020

By the way, the panflute package at conda-forge in principle can include dependence on pandoc: https://github.com/conda-forge/panflute-feedstock, maintained by @kiwi0fruit. With that it would be impossible to create a conflicted conda environment for panflute.

I know it is possible but don't know exactly how, @kiwi0fruit, any comments?

@sergiocorreia
Copy link
Owner

Hi Kolen, (and Laurent, Damien, etc.)

I agree with you that adding support for 2.10 (and now 2.10.1) is crucial. Due to family health issues, I am currently on leave from work and not in town, so even if this is important I have almost zero time to spend on updating.

One thing though: last time I looked my main problem was that I could not create or find an example of how the JSON is supposed to look, as no readers were currently supporting multirows and other features. Has this changed? If any of you can create a sample input that I can use that would be incredibly useful.

@ickc
Copy link
Collaborator Author

ickc commented Sep 15, 2020

@sergiocorreia, I'm sorry to hear that and take your time to heal with your family! I totally understand and lately I don't have much time on these open source projects either. It's certainly not your responsibility to come up with a fix.

I know that there's PRs in pandoc on some readers/writers on the new table feature such as HTML, LaTeX, etc. And the test cases can provide some JSON to work with.

Anyone are welcome to step up to provide this or even make a PR too.

@fralau
Copy link

fralau commented Sep 16, 2020

@sergiocorreia I would be willing to get samples, since that would also allow me to better understand what this evolution is about.

My only problem, right now, would be that I don't have access to a 2.10 platform on my Mac 🤷‍♂️ and I am not sure I wish to risk changing my setup (since brew is a pain for that, as far as pandoc is concerned). Anybody, any idea?

@tarleb
Copy link

tarleb commented Nov 6, 2020

@tarleb, is all the env var def documented somewhere?

You are right, this is only ever mentioned in the changelog, but not in any other documentation. I'll write a PR to add this to the JSON filter docs. (Edit: PR)

There is one other environment variable: PANDOC_READER_OPTIONS. This should contain the same information as the Lua global of the same name.

There is an open pandoc issue about exposing more info: jgm/pandoc#5221

@ickc
Copy link
Collaborator Author

ickc commented Nov 6, 2020

Thanks for the info, I just double checked and only PANDOC_READER_OPTIONS, PANDOC_VERSION are available to JSON filters, so that's a bit different comparing to that documented in the lua filter section. PANDOC_READER_OPTIONS is going to be useful for us.

(But the info from the other 4 env var available to lua filter is either already known in JSON filters, or irrelevant for our case. So nothing's lost here.)

Looking forward to PANDOC_WRITER_OPTIONS.

@ickc ickc changed the title Supporting pandoc 2.10 Supporting pandoc 2.11 Nov 7, 2020
@ickc
Copy link
Collaborator Author

ickc commented Nov 7, 2020

pandoc 2.10 uses pandoc-types 1.21

pandoc 2.11 uses pandoc-types 1.22

IIRC something from pandoc-discuss mentioned a slight API changes is made to fix some consistency issue in pandoc 2.11.

So in short probably we'd jump to supporting 2.11 instead.

@fralau
Copy link

fralau commented Nov 7, 2020

@sergiocorreia @ickc @tarleb Thanks a lot for this work for the community, this issue is getting really interesting ! Loads of things happened in this thread. After 50+ messages, perhaps someone could make a little summary of where we are all going with this issue, where we have come from, and where we are ?

@sergiocorreia
Copy link
Owner

@fralau

  • Goal is to add support 2.11 soon (within the next couple of days)
  • Going forward, we want to stay as close as possible to pandoc-latest , but avoid spending resources in keeping compat with older versions of pandoc. For that, people can just install an earlier version of panflute (so we will add a table indicating what version of panflute should they install if they have pandoc x.y)

Beyond that,

  1. We want to expose a bit more metadata (the env vars that pandoc exposes)
  2. We will probably need helper functions to make creating and modifying tables a bit easier, as 2.11 added quite a bit of complexity (still worth it though!). Right now, if you want to do something as simple as adding an extra column to a table, you would have to do a lot of changes to a lot of objects (you would need to touch every TableRow in TableHead, TableFoot, and all TableBody objects, plus adjust the ColSpec list)

@fralau
Copy link

fralau commented Nov 7, 2020

@sergiocorreia Thanks a lot: this sounds very good !

Indeed, we might have an interesting problem in our project to determine (on the fly?) which version of pandoc is being used, so as to determine which version of panflute to use. Or else, ban versions of pandoc < 2.11 🤔

@ickc
Copy link
Collaborator Author

ickc commented Nov 8, 2020

@fralau, regarding the lookup table between panflute and pandoc versions, for now it is just https://github.com/sergiocorreia/panflute/tree/stable#supported-pandoc-versions in the stable branch. It is an information for the end users and has to be acted upon manually.

Long term I'd like to have proper version control using conda. c.f. https://github.com/conda-forge/panflute-feedstock currently maintained by @kiwi0fruit. The pandoc dependence should be added there. And if done correctly people can do conda install panflute to have the matching version of compatible pandoc, panflute. (And say conda install pantable would have matching version of compatible pandoc, panflute, pantable and so on.)

For pip users, another option would be for panflute to bundle pandoc inside its wheel. I know that pypandoc used to do it in the past but has too much trouble that we wrote a download_pandoc function to just inform the endusers to run that function. However, it seems that they are building it from the source. Bundling the pre-compiled binaries in the wheels probably is much easier.

We probably should open another issue on this matter.

@ickc
Copy link
Collaborator Author

ickc commented Nov 8, 2020

Here are the files which we currently use internally to test table output. The planets table tests colspans and rowspans, nordics checks element attributes and table footer, and students contains multiple table bodies, each with its own subheader.

nordics.txt
planets.txt
students.txt

GitHub requires a .txt ending, but all files contain pandoc JSON.

@tarleb, do you have a version of these with pandoc api 1.22 instead? Thanks.

@gabyx
Copy link

gabyx commented Nov 9, 2020

So good when this is implemented! :-) 👍 @ickc Do you know how much time this needs?. Thanks a lot for your effort.

@ickc
Copy link
Collaborator Author

ickc commented Nov 9, 2020

@tarleb, do you have a version of these with pandoc api 1.22 instead? Thanks.

Nevermind, I converted them from pandoc 2.10.1 to 2.11.1.1:

nordics.txt
planets.txt
students.txt

Seems like only the JSON format has changed.

@sergiocorreia
Copy link
Owner

@tarleb, do you have a version of these with pandoc api 1.22 instead? Thanks.

Nevermind, I converted them from pandoc 2.10.1 to 2.11.1.1:

Is there a markdown source? In that way, we can generate the json each time we test; else the build might/will break next time pandoc_api changes even slightly.

@tarleb
Copy link

tarleb commented Nov 9, 2020

There are only .native sources: https://github.com/jgm/pandoc/tree/master/test/tables. Those can be converted to json with pandoc planets.native -o planets.json.

@sergiocorreia
Copy link
Owner

Thanks! Just added those files to the test cases!

sergiocorreia added a commit that referenced this issue Nov 9, 2020
- See discussion in #142 and comments from @ickc  and @tarleb
- Fixed bugs revealed by the new tests
@sergiocorreia
Copy link
Owner

By the way, the develop branch now supports the latest version of pandoc, will be adding that to master and releasing to pypi soon.

@ickc
Copy link
Collaborator Author

ickc commented Nov 10, 2020

the master branch now passes tests including the native files pointed by @tarleb. Feel free to test the master branch, e.g. by pip install git+https://github.com/sergiocorreia/panflute.git@master.

@sergiocorreia
Copy link
Owner

Closing it; code is now on pypi

Also, definitely let us know if you find any bugs or have any ideas for easier table constructors. For instance, most filter creators probably don't want to specify table foot, header of table body, colspec, etc., so a possibility is a SimpleTable() that then populates the full-on Table().

@gabyx
Copy link

gabyx commented Nov 10, 2020

@sergiocorreia , @ickc, @tarleb
Huge thanks for updating! Really appreciated!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests