Implement a Mallard Reader. #2700

Open
wants to merge 1 commit into
from

Projects

None yet

2 participants

@MathieuDuponchelle
Contributor

See http://projectmallard.org for more information, mallard is
basically a simplified docbook with the added notion of
http://projectmallard.org/1.0/mal_links.

The documentation maintainers of developer.gnome.org
might consider migrating to CommonMark, that's why I wrote this
reader, which is basically a copy pasted and trimmed down version
of the docbook one.

See https://github.com/GNOME/gnome-devel-docs for a decent corpus
of mallard pages.

The coverage is not total for this reader, but it's already useful
as is and I'll certainly get back to it pretty soon.

I'm really not sure about the handling of the <links> node, I'll be interested in other ideas.

Thanks again for that awesome piece of software :)

@MathieuDuponchelle MathieuDuponchelle Implement a Mallard Reader.
See http://projectmallard.org for more information, mallard is
basically a dumbed down docbook with the added notion of
http://projectmallard.org/1.0/mal_links , which usefulness
is very debatable.

The documentation maintainers of developer.gnome.org
might consider migrating to CommonMark, that's why I wrote this
reader, which is basically a copy pasted and trimmed down version
of the docbook one.

See https://github.com/GNOME/gnome-devel-docs for a decent corpus
of mallard pages.

The coverage is not total for this reader, but it's already useful
as is and I'll certainly get back to it pretty soon.
8a9c232
@jgm
Owner
jgm commented Feb 6, 2016

I'd rather avoid code duplication. If mallard really is
a subset of docbook with a small addition (mal_links), then
I wonder whether it would make more sense to implement
it as a variant of the existing docbook reader?

That's how we handle the difference between html and html5,
or plain and markdown, for example.

+++ Mathieu Duponchelle [Feb 06 16 12:13 ]:

See [1]http://projectmallard.org for more information, mallard is
basically a dumbed down docbook with the added notion of
[2]http://projectmallard.org/1.0/mal_links , which usefulness
is very debatable.

The documentation maintainers of developer.gnome.org
might consider migrating to CommonMark, that's why I wrote this
reader, which is basically a copy pasted and trimmed down version
of the docbook one.

See [3]https://github.com/GNOME/gnome-devel-docs for a decent corpus
of mallard pages.

The coverage is not total for this reader, but it's already useful
as is and I'll certainly get back to it pretty soon.

I'm really not sure about the handling of the node, I'll be interested
in other ideas.

Thanks again for that awesome piece of software :)
__________________________________________________________________

You can view, comment on, or merge this pull request online at:

[4]https://github.com/jgm/pandoc/pull/2700

Commit Summary

* Implement a Mallard Reader.

File Changes

* M [5]pandoc.cabal (1)
* M [6]src/Text/Pandoc.hs (3)
* A [7]src/Text/Pandoc/Readers/Mallard.hs (306)

Patch Links:

* [8]https://github.com/jgm/pandoc/pull/2700.patch
* [9]https://github.com/jgm/pandoc/pull/2700.diff


Reply to this email directly or [10]view it on GitHub.

References

  1. http://projectmallard.org/
  2. http://projectmallard.org/1.0/mal_links
  3. https://github.com/GNOME/gnome-devel-docs
  4. #2700
  5. https://github.com/jgm/pandoc/pull/2700/files#diff-0
  6. https://github.com/jgm/pandoc/pull/2700/files#diff-1
  7. https://github.com/jgm/pandoc/pull/2700/files#diff-2
  8. https://github.com/jgm/pandoc/pull/2700.patch
  9. https://github.com/jgm/pandoc/pull/2700.diff
    1. #2700
@MathieuDuponchelle
Contributor

Well the thing is that it isn't really a subset, for example where docbook has para / informalpara / formalpara, mallard has p, if you look at the list of block elements here -> https://github.com/jgm/pandoc/pull/2700/files#diff-3765f376fcbd39161286f22a5375facfR129 you'll see that it's similar but not identical, there are also some tiny differences in parsing certain things, and these differences piling up make factorization of the code less obvious than it could be.

Also please note that it's my second time writing haskell, and everything still seems a bit mysterious to me (<$> oO) .

I think what we should do is have you pay a closer look at the differences between both readers, decide what's worth sharing and I'll be happy to do that, cause I'm fairly sure the solutions I'll come up with will not be exactly the cleanest ones. I don't mind waiting, as there's obviously no risk of conflicts here, up to you :)

@MathieuDuponchelle
Contributor

Also do you hang out on some irc channels ? I've got a lot of silly questions to ask you about cmark, I'm digging into the code right now to find the "parsing extensions should go there" sign :)

@MathieuDuponchelle MathieuDuponchelle referenced this pull request in projectmallard/projectmallard.org Feb 6, 2016
Open

Mallard/Ducktype to xyz Convertor #29

@jgm
Owner
jgm commented Feb 7, 2016

+++ Mathieu Duponchelle [Feb 06 16 14:11 ]:

Also do you hang out on some irc channels ? I've got a lot of silly
questions to ask you about cmark, I'm digging into the code right now
to find the "parsing extensions should go there" sign :)

I don't, no. Email is the best way for me.

@jgm
Owner
jgm commented Feb 7, 2016
@MathieuDuponchelle
Contributor

Cool thanks, however I've given this a bit more thought and an alternate approach could also be to just merge this and take the time to review / improve it when there's time as :

  • The code is completely self-contained
  • The reader is already useful in its current state

Note that I don't really need this upstream, as I only need it at "porting-time", but I wouldn't like it to just get forgotten.

I can't promise I'll stick around for doing the factorization work, but that's quite likely :)

Your call anyway !

@MathieuDuponchelle
Contributor

The code is completely self-contained

Correction, it's not, but the "code-path dependencies" it introduces are purely in the Reader -> Pandoc direction, not sure how to best express that but the net result is that removing it or updating it will not require any changes elswehere, and it would even nearly be possible to revert the patch without conflicts at any point in Pandoc's future history (nearly because the cabal file and the import of Mallard might conflict, but that's really a non-issue)

@jgm
Owner
jgm commented May 10, 2016

I've merged this into my mallard branch.
But it's not ready for master. Lots of element aren't supported (e.g. tables), and there are no real tests (I added a stub).

@MathieuDuponchelle
Contributor

Heh, that's nice :) Pretty much forgot about that request, regarding tables for example, indeed, my target was commonmark which has no syntax for them at the moment. Did you think about code sharing between this reader and the docbook reader? Do you think it is practical ?

@jgm
Owner
jgm commented May 10, 2016

The CommonMark writer will output raw HTML for tables (currently).

Code sharing: yes, there's too much duplication for my tastes. I think the way to do this well would be to make the mallard reader a "variant" of the docbook reader. That is, the DocBook module could provide a function readMallard that sets a field in DBState indicating that the mallard variant is to be used. Then for elements like p that exist in mallard but not DocBook, we could simply check that this field is set, and for minor differences in handling of other fields, we could also check this. I think it would be okay just to leave DocBook elements that have no Mallard counterparts as they are -- after all, it isn't our intent to validate the documents, just to read them, and a Mallard document shouldn't contain these elements in the first place.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment