Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TyXML #82

Open
kit-ty-kate opened this issue Nov 28, 2013 · 44 comments
Open

TyXML #82

kit-ty-kate opened this issue Nov 28, 2013 · 44 comments

Comments

@kit-ty-kate
Copy link

Is it possible to have an optional module depending on TyXML that have a function to transforms an Omd.t into a TyXML type (Html5_sigs.T.elt) ? It would facilitate the integration with projects that use ocsigen.

@pw374
Copy link
Contributor

pw374 commented Nov 28, 2013

I'll be glad if there exists a TyXML backend for OMD. If someone wants to implement it and asks for my help, I will be happy to help! :-)

@darioteixeira
Copy link

@jpdeplaix: if you don't mind a convoluted route, you could take a look at Lambdoc, which does support output to a Tyxml type and now also includes Markdown (via OMD) as one of the markups it supports.

@kit-ty-kate
Copy link
Author

@darioteixeira can you do at least an opam package for camlhighlight ?

@darioteixeira
Copy link

@jpdeplaix: I already have a Camlhighlight OPAM package in my local repo (and a Lambdoc one, for that matter). The only reason I haven't pushed it upstream yet is because Camlhighlight currently requires a manual step before it's ready for use (you must copy a file into /usr/share/source-highlight). I'm hoping to find a solution with the Source-highlight upstream that will make this step unnecessary.

There's another issue with Camlhighlight that I wish to see resolved before the next release: it actually outputs a Eliom_content.Html5.F.elt, because at the time mixing Html5.F.elt and Eliom_content.Html5.F.elt was a PITA. Perhaps newer versions of Tyxml/Eliom make this easier: I'll give it a try...

@darioteixeira
Copy link

@jpdeplaix: Mind you, there is also still a problem with the Markdown support in Lambdoc: because OMD does not provide location information, error messages cannot pinpoint the line number where an error occurred (see this issue).

@agarwal
Copy link

agarwal commented Mar 26, 2016

@jpdeplaix @darioteixeira Curious if either of you have an update on this. If there's no clear solution still, I might consider implementing something.

@darioteixeira
Copy link

@agarwal: In trunk, Lambdoc's support for Markdown (via OMD) is much better now, though there are still some issues to resolve before a 1.0 release (if you want to try this route note that Lambdoc's trunk depends on Tyxml's trunk). As for direct support for Tyxml in OMD, I don't know of any developments.

@agarwal
Copy link

agarwal commented Mar 28, 2016

@darioteixeira Thanks. I'll take a look.

shepard8 added a commit to shepard8/omd-tyxml that referenced this issue Aug 22, 2017
@shepard8
Copy link

shepard8 commented Aug 22, 2017

There is still work to be done, but I have a first "quick and dirty" solution to this issue (see above commit).

The more complex constructors in the Omd.element type are not or badly documented. It would help me if someone could explain to me what are

  • Paragraph and how to enable it in of_string;
  • Ulp, Olp (I read somewhere that paragraphs are implicit, but what does it really mean?)
  • NL
  • Ref
  • Img_ref
  • X

Besides implementing these correctly, I also need to provide the code inside a functor, so that any TyXML module can use it. I also plan to provide submodules D and F compatible with Eliom_content.Html.{D,F}.

Any comment on that would be welcome. Am I heading in the right direction?

@kit-ty-kate
Copy link
Author

Maybe @Drup can answer to some of these questions

@Drup
Copy link
Member

Drup commented Aug 22, 2017

I would be happy to answer tyxml questions, but I feel the implementation above does a decent job at trying to turn the loosely-typed omd AST into tyxml. The questions seems mostly omd-specific.

@shonfeder
Copy link
Collaborator

I've encountered a need for this as well, so thought I'd revive the old discussion.

@shepard8 are you still interested in pushing this through? If so, I'd be happy to see if I could help figure out how we should treat the listed constructors. (It not, I'd be willing to pick this issue up.)

@shepard8
Copy link

Hello!

I have no time for this at the moment, feel free to continue (or restart from scratch, I don't know your plans :-) ) the work on this issue.

@shonfeder
Copy link
Collaborator

Ok! Thanks @shepard8. I should be able to take a crack at this in the next week or two.

@nojb
Copy link
Contributor

nojb commented Jun 20, 2020

There is a simple HTML intermediate representation now in master:
https://github.com/ocaml/omd/blob/01614cc0227303224e3f39cb5e85d40a7774180d/src/html.mli#L4-L13

It should be trivial to transform to TyXML.

@nojb nojb closed this as completed Jun 20, 2020
@shonfeder
Copy link
Collaborator

Would it be useful to use this intermediate representation to write the optional module that would actually satisfy this issue? Or is the particular feature request here essentially being rejected, on the grounds that it is easy enough for users to write their own transformers?

@nojb
Copy link
Contributor

nojb commented Jun 20, 2020

A priori I think this is something that users can write on their side if they need to. But if there is a lot of demand for it we can revisit the issue.

@Drup
Copy link
Member

Drup commented Jun 21, 2020

Translation from markdown to HTML is not as trivial as people think, so it seems useful to provide it somewhere. Personally, I would suggest providing a separate package omd-tyxml, with an appropriate css that does just that, once and for all.

@shonfeder
Copy link
Collaborator

I tend to agree! I'd be happy to take on this work item, unless someone else is particularly keen on it.

@nojb Would you be up for having the package be part of this repo, or would you rather I implement it in a separate repository?

@nojb
Copy link
Contributor

nojb commented Jun 22, 2020

@nojb Would you be up for having the package be part of this repo, or would you rather I implement it in a separate repository?

I think it makes sense to have it in this repository, have a go if you feel like, just be aware that the code might still budge a bit before the release. Thanks!

@kit-ty-kate
Copy link
Author

Can this be reopened until such a subpackage is available?

@nojb nojb reopened this Jun 23, 2020
@shonfeder
Copy link
Collaborator

@nojb Sounds good. I should have time this weekend. :)

@shonfeder
Copy link
Collaborator

shonfeder commented Jun 28, 2020

I've started work on this, and opened a very rough WIP PR, just to explore some directions and get the lay of the land, in #211. As I've poked around, I've discovered that don't know what we gain in going through the intermediate HTML representation:

  • From the perspective of implementation, using the intermediate representation means we have to use stringly typed nodes rather than nodes tagged with enums, which means we lose exhaustiveness checking on almost all of the structure.
  • From a usage perspective, having to go through the intermediate representation means an extra round of tree transformations, which will surely impact performance negatively.

I'd like to get your take on this, @nojb, and see if I may be missing something that you had in mind which gives the intermediate representation an edge here.

@nojb
Copy link
Contributor

nojb commented Jun 29, 2020

I've started work on this, and opened a very rough WIP PR, just to explore some directions and get the lay of the land, in #211. As I've poked around, I've discovered that don't know what we gain in going through the intermediate HTML representation:

The HTML intermediate representation allows to nicely separates the HTML printing logic from that of generating the HTML. It is also used to share code with the plain text backend. Also, it allows to embed arbitrary content inside the generated HTML.

  • From the perspective of implementation, using the intermediate representation means we have to use stringly typed nodes rather than nodes tagged with enums, which means we lose exhaustiveness checking on almost all of the structure.

For the purposes of TyXML, the intermediate representation does not have to be used. There are pros and cons to using it.

If you do use it, you share the logic with the usual HTML backend which makes it easier to be sure that you are generating the same HTML in both cases. But the "stringy" representation of nodes in the intermediate representation means that one does not get a static guarantee that the TyXML backend covers all the cases arising from Markdown.

On the other hand, if you don't use the intermediate representation you get a static guarantee that you are covering every case arising from Markdown, but you will need to replicate some of the Markdown -> HTML logic for that backend in order to produce the same HTML as the "usual" backend.

Ideally, we would like the HTML produced by both backends (the "usual" one and the TyXML one) to always coincide when the "usual" backend generates valid HTML (note that Markdown can embed arbitrary content in so-called HTML blocks; you will need to decide how to handle these, as they may not be valid HTML).

  • From a usage perspective, having to go through the intermediate representation means an extra round of tree transformations, which will surely impact performance negatively.

I would be surprised if the performance impact was noticeable in practice.

@Drup
Copy link
Member

Drup commented Jun 29, 2020

What's the purpose of the non-tyxml HTML backend apart from "does not use tyxml" ?

@nojb
Copy link
Contributor

nojb commented Jun 29, 2020

What's the purpose of the non-tyxml HTML backend apart from "does not use tyxml" ?

Markdown can embed arbitrary content which is supposed to be quoted verbatim inside the generated HTML (see here). This would be tricky to represent using tyxml (as far as I understand).

@Drup
Copy link
Member

Drup commented Jun 29, 2020

This would be tricky to represent using tyxml (as far as I understand).

That's not particularly a problem no, it just mean using Html.Unsafe to build those parts (since the well-formedness guarantee clearly doesn't apply).

@nojb
Copy link
Contributor

nojb commented Jun 29, 2020

That's not particularly a problem no, it just mean using Html.Unsafe to build those parts (since the well-formedness guarantee clearly doesn't apply).

Thanks, I didn't know about the Unsafe module. I guess this means that the best would be to make a standalone tyxml backend (not dependent on the existing HTML representation). We could then evaluate it and (perhaps) decide to switch to it altogether.

@shonfeder
Copy link
Collaborator

After poking around a bit, I had the same thought as @Drup. If we are not opposed to the tyxml dependency in principle, I think it would appealing to replace the intermediate representation with a tyxml representation. This would also mean there's no need for a second optional package. I'd be happy to reorient in that direction.

@nojb
Copy link
Contributor

nojb commented Jun 29, 2020

After poking around a bit, I had the same thought as @Drup. If we are not opposed to the tyxml dependency in principle, I think it would appealing to replace the intermediate representation with a tyxml representation. This would also mean there's no need for a second optional package. I'd be happy to reorient in that direction.

I think this is a bit premature. Let's see an implementation of the backend first (as a separate package), let's see about testing, etc, and then let us consider whether we want to switch to the tyxml backend altogether.

@shonfeder
Copy link
Collaborator

Ah, OK. That makes sense. I'll point #211 in the direction of a function from Omd.doc -> Tyxml.doc so we can evaluate 👍

@shonfeder
Copy link
Collaborator

Hi, @Drup. I've just started looking at conversion of attributes. The encoding of Tyxml attributes is quite extensive, it I'm trying to figure out a reasonable way of going from string -> 'relevant Tyxml.Html.attrib where 'relevant is the proper subtype for the Html.elt currently being formed. I can construct custom functions for this purpose (writing a tree of functions to convert string attribute names to the appropriate polymorphic variant), but this seems like it will be fragile (not to mention tedious;)). Do you have any thoughts on what might be an elegant approach here? I think a possible escape hatch is just to construct everything from Html.Unsafe, but was hoping there might be some hidden Tyxml mechanism for going string -> _ attrib which I haven't been able to turn up in the documentation.

@Drup
Copy link
Member

Drup commented Jul 4, 2020

In which context do you have raw string to express attributes in markdown ? If that's only for inlined HTML, use the unsafe API (or parse the HTML properly, using lambdasoup, and build the tyxml from that, I can show you the right piece of code to do that later).

@shonfeder
Copy link
Collaborator

Thanks @Drup. I am using the unsafe API for inlined html. The only attributes I know we'll have to deal with immediately are things like titles and alt text of images and source code annotations in code blocks. But I can actually deal with this handful of cases manually.

But I had my mind cast towards a more general solution, so that we might sometime support nice features like fenced code blocks and annotated spans from Pandoc Markdown. However, that's really out of scope for this issue, so I probably ought to stick with the just pattern matching out for the handful of attributes we need to support at the moment.

Thanks for the tip on lambdasoup! I was thinking about adding a safe variant as a followup, that ensures the inlined HTMl is valid insteda of using the unsafe API, and I think you've pointed the right way for that.

@shonfeder shonfeder mentioned this issue Jul 5, 2020
@Drup
Copy link
Member

Drup commented Jul 10, 2020

But I had my mind cast towards a more general solution, so that we might sometime support nice features like fenced code blocks and annotated spans from Pandoc Markdown. However, that's really out of scope for this issue, so I probably ought to stick with the just pattern matching out for the handful of attributes we need to support at the moment.

I agree. In my experience: do only exactly what you need and not more. Trying to design things that are too generic mesh poorly with very typed DSLs like Tyxml, and are very often completely unnecessary.

Thanks for the tip on lambdasoup! I was thinking about adding a safe variant as a followup, that ensures the inlined HTMl is valid insteda of using the unsafe API, and I think you've pointed the right way for that.

Yes, using lambdasoup is exactly the right way to go here. I've considered adding that feature directly in tyxml, but didn't want to add a dependency. Ideally, there should be a page in the documentation that demonstrates the small piece of code necessary to do it.

@shonfeder
Copy link
Collaborator

So the Tyxml backend is ready (for some time), but we need to decide whether we'd like to use this as the html representation for Omd or whether we just want it as an optional backend (probably suppllied by an omd-tyxml package?).

Pros and cons of using Tyxml for all our HTML needs:

Pros:

  • Reduce code duplication (the to_txml is isomorphic to the current, bespoke html conversion)
  • The conformance tests are fully solved on the tyxml_branch (whereas there are some excluded edge cases in master)
  • Built-in interop with anything in the ecosystem that uses tyxml.
  • Give strong static guarantees on the html we're constructing

Cons:

  • Adds eternal deps to the core package (currently Omd depends on nothing but ocaml)
  • Tyxml is a more complex to work with than the bespoke html we're currently using.

Any thoughts?

cc @sonologico

@jfrolich
Copy link
Contributor

I think tyxml is pretty terrible to work with, it uses the typesystem to prevent you nesting certain elements, but you need a lot of code duplication and type coercion just to please tyxml. The errors are also terrible. And that's all to make sure you are nesting elements correctly (which is not a problem in most modern browsers). Seems like all the wrong trade-offs. So I am all for keeping the current simpler solution.

@Drup
Copy link
Member

Drup commented Apr 12, 2021

I think tyxml is pretty terrible to work with, it uses the typesystem to prevent you nesting certain elements, but you need a lot of code duplication and type coercion just to please tyxml.

I suspect you never used it for actually write webpages ? :)
The duplication in this PR is pretty specific to automatic renderers for structured formats like markdown. Casual use of tyxml doesn't exactly look like that.

In any case, I personally think it should be a separate package. The fact that the tyxml version is better spec compliant than the original one should tell you something though ...

@SquidDev
Copy link
Contributor

SquidDev commented Apr 12, 2021

The conformance tests are fully solved on the tyxml_branch (whereas there are some excluded edge cases in master)

Are we sure about this? I just checked out the branch and reran the tests (dune build @gen --auto-promote; dune runtest) and several tests are still failing. I'm fairly sure most of the compliance issues are due to parsing rather than emitting markdown.

@jfrolich
Copy link
Contributor

I think tyxml is pretty terrible to work with, it uses the typesystem to prevent you nesting certain elements, but you need a lot of code duplication and type coercion just to please tyxml.

I suspect you never used it for actually write webpages ? :)
The duplication in this PR is pretty specific to automatic renderers for structured formats like markdown. Casual use of tyxml doesn't exactly look like that.

In any case, I personally think it should be a separate package. The fact that the tyxml version is better spec compliant than the original one should tell you something though ...

I used it, and I think the fighting with the compiler isn't worth the compliance with the structural spec compliance. Code duplication happens when you cannot generalize a component because for instance elements within an anchor tag accept a separate subset of the dom so you need two code paths (at least that was what I was experiencing when I remember correctly). Once I wrote a simpler HTML API it was pure bliss and I could remove so much code and refactor some components that needed to be awkward because of tyxml.

@Drup
Copy link
Member

Drup commented Apr 12, 2021

I used it, and I think the fighting with the compiler isn't worth the compliance with the structural spec compliance. Code duplication happens when you cannot generalize a component because for instance elements within an anchor tag accept a separate subset of the dom so you need two code paths

It's getting out of topic, but that should basically never be the case unlike you have complex recursive data. I suppose you don't care anymore, but I would very much like to see the code (probably more on the tyxml bug tracker).

@jfrolich
Copy link
Contributor

@Drup Let's take it offline. Thanks for your responses!

@sonologico
Copy link
Collaborator

sonologico commented Apr 12, 2021

The html representation that is included in the main package is small enough that I don't see much of an issue with maintaining that there and having omd-tyxml as a separate package.

I don't see a significant downside for it if the cost is this small, so I think that the benefit of not having to pull in tyxml for users that wouldn't use it outweighs it.

@shonfeder
Copy link
Collaborator

Cool. I'm convinced its best to make a separate package. Thanks, all!

@Drup Thanks for the heads about the test failures. I must have forgotten to regenerate the tests. I'll look into this (and try to prevent this kind of oversight in the future).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.