Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

github publish: some markup conversion failures #850

Open
sknebel opened this issue Nov 20, 2018 · 5 comments

Comments

@sknebel
Copy link
Contributor

commented Nov 20, 2018

I just used a fairly complex GH issue (that I had written on GH) to test POSSE-ing to a test repo:
https://www.svenknebel.de/posts/2018/11/8/ to sknebel/random-test-repo#1

The HTML in my post is a cleaned up version of Githubs HTML (with mentions of users and other issues removed to cut down the noise)

The following things in the created issue were unexpected:

  • the nested lists didn't convert correctly
  • < > were added around bare links
  • in the "output format" section, a space was added before the italicized "not")
  • the JSON code example was cut, but I had forgotten to escape a < - the browser still displayed the code following, probably because <------ clearly wasn't valid HTML tag, but it is understandable bridgy (or maybe even mf2py?) failed there (I have since edited the post to use a &lt;)

EDIT: feel free to move this issue to granary or ask me to split it up or ... - happy to help you as much as I can.

@snarfed

This comment has been minimized.

Copy link
Owner

commented Nov 21, 2018

hey, thanks for the report! yeah, converting HTML to markdown will often be imperfect, and mostly at the mercy of html2text, but i can take a look!

@sknebel

This comment has been minimized.

Copy link
Contributor Author

commented Nov 21, 2018

Just tested: lxml and mf2py handle the <----------- correctly (and escape the < in the html output of the e-content), which makes it surprising it doesn't make it through.

snarfed added a commit to snarfed/granary that referenced this issue Nov 24, 2018

@snarfed

This comment has been minimized.

Copy link
Owner

commented Nov 24, 2018

fixed the < and > around linked URLs.

@snarfed

This comment has been minimized.

Copy link
Owner

commented Nov 24, 2018

the nested lists and space before italicized not are afaict bugs in html2text. i may narrow them down and file issues; we'll see.

@sknebel

This comment has been minimized.

Copy link
Contributor Author

commented Nov 24, 2018

It seems the nested list is something where the original markdown implementation and those based on it accept html2text's output (the markdown documentation doesn't appear to describe nested lists at all), but CommonMark, on which GitHub's markdown support is based, specified it explicitly in a way that requires a deeper indentation. Its specification has a section on this history: https://spec.commonmark.org/0.28/#motivation

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants
You can’t perform that action at this time.