Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docbook writer->reader loses link href #8437

Closed
arcnmx opened this issue Nov 12, 2022 · 1 comment
Closed

docbook writer->reader loses link href #8437

arcnmx opened this issue Nov 12, 2022 · 1 comment

Comments

@arcnmx
Copy link
Contributor

arcnmx commented Nov 12, 2022

Explain the problem.

Using some example input:

[![alt](https://img.shields.io/badge/License-Apache%202.0-blue.svg "title")](http://www.apache.org/licenses/LICENSE-2.0)

... we get:

<para>
  <link xlink:href="http://www.apache.org/licenses/LICENSE-2.0"><inlinemediaobject>
    <imageobject>
      <objectinfo>
        <title>
          title
        </title>
      </objectinfo>
      <imagedata fileref="https://img.shields.io/badge/License-Apache%202.0-blue.svg" />
    </imageobject>
  </inlinemediaobject></link>
</para>

(we already lost the "alt" text btw 😞)

if we feed this back into pandoc, we end up with a lot less than we started with:

[![](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](#)

So the reader has lost:

  • url href (apparently replacing it with #)
  • the title
  • the alt text (well the writer actually lost this first)

An alternate universe

Let's see what happens if we enlist the help of the asciidoc writer instead:

http://www.apache.org/licenses/LICENSE-2.0[image:https://img.shields.io/badge/License-Apache%202.0-blue.svg[alt&#44;title="title"]]

curious that it escapes commas as &#44? my local pandoc 2.17.1.1 does not do this, so I'm not sure if this is just a try pandoc bug or an actual pandoc bug, but it doesn't work, so fix it before continuing:

asciidoctor -b docbook - <<EOF
http://www.apache.org/licenses/LICENSE-2.0[image:https://img.shields.io/badge/License-Apache%202.0-blue.svg[alt,title="title"]]
EOF

now we get new and exciting output:

<?xml version="1.0" encoding="UTF-8"?>
<?asciidoc-toc?>
<?asciidoc-numbered?>
<article xmlns="http://docbook.org/ns/docbook" xmlns:xl="http://www.w3.org/1999/xlink" version="5.0" xml:lang="en">
<info>
<title>Untitled</title>
<date>1980-01-01</date>
</info>
<simpara><link xl:href="http://www.apache.org/licenses/LICENSE-2.0"><inlinemediaobject>
<imageobject>
<imagedata fileref="https://img.shields.io/badge/License-Apache%202.0-blue.svg"/>
</imageobject>
<textobject><phrase>alt</phrase></textobject>
</inlinemediaobject></link></simpara>
</article>

Which pandoc obviously likes a lot better:

[![alt](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](http://www.apache.org/licenses/LICENSE-2.0)

That's almost entirely where we started! I think the summary here is roughly:

  • the writer produces fragments rather than full documents by default, while the reader can't deal with fragments properly
    (it doesn't know what an xlink:href is without xmlns telling it first, presumably)
  • the reader can't read image titles that the writer produces
  • the reader can read alt text but the writer can't produce it
  • the asciidoc writer gained a bug somewhere between 2.17.1.1 and 2.19.2 related to that comma?

I gather that feature parity between a format's reader and writer isn't necessarily a given, but one would hope that basic inputs containing only some links and inline images wouldn't be too lossy right out of the gate - and that's not really the story here :<

Pandoc version?

https://pandoc.org/try/ pandoc version 2.19.2 because nixpkgs doesn't seem to care about keeping it up to date

@arcnmx arcnmx added the bug label Nov 12, 2022
@tarleb
Copy link
Collaborator

tarleb commented Nov 29, 2022

Thanks for this thorough report. Related issues are #8070 and #3177.

@jgm jgm self-assigned this Nov 29, 2022
jgm added a commit that referenced this issue Nov 29, 2022
even in a fragment.  (We now just look for an `href` attribute without
worrying about the namespace.)

See #8437.
jgm added a commit that referenced this issue Nov 29, 2022
...with entities when they're in Str elements.  If a link
contains an image, it may have attributes, and the commas
there should not be converted.

See #8437, #8070.
@jgm jgm closed this as completed in 513bdef Nov 29, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants