Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HTML reader with auto_identifiers produces duplicate attributes #8383

Open
bpj opened this issue Oct 17, 2022 · 5 comments
Open

HTML reader with auto_identifiers produces duplicate attributes #8383

bpj opened this issue Oct 17, 2022 · 5 comments

Comments

@bpj
Copy link

bpj commented Oct 17, 2022

(See also pandoc-discuss)

When converting to mediawiki MW headings are prepended with a span
<span id="HEADING-ID"></span>, evidently to provide an anchor to
pandoc's original/automatic id so that internal links will still work. However
if there is a <section> element1 I frequently end up with an identical id
tag on the div resulting from the section element and the automatically
inserted span element, which seems like a bug to me:

<section id="head-2" class="level2">
<h2>Head 2</h2>
<section id="head-3" class="level3">
<h3>Head 3</h3>
<ol type="1">
<li><p>Li 1</p></li>
<li><p>Li 2</p>
<p>Li 2 para 2</p></li>
<li><p>Li 3</p></li>
</ol>
<p>Text</p>
<ol type="i">
<li>Li i</li>
<li>Li ii</li>
</ol>
</section>
</section>
<div id="head-2" class="section level2">

<span id="head-2"></span>
== Head 2 ==

<div id="head-3" class="section level3">

<span id="head-3"></span>
=== Head 3 ===

<ol style="list-style-type: decimal;">
<li><p>Li 1</p></li>
<li><p>Li 2</p>
<p>Li 2 para 2</p></li>
<li><p>Li 3</p></li></ol>

Text

<ol style="list-style-type: lower-roman;">
<li>Li i</li>
<li>Li ii</li></ol>


</div>

</div>

It seems that the only fix currently is to go through the MW source after
conversion and manually remove any offending spans.

Footnotes

  1. In my case inserted by pandoc into the HTML now used as source at an
    earlier run.

@bpj bpj added the bug label Oct 17, 2022
@jgm jgm closed this as completed in 1e7b57e Oct 17, 2022
@jgm jgm reopened this Oct 18, 2022
@jgm
Copy link
Owner

jgm commented Oct 18, 2022

Re-opening so we can add the section divs again, and just fix the id issue.

@jgm
Copy link
Owner

jgm commented Oct 18, 2022

I reverted the change. Actually, it appears that the real issue is in the HTML reader. Using pandoc -f html -t native on the snippet above, we get:

[ Div
    ( "head-2" , [ "section" , "level2" ] , [] )
    [ Header
        2 ( "head-2" , [] , [] ) [ Str "Head" , Space , Str "2" ]
    , Div
        ( "head-3" , [ "section" , "level3" ] , [] )
        [ Header
            3 ( "head-3" , [] , [] ) [ Str "Head" , Space , Str "3" ]
...

So the problem here is that the HTML reader is adding auto-identifiers to the Header elements that duplicate the identifiers already present in the Divs. That is what needs fixing.

This also points to a workaround for your case: use -f html-auto_identifiers -t mediawiki.

@jgm jgm changed the title Duplicate id attributes with section divs in MediaWiki output HTML reader with auto_identifiers produces duplicate attributes Oct 18, 2022
@jgm
Copy link
Owner

jgm commented Oct 18, 2022

With changes to the HTML reader, I now get:

<div id="head-2" class="section level2">

<span id="head-2-1"></span>
== Head 2 ==

<div id="head-3" class="section level3">

<span id="head-3-1"></span>
=== Head 3 ===

<ol style="list-style-type: decimal;">
<li><p>Li 1</p></li>
<li><p>Li 2</p>
<p>Li 2 para 2</p></li>
<li><p>Li 3</p></li></ol>

Text

<ol style="list-style-type: lower-roman;">
<li>Li i</li>
<li>Li ii</li></ol>


</div>

</div>

@jgm jgm closed this as completed in e5fbddd Oct 18, 2022
@jgm jgm reopened this Oct 18, 2022
@jgm
Copy link
Owner

jgm commented Oct 18, 2022

OK, those changes led to some other problems, e.g. with the LaTeX writer, so reverting for now.

@jgm
Copy link
Owner

jgm commented Oct 18, 2022

What we really need is a pass through the AST at the end of the HTML writer that finds section Divs immediately containing a Header with the same id, and removes the id on the Header.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants