epub noterefs across files not properly converted #5531

alibou99 · 2019-05-29T13:18:45Z

PS C:\files\dev\Pandoc> pandoc --version
pandoc.exe 2.7.2
Compiled with pandoc-types 1.17.5.4, texmath 0.11.2.2, skylighting 0.7.7

issue :
when I convert EPUB files to md, docx, or html, some of the text is missing. it happens when there is a note call.
here is an example :

<p class="txt_courant_justif"><span epub:type="pagebreak" id="page_36" title="36"/>
Text1
<a class="apnb" epub:type="noteref" id="ap_ntb-002-1" href="p1chap2.xhtml#ntb-002">1</a>. 
Text2
<a class="apnb" epub:type="noteref" id="ap_ntb-003-1" href="p1chap2.xhtml#ntb-003">2</a>
Text3
</p>
<p class="txt_courant_justif">
Text4
</p>

in this example, the Tex2 and Text3 are missing in the output

The text was updated successfully, but these errors were encountered:

jgm · 2019-05-29T13:42:35Z

Simpler way to reproduce this:

% pandoc -f html+epub_html_exts -t native
<p class="txt_courant_justif"><span epub:type="pagebreak" id="page_36" title="36"/>
Text1
<a class="apnb" epub:type="noteref" id="ap_ntb-002-1" href="p1chap2.xhtml#ntb-002">1</a>. 
Text2
<a class="apnb" epub:type="noteref" id="ap_ntb-003-1" href="p1chap2.xhtml#ntb-003">2</a>
Text3
</p>
<p class="txt_courant_justif">
Text4
</p>
^D
[Para [Span ("page_36",[],[("type","pagebreak"),("title","36")]) [],SoftBreak,Str "Text1",SoftBreak,Str ""]
,Para [Str "Text4"]]

alibou99 · 2019-05-29T13:54:02Z

I'm new to Git, excuse me if I do not understand everything, but what does it mean ?

alibou99 · 2019-05-29T14:03:03Z

my source document is an epub 3.0 not an html

jgm · 2019-05-29T14:52:57Z

This gives a way to reproduce the underlying issue in a simpler way, without actually producing an epub (because the epub reader uses the html reader plus a special extension under the hood). It's really a "note to self" for me to diagnose this.

alibou99 · 2019-05-29T14:59:00Z

thank you very much, I just tested the conversion via Calibre, no problem, I have the whole text. However with caliber, the notes are not recognized as such

jgm · 2019-05-29T15:15:27Z

Yes, pandoc is stumbling on notes that refer to another file, such as href="p1chap2.xhtml#ntb-002".

jgm · 2019-05-29T15:17:21Z

With the commit I just pushed, we now get:

[Para [Span ("page_36",[],[("type","pagebreak"),("title","36")]) [],SoftBreak,Str "Text1",SoftBreak,Link ("ap_ntb-002-1",["apnb"],[]) [Str "1"] ("p1chap2.xhtml#ntb-002",""),Str ".",SoftBreak,Str "Text2",SoftBreak,Link ("ap_ntb-003-1",["apnb"],[]) [Str "2"] ("p1chap2.xhtml#ntb-003",""),SoftBreak,Str "Text3"]
,Para [Str "Text4"]]

which is an improvement. The missing text is no longer missing. However, the noterefs are being parsed as links rather than proper noterefs, so there is still work to do.

alibou99 · 2019-05-29T15:23:03Z

I work for a non-profit organization, we prepare books for digital braille so that it is used by the blind. for this, our pivot format is docx or RTF. for the moment Pandoc manages at least the thing, but with this problem of the missing texts, I am reviewing all the procedure to switch to another tool, I hope that we will find a quick solution.

jgm · 2019-05-29T15:24:56Z

By tonight there should be a nightly available in pandoc-nightlies; this will at least solve the missing text problem.

alibou99 · 2019-05-29T15:25:54Z

very good news, how can I benefit from this corrected version as quickly as possible ?

alibou99 · 2019-05-29T15:28:24Z

I installed pandoc via chocolatey

alibou99 · 2019-05-29T15:36:20Z

this is my first post in the git, is there a specific command to update Pandoc on my computer and take advantage of the fix ?
thank you

jgm · 2019-05-29T15:54:36Z

Here's a binary of the latest Windows build: https://ci.appveyor.com/project/jgm/pandoc/build/job/gy92q5at64l3e68q/artifacts

alibou99 · 2019-05-30T00:22:58Z

thank you very much it works very well and I'm no longer missing text.
now trying to see the problem of footnotes. here are two examples, the first code works very well, the conversion eoub to docx produces a word document that recognizes the footnotes, the second example do not have it.
example1: good one

<p class="nonindentb">Text1<a epub:type="noteref" class="noteref" id="fn-1" href="#fn1">1</a> Text2</p>

<div epub:type="footnote" id="fn1">
<p class="noindent0"><a class="link" href="#fn-1"><span style="color: #000000;">1</span></a>. Text...</p>
</div>

example2 : bad one

<p class="txt_courant_justif"><span epub:type="pagebreak" id="page_36" title="36"/>
Text1
<a class="apnb" epub:type="noteref" id="ap_ntb-003-1" href="p1chap2.xhtml#ntb-003">2</a>
Text2
</p>
<p class="txt_courant_justif">
Text4
</p>

<section class="defnotes" epub:type="footnotes">
<!--note--><aside class="ntb" epub:type="footnote" id="ntb-003">
<p class="txt_justif"><a href="p1chap2.xhtml#ap_ntb-003-1">2</a>. Text...</p></aside>
<!--note--></section></section>

I greatly appreciate your help

jgm · 2019-05-30T04:56:33Z

Yes, the problem is that pandoc currently will only pick up footnotes that are defined in the same file. In your second example the note is in a different file.

jgm added format:EPUB format:HTML reader labels May 29, 2019

jgm added this to the 2.7.3 milestone May 29, 2019

jgm changed the title ~~issue converting from epub document~~ epub noterefs across files not properly converted May 29, 2019

jgm removed this from the 2.7.3 milestone May 29, 2019

jgm mentioned this issue Jun 10, 2019

Missing Text #5562

Closed

jgm mentioned this issue Feb 2, 2022

Converting Epub with noteref to any format drops <a ... epub:type="noteref"> in result #7884

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

epub noterefs across files not properly converted #5531

epub noterefs across files not properly converted #5531

alibou99 commented May 29, 2019 •

edited

jgm commented May 29, 2019

alibou99 commented May 29, 2019

alibou99 commented May 29, 2019

jgm commented May 29, 2019

alibou99 commented May 29, 2019

jgm commented May 29, 2019

jgm commented May 29, 2019

alibou99 commented May 29, 2019

jgm commented May 29, 2019

alibou99 commented May 29, 2019

alibou99 commented May 29, 2019

alibou99 commented May 29, 2019

jgm commented May 29, 2019

alibou99 commented May 30, 2019 •

edited

jgm commented May 30, 2019

epub noterefs across files not properly converted #5531

epub noterefs across files not properly converted #5531

Comments

alibou99 commented May 29, 2019 • edited

jgm commented May 29, 2019

alibou99 commented May 29, 2019

alibou99 commented May 29, 2019

jgm commented May 29, 2019

alibou99 commented May 29, 2019

jgm commented May 29, 2019

jgm commented May 29, 2019

alibou99 commented May 29, 2019

jgm commented May 29, 2019

alibou99 commented May 29, 2019

alibou99 commented May 29, 2019

alibou99 commented May 29, 2019

jgm commented May 29, 2019

alibou99 commented May 30, 2019 • edited

jgm commented May 30, 2019

alibou99 commented May 29, 2019 •

edited

alibou99 commented May 30, 2019 •

edited