We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DOCX reader should
instead of checking the styleId, is look up the style id and check the style's <w:name> element to see if it is "caption".
as pointed by @jgm.
Previous discussed at #9515
I have my Microsoft Word in German and my document in English. I create a table using the Microsoft Word built-in interface. And I add a caption using the Microsoft Word built-in dialogue window. Because my document is in English, Word automatically set the caption to "Table". The final minimal working example is mwe-using-german-word.docx. When I run pandoc --from docx --to html mwe-using-german-word.docx, the output is <p>Lorem ipsum</p> <p>Table 1 Example</p> <table> <colgroup> <col style="width: 50%" /> <col style="width: 50%" /> </colgroup> <thead> <tr class="header"> <th>A</th> <th>B</th> </tr> </thead> <tbody> <tr class="odd"> <td>C</td> <td>D</td> </tr> </tbody> </table> instead of <p>Lorem ipsum</p> <table> <caption><p>Example</p></caption> <colgroup> <col style="width: 50%" /> <col style="width: 50%" /> </colgroup> <thead> <tr class="header"> <th>A</th> <th>B</th> </tr> </thead> <tbody> <tr class="odd"> <td>1</td> <td>2</td> </tr> </tbody> </table> that is produced by the same command (pandoc --from docx --to html) but using mwe-using-english-word.docx as input. XML of non-English Document The caption is <w:p w14:paraId="1FADD07B" w14:textId="3660CC9A" w:rsidR="00917377" w:rsidRDefault="00917377" w:rsidP="00917377"> <w:pPr> <w:pStyle w:val="Beschriftung"/> <w:keepNext/> </w:pPr> <w:r> <w:t xml:space="preserve">Table </w:t> </w:r> <w:r> <w:fldChar w:fldCharType="begin"/> </w:r> <w:r> <w:instrText xml:space="preserve"> SEQ Table \* ARABIC </w:instrText> </w:r> <w:r> <w:fldChar w:fldCharType="separate"/> </w:r> <w:r> <w:rPr> <w:noProof/> </w:rPr> <w:t>1</w:t> </w:r> <w:r> <w:fldChar w:fldCharType="end"/> </w:r> <w:r> <w:t xml:space="preserve"> </w:t> </w:r> <w:proofErr w:type="spellStart"/> <w:r> <w:t>Example</w:t> </w:r> <w:proofErr w:type="spellEnd"/> </w:p> XML of English Document <w:p w14:paraId="5DE3A68F" w14:textId="153D5F3C" w:rsidR="000E6255" w:rsidRDefault="000E6255" w:rsidP="000E6255"> <w:pPr> <w:pStyle w:val="Caption"/> <w:keepNext/> </w:pPr> <w:r> <w:t xml:space="preserve">Table </w:t> </w:r> <w:fldSimple w:instr=" SEQ Table \* ARABIC "> <w:r> <w:rPr> <w:noProof/> </w:rPr> <w:t>1</w:t> </w:r> </w:fldSimple> <w:r> <w:t xml:space="preserve"> Example</w:t> </w:r> </w:p>
I have my Microsoft Word in German and my document in English.
I create a table using the Microsoft Word built-in interface.
And I add a caption using the Microsoft Word built-in dialogue window.
Because my document is in English, Word automatically set the caption to "Table".
The final minimal working example is mwe-using-german-word.docx.
When I run pandoc --from docx --to html mwe-using-german-word.docx, the output is
pandoc --from docx --to html mwe-using-german-word.docx
<p>Lorem ipsum</p> <p>Table 1 Example</p> <table> <colgroup> <col style="width: 50%" /> <col style="width: 50%" /> </colgroup> <thead> <tr class="header"> <th>A</th> <th>B</th> </tr> </thead> <tbody> <tr class="odd"> <td>C</td> <td>D</td> </tr> </tbody> </table>
instead of
<p>Lorem ipsum</p> <table> <caption><p>Example</p></caption> <colgroup> <col style="width: 50%" /> <col style="width: 50%" /> </colgroup> <thead> <tr class="header"> <th>A</th> <th>B</th> </tr> </thead> <tbody> <tr class="odd"> <td>1</td> <td>2</td> </tr> </tbody> </table>
that is produced by the same command (pandoc --from docx --to html) but using mwe-using-english-word.docx as input.
pandoc --from docx --to html
The caption is
<w:p w14:paraId="1FADD07B" w14:textId="3660CC9A" w:rsidR="00917377" w:rsidRDefault="00917377" w:rsidP="00917377"> <w:pPr> <w:pStyle w:val="Beschriftung"/> <w:keepNext/> </w:pPr> <w:r> <w:t xml:space="preserve">Table </w:t> </w:r> <w:r> <w:fldChar w:fldCharType="begin"/> </w:r> <w:r> <w:instrText xml:space="preserve"> SEQ Table \* ARABIC </w:instrText> </w:r> <w:r> <w:fldChar w:fldCharType="separate"/> </w:r> <w:r> <w:rPr> <w:noProof/> </w:rPr> <w:t>1</w:t> </w:r> <w:r> <w:fldChar w:fldCharType="end"/> </w:r> <w:r> <w:t xml:space="preserve"> </w:t> </w:r> <w:proofErr w:type="spellStart"/> <w:r> <w:t>Example</w:t> </w:r> <w:proofErr w:type="spellEnd"/> </w:p>
<w:p w14:paraId="5DE3A68F" w14:textId="153D5F3C" w:rsidR="000E6255" w:rsidRDefault="000E6255" w:rsidP="000E6255"> <w:pPr> <w:pStyle w:val="Caption"/> <w:keepNext/> </w:pPr> <w:r> <w:t xml:space="preserve">Table </w:t> </w:r> <w:fldSimple w:instr=" SEQ Table \* ARABIC "> <w:r> <w:rPr> <w:noProof/> </w:rPr> <w:t>1</w:t> </w:r> </w:fldSimple> <w:r> <w:t xml:space="preserve"> Example</w:t> </w:r> </w:p>
The text was updated successfully, but these errors were encountered:
6f87c9e
Docx reader: ensure that table captions are counted.
87b07c6
Normally these occur outside the table element itself, but they should still be parsed as captions in this case. Closes #9518.
No branches or pull requests
DOCX reader should
as pointed by @jgm.
Previous discussed at #9515
The text was updated successfully, but these errors were encountered: