Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pandoc doesn't preserve order of Xml elements in settings.xml #9264

Closed
edwintorok opened this issue Dec 17, 2023 · 7 comments
Closed

pandoc doesn't preserve order of Xml elements in settings.xml #9264

edwintorok opened this issue Dec 17, 2023 · 7 comments
Labels

Comments

@edwintorok
Copy link
Contributor

edwintorok commented Dec 17, 2023

Explain the problem.

Using docx-validator on settings the reference doc now validates:

$ pandoc --print-default-data-file=reference.docx >|reference.docx
$ ./validate reference.docx
./tmp/document-pretty.xml validates
DOCUMENT
No entities in internal subset
No entities in external subset
./tmp/styles-pretty.xml validates
./tmp/settings-pretty.xml validates

However a newly created empty document does not:

touch test.md
pandoc test.md -o test.docx --reference-doc reference.docx
./validate test.docx
./tmp/settings-pretty.xml:11: element zoom: Schemas validity error : Element '{http://schemas.openxmlformats.org/wordprocessingml/2006/main}zoom': This element is not expected. Expected is one of ( {http://schemas.openxmlformats.org/wordprocessingml/2006/main}doNotIncludeSubdocsInStats, {http://schemas.openxmlformats.org/wordprocessingml/2006/main}doNotAutoCompressPictures, {http://schemas.openxmlformats.org/wordprocessingml/2006/main}forceUpgrade, {http://schemas.openxmlformats.org/wordprocessingml/2006/main}captions, {http://schemas.openxmlformats.org/wordprocessingml/2006/main}readModeInkLockDown, {http://schemas.openxmlformats.org/wordprocessingml/2006/main}smartTagType, {http://schemas.openxmlformats.org/schemaLibrary/2006/main}schemaLibrary, {http://schemas.openxmlformats.org/wordprocessingml/2006/main}shapeDefaults, {http://schemas.openxmlformats.org/wordprocessingml/2006/main}doNotEmbedSmartTags, {http://schemas.openxmlformats.org/wordprocessingml/2006/main}decimalSymbol ).
./tmp/settings-pretty.xml fails to validate

It looks like the settings got reordered and the 'zoom' tag is now in the wrong place:

<w:settings xmlns:m="http://schemas.openxmlformats.org/officeDocument/2006/math" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships" xmlns:sl="http://schemas.openxmlformats.org/schemaLibrary/2006/main" xmlns:v="urn:schemas-microsoft-com:vml" xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main" xmlns:w10="urn:schemas-microsoft-com:office:word">
  <w:stylePaneFormatFilter w:val="0004"/>
  <w:footnotePr>
    <w:footnote w:id="-1"/>
    <w:footnote w:id="0"/>
  </w:footnotePr>
  <w:rsids>
  </w:rsids>
  <w:clrSchemeMapping w:accent1="accent1" w:accent2="accent2" w:accent3="accent3" w:accent4="accent4" w:accent5="accent5" w:accent6="accent6" w:bg1="light1" w:bg2="light2" w:followedHyperlink="followedHyperlink" w:hyperlink="hyperlink" w:t1="dark1" w:t2="dark2"/>
  <w:zoom w:percent="100"/>

This is probably due to this code in Writer/Docx.hs:

settingsEntry <- copyChildren refArchive distArchive settingsPath epochtime settingsList

The order of elements in settings.xml can be seen in wml.xsd

Pandoc version?

I've built it from latest main:

$ git describe --always
5875de3f8
$ pandoc --version
pandoc 3.1.11
Features: +server +lua
Scripting engine: Lua 5.4
User data directory: /var/home/edwin/.local/share/pandoc
Copyright (C) 2006-2023 John MacFarlane. Web: https://pandoc.org
This is free software; see the source for copying conditions. There is no
warranty, not even for merchantability or fitness for a particular purpose.
@jgm
Copy link
Owner

jgm commented Dec 17, 2023

One sensible approach might be to put all the elements that can go in settings.xml, in order, in the archive reference.docx. Then we could simply update the ones that are found in the user's reference.docx, leaving the order. But to do this I'd have to know what default values to give all these settings.

@jgm
Copy link
Owner

jgm commented Dec 17, 2023

I guess it's just as easy to embed the ordered list of element names in the Haskell code...
[EDIT:] There is such a list already, settingsList, it's just not complete or correctly ordered!

@edwintorok
Copy link
Contributor Author

I wrote a script that attempts to fix up the docx (and a small test document now successfully validates according to the .xsd, but not yet according to the .Net tool).
In particular the list of tags for settings is here if it helps (and I'll try to create issues or send PRs to fix the other things that I found when I find some time).

@jgm jgm closed this as completed in c9bf4da Dec 18, 2023
@jgm
Copy link
Owner

jgm commented Dec 18, 2023

I think I have it working now, but further testing always welcome!

@jgm
Copy link
Owner

jgm commented Dec 18, 2023

I ran docx-validator on all the golden tests in test/docx/golden.
These were the failures:

lists.docx

./tmp/document-pretty.xml:129: element pStyle: Schemas validity error : Element '{http://schemas.openxmlformats.org/wordprocessingml/2006/main}pStyle': This element is not expected. Expected is one of ( {http://schemas.openxmlformats.org/wordprocessingml/2006/main}suppressLineNumbers, {http://schemas.openxmlformats.org/wordprocessingml/2006/main}pBdr, {http://schemas.openxmlformats.org/wordprocessingml/2006/main}shd, {http://schemas.openxmlformats.org/wordprocessingml/2006/main}tabs, {http://schemas.openxmlformats.org/wordprocessingml/2006/main}suppressAutoHyphens, {http://schemas.openxmlformats.org/wordprocessingml/2006/main}kinsoku, {http://schemas.openxmlformats.org/wordprocessingml/2006/main}wordWrap, {http://schemas.openxmlformats.org/wordprocessingml/2006/main}overflowPunct, {http://schemas.openxmlformats.org/wordprocessingml/2006/main}topLinePunct, {http://schemas.openxmlformats.org/wordprocessingml/2006/main}autoSpaceDE ).

lists_div_bullets.docx

./tmp/document-pretty.xml:33: element pStyle: Schemas validity error : Element '{http://schemas.openxmlformats.org/wordprocessingml/2006/main}pStyle': This element is not expected. Expected is one of ( {http://schemas.openxmlformats.org/wordprocessingml/2006/main}suppressLineNumbers, {http://schemas.openxmlformats.org/wordprocessingml/2006/main}pBdr, {http://schemas.openxmlformats.org/wordprocessingml/2006/main}shd, {http://schemas.openxmlformats.org/wordprocessingml/2006/main}tabs, {http://schemas.openxmlformats.org/wordprocessingml/2006/main}suppressAutoHyphens, {http://schemas.openxmlformats.org/wordprocessingml/2006/main}kinsoku, {http://schemas.openxmlformats.org/wordprocessingml/2006/main}wordWrap, {http://schemas.openxmlformats.org/wordprocessingml/2006/main}overflowPunct, {http://schemas.openxmlformats.org/wordprocessingml/2006/main}topLinePunct, {http://schemas.openxmlformats.org/wordprocessingml/2006/main}autoSpaceDE ).
./tmp/document-pretty.xml:45: element pStyle: Schemas validity error : Element '{http://schemas.openxmlformats.org/wordprocessingml/2006/main}pStyle': This element is not expected. Expected is one of ( {http://schemas.openxmlformats.org/wordprocessingml/2006/main}suppressLineNumbers, {http://schemas.openxmlformats.org/wordprocessingml/2006/main}pBdr, {http://schemas.openxmlformats.org/wordprocessingml/2006/main}shd, {http://schemas.openxmlformats.org/wordprocessingml/2006/main}tabs, {http://schemas.openxmlformats.org/wordprocessingml/2006/main}suppressAutoHyphens, {http://schemas.openxmlformats.org/wordprocessingml/2006/main}kinsoku, {http://schemas.openxmlformats.org/wordprocessingml/2006/main}wordWrap, {http://schemas.openxmlformats.org/wordprocessingml/2006/main}overflowPunct, {http://schemas.openxmlformats.org/wordprocessingml/2006/main}topLinePunct, {http://schemas.openxmlformats.org/wordprocessingml/2006/main}autoSpaceDE ).

lists_multiple_initial.docx

./tmp/document-pretty.xml:10: element pStyle: Schemas validity error : Element '{http://schemas.openxmlformats.org/wordprocessingml/2006/main}pStyle': This element is not expected. Expected is one of ( {http://schemas.openxmlformats.org/wordprocessingml/2006/main}suppressLineNumbers, {http://schemas.openxmlformats.org/wordprocessingml/2006/main}pBdr, {http://schemas.openxmlformats.org/wordprocessingml/2006/main}shd, {http://schemas.openxmlformats.org/wordprocessingml/2006/main}tabs, {http://schemas.openxmlformats.org/wordprocessingml/2006/main}suppressAutoHyphens, {http://schemas.openxmlformats.org/wordprocessingml/2006/main}kinsoku, {http://schemas.openxmlformats.org/wordprocessingml/2006/main}wordWrap, {http://schemas.openxmlformats.org/wordprocessingml/2006/main}overflowPunct, {http://schemas.openxmlformats.org/wordprocessingml/2006/main}topLinePunct, {http://schemas.openxmlformats.org/wordprocessingml/2006/main}autoSpaceDE ).
./tmp/document-pretty.xml:41: element pStyle: Schemas validity error : Element '{http://schemas.openxmlformats.org/wordprocessingml/2006/main}pStyle': This element is not expected. Expected is one of ( {http://schemas.openxmlformats.org/wordprocessingml/2006/main}suppressLineNumbers, {http://schemas.openxmlformats.org/wordprocessingml/2006/main}pBdr, {http://schemas.openxmlformats.org/wordprocessingml/2006/main}shd, {http://schemas.openxmlformats.org/wordprocessingml/2006/main}tabs, {http://schemas.openxmlformats.org/wordprocessingml/2006/main}suppressAutoHyphens, {http://schemas.openxmlformats.org/wordprocessingml/2006/main}kinsoku, {http://schemas.openxmlformats.org/wordprocessingml/2006/main}wordWrap, {http://schemas.openxmlformats.org/wordprocessingml/2006/main}overflowPunct, {http://schemas.openxmlformats.org/wordprocessingml/2006/main}topLinePunct, {http://schemas.openxmlformats.org/wordprocessingml/2006/main}autoSpaceDE ).

table_one_row.docx

./tmp/document-pretty.xml:9: element jc: Schemas validity error : Element '{http://schemas.openxmlformats.org/wordprocessingml/2006/main}jc': This element is not expected. Expected is one of ( {http://schemas.openxmlformats.org/wordprocessingml/2006/main}tblCaption, {http://schemas.openxmlformats.org/wordprocessingml/2006/main}tblDescription, {http://schemas.openxmlformats.org/wordprocessingml/2006/main}tblPrChange ).

tables-default-widths.docx

./tmp/document-pretty.xml:18: element jc: Schemas validity error : Element '{http://schemas.openxmlformats.org/wordprocessingml/2006/main}jc': This element is not expected. Expected is one of ( {http://schemas.openxmlformats.org/wordprocessingml/2006/main}tblCaption, {http://schemas.openxmlformats.org/wordprocessingml/2006/main}tblDescription, {http://schemas.openxmlformats.org/wordprocessingml/2006/main}tblPrChange ).
./tmp/document-pretty.xml:236: element jc: Schemas validity error : Element '{http://schemas.openxmlformats.org/wordprocessingml/2006/main}jc': This element is not expected. Expected is one of ( {http://schemas.openxmlformats.org/wordprocessingml/2006/main}tblCaption, {http://schemas.openxmlformats.org/wordprocessingml/2006/main}tblDescription, {http://schemas.openxmlformats.org/wordprocessingml/2006/main}tblPrChange ).
./tmp/document-pretty.xml:301: element jc: Schemas validity error : Element '{http://schemas.openxmlformats.org/wordprocessingml/2006/main}jc': This element is not expected. Expected is one of ( {http://schemas.openxmlformats.org/wordprocessingml/2006/main}tblCaption, {http://schemas.openxmlformats.org/wordprocessingml/2006/main}tblDescription, {http://schemas.openxmlformats.org/wordprocessingml/2006/main}tblPrChange ).

tables.docx

./tmp/document-pretty.xml:18: element jc: Schemas validity error : Element '{http://schemas.openxmlformats.org/wordprocessingml/2006/main}jc': This element is not expected. Expected is one of ( {http://schemas.openxmlformats.org/wordprocessingml/2006/main}tblCaption, {http://schemas.openxmlformats.org/wordprocessingml/2006/main}tblDescription, {http://schemas.openxmlformats.org/wordprocessingml/2006/main}tblPrChange ).
./tmp/document-pretty.xml:237: element jc: Schemas validity error : Element '{http://schemas.openxmlformats.org/wordprocessingml/2006/main}jc': This element is not expected. Expected is one of ( {http://schemas.openxmlformats.org/wordprocessingml/2006/main}tblCaption, {http://schemas.openxmlformats.org/wordprocessingml/2006/main}tblDescription, {http://schemas.openxmlformats.org/wordprocessingml/2006/main}tblPrChange ).
./tmp/document-pretty.xml:303: element jc: Schemas validity error : Element '{http://schemas.openxmlformats.org/wordprocessingml/2006/main}jc': This element is not expected. Expected is one of ( {http://schemas.openxmlformats.org/wordprocessingml/2006/main}tblCaption, {http://schemas.openxmlformats.org/wordprocessingml/2006/main}tblDescription, {http://schemas.openxmlformats.org/wordprocessingml/2006/main}tblPrChange ).

@jgm
Copy link
Owner

jgm commented Feb 29, 2024

@edwintorok Was endnotePr left out of the list for a reason?
EDIT: I see now that endnotePr and footnotePr are both problematic, because they may depend on the endnotes.xml or footnotes.xml in the reference docx, which isn't copied over.

@edwintorok
Copy link
Contributor Author

I don't see endnotePr anywhere in pandoc currently, and as you say tag order isn't the only thing missing in order to support it.
The OOXML validator in the CI should be pretty good at picking up missing references though once you start using it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants