New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make reference.docx pass validation #9263
Conversation
``` ./tmp/styles-pretty.xml:30: element qFormat: Schemas validity error : Element '{http://schemas.openxmlformats.org/wordprocessingml/2006/main}qFormat': This element is not expected. Expected is one of ( {http://schemas.openxmlformats.org/wordprocessingml/2006/main}rPr, {http://schemas.openxmlformats.org/wordprocessingml/2006/main}tblPr, {http://schemas.openxmlformats.org/wordprocessingml/2006/main}trPr, {http://schemas.openxmlformats.org/wordprocessingml/2006/main}tcPr, {http://schemas.openxmlformats.org/wordprocessingml/2006/main}tblStylePr ). ``` According to `wml.xsd` it must come before `pPr`: ``` <xsd:complexType name="CT_Style"> <xsd:sequence> [...] <xsd:element name="qFormat" type="CT_OnOff" minOccurs="0"/> [...] <xsd:element name="pPr" type="CT_PPrGeneral" minOccurs="0" maxOccurs="1"/> ``` Signed-off-by: Edwin Török <edwin@etorok.net>
``` ./tmp/styles-pretty.xml:111: element spacing: Schemas validity error : Element '{http://schemas.openxmlformats.org/wordprocessingml/2006/main}spacing': This element is not expected. Expected is one of ( {http://schemas.openxmlformats.org/wordprocessingml/2006/main}textDirection, {http://schemas.openxmlformats.org/wordprocessingml/2006/main}textAlignment, {http://schemas.openxmlformats.org/wordprocessingml/2006/main}textboxTightWrap, {http://schemas.openxmlformats.org/wordprocessingml/2006/main}outlineLvl, {http://schemas.openxmlformats.org/wordprocessingml/2006/main}divId, {http://schemas.openxmlformats.org/wordprocessingml/2006/main}cnfStyle, {http://schemas.openxmlformats.org/wordprocessingml/2006/main}pPrChange ). ``` According to `wml.xsd` `spacing` must be placed before `jc`: ``` <xsd:sequence> <xsd:element name="spacing" type="CT_Spacing" minOccurs="0"/> [...] <xsd:element name="jc" type="CT_Jc" minOccurs="0"/> ``` Signed-off-by: Edwin Török <edwin@etorok.net>
There was an extra `>` which showed up as "character content" in the XML: ``` /tmp/styles-pretty.xml:113: element rPr: Schemas validity error : Element '{http://schemas.openxmlformats.org/wordprocessingml/2006/main}rPr': Character content other than whitespace is not allowed because the content type is 'element-only'. ``` Signed-off-by: Edwin Török <edwin@etorok.net>
According to `wml.xsd` the order must be: ``` <xsd:sequence> <xsd:element name="tcBorders" type="CT_TcBorders" minOccurs="0" maxOccurs="1"/> [...] <xsd:element name="vAlign" type="CT_VerticalJc" minOccurs="0"/> ``` Signed-off-by: Edwin Török <edwin@etorok.net>
There were 2 `pStyle` for `Abstract`: ``` ./tmp/document-pretty.xml:47: element pStyle: Schemas validity error : Element '{http://schemas.openxmlformats.org/wordprocessingml/2006/main}pStyle': This element is not expected. Expected is one of ( {http://schemas.openxmlformats.org/wordprocessingml/2006/main}keepNext, {http://schemas.openxmlformats.org/wordprocessingml/2006/main}keepLines, {http://schemas.openxmlformats.org/wordprocessingml/2006/main}pageBreakBefore, {http://schemas.openxmlformats.org/wordprocessingml/2006/main}framePr, {http://schemas.openxmlformats.org/wordprocessingml/2006/main}widowControl, {http://schemas.openxmlformats.org/wordprocessingml/2006/main}numPr, {http://schemas.openxmlformats.org/wordprocessingml/2006/main}suppressLineNumbers, {http://schemas.openxmlformats.org/wordprocessingml/2006/main}pBdr, {http://schemas.openxmlformats.org/wordprocessingml/2006/main}shd, {http://schemas.openxmlformats.org/wordprocessingml/2006/main}tabs ). ``` Signed-off-by: Edwin Török <edwin@etorok.net>
``` ./tmp/document-pretty.xml:260: element tblW: Schemas validity error : Element '{http://schemas.openxmlformats.org/wordprocessingml/2006/main}tblW', attribute '{http://schemas.openxmlformats.org/wordprocessingml/2006/main}w': '0.0' is not a valid value of the union type '{http://schemas.openxmlformats.org/wordprocessingml/2006/main}ST_MeasurementOrPercent'. ``` See http://officeopenxml.com/WPtableWidth.php, there is a disagreement here between standard versions on whether a `%` is required or not when type=`pct`, but the default is 0 when omitted, so just delete this entry. Signed-off-by: Edwin Török <edwin@etorok.net>
Error from OOXMLValidator: ``` { "Description": "The required attribute 'val' is missing.", "Path": { "NamespacesDefinitions": [ "xmlns:w=\"http://schemas.openxmlformats.org/wordprocessingml/2006/main\"" ], "Namespaces": { }, "XPath": "/w:document[1]/w:body[1]/w:tbl[1]/w:tr[1]/w:trPr[1]/w:cnfStyle[1]", "PartUri": "/word/document.xml" }, "Id": "Sch_MissRequiredAttribute", "ErrorType": "Schema" }, ``` This is a bitmask where the first bit means 'first row', which is set as an attribute already. Signed-off-by: Edwin Török <edwin@etorok.net>
``` { "Description": "The element has unexpected child element 'http://schemas.openxmlformats.org/wordprocessingml/2006/main:doNotTrackMoves'.", "Path": { "NamespacesDefinitions": [ "xmlns:w=\"http://schemas.openxmlformats.org/wordprocessingml/2006/main\"" ], "Namespaces": { }, "XPath": "/w:settings[1]", "PartUri": "/word/settings.xml" }, "Id": "Sch_UnexpectedElementContentExpectingComplex", "ErrorType": "Schema" } ``` According to `wml.xsd` the order is: ``` <xsd:complexType name="CT_Settings"> <xsd:sequence> <xsd:element name="doNotTrackMoves" type="CT_OnOff" minOccurs="0"/> [...] <xsd:element name="footnotePr" type="CT_FtnDocProps" minOccurs="0"/> ``` Signed-off-by: Edwin Török <edwin@etorok.net>
From OOXMLValidator: ``` { "Description": "The element has unexpected child element 'http://schemas.openxmlformats.org/wordprocessingml/2006/main:b'.", "Path": { "NamespacesDefinitions": [ "xmlns:w=\"http://schemas.openxmlformats.org/wordprocessingml/2006/main\"" ], "Namespaces": { }, "XPath": "/w:styles[1]/w:style[9]/w:rPr[1]", "PartUri": "/word/styles.xml" }, "Id": "Sch_UnexpectedElementContentExpectingComplex", "ErrorType": "Schema" }, ``` Signed-off-by: Edwin Török <edwin@etorok.net>
``` { "Description": "The element has unexpected child element 'http://schemas.openxmlformats.org/wordprocessingml/2006/main:bCs'.", "Path": { "NamespacesDefinitions": [ "xmlns:w=\"http://schemas.openxmlformats.org/wordprocessingml/2006/main\"" ], "Namespaces": { }, "XPath": "/w:styles[1]/w:style[15]/w:rPr[1]", "PartUri": "/word/styles.xml" }, "Id": "Sch_UnexpectedElementContentExpectingComplex", "ErrorType": "Schema" }, ``` Signed-off-by: Edwin Török <edwin@etorok.net>
Using `make test TESTARGS=--accept` Signed-off-by: Edwin Török <edwin@etorok.net>
Many thanks! This is great. I had no idea these elements had to go in a certain order. |
Thanks, I've opened a separate issue #9264, the reference doc validates now, but an empty doc created by pandoc does not. I guess the xml order is lost, luckily only in |
I've used 2 docx validators:
They spotted some genuine errors (an extra
>
after a close tag, I assume a typo?), but there are also a lot of annoying errors about "Element is not expected". Took me a while to figure out that they complain about the order of XML tags!Both the XSD and RELAXNG schemas for ISO/IEC 29500 use
xs:sequence
in a lot of places which demands a particular ordering for the XML tags. I don't know whether any application actually cares about this order, but it is better to fix them, otherwise the real validation errors are difficult to see due to all the noise.With the changes in this PR the reference.docx now validates with both of the above validators, and at least LibreOffice and Google Docs can still open the .docx files.
Bugs fixed:
cnfStyle
>
after aw:color
(i.e.>>
)The actual output of
pandoc
doesn't always validate, and I haven't looked at validation anything else than docx, but lets start by fixing the reference docx in this PR.The changes here are 1 commit / error fixed with one testfile regeneration commit at the end, but if you prefer I can squash them into a single commit or rebase®en at each step, whichever you prefer (to retain bisectability post merge).