Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docx-validator comments and fonts #9269

Closed
edwintorok opened this issue Dec 18, 2023 · 15 comments
Closed

docx-validator comments and fonts #9269

edwintorok opened this issue Dec 18, 2023 · 15 comments
Labels

Comments

@edwintorok
Copy link
Contributor

Explain the problem.

These files fail validation with a modified docx-validator script that looks are more XML files:

pandoc/test/docx/golden/comments.docx
pandoc/test/docx/golden/custom_style_reference.docx
pandoc/test/docx/golden/track_changes_scrubbed_metadata.docx

pandoc/test/docx/golden/comments.docx.out:./tmp/comments-pretty.xml:11: element annotationRef: Schemas validity error : Element '{http://schemas.openxmlformats.org/wordprocessingml/2006/main}annotationRef': This element is not expected. Expected is ( {http://schemas.openxmlformats.org/wordprocessingml/2006/main}rPrChange ).
pandoc/test/docx/golden/comments.docx.out:./tmp/comments-pretty.xml:27: element annotationRef: Schemas validity error : Element '{http://schemas.openxmlformats.org/wordprocessingml/2006/main}annotationRef': This element is not expected. Expected is ( {http://schemas.openxmlformats.org/wordprocessingml/2006/main}rPrChange ).
pandoc/test/docx/golden/comments.docx.out:./tmp/comments-pretty.xml:43: element annotationRef: Schemas validity error : Element '{http://schemas.openxmlformats.org/wordprocessingml/2006/main}annotationRef': This element is not expected. Expected is ( {http://schemas.openxmlformats.org/wordprocessingml/2006/main}rPrChange ).
pandoc/test/docx/golden/comments.docx.out:./tmp/comments-pretty.xml:65: element annotationRef: Schemas validity error : Element '{http://schemas.openxmlformats.org/wordprocessingml/2006/main}annotationRef': This element is not expected. Expected is ( {http://schemas.openxmlformats.org/wordprocessingml/2006/main}rPrChange ).
pandoc/test/docx/golden/comments.docx.out:./tmp/comments-pretty.xml:81: element annotationRef: Schemas validity error : Element '{http://schemas.openxmlformats.org/wordprocessingml/2006/main}annotationRef': This element is not expected. Expected is ( {http://schemas.openxmlformats.org/wordprocessingml/2006/main}rPrChange ).
pandoc/test/docx/golden/custom_style_reference.docx.out:./tmp/fontTable-pretty.xml:2: element fonts: Schemas validity error : Element '{http://schemas.openxmlformats.org/wordprocessingml/2006/main}fonts', attribute '{http://schemas.openxmlformats.org/markup-compatibility/2006}Ignorable': The attribute '{http://schemas.openxmlformats.org/markup-compatibility/2006}Ignorable' is not allowed.
pandoc/test/docx/golden/track_changes_scrubbed_metadata.docx.out:./tmp/comments-pretty.xml:11: element annotationRef: Schemas validity error : Element '{http://schemas.openxmlformats.org/wordprocessingml/2006/main}annotationRef': This element is not expected. Expected is ( {http://schemas.openxmlformats.org/wordprocessingml/2006/main}rPrChange ).

Pandoc version?

Latest main.

@edwintorok
Copy link
Contributor Author

This is after #9270 and #9266, #9265

@jgm
Copy link
Owner

jgm commented Dec 18, 2023

Ah, I saw how you modified the validator script.
I've added a modified version of your modified script and now use it in the Makefile target. The error code indicates the total number of validation errors (currently 41).

@edwintorok
Copy link
Contributor Author

edwintorok commented Dec 18, 2023

I've opened an upstream PR with an improved version of my script (it unpacks all files in word/ so I don't need an explicit list of files to validate): devoidfury/docx-validator#1

@jgm
Copy link
Owner

jgm commented Dec 18, 2023

Oh, I already modified your script in parallel -- I will have to look at your changes to integrate them into mine.

@jgm
Copy link
Owner

jgm commented Dec 18, 2023

OK, I've updated my version of the script (tools/validate-docx.sh) so that it looks at all xml files in the container, as yours does.

Mine has a few other enhancements: it can take multiple files on the command line, and it checks them all, outputting a list of non-valid files at the end, and returning the number of errors as exit status.

@jgm
Copy link
Owner

jgm commented Dec 18, 2023

So now we're up to 185 errors!

@edwintorok
Copy link
Contributor Author

"No matching global declaration available for the validation root." means there isn't actually a rule how to validate that XML, which is why I only extracted the files in the word/ subdir

@edwintorok
Copy link
Contributor Author

diff --git a/tools/validate-docx.sh b/tools/validate-docx.sh
index 09ed1a2f5..740706554 100644
--- a/tools/validate-docx.sh
+++ b/tools/validate-docx.sh
@@ -9,7 +9,7 @@ for file in "$@"; do
   file_errors=0
   echo "*** Checking $file"
   rm -rf "$tmpdir"
-  unzip -q -o -j "$file" -d "$tmpdir"
+  unzip -q -o -j "$file" "word/*.xml" -d "$tmpdir"
   for i in "$tmpdir"/*.xml; do
     xmllint --format "${i}" > "${i}.pretty.xml"
   done

Try this, and then that number is down to 6!

@jgm
Copy link
Owner

jgm commented Dec 18, 2023

Oh, I missed that! Great.

@jgm
Copy link
Owner

jgm commented Dec 18, 2023

I also added word/_rels/*.xml.
But it's still 6.

@jgm
Copy link
Owner

jgm commented Dec 18, 2023

Never mind, it turns out that the files in _rels are named .xml.rels, and when I enabled validation of them, I got the "no matching global declaration." So, I'll remove that change.

edwintorok added a commit to edwintorok/pandoc that referenced this issue Dec 18, 2023
annotationRef is not valid for `w:rPr`, only for `w:r` according to
wml.xsd.

See jgm#9269

Signed-off-by: Edwin Török <edwin@etorok.net>
edwintorok added a commit to edwintorok/pandoc that referenced this issue Dec 18, 2023
annotationRef is not valid for `w:rPr`, only for `w:r` according to
wml.xsd.

See jgm#9269

Signed-off-by: Edwin Török <edwin@etorok.net>
edwintorok added a commit to edwintorok/pandoc that referenced this issue Dec 18, 2023
annotationRef is not valid for `w:rPr`, only for `w:r` according to
wml.xsd.

See jgm#9269

Signed-off-by: Edwin Török <edwin@etorok.net>
@edwintorok
Copy link
Contributor Author

The one about 'Ignorable' seems to be a false positive, that is something the markup processor should implement and is not normally part of the xsd it seems https://learn.microsoft.com/en-us/dotnet/desktop/wpf/advanced/mc-ignorable-attribute?view=netframeworkdesktop-4.8 and looks like wml.xsd got modified to accept it for document.xml, but not the others, this diff to wml.xsd makes it accept it for the others too:

diff --git a/schemas/ISO-IEC29500-4_2016/wml.xsd b/schemas/ISO-IEC29500-4_2016/wml.xsd
index 6fa33d9..8b456f4 100644
--- a/schemas/ISO-IEC29500-4_2016/wml.xsd
+++ b/schemas/ISO-IEC29500-4_2016/wml.xsd
@@ -1816,6 +1816,7 @@
     <xsd:attribute name="hAnsiTheme" type="ST_Theme"/>
     <xsd:attribute name="eastAsiaTheme" type="ST_Theme"/>
     <xsd:attribute name="cstheme" type="ST_Theme"/>
+
   </xsd:complexType>
   <xsd:group name="EG_RPrBase">
     <xsd:choice>
@@ -3044,6 +3045,7 @@
       <xsd:element name="targetScreenSz" type="CT_TargetScreenSz" minOccurs="0"/>
       <xsd:element name="saveSmartTagsAsXml" type="CT_OnOff" minOccurs="0"/>
     </xsd:sequence>
+        <xsd:attribute ref="mc:Ignorable" use="optional" />
   </xsd:complexType>
   <xsd:simpleType name="ST_FrameScrollbar">
     <xsd:restriction base="xsd:string">
@@ -3194,6 +3196,7 @@
       <xsd:element name="num" type="CT_Num" minOccurs="0" maxOccurs="unbounded"/>
       <xsd:element name="numIdMacAtCleanup" type="CT_DecimalNumber" minOccurs="0"/>
     </xsd:sequence>
+        <xsd:attribute ref="mc:Ignorable" use="optional" />
   </xsd:complexType>
   <xsd:simpleType name="ST_TblStyleOverrideType">
     <xsd:restriction base="xsd:string">
@@ -3285,6 +3288,7 @@
       <xsd:element name="latentStyles" type="CT_LatentStyles" minOccurs="0" maxOccurs="1"/>
       <xsd:element name="style" type="CT_Style" minOccurs="0" maxOccurs="unbounded"/>
     </xsd:sequence>
+        <xsd:attribute ref="mc:Ignorable" use="optional" />

And the comments are fixed by the linked PR.

jgm pushed a commit that referenced this issue Dec 18, 2023
annotationRef is not valid for `w:rPr`, only for `w:r` according to
wml.xsd.

See #9269

Signed-off-by: Edwin Török <edwin@etorok.net>
@jgm
Copy link
Owner

jgm commented Dec 18, 2023

It would be nice if we could get make validate-docx-golden-tests to return a clean exit code. However, when I tried patching with the diff above, I still got one failure relating to Ignorable.

edwintorok added a commit to edwintorok/pandoc that referenced this issue Dec 19, 2023
`make validate-docx-golden-tests` now passes

Fixes jgm#9269

Signed-off-by: Edwin Török <edwin@etorok.net>
@edwintorok
Copy link
Contributor Author

I pushed a workaround in the linked PR, the diff I pasted was probably truncated.

Many thanks for your help in reviewing,fixing and merging these validation PRs.

@devoidfury
Copy link

I upstreamed your changes to the validator, thanks! Also noted the issue with rels, I believe this is due to the definition commented out in wml-2010.xsd line 2, if anyone wants to pull on that thread.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants