Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make XMLInputFactory configurable. #77

Closed
tan9 opened this issue Sep 5, 2023 · 5 comments · Fixed by #78
Closed

Make XMLInputFactory configurable. #77

tan9 opened this issue Sep 5, 2023 · 5 comments · Fixed by #78

Comments

@tan9
Copy link
Contributor

tan9 commented Sep 5, 2023

We require the XMLInputFactory configuration capability as described in #39.

Context

Our application is deployed on JBoss EAP, which incorporates Woodstox as its StAX implementation. While attempting to parse .ods files with a Dtd declared in metadata.xml (which was generated by using JasperReports) like the following:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE manifest:manifest PUBLIC "-//OpenOffice.org//DTD Manifest 1.0//EN" "Manifest.dtd">
<manifest:manifest xmlns:manifest="urn:oasis:names:tc:opendocument:xmlns:manifest:1.0">
    ...

We encountered the following exception:

com.github.miachm.sods.NotAnOdsException: (previously java.io.FileNotFoundException) /Manifest.dtd (No such file or directory)
 at [row,col {unknown-source}]: [2,92]

	at com.ctc.wstx.sr.StreamScanner.constructWfcException(StreamScanner.java:634)
	at com.ctc.wstx.sr.StreamScanner.throwParseError(StreamScanner.java:504)
	at com.ctc.wstx.sr.ValidatingStreamReader.findDtdExtSubset(ValidatingStreamReader.java:464)
	at com.ctc.wstx.sr.ValidatingStreamReader.finishDTD(ValidatingStreamReader.java:326)
	at com.ctc.wstx.sr.BasicStreamReader.skipToken(BasicStreamReader.java:3466)
	at com.ctc.wstx.sr.BasicStreamReader.nextFromProlog(BasicStreamReader.java:2089)
	at com.ctc.wstx.sr.BasicStreamReader.next(BasicStreamReader.java:1180)
	at com.github.miachm.sods.XmlReaderInstanceEventImpl.nextElement(XmlReaderInstanceEventImpl.java:46)
	at com.github.miachm.sods.OdsReader.processContent(OdsReader.java:86)
      ...

Please note that the exception above occurred due to this StAX implementation trying to validate the XML against the missed DTD resource, and this issue has been encountered during the process of reading and parsing the .ods file.

Workaround

I can verify that when we follow the provided instructions to configure Woodstox not to perform validation, the .ods file can be successfully read:

--- a/src/com/github/miachm/sods/XmlReaderEventImpl.java	(revision d36c400d804f7eb4f7103addeb7533d6faa5042c)
+++ b/src/com/github/miachm/sods/XmlReaderEventImpl.java	(date 1693920428753)
@@ -13,6 +13,7 @@
     @Override
     public XmlReaderInstanceEventImpl load(InputStream in) throws IOException {
         try {
+            inputFactory.setProperty(XMLInputFactory.SUPPORT_DTD, Boolean.FALSE);
             reader = inputFactory.createXMLStreamReader(in);
             // Skip start of document
             try {

Discussion

Exploring ways to extend XMLInputFactory customization:

  1. One approach is to adopt the suggestion outlined in XMLParser problem when using poi-on-android and SODS at the same time #39 (comment)
https://github.com/miachm/SODS/issues/39#issuecomment-1039036961
  1. Another strategy involves utilizing the ServiceLoader mechanism as demonstrated below:
        // Try to load XmlReader or use default XmlReaderEventImpl
        ServiceLoader<XmlReader> loader = ServiceLoader.load(XmlReader.class);
        Iterator<XmlReader> iterator = loader.iterator();
        if (iterator.hasNext()) {
            reader = iterator.next();
        } else {
            reader = new XmlReaderEventImpl();
        }
  1. Or we can simply apply the mentioned patch directly to set XMLInputFactory.SUPPORT_DTD to true. This adjustment will ensure that JasperReports-generated ODS files function correctly on JBoss EAP without affecting any existing usages.

  2. Are there any alternative approaches that should be considered?

@github-actions
Copy link

github-actions bot commented Sep 5, 2023

Thanks for your report! Please ensure you have provided enough info in order to recreate the issue, including the problematic ODS File.

@miachm
Copy link
Owner

miachm commented Sep 10, 2023

Looks like the 3rd option is the best one. That's assuming this flag doesn't affect the existing usage. I guess not, otherwise we should be able to see it at unit tests.

Nevertheless, you said the flag should be set to true. But previously, in your workaround, you set the flag to false... Should it not be false?

@tan9
Copy link
Contributor Author

tan9 commented Sep 13, 2023

Looks like the 3rd option is the best one. That's assuming this flag doesn't affect the existing usage. I guess not, otherwise we should be able to see it at unit tests.

I've submitted a pull request to address this change: #78

Nevertheless, you said the flag should be set to true. But previously, in your workaround, you set the flag to false... Should it not be false?

A correction is needed: to disable validation, set it to false. Additionally, this prevents SODS from being vulnerable to an XXE attack.

@tan9
Copy link
Contributor Author

tan9 commented Sep 13, 2023

However, #78 causes #80.

@tan9
Copy link
Contributor Author

tan9 commented Sep 13, 2023

@miachm I've rebased the PR onto the master branch.

miachm added a commit that referenced this issue Sep 13, 2023
Fixes #77 Disable DTD support on XMLInputFactory.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants