Parsing of XML content with input fields > 512kb fails #551

Closed
jodue opened this Issue Feb 6, 2017 · 5 comments

Comments

Projects
None yet
2 participants
@jodue
Contributor

jodue commented Feb 6, 2017

I have test xml data with a binary resource which is > 512kb. When HAPI-FHIR tries to parse this an exception is thrown because the utilized XML woodstox library has 512kb as a default max size for a single attribute.

This issue has already been fixed some time ago but now this happens again because there is a check if the WstxInputFactory is used that does not work correctly any more:

XMLUtil.java line 1618:
if (inputFactory instanceof com.ctc.wstx.stax.WstxInputFactory)
This does evaluate to false as inputFactory is wrapped "__redirected.__XMLInputFactory@614ab92" but the real factory object is indeed "actual: WstxInputFactory"

jamesagnew added a commit that referenced this issue Feb 6, 2017

@jamesagnew

This comment has been minimized.

Show comment
Hide comment
@jamesagnew

jamesagnew Feb 6, 2017

Owner

Hi @jodue , would you be able to share more information about your setup?

This is working fine on my end (and I've committed a unit test just to be sure). I'm wondering if you have a different version of Woodstox on your classpath or something like that?

Owner

jamesagnew commented Feb 6, 2017

Hi @jodue , would you be able to share more information about your setup?

This is working fine on my end (and I've committed a unit test just to be sure). I'm wondering if you have a different version of Woodstox on your classpath or something like that?

@jodue

This comment has been minimized.

Show comment
Hide comment
@jodue

jodue Feb 7, 2017

Contributor

Hi James,

thanks for looking into this! Woodstox is added as a maven dependency by HAPI-FHIR, woodstox-core-asl-4.4.1.jar so this should be correct i guess.

We are running HAPI_FHIR in an JBoss EAP 7 which seems to somehow wrap this factory class which is why the call to XMLInputFactory.newInstance(); returns this "__redirected.__XMLInputFactory". As far as i could find out this should affect all JBoss versions and not only EAP 7.

An easy solution to this would be to remove the instanceof check as adding the two properties to other implementations would not really have any downside.

Since the HAPI-FHIR build currently seems to be broken i am unable to fix/test this but if you wish i could look into this when the build is running again!

Thank you!
Johannes

Contributor

jodue commented Feb 7, 2017

Hi James,

thanks for looking into this! Woodstox is added as a maven dependency by HAPI-FHIR, woodstox-core-asl-4.4.1.jar so this should be correct i guess.

We are running HAPI_FHIR in an JBoss EAP 7 which seems to somehow wrap this factory class which is why the call to XMLInputFactory.newInstance(); returns this "__redirected.__XMLInputFactory". As far as i could find out this should affect all JBoss versions and not only EAP 7.

An easy solution to this would be to remove the instanceof check as adding the two properties to other implementations would not really have any downside.

Since the HAPI-FHIR build currently seems to be broken i am unable to fix/test this but if you wish i could look into this when the build is running again!

Thank you!
Johannes

@jamesagnew

This comment has been minimized.

Show comment
Hide comment
@jamesagnew

jamesagnew Feb 7, 2017

Owner

Hi Johannes,

Unfortunately we can't set those properties blindly.. we used to do exactly that but this caused issues on Glassfish, where a non-woodstox StAX provider is used by default (SJSXP).. on that platform having unrecognized properties caused the parser to refuse to work.

I wonder if there is any way other than the instanceof test that could be used to detect that the parser is woodstox. It might be worth asking on one of the JBoss list I suppose, I'm not really sure why they are using a repackaged woodstox in the first place.

One option might be something like:

boolean isWoodstox = inputFactory.getProperty("org.codehaus.stax2.implVersion") != null;

You'd want to wrap that in a try-catch, but that could work.

Ps, build is passing again!

Owner

jamesagnew commented Feb 7, 2017

Hi Johannes,

Unfortunately we can't set those properties blindly.. we used to do exactly that but this caused issues on Glassfish, where a non-woodstox StAX provider is used by default (SJSXP).. on that platform having unrecognized properties caused the parser to refuse to work.

I wonder if there is any way other than the instanceof test that could be used to detect that the parser is woodstox. It might be worth asking on one of the JBoss list I suppose, I'm not really sure why they are using a repackaged woodstox in the first place.

One option might be something like:

boolean isWoodstox = inputFactory.getProperty("org.codehaus.stax2.implVersion") != null;

You'd want to wrap that in a try-catch, but that could work.

Ps, build is passing again!

@jodue

This comment has been minimized.

Show comment
Hide comment
@jodue

jodue Feb 7, 2017

Contributor

Hi James,

that is unfortunate. One would think that it would just ignore unknown properties ...

I now built HAPI-FHIR locally with the proposed fix (see attached diff) and it works as expected. I can also send a pull request if you like!

Thank you!
woodstox.txt

Contributor

jodue commented Feb 7, 2017

Hi James,

that is unfortunate. One would think that it would just ignore unknown properties ...

I now built HAPI-FHIR locally with the proposed fix (see attached diff) and it works as expected. I can also send a pull request if you like!

Thank you!
woodstox.txt

jamesagnew added a commit that referenced this issue Mar 17, 2017

Merge pull request #571 from jodue/master
Pull request for issue #551 to fix parsing of input fields > 512kb with JBoss
@jamesagnew

This comment has been minimized.

Show comment
Hide comment
@jamesagnew

jamesagnew Mar 17, 2017

Owner

PR merged in.

Owner

jamesagnew commented Mar 17, 2017

PR merged in.

@jamesagnew jamesagnew closed this Mar 17, 2017

jamesagnew added a commit that referenced this issue Mar 17, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment