New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NumberFormatException when extracting text from docx file #148
Comments
Please put the docx somewhere I can look at it.
|
sorry, I forgot it. here you can download the file: https://drive.google.com/file/d/0B6qA3QZEFwTKaXdlNE9PRGJhRVU/view?usp=sharing. |
@plutext Any updates? I had a very similar issue with the latest version of docx4j:
|
Please post your docx at http://ndoc.it Which version of docx4j? Generally such issues are handled by the code at https://github.com/plutext/docx4j/blob/master/src/main/java/org/docx4j/jaxb/mc-preprocessor.xslt#L89 |
Another example attached. In this case, it's triggered by the decimal value of
According to the schema, w:space should be of type Stack trace follows.
|
Should be fixed by bc652c5 Will be in a new release this week. Anybody else who encounters a similar issue but on some other attribute, please open your own issue, clearly showing what XML structure is at issue. |
I'm extracting text from a docx file using
TextUtils.extractText(Object o, Writer w)
. For a certain document (generated with an older version fo google docs) I get this exception:2015-06-21 05:55:14,999 ERROR openpackaging.parts.JaxbXmlPartXPathAware - For input string: "9360.0" [DefaultQuartzScheduler_Worker-10] {} java.lang.NumberFormatException: For input string: "9360.0" at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65) at java.lang.Integer.parseInt(Integer.java:492) at java.math.BigInteger.<init>(BigInteger.java:338) at java.math.BigInteger.<init>(BigInteger.java:476) at com.sun.xml.internal.bind.DatatypeConverterImpl._parseInteger(DatatypeConverterImpl.java:72) at com.sun.xml.internal.bind.v2.model.impl.RuntimeBuiltinLeafInfoImpl$21.parse(RuntimeBuiltinLeafInfoImpl.java:766) at com.sun.xml.internal.bind.v2.model.impl.RuntimeBuiltinLeafInfoImpl$21.parse(RuntimeBuiltinLeafInfoImpl.java:764) at com.sun.xml.internal.bind.v2.runtime.reflect.TransducedAccessor$CompositeTransducedAccessorImpl.parse(TransducedAccessor.java:230) at com.sun.xml.internal.bind.v2.runtime.unmarshaller.StructureLoader.startElement(StructureLoader.java:194) at com.sun.xml.internal.bind.v2.runtime.unmarshaller.UnmarshallingContext._startElement(UnmarshallingContext.java:486) at com.sun.xml.internal.bind.v2.runtime.unmarshaller.UnmarshallingContext.startElement(UnmarshallingContext.java:465) at com.sun.xml.internal.bind.v2.runtime.unmarshaller.InterningXmlVisitor.startElement(InterningXmlVisitor.java:60) at com.sun.xml.internal.bind.v2.runtime.unmarshaller.SAXConnector.startElement(SAXConnector.java:135) at com.sun.xml.internal.bind.unmarshaller.DOMScanner.visit(DOMScanner.java:229) at com.sun.xml.internal.bind.unmarshaller.DOMScanner.visit(DOMScanner.java:266) at com.sun.xml.internal.bind.unmarshaller.DOMScanner.visit(DOMScanner.java:235) at com.sun.xml.internal.bind.unmarshaller.DOMScanner.visit(DOMScanner.java:266) at com.sun.xml.internal.bind.unmarshaller.DOMScanner.visit(DOMScanner.java:235) at com.sun.xml.internal.bind.unmarshaller.DOMScanner.visit(DOMScanner.java:266) at com.sun.xml.internal.bind.unmarshaller.DOMScanner.visit(DOMScanner.java:235) at com.sun.xml.internal.bind.unmarshaller.DOMScanner.visit(DOMScanner.java:266) at com.sun.xml.internal.bind.unmarshaller.DOMScanner.visit(DOMScanner.java:235) at com.sun.xml.internal.bind.unmarshaller.DOMScanner.scan(DOMScanner.java:112) at com.sun.xml.internal.bind.unmarshaller.DOMScanner.scan(DOMScanner.java:95) at com.sun.xml.internal.bind.unmarshaller.DOMScanner.scan(DOMScanner.java:88) at com.sun.xml.internal.bind.v2.runtime.BinderImpl.associativeUnmarshal(BinderImpl.java:146) at com.sun.xml.internal.bind.v2.runtime.BinderImpl.unmarshal(BinderImpl.java:117) at org.docx4j.openpackaging.parts.JaxbXmlPartXPathAware.unwrapUsually(JaxbXmlPartXPathAware.java:283) at org.docx4j.openpackaging.parts.JaxbXmlPartXPathAware.unmarshal(JaxbXmlPartXPathAware.java:333) at org.docx4j.openpackaging.parts.JaxbXmlPart.getContents(JaxbXmlPart.java:147)
Is there a way to prevent this exception
?
The text was updated successfully, but these errors were encountered: