Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spaces in type names lead to NumberFormatException (decimal to long) #326

Open
marhop opened this issue Aug 2, 2022 · 7 comments
Open

Comments

@marhop
Copy link

marhop commented Aug 2, 2022

Hi,

Description: I tried to load a SIARD 2.1 file (created with SiardFromDb 2.1.120 (SIARD Suite) from a MySQL 5.5.5-10.1.37-MariaDB-0+deb9u1 DBMS) with DBPTK Desktop 2.6.0 and got stuck when preparing for browse. In ~/.dbvtk/log/dbvtk.log the following error occured:

2022-08-02 08:33:20,898 [http-nio-auto-1-exec-4] WARN  c.d.c.s.i.DatabaseRowsSolrManager - Could not insert a document batch in collectiondbv-database-dc688e07-cfc9-4673-9bab-d9d12a198035. Last response (if any): null
org.apache.solr.common.SolrException: ERROR: [doc=570] Error adding field 'col18_l'='237869.25' msg=For input string: "237869.25"
	at org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java:224)
	at org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java:100)
	at org.apache.solr.update.AddUpdateCommand.lambda$null$0(AddUpdateCommand.java:261)
	at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
	at java.util.ArrayList$ArrayListSpliterator.tryAdvance(ArrayList.java:1361)
	at java.util.stream.StreamSpliterators$WrappingSpliterator.lambda$initPartialTraversalState$0(StreamSpliterators.java:295)
	at java.util.stream.StreamSpliterators$AbstractWrappingSpliterator.fillBuffer(StreamSpliterators.java:207)
	at java.util.stream.StreamSpliterators$AbstractWrappingSpliterator.doAdvance(StreamSpliterators.java:162)
	at java.util.stream.StreamSpliterators$WrappingSpliterator.tryAdvance(StreamSpliterators.java:301)
	at java.util.Spliterators$1Adapter.hasNext(Spliterators.java:681)
	at org.apache.lucene.index.DocumentsWriterPerThread.updateDocuments(DocumentsWriterPerThread.java:200)
	at org.apache.lucene.index.DocumentsWriter.updateDocuments(DocumentsWriter.java:415)
	at org.apache.lucene.index.IndexWriter.updateDocuments(IndexWriter.java:1471)
	at org.apache.lucene.index.IndexWriter.updateDocuments(IndexWriter.java:1464)
	at org.apache.solr.update.DirectUpdateHandler2.updateDocOrDocValues(DirectUpdateHandler2.java:967)
	at org.apache.solr.update.DirectUpdateHandler2.doNormalUpdate(DirectUpdateHandler2.java:342)
	at org.apache.solr.update.DirectUpdateHandler2.addDoc0(DirectUpdateHandler2.java:294)
	at org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:241)
	at org.apache.solr.update.processor.RunUpdateProcessorFactory$RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:73)
	at org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:55)
	at org.apache.solr.update.processor.NestedUpdateProcessorFactory$NestedUpdateProcessor.processAdd(NestedUpdateProcessorFactory.java:79)
	at org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:55)
	at org.apache.solr.update.processor.DistributedUpdateProcessor.doLocalAdd(DistributedUpdateProcessor.java:263)
	at org.apache.solr.update.processor.DistributedUpdateProcessor.doVersionAdd(DistributedUpdateProcessor.java:502)
	at org.apache.solr.update.processor.DistributedUpdateProcessor.lambda$versionAdd$0(DistributedUpdateProcessor.java:343)
	at org.apache.solr.update.VersionBucket.runWithLock(VersionBucket.java:50)
	at org.apache.solr.update.processor.DistributedUpdateProcessor.versionAdd(DistributedUpdateProcessor.java:343)
	at org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:229)
	at org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:55)
	at org.apache.solr.update.processor.AddSchemaFieldsUpdateProcessorFactory$AddSchemaFieldsUpdateProcessor.processAdd(AddSchemaFieldsUpdateProcessorFactory.java:481)
	at org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:55)
	at org.apache.solr.update.processor.FieldMutatingUpdateProcessor.processAdd(FieldMutatingUpdateProcessor.java:118)
	at org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:55)
	at org.apache.solr.update.processor.FieldMutatingUpdateProcessor.processAdd(FieldMutatingUpdateProcessor.java:118)
	at org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:55)
	at org.apache.solr.update.processor.FieldMutatingUpdateProcessor.processAdd(FieldMutatingUpdateProcessor.java:118)
	at org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:55)
	at org.apache.solr.update.processor.FieldMutatingUpdateProcessor.processAdd(FieldMutatingUpdateProcessor.java:118)
	at org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:55)
	at org.apache.solr.update.processor.FieldNameMutatingUpdateProcessorFactory$1.processAdd(FieldNameMutatingUpdateProcessorFactory.java:75)
	at org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:55)
	at org.apache.solr.update.processor.FieldMutatingUpdateProcessor.processAdd(FieldMutatingUpdateProcessor.java:118)
	at org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:55)
	at org.apache.solr.update.processor.AbstractDefaultValueUpdateProcessorFactory$DefaultValueUpdateProcessor.processAdd(AbstractDefaultValueUpdateProcessorFactory.java:92)
	at org.apache.solr.handler.loader.JavabinLoader$1.update(JavabinLoader.java:110)
	at org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$StreamingCodec.readOuterMostDocIterator(JavaBinUpdateRequestCodec.java:344)
	at org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$StreamingCodec.readIterator(JavaBinUpdateRequestCodec.java:292)
	at org.apache.solr.common.util.JavaBinCodec.readObject(JavaBinCodec.java:338)
	at org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:283)
	at org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$StreamingCodec.readNamedList(JavaBinUpdateRequestCodec.java:245)
	at org.apache.solr.common.util.JavaBinCodec.readObject(JavaBinCodec.java:303)
	at org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:283)
	at org.apache.solr.common.util.JavaBinCodec.unmarshal(JavaBinCodec.java:196)
	at org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec.unmarshal(JavaBinUpdateRequestCodec.java:131)
	at org.apache.solr.handler.loader.JavabinLoader.parseAndLoadDocs(JavabinLoader.java:122)
	at org.apache.solr.handler.loader.JavabinLoader.load(JavabinLoader.java:70)
	at org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:97)
	at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:82)
	at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:216)
	at org.apache.solr.core.SolrCore.execute(SolrCore.java:2637)
	at org.apache.solr.client.solrj.embedded.EmbeddedSolrServer.request(EmbeddedSolrServer.java:227)
	at org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:214)
	at org.apache.solr.client.solrj.SolrClient.add(SolrClient.java:177)
	at com.databasepreservation.common.server.index.DatabaseRowsSolrManager.insertDocument(DatabaseRowsSolrManager.java:387)
	at com.databasepreservation.common.server.index.DatabaseRowsSolrManager.addRow(DatabaseRowsSolrManager.java:160)
	at com.databasepreservation.modules.viewer.DbvtkExportModule.handleDataRow(DbvtkExportModule.java:134)
	at com.databasepreservation.model.modules.filters.IdentityFilter.handleDataRow(IdentityFilter.java:88)
	at com.databasepreservation.model.modules.filters.ObservableFilter.handleDataRow(ObservableFilter.java:123)
	at com.databasepreservation.modules.siard.in.content.SIARD2ContentImportStrategy.endElement(SIARD2ContentImportStrategy.java:374)
	at org.apache.xerces.parsers.AbstractSAXParser.endElement(Unknown Source)
	at org.apache.xerces.impl.xs.XMLSchemaValidator.endElement(Unknown Source)
	at org.apache.xerces.impl.XMLNSDocumentScannerImpl.scanEndElement(Unknown Source)
	at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(Unknown Source)
	at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source)
	at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
	at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
	at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
	at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source)
	at org.apache.xerces.jaxp.SAXParserImpl$JAXPSAXParser.parse(Unknown Source)
	at com.databasepreservation.modules.siard.in.content.SIARD2ContentImportStrategy.importContent(SIARD2ContentImportStrategy.java:184)
	at com.databasepreservation.modules.siard.in.input.SIARDImportDefault.migrateDatabaseTo(SIARDImportDefault.java:64)
	at com.databasepreservation.DatabaseMigration.migrate(DatabaseMigration.java:123)
	at com.databasepreservation.common.server.controller.SIARDController.convertSIARDtoSolr(SIARDController.java:706)
	at com.databasepreservation.common.server.controller.SIARDController.loadFromLocal(SIARDController.java:669)
	at com.databasepreservation.common.api.v1.CollectionResource.createCollection(CollectionResource.java:197)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.glassfish.jersey.server.model.internal.ResourceMethodInvocationHandlerFactory.lambda$static$0(ResourceMethodInvocationHandlerFactory.java:52)
	at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher$1.run(AbstractJavaResourceMethodDispatcher.java:124)
	at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.invoke(AbstractJavaResourceMethodDispatcher.java:167)
	at org.glassfish.jersey.server.model.internal.JavaResourceMethodDispatcherProvider$TypeOutInvoker.doDispatch(JavaResourceMethodDispatcherProvider.java:219)
	at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.dispatch(AbstractJavaResourceMethodDispatcher.java:79)
	at org.glassfish.jersey.server.model.ResourceMethodInvoker.invoke(ResourceMethodInvoker.java:475)
	at org.glassfish.jersey.server.model.ResourceMethodInvoker.apply(ResourceMethodInvoker.java:397)
	at org.glassfish.jersey.server.model.ResourceMethodInvoker.apply(ResourceMethodInvoker.java:81)
	at org.glassfish.jersey.server.ServerRuntime$1.run(ServerRuntime.java:255)
	at org.glassfish.jersey.internal.Errors$1.call(Errors.java:248)
	at org.glassfish.jersey.internal.Errors$1.call(Errors.java:244)
	at org.glassfish.jersey.internal.Errors.process(Errors.java:292)
	at org.glassfish.jersey.internal.Errors.process(Errors.java:274)
	at org.glassfish.jersey.internal.Errors.process(Errors.java:244)
	at org.glassfish.jersey.process.internal.RequestScope.runInScope(RequestScope.java:265)
	at org.glassfish.jersey.server.ServerRuntime.process(ServerRuntime.java:234)
	at org.glassfish.jersey.server.ApplicationHandler.handle(ApplicationHandler.java:684)
	at org.glassfish.jersey.servlet.WebComponent.serviceImpl(WebComponent.java:394)
	at org.glassfish.jersey.servlet.ServletContainer.serviceImpl(ServletContainer.java:386)
	at org.glassfish.jersey.servlet.ServletContainer.doFilter(ServletContainer.java:561)
	at org.glassfish.jersey.servlet.ServletContainer.doFilter(ServletContainer.java:502)
	at org.glassfish.jersey.servlet.ServletContainer.doFilter(ServletContainer.java:439)
	at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:189)
	at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:162)
	at org.springframework.web.filter.RequestContextFilter.doFilterInternal(RequestContextFilter.java:100)
	at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:117)
	at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:189)
	at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:162)
	at org.springframework.boot.actuate.metrics.web.servlet.WebMvcMetricsFilter.doFilterInternal(WebMvcMetricsFilter.java:96)
	at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:117)
	at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:189)
	at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:162)
	at org.springframework.web.filter.CharacterEncodingFilter.doFilterInternal(CharacterEncodingFilter.java:201)
	at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:117)
	at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:189)
	at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:162)
	at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:197)
	at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:97)
	at org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:541)
	at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:135)
	at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:92)
	at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:78)
	at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:360)
	at org.apache.coyote.http11.Http11Processor.service(Http11Processor.java:399)
	at org.apache.coyote.AbstractProcessorLight.process(AbstractProcessorLight.java:65)
	at org.apache.coyote.AbstractProtocol$ConnectionHandler.process(AbstractProtocol.java:890)
	at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.doRun(NioEndpoint.java:1743)
	at org.apache.tomcat.util.net.SocketProcessorBase.run(SocketProcessorBase.java:49)
	at org.apache.tomcat.util.threads.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1191)
	at org.apache.tomcat.util.threads.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:659)
	at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
	at java.lang.Thread.run(Thread.java:750)
Caused by: java.lang.NumberFormatException: For input string: "237869.25"
	at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
	at java.lang.Long.parseLong(Long.java:589)
	at java.lang.Long.parseLong(Long.java:631)
	at org.apache.solr.schema.LongPointField.createField(LongPointField.java:154)
	at org.apache.solr.schema.PointField.createFields(PointField.java:251)
	at org.apache.solr.update.DocumentBuilder.addField(DocumentBuilder.java:65)
	at org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java:179)
	... 140 common frames omitted

The relevant column definition in header/metadata.xml:

<column>
    <name>...</name>
    <type>DECIMAL(13, 2)</type>
    <typeOriginal>decimal</typeOriginal>
    <description>...</description>
</column>

The corresponding column definition in content/schema0/table5/table5.xsd:

<xs:element minOccurs="0" name="c19" type="xs:decimal"/>

An example entry in content/schema0/table5/table5.xml:

<c19>237869.25</c19>

Apparently dbptk tries to turn a floating point decimal into a long. Is that intentional?

Steps required to reproduce the bug:

  1. Create SIARD file as described above.
  2. Load into DBPTK Desktop.
  3. Try browsing.

Attach the dbptk-app.log.txt file below. → Could not find that file on my system, sorry ...

Cheers,
Martin

@luis100
Copy link
Member

luis100 commented Aug 2, 2022

Hello, the error refers col18 col18_l'='237869.25' but your report of the datatype seems to refer c19, could you check the data again to ensure your report is correct?

@luis100 luis100 transferred this issue from keeps/dbptk-developer Aug 2, 2022
@marhop
Copy link
Author

marhop commented Aug 2, 2022

Ah yeah, I thought that was just dbptk counting from zero. :-) Will check again.

@luis100
Copy link
Member

luis100 commented Aug 2, 2022

This may be a re-numbering column issue (expected when loading from SiardSuite into DBPTK Desktop), fixed on #316, to be released in 2.6.1.

@luis100
Copy link
Member

luis100 commented Aug 2, 2022

We do start with 0, and SIARD starts with 1. So your report might be correct. But this has not happened before. Please check if this report is correct. Also, if you could mock an example that would reproduce the issue it would help immensely.

@marhop
Copy link
Author

marhop commented Aug 2, 2022

Double-checked right now, there seems indeed to be an offset of 1 between the column numbers in the SIARD file and in the log ...

I'll try to put together a minimal example.

@marhop
Copy link
Author

marhop commented Aug 2, 2022

Well now, that was tricky.

I created a minimal example, and because I did not have the SIARD Suite at hand I just created it with DBPTK Desktop 2.6.0 - no problems at all, I could browse it perfectly fine (see attached file a.siard.zip, added the zip extension so GitHub would let me upload it).

So I compared my example file to the file created by the SIARD Suite (the one that raised the error above). Where DBPTK puts this into header/metadata.xml

<type>DECIMAL(13,2)</type>
<typeOriginal>DECIMAL(13,2)</typeOriginal>

the SIARD Suite writes this:

<type>DECIMAL(13, 2)</type>
<typeOriginal>decimal</type>

But contrary to what I thought first, it's not the obvious difference in the typeOriginal element that leads to problems - it's the space in (13, 2)! When I changed the entry to

<type>DECIMAL(13, 2)</type>
<typeOriginal>DECIMAL(13, 2)</typeOriginal>

that is, just added a space char in the type name (see attached file b.siard.zip), browsing the SIARD file raised the NumberFormatException.

I can't judge how lenient the type name parsing should be, but since the SIARD Suite wrote those spaces at least at one point in history (or still writes them, I haven't checked), maybe you could make your parser a little more flexible to increase compatibility?

PS: Again, there was an offset between the column number in the SIARD file (c1) and the one reported in the ~/.dbvtk/log/dbvtk.log file (col0_l).

@marhop marhop changed the title NumberFormatException when converting from decimal to long Spaces in type names lead to NumberFormatException (decimal to long) Aug 2, 2022
@luis100
Copy link
Member

luis100 commented Aug 2, 2022

Parsing should be lenient, validation should be strict. This would then me marked as an enhancement, and maybe transferred back to the dbptk-developer as it is the part of the logic that parses SIARD into the intermediate data model.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants