You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Jun 20, 2023. It is now read-only.
I am having trouble indexing some of our documents. They are sometimes over 20Mb.
I receive the following exception.
[2014-12-08 10:48:43,375][WARN ][org.apache.pdfbox.util.PDFStreamEngine] java.lang.NullPointerException
java.lang.NullPointerException
at org.apache.pdfbox.util.PDFStreamEngine.processEncodedText(PDFStreamEngine.java:364)
at org.apache.pdfbox.util.operator.ShowText.process(ShowText.java:45)
at org.apache.pdfbox.util.PDFStreamEngine.processOperator(PDFStreamEngine.java:554)
at org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:268)
at org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:235)
at org.apache.pdfbox.util.PDFStreamEngine.processStream(PDFStreamEngine.java:215)
at org.apache.pdfbox.util.PDFTextStripper.processPage(PDFTextStripper.java:456)
at org.apache.pdfbox.util.PDFTextStripper.processPages(PDFTextStripper.java:381)
at org.apache.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:340)
at org.apache.tika.parser.pdf.PDF2XHTML.process(PDF2XHTML.java:106)
at org.apache.tika.parser.pdf.PDFParser.parse(PDFParser.java:143)
at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
at org.apache.tika.Tika.parseToString(Tika.java:421)
at org.elasticsearch.index.mapper.attachment.AttachmentMapper.parse(AttachmentMapper.java:389)
at org.elasticsearch.index.mapper.object.ObjectMapper.serializeValue(ObjectMapper.java:648)
at org.elasticsearch.index.mapper.object.ObjectMapper.parse(ObjectMapper.java:501)
at org.elasticsearch.index.mapper.DocumentMapper.parse(DocumentMapper.java:542)
at org.elasticsearch.index.mapper.DocumentMapper.parse(DocumentMapper.java:491)
at org.elasticsearch.index.shard.service.InternalIndexShard.prepareIndex(InternalIndexShard.java:397)
at org.elasticsearch.action.bulk.TransportShardBulkAction.shardIndexOperation(TransportShardBulkAction.java:442)
at org.elasticsearch.action.bulk.TransportShardBulkAction.shardOperationOnPrimary(TransportShardBulkAction.java:157)
at org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction.performOnPrimary(TransportShardReplicationOperationAction.java:535)
at org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction$1.run(TransportShardReplicationOperationAction.java:434)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Can this be due to not enough memory allocated to Tika ?
How can I allocate more memory to ES so that Tika can benefit from it too ?
Thanks for any hints or help
The text was updated successfully, but these errors were encountered:
I wonder if this has been fixed in more recent version of Tika. We are going to release the plugin with Tika 1.7 (was 1.5). If you still use it, could you test with the new version and reopen the issue if you still get NPE?
BTW are you using an old version of the plugin? Because we now catch a Throwable so you should not see that kind of log.
Closing. Feel free to reopen if needed and/or add more details.
Hi Guys,
I am having trouble indexing some of our documents. They are sometimes over 20Mb.
I receive the following exception.
[2014-12-08 10:48:43,375][WARN ][org.apache.pdfbox.util.PDFStreamEngine] java.lang.NullPointerException
java.lang.NullPointerException
at org.apache.pdfbox.util.PDFStreamEngine.processEncodedText(PDFStreamEngine.java:364)
at org.apache.pdfbox.util.operator.ShowText.process(ShowText.java:45)
at org.apache.pdfbox.util.PDFStreamEngine.processOperator(PDFStreamEngine.java:554)
at org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:268)
at org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:235)
at org.apache.pdfbox.util.PDFStreamEngine.processStream(PDFStreamEngine.java:215)
at org.apache.pdfbox.util.PDFTextStripper.processPage(PDFTextStripper.java:456)
at org.apache.pdfbox.util.PDFTextStripper.processPages(PDFTextStripper.java:381)
at org.apache.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:340)
at org.apache.tika.parser.pdf.PDF2XHTML.process(PDF2XHTML.java:106)
at org.apache.tika.parser.pdf.PDFParser.parse(PDFParser.java:143)
at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
at org.apache.tika.Tika.parseToString(Tika.java:421)
at org.elasticsearch.index.mapper.attachment.AttachmentMapper.parse(AttachmentMapper.java:389)
at org.elasticsearch.index.mapper.object.ObjectMapper.serializeValue(ObjectMapper.java:648)
at org.elasticsearch.index.mapper.object.ObjectMapper.parse(ObjectMapper.java:501)
at org.elasticsearch.index.mapper.DocumentMapper.parse(DocumentMapper.java:542)
at org.elasticsearch.index.mapper.DocumentMapper.parse(DocumentMapper.java:491)
at org.elasticsearch.index.shard.service.InternalIndexShard.prepareIndex(InternalIndexShard.java:397)
at org.elasticsearch.action.bulk.TransportShardBulkAction.shardIndexOperation(TransportShardBulkAction.java:442)
at org.elasticsearch.action.bulk.TransportShardBulkAction.shardOperationOnPrimary(TransportShardBulkAction.java:157)
at org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction.performOnPrimary(TransportShardReplicationOperationAction.java:535)
at org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction$1.run(TransportShardReplicationOperationAction.java:434)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Can this be due to not enough memory allocated to Tika ?
How can I allocate more memory to ES so that Tika can benefit from it too ?
Thanks for any hints or help
The text was updated successfully, but these errors were encountered: