New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
java.lang.IllegalArgumentException: java.lang.IllegalArgumentException: field [content] not present as part of path [attachment.content] with large pdf files #393
Comments
Just for the record: I am running NC 14 $ php occ status
|
have you run some test before the first index ? |
Yes I did: $ php occ fulltextsearch:test .Testing your current setup:
|
can you reset, test and re-index ?
|
I also did some test on my side, and elasticsearch returns me an error 'Request Entity Too Large'. I would say that pdf files bigger than ~70MB will not be indexed by elasticsearch Would you send me the pdf that crash your index ? |
@tomthecat using '@'s is how you contact people on Github, please be a little careful with them. |
@tomthecat haven't receive any email, if you host the file, can you send me the link to maxence@nextcloud.com ? |
daita: Did you receive my mail? |
yup, if we're talking about a 170MB pdf ? :-) |
What do you think: is there a chance to make fulltextsearch skip these files and not to abort? |
should be fixed in 1.0.2 |
Updated to 1.0.2. Seems a bit better now, but:
error at very large PDF files and also at a large PPT. |
The error I have is a BadRequest400Exception from elasticsearch, which is typical for pdf file bigger than 70MB. Have you change anything to the configuration of your ES ? |
Nope. I followed the instructions given here: https://fribeiro.org/tech/2018/02/07/nextcloud-full-text-elasticsearch/ and here: https://decatec.de/home-server/volltextsuche-in-nextcloud-mit-ocr/ (for tesseract OCR) without any additional tweaking. |
I get the same error with a PDF of ~3 MB but around 130 pages. I can also send you the pdf if needed. |
I have the same error. Is it possible to configure fulltextsearch to skip these pdfs? We have a lot of pdfs bigger than 70Mb... |
php occ fulltextsearch:index
stops indexing at pdf files withException: Elasticsearch\Common\Exceptions\ServerErrorResponseException
│ Message: java.lang.IllegalArgumentException: java.lang.IllegalArgumentException: field [content] not present as part of path [attachment.content]
I deleted the first pdf where indexing stopped, started the indexing command again. fulltextsearch indexing stalled again on a pdf file. And again after deleting this one too.
Common pattern: all pdf files were larger than 70 Mbyte.
Elasticsearch is running with 8 GB of RAM:
Active: active (running) since Mon 2018-10-08 16:14:13 CEST; 1h 10min ago
Docs: http://www.elastic.co
Main PID: 504 (java)
CGroup: /system.slice/elasticsearch.service
|-504 /bin/java -Xms8g -Xmx8g -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -XX:+AlwaysPreTouch -Xss1m -Djava...
`-807 /usr/share/elasticsearch/modules/x-pack-ml/platform/linux-x86_64/bin/controller
Latest apps installed (1.01) and configured.
Any hints on this? I would love to use fulltextsearch on my files...
The text was updated successfully, but these errors were encountered: