Skip to content
This repository has been archived by the owner on Jun 28, 2022. It is now read-only.

The accumulated size of entities is "50,000,001" that exceeded the "50,000,000" limit set by "FEATURE_SECURE_PROCESSING" #65

Open
HappyCoderMan opened this issue Sep 13, 2016 · 6 comments
Labels

Comments

@HappyCoderMan
Copy link

I am trying to index Wikipedia using a local bz2 copy to a local elasticsearch. It ran for a long time correctly, but then had an exception like this:
The accumulated size of entities is "50,000,001" that exceeded the "50,000,000" limit set by "FEATURE_SECURE_PROCESSING"

This is what I ran:
java -jar stream2es wiki --target http://localhost:9200/testwiki --log debug create index http://localhost:9200/testwiki --source /home/testuser/enwiki-latest-pages-articles.xml.bz2

image

@drewr drewr added the bug label Sep 13, 2016
@sharonadar
Copy link

+1

@HappyCoderMan
Copy link
Author

I have looked for a workaround, but not had any success yet. I tried setting the -DentityExpansionLimit on the command line with values of 2147480000 and 0. Both of those options resulted in the same 50,000,000 limit error.

Example:
java -DentityExpansionLimit=2147480000 -jar stream2es-test.jar ...

@aholstenson
Copy link

Instead of entityExpansionLimit try using jdk.xml.totalEntitySizeLimit (works for me using Java 8) or just totalEntitySizeLimit if that doesn't work. The problem is that by default secure processing is used which limits the number of entities to 50,000,000 by default, the expansion limit controls entity expansion and you shouldn't need to adjust that when parsing a Wikipedia XML-dump.

@HappyCoderMan
Copy link
Author

Thank you very much for that suggestion. It appears to have worked. My Wikipedia index ran to 2.5X more documents than it did previously. (My run ran out of disk space and didn't complete, but that should be unrelated to this issue.)

@ourdark
Copy link

ourdark commented Feb 6, 2017

nohup java -DentityExpansionLimit=2147480000 -DtotalEntitySizeLimit=2147480000 -Djdk.xml.totalEntitySizeLimit=2147480000 -Xmx2g -jar stream2es wiki --target http://es2:9200/en-wiki --source /mirror/enwiki-latest-pages-articles.xml.bz2 --log debug &
https://jira.atlassian.com/browse/JRA-62752?workflowName=JIRA+Bug+Workflow+w+Kanban+v6+-+Restricted&stepId=1

@slvher
Copy link

slvher commented Feb 23, 2018

Thank you! @ourdark

When processing huge xml file, we can also set the value of property to 0 or -1, which indicates no limit. e.g. -DentityExpansionLimit=0 -DtotalEntitySizeLimit=0 -Djdk.xml.totalEntitySizeLimit=0

Reference: https://docs.oracle.com/javase/tutorial/jaxp/limits/limits.html

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

6 participants