Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IndexStatuses OOM for very large collections (i.e. Tweets2013) #17

Closed
isoboroff opened this issue Apr 17, 2013 · 4 comments
Closed

IndexStatuses OOM for very large collections (i.e. Tweets2013) #17

isoboroff opened this issue Apr 17, 2013 · 4 comments

Comments

@isoboroff
Copy link
Collaborator

2013-04-17 07:16:46,041 [main] INFO IndexStatuses - 276300000 statuses indexed
2013-04-17 07:17:10,442 [main] INFO IndexStatuses - 276400000 statuses indexed
2013-04-17 07:17:30,239 [main] INFO IndexStatuses - Total of 276485008 statuses added
2013-04-17 07:17:30,239 [main] INFO IndexStatuses - Merging segments...
java.lang.IllegalStateException: this writer hit an OutOfMemoryError; cannot flush
at org.apache.lucene.index.IndexWriter.doFlush(IndexWriter.java:2908)
at org.apache.lucene.index.IndexWriter.flush(IndexWriter.java:2901)
at org.apache.lucene.index.IndexWriter.forceMerge(IndexWriter.java:1645)
at org.apache.lucene.index.IndexWriter.forceMerge(IndexWriter.java:1621)
at cc.twittertools.search.indexing.IndexStatuses.main(IndexStatuses.java:145)

After this error, the destination directory is empty, so we have to start from scratch.

Solution 1: bump up JVM settings in etc/run.sh
Solution 2: avoid OOM better?

@isoboroff
Copy link
Collaborator Author

Crash is at line 156 in IndexStatuses.java. Unsure why we are left with an empty directory.

@isoboroff
Copy link
Collaborator Author

Testing with -Xmx8g in run.sh

@stewhdcs
Copy link

Was that successful? A custom MergePolicy for the IndexWriter might be required?

@isoboroff
Copy link
Collaborator Author

-Xmx8G was successful. The final index is 48GB for 276M statuses, taking about 19-20 hours on my old Mac Pro (4 processors, 32GB RAM). It would be nice if IndexStatuses could be a little more robust in memory conditions but that might be hard to catch. At any rate, I'm going to close this issue and add a wishlist issue for memory handling in IndexStatuses.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants