Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OutOfMemoryException Building Index - Need to make the enumeration more lazy #31

Closed
roend83 opened this issue Nov 21, 2015 · 3 comments
Closed

Comments

@roend83
Copy link

roend83 commented Nov 21, 2015

I've setup an index using a SimpleDataIndexer that is trying to index data from a database table with around 2 million rows. I'm using Umbraco's database object to query the data which appears to use a DataReader to read a row at a time. In my SimpleDataService I'm looping through the objects returned from Umbraco and yielding a new SimpleDataSet. Am I doing something wrong or is indexing this much data just not supported?

@Shazwazza
Copy link
Owner

Hi, yeah it should support that but it could be due to how the SimpleDataIndexer queues up it's data for indexing. The way it currently works is it will lookup all of the data that needs to be indexed and serialize it to the Examine format (which is in memory) and then the Examine indexing thread will consume the queue. Chances are that the memory is being filled up with millions of serialized items waiting to be indexed.

But it should be smarter and perform it's data lookups in batches (let's say of 1000) and ensure the batches only start enumeration at the time of indexing instead of queuing up in memory. This is possible now with a bit of work but Examine should be managing this by default OOTB.

I started on making this happen a while ago and must have got side tracked. I'll update this issue name so I don't forget about this for the next release!

In the meantime, I think you might have to do your index rebuilding in batches with timeouts as to not fill up the the mem with serialized versions of your data. Once your index is built though it's strongly advised to not actually rebuild it ever (unless there's data inconsistencies) and just keep it up-to-date with your source data.

@Shazwazza Shazwazza changed the title OutOfMemoryException Building Index OutOfMemoryException Building Index - Need to make the enumeration more lazy Nov 23, 2015
@Shazwazza Shazwazza added this to the 0.1.69 milestone Nov 23, 2015
@roend83
Copy link
Author

roend83 commented Nov 26, 2015

In case anyone else is running into this, I've found that if I page the query into 15,000 results at a time then I can get the index to consistently rebuild for my data set.

@Shazwazza
Copy link
Owner

I think this issue is what I've fixed here: #41

1c497dc#diff-a4636988629a819865ddf6634cf261f1R78

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants