You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I've setup an index using a SimpleDataIndexer that is trying to index data from a database table with around 2 million rows. I'm using Umbraco's database object to query the data which appears to use a DataReader to read a row at a time. In my SimpleDataService I'm looping through the objects returned from Umbraco and yielding a new SimpleDataSet. Am I doing something wrong or is indexing this much data just not supported?
The text was updated successfully, but these errors were encountered:
Hi, yeah it should support that but it could be due to how the SimpleDataIndexer queues up it's data for indexing. The way it currently works is it will lookup all of the data that needs to be indexed and serialize it to the Examine format (which is in memory) and then the Examine indexing thread will consume the queue. Chances are that the memory is being filled up with millions of serialized items waiting to be indexed.
But it should be smarter and perform it's data lookups in batches (let's say of 1000) and ensure the batches only start enumeration at the time of indexing instead of queuing up in memory. This is possible now with a bit of work but Examine should be managing this by default OOTB.
I started on making this happen a while ago and must have got side tracked. I'll update this issue name so I don't forget about this for the next release!
In the meantime, I think you might have to do your index rebuilding in batches with timeouts as to not fill up the the mem with serialized versions of your data. Once your index is built though it's strongly advised to not actually rebuild it ever (unless there's data inconsistencies) and just keep it up-to-date with your source data.
Shazwazza
changed the title
OutOfMemoryException Building Index
OutOfMemoryException Building Index - Need to make the enumeration more lazy
Nov 23, 2015
In case anyone else is running into this, I've found that if I page the query into 15,000 results at a time then I can get the index to consistently rebuild for my data set.
I've setup an index using a SimpleDataIndexer that is trying to index data from a database table with around 2 million rows. I'm using Umbraco's database object to query the data which appears to use a DataReader to read a row at a time. In my SimpleDataService I'm looping through the objects returned from Umbraco and yielding a new SimpleDataSet. Am I doing something wrong or is indexing this much data just not supported?
The text was updated successfully, but these errors were encountered: