Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Very high memory usage on bulk insert of documents #20

Closed
GoogleCodeExporter opened this issue Jun 9, 2015 · 5 comments
Closed

Very high memory usage on bulk insert of documents #20

GoogleCodeExporter opened this issue Jun 9, 2015 · 5 comments

Comments

@GoogleCodeExporter
Copy link

When adding a lot of documents (100.000+) to Solr in one go, in this case a
CLI script used for an initial import, the memory usage gets very high
(over 500M)

I could trace the memory usage to the function _sendRawPost in Service.php.
For every request a new stream context is created, using about as much
memory as the data in the request. With a big data set this starts to add
up very quickly.

I managed to solve the issue by reusing the same context for each request,
and modifing the options to suit the next request. This way the memory
usage remained very stable even for a 600.000 document run.

Original issue reported on code.google.com by raspberr...@gmail.com on 14 Oct 2009 at 12:34

@GoogleCodeExporter
Copy link
Author

I was able to quickly verify what you report. Seems very unfortunate to me that 
there is no way to free the 
memory from the stream context resource. Using a new context seemed cleaner 
code wise, but its obviously not 
acceptable. I'll move to reusing a single context.

Original comment by donovan....@gmail.com on 19 Oct 2009 at 4:29

  • Changed state: Started

@GoogleCodeExporter
Copy link
Author

Moved to reusing a get and post context instead of creating a new one for each 
request in r21

Original comment by donovan....@gmail.com on 9 Nov 2009 at 10:09

  • Changed state: Fixed

@GoogleCodeExporter
Copy link
Author

further fix in r22 (wrong stream context function used to set options)

Original comment by donovan....@gmail.com on 9 Nov 2009 at 10:52

@GoogleCodeExporter
Copy link
Author

I am having a rather large performance issue which I think is related to this.  
I am using the newest code however I think this is leaking.  I am trying to 
index between 6 and 10 million documents and even with a memory limit on php of 
4 G, I get to maybe 1 million before it eats up the memory.  I have tried doing 
this in chunks of 100,000, 10,000, and 1,000 and it all just dies and seems to 
be around this function.

Thoughts?  better approaches?

Original comment by ave...@gmail.com on 30 Aug 2010 at 3:51

@GoogleCodeExporter
Copy link
Author

Are you using the SVN version of the code? it now reuses a context. If you are 
and are still seeing memory climb - then I'd check whether you're holding onto 
documents somewhere. If that still doesn't work, then you could try breaking 
the work into several processes.

Original comment by donovan....@gmail.com on 30 Aug 2010 at 4:05

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant