forked from julien-duponchelle/scrapy-elasticsearch
-
Notifications
You must be signed in to change notification settings - Fork 89
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bulk indexing instead of single item indexing #20
Comments
@jenkin will be great if you can code review or do some testings |
Yep, thanks! Maybe it can be useful to add a setting for buffer length, so every some items it indexes them to ES. In my application now I have dozens of items every run, but for thousands of them a performance problem can raise. I added two line notes to code... :) |
Now available at https://pypi.python.org/pypi/ScrapyElasticSearch/0.7 |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Working in a pipeline, every item is indexed separately with many requests to ES (one per item). In addition, in some cases you want to break the pipeline concept, applying a global transformation to items before indexing (ie by overloading open_spider and close_spider methods in a pipeline class).
Using ES bulk api you can temporarily add items to an item buffer (with a length controlled by a setting) and then index them sometimes and not for every single item.
The text was updated successfully, but these errors were encountered: