Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bulk indexing instead of single item indexing #20

Closed
jenkin opened this issue Apr 13, 2016 · 4 comments
Closed

Bulk indexing instead of single item indexing #20

jenkin opened this issue Apr 13, 2016 · 4 comments

Comments

@jenkin
Copy link

jenkin commented Apr 13, 2016

Working in a pipeline, every item is indexed separately with many requests to ES (one per item). In addition, in some cases you want to break the pipeline concept, applying a global transformation to items before indexing (ie by overloading open_spider and close_spider methods in a pipeline class).

Using ES bulk api you can temporarily add items to an item buffer (with a length controlled by a setting) and then index them sometimes and not for every single item.

@jayzeng
Copy link
Owner

jayzeng commented Apr 18, 2016

@jayzeng
Copy link
Owner

jayzeng commented Apr 18, 2016

@jenkin will be great if you can code review or do some testings

@jenkin
Copy link
Author

jenkin commented Apr 19, 2016

Yep, thanks! Maybe it can be useful to add a setting for buffer length, so every some items it indexes them to ES. In my application now I have dozens of items every run, but for thousands of them a performance problem can raise. I added two line notes to code... :)

@jayzeng
Copy link
Owner

jayzeng commented Apr 30, 2016

@jayzeng jayzeng closed this as completed Apr 30, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants