Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unzip the file after downloading #3

Closed
honzakral opened this issue Dec 9, 2015 · 4 comments
Closed

Unzip the file after downloading #3

honzakral opened this issue Dec 9, 2015 · 4 comments
Labels
bug Something's wrong :Metrics How metrics are stored, calculated or aggregated
Milestone

Comments

@honzakral
Copy link
Contributor

the bz2 uncompress is eating some CPU so it's not stressing elasticsearch as much as possible, the observed difference is small, but it is there...

@danielmitterdorfer danielmitterdorfer added bug Something's wrong :Metrics How metrics are stored, calculated or aggregated labels Dec 10, 2015
@mikemccand
Copy link
Contributor

++

I also say Python chewing ~100% CPU, which is no good :) When I run on an already uncompressed docs source it's more like 15% CPU.

@honzakral
Copy link
Contributor Author

btw if we don't want to spend the disk space maybe it would be an option to spawn an external process for the unzipping (bunzip2 --stdout) and read from the pipe.

this can be done either via multiprocessing package or subprocess.

@danielmitterdorfer
Copy link
Member

I am really wary of generating any additional load while running the benchmark to avoid skewing results so I'd rather spend the additional disk space. tbh I'd rather physically separate also the client from the cluster under test but it takes some time until we get there (#25). So I'm +1 on unzipping before-hand.

@danielmitterdorfer danielmitterdorfer added this to the 0.0.3 milestone Dec 11, 2015
@mikemccand
Copy link
Contributor

Maybe a good "goldilocks" solution would be to use .gz instead of .bz2: it will be a bit larger, but CPU cost for decompression is much lower.

@danielmitterdorfer danielmitterdorfer modified the milestones: 0.1.0, 0.0.3 Jan 19, 2016
@danielmitterdorfer danielmitterdorfer self-assigned this Jan 21, 2016
@danielmitterdorfer danielmitterdorfer removed their assignment Jan 21, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something's wrong :Metrics How metrics are stored, calculated or aggregated
Projects
None yet
Development

No branches or pull requests

3 participants