Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

download boltdb files parallelly during reads #2483

Merged

Conversation

sandeepsukhani
Copy link
Contributor

What this PR does / why we need it:
When downloading files initially for reads we download them 1 at a time which is quite slow since we are now creating 96 per ingester per day. This PR changes the code to download up to 50 files at a time.

Checklist

  • Tests updated

@codecov-commenter
Copy link

codecov-commenter commented Aug 10, 2020

Codecov Report

Merging #2483 into master will increase coverage by 0.06%.
The diff coverage is 72.50%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #2483      +/-   ##
==========================================
+ Coverage   62.91%   62.98%   +0.06%     
==========================================
  Files         162      162              
  Lines       13998    14035      +37     
==========================================
+ Hits         8807     8840      +33     
- Misses       4502     4505       +3     
- Partials      689      690       +1     
Impacted Files Coverage Δ
pkg/storage/stores/shipper/downloads/table.go 66.66% <72.50%> (+1.66%) ⬆️
pkg/logql/evaluator.go 92.47% <0.00%> (-0.41%) ⬇️
pkg/promtail/targets/file/filetarget.go 69.64% <0.00%> (+1.78%) ⬆️
pkg/promtail/targets/file/tailer.go 78.40% <0.00%> (+4.54%) ⬆️

if err != nil {
return err
}

folderPath, err := t.folderPathForTable(true)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this could be moved before the t.doParallelDownload and you could pass the folder path into that function to avoid having to call it here and also inside that function, WDYT?


queue := make(chan chunk.StorageObject)
n := util.Min(len(objects), downloadParallelism)
incomingErrors := make(chan error, n)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if this should not be a buffered channel, honestly I'm not sure it will really make much difference because of how fast the loop should be that reads from it.

By having it buffered one of the worker threads could receive an error and then immediately start downloading the next item from the queue.

If we make it unbuffered then the check for success becomes synchronous over all the worker threads and any errors encountered would result in immediate cancel before workers started another download.

I think it makes more sense to me to remove the buffering on this channel.

Copy link
Collaborator

@slim-bean slim-bean left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@slim-bean slim-bean merged commit c5bc416 into grafana:master Aug 10, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants