Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cache: Recheck before downloading #2019

Merged
merged 2 commits into from
Oct 5, 2018
Merged

cache: Recheck before downloading #2019

merged 2 commits into from
Oct 5, 2018

Conversation

fd0
Copy link
Member

@fd0 fd0 commented Oct 4, 2018

What is the purpose of this change? What does it change?

recheck if another goroutine has already downloaded to the cache before
attempting it. A user reported that restic printed errors like this:

Load(<data/61db3a5d4c>, 0, 0) returned error, retrying after 689.290829ms: Create: open /tmp/restic-check-cache-551233426/a14636d2d23f0d08914fe85b74cf8a7828f73b5d6e6c465b6795086221200faf/data/61/61db3a5d4c79ca52bd1ddbe0f833480d9768b6af792affdb582d927764a11af8: file exists

This means that a goroutine was attempting to download a file, but by
the time it got to the actual download, the file was already there. We
now recheck if the file has been downloaded in the meantime.

@codecov-io
Copy link

codecov-io commented Oct 4, 2018

Codecov Report

Merging #2019 into master will decrease coverage by 0.1%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #2019      +/-   ##
==========================================
- Coverage   50.16%   50.05%   -0.11%     
==========================================
  Files         169      169              
  Lines       13573    13574       +1     
==========================================
- Hits         6809     6795      -14     
- Misses       5740     5760      +20     
+ Partials     1024     1019       -5
Impacted Files Coverage Δ
internal/cache/backend.go 61.46% <100%> (-1.5%) ⬇️
internal/checker/checker.go 64.54% <0%> (-3.87%) ⬇️
internal/archiver/blob_saver.go 95.06% <0%> (-2.47%) ⬇️
internal/backend/s3/s3.go 62.4% <0%> (+2.25%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b1e1b71...5971650. Read the comment docs.

@ifedorenko
Copy link
Contributor

Are you sure the "other" goroutine successfully downloaded the file? From what I can tell the problem can be explained by a failure on the other goroutine. Here is execution schedule returns success when download failed:

  • gr1: start download
  • gr2: block on gr1 download
  • gr1: detect and return error
  • gr2: unblock and return success

Or do I misunderstand how the caching works here?

@ifedorenko
Copy link
Contributor

Never mind. Failed download is another problem.

@fd0
Copy link
Member Author

fd0 commented Oct 4, 2018

Here's a bit more log:

cache/backend.go:174        cache.(*Backend).Load       300           performing readahead for <data/61db3a5d4c>
cache/file.go:110           cache.(*Cache).Save         300           Save to cache: <data/61db3a5d4c>
cache/file.go:89            cache.(*Cache).SaveWriter   300           Save to cache: <data/61db3a5d4c>
[...]
cache/file.go:44            cache.(*Cache).Load         300           Load from cache: <data/61db3a5d4c>
cache/backend.go:174        cache.(*Backend).Load       286           performing readahead for <data/61db3a5d4c>
cache/file.go:110           cache.(*Cache).Save         286           Save to cache: <data/61db3a5d4c>
cache/file.go:89            cache.(*Cache).SaveWriter   286           Save to cache: <data/61db3a5d4c>
Load(<data/61db3a5d4c>, 0, 0) returned error, retrying after 689.290829ms: Create: open /tmp/restic-check-cache-551233426/a14636d2d23f0d08914fe85b74cf8a7828f73b5d6e6c465b6795086221200faf/data/61/61db3a5d4c79ca52bd1ddbe0f833480d9768b6af792affdb582d927764a11af8: file exists

The third column is the goroutine number. What I reconstructed was:

  • Goroutine 300 got a request to read from the file 61db3a5d4c, it is not cached yet but should be, so it downloads the file
  • Goroutine 286 also got a request to read the same file, it is not cached yet, and the goroutine at this point executes something around line 172 here:
    if offset != 0 || length != 0 {
  • Goroutine 300 finishes the download, removes the entry from the inProgress map, and serves the request from the cache, the file exists now. The debug message indicating this is, Load from cache: <data/61db3a5d4c>
  • Goroutine 286 continues executing, it detected that the file is not yet cached, so it attempts to download it (performing readahead [...])
    • cacheFile() does not find an entry in the inProgress map (the other goroutine has removed it because it was done), so it starts downloading
    • then, the file in the cache is opened with O_EXCL
    • it exists already (created by the other goroutine), so that fails with file exists
  • The generic retry code kicks in and tries to load the file again, now the file is cached and all is good

So, now we're checking if the file has been cached in cacheFile() again. Alternatively, we could just leave the closed channel in inProgress after downloading, but that felt messy to me. It's still not perfect, but the probability that this happens again is much lower.

Thank you for asking btw, I saw a small bug and will push an new commit: in cacheFile, we're leaving an open channel in the inProgress map. That should be cleaned up before returning.

@fd0 fd0 merged commit 5971650 into master Oct 5, 2018
fd0 added a commit that referenced this pull request Oct 5, 2018
cache: Recheck before downloading
@fd0 fd0 deleted the recheck-cache branch October 5, 2018 10:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants