Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor(general): Increase restore progress granularity #3655

Merged
merged 13 commits into from
May 10, 2024

Conversation

e-sumin
Copy link
Contributor

@e-sumin e-sumin commented Feb 18, 2024

When restoring huge file(s), the progress reporting is done in a bit weird way:

kopia_test % kopia snapshot restore ka2084d263182164b6cf3456668e6b6da /Users/eugen.sumin/kopia_test/2
Restoring to local filesystem (/Users/eugen.sumin/kopia_test/2) with parallelism=8...
Processed 6 (5.4 GB) of 5 (5.4 GB) 1.6 MB/s (100.0%) remaining 0s.
Processed 6 (5.4 GB) of 5 (5.4 GB) 1.6 MB/s (100.0%) remaining 0s.
Processed 6 (5.4 GB) of 5 (5.4 GB) 1.6 MB/s (100.0%) remaining 0s.
Processed 6 (5.4 GB) of 5 (5.4 GB) 1.5 MB/s (100.0%) remaining 0s.
Processed 6 (5.4 GB) of 5 (5.4 GB) 1.5 MB/s (100.0%) remaining 0s.
Processed 6 (5.4 GB) of 5 (5.4 GB) 1.5 MB/s (100.0%) remaining 0s.
Restored 5 files, 1 directories and 0 symbolic links (5.4 GB).

In fact, the amount of restored data is dumped when particular file completely restored.

This PR contains the least invasive change, which allows us to see progress update while file is downloaded from object storage.

Restoring to local filesystem (/Users/eugen.sumin/kopia_test/55) with parallelism=8...
Processed 2 (3.1 MB) of 5 (1.8 GB).
Processed 4 (459.6 MB) of 5 (1.8 GB) 270.3 MB/s (25.2%) remaining 4s.
Processed 4 (468.7 MB) of 5 (1.8 GB) 269 MB/s (25.7%) remaining 4s.
Processed 4 (741.6 MB) of 5 (1.8 GB) 269 MB/s (40.6%) remaining 3s.
Processed 4 (1.1 GB) of 5 (1.8 GB) 280 MB/s (57.6%) remaining 2s.
Processed 5 (1.4 GB) of 5 (1.8 GB) 291.1 MB/s (75.2%) remaining 1s.
Processed 5 (1.4 GB) of 5 (1.8 GB) 289.8 MB/s (75.6%) remaining 1s.
Processed 5 (1.6 GB) of 5 (1.8 GB) 270.2 MB/s (85.3%) remaining 0s.
Processed 5 (1.7 GB) of 5 (1.8 GB) 256.3 MB/s (95.0%) remaining 0s.
Processed 6 (1.8 GB) of 5 (1.8 GB) 251 MB/s (100.0%) remaining 0s.
Processed 6 (1.8 GB) of 5 (1.8 GB) 251 MB/s (100.0%) remaining 0s.
Restored 5 files, 1 directories and 0 symbolic links (1.8 GB).

@e-sumin
Copy link
Contributor Author

e-sumin commented Feb 18, 2024

@KastenMike @julio-lopez could you please review ?

@@ -35,6 +35,10 @@ func VerifyObject(ctx context.Context, cr contentReader, oid ID) ([]content.ID,
return tracker.contentIDs(), nil
}

// FileReadingProgressCallback is a callback intended to be used during file copying
// to report amount of data sent to destination.
type FileReadingProgressCallback func(chunkSize int64)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need to change object reader at all, can't we just emit the progress as we're reading the bytes during restore?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this does not seem used anywhere, the rest looks good

Copy link

codecov bot commented Feb 19, 2024

Codecov Report

Attention: Patch coverage is 83.20000% with 21 lines in your changes are missing coverage. Please review.

Project coverage is 77.18%. Comparing base (cb455c6) to head (3b49d9a).
Report is 133 commits behind head on master.

Files Patch % Lines
cli/cli_progress.go 77.58% 8 Missing and 5 partials ⚠️
snapshot/restore/local_fs_output.go 63.15% 7 Missing ⚠️
snapshot/restore/restore.go 95.23% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #3655      +/-   ##
==========================================
+ Coverage   75.86%   77.18%   +1.31%     
==========================================
  Files         470      479       +9     
  Lines       37301    28756    -8545     
==========================================
- Hits        28299    22194    -6105     
+ Misses       7071     4660    -2411     
+ Partials     1931     1902      -29     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@@ -403,6 +404,14 @@ func (o *FilesystemOutput) copyFileContent(ctx context.Context, targetPath strin
}
defer r.Close() //nolint:errcheck

type withProgressTracking interface {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's easier to wrap "r" so that it reports bytes written in the callback.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the idea, I've changed the code, is it what you meant ?

@e-sumin
Copy link
Contributor Author

e-sumin commented Mar 4, 2024

@jkowalski do you think the changes could be merged ? Or something else has to be done before ?

@e-sumin
Copy link
Contributor Author

e-sumin commented Mar 11, 2024

@jkowalski do you think the changes could be merged ? Or something else has to be done before ?

Don't want to be annoying, just a friendly reminder :)

@denisvmedia
Copy link
Collaborator

@jkowalski @redgoat650 @julio-lopez could you please check this PR?

}

return write(targetPath, r, f.Size(), o.copier)
return write(targetPath, wr, f.Size(), o.copier)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@e-sumin Question about the progress reporting using the reader tracking mechanism:

What is the best way to evaluate progress, as in the definition of done on processing X bytes ? When we have read X bytes from the reader does that mean we are done writing X bytes to the target ? In other words, would it be possible to get in a scenario where we have read 100% of the data from repository and are still stuck writing the final block due to a network issue (NFS) or similar ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's true, but this way of progress update was chosen because it is less intrusive. Also, when reading, usually, the block is very small in comparison with total amount of data. So, we, anyway, can get 100% earlier than last block will be read. So, I'd not consider this as critical flaw.

bytesWritten := int64(0)
progressCallback := func(chunkSize int64) {
bytesWritten += chunkSize
c.stats.RestoredTotalFileSize.Add(chunkSize)
Copy link
Collaborator

@Shrekster Shrekster Apr 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit; this callback is hit each time we have read chunkSize from the repository, not necessarily "bytes written". Is that correct ? This is w.r.t. to the my other comment as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's correct. This progress calculation way treats bytes read as bytes written.

@Shrekster
Copy link
Collaborator

Shrekster commented Apr 4, 2024

@e-sumin and I discussed this over a call and he explained why the reader based approach is correct / simpler for now. I also agree with his comments above.

@e-sumin is looking into adding unit-test coverage with a custom test-only callback plumbing to make it non-flaky.

@e-sumin
Copy link
Contributor Author

e-sumin commented May 6, 2024

I did some refactoring of snapshot restore:

  • progress reporting interval is taken from command line, as it was done for upload progress reporting
    • currently upload progress instance is keeping progress flag and interval and restore progress just utilizes its value (this was originally discussed with @Shrekster as less invasive thing)
    • cli progress refactoring should be done (but probably it would be better to have it as a separate PR). We need to extract flag and interval and throttling mechanism to reused part and create progress instance depending on operation (upload or restore)
      • this also will allow to write separate tests for progress reporting format and progress reporting intervals
  • probably it would be better to pass restore progress interface to copier and increase counters separately when needed instead of keeping another copy of stats in copier and passing its values to progress reporter.

Some of changes I've done was done to be able to have an unit test which verifies that progress is reported properly.

@e-sumin e-sumin requested a review from Shrekster May 6, 2024 10:08
@e-sumin
Copy link
Contributor Author

e-sumin commented May 9, 2024

I've fixed the test and merged master into my branch. The tests should pass now.

@Shrekster Shrekster merged commit 2b92388 into kopia:master May 10, 2024
23 checks passed

// Progress is invoked by copier to report status of snapshot restoration.
type Progress interface {
SetCounters(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Too late to comment here, but I believe having an args struct would be better.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants