New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Only upload digests missing from CAS #5713

Merged
merged 1 commit into from Apr 23, 2018

Conversation

Projects
None yet
3 participants
@dotordogh
Copy link
Contributor

dotordogh commented Apr 17, 2018

Problem

Currently we are uploading all the digests, even though they may already be in the CAS. This is a lot of network requests and we would like to limit those. Second part to #5505

Solution

With this change, if there are more than 2 digests being uploaded or the digests are very large, we will call the CAS's FindMissingBlobs endpoint to filter the upload request to those digests that are missing. Otherwise, upload everything.

Result

Optimized CAS upload requests!

@dotordogh dotordogh requested review from illicitonion and ity Apr 17, 2018

@dotordogh dotordogh added the remoting label Apr 17, 2018

@dotordogh dotordogh changed the title [Remoting] Only upload digests missing from CAS Only upload digests missing from CAS Apr 17, 2018

@illicitonion
Copy link
Contributor

illicitonion left a comment

Looks great! Just a very few clean-up comments for readability :)

@@ -279,13 +276,36 @@ impl Store {
expanded_digests
})
.and_then(move |digests| {
if digests.len() < 3 {

This comment has been minimized.

@illicitonion

illicitonion Apr 17, 2018

Contributor

Maybe extract the length/size check into a function called something like upload_is_faster_than_checking_whether_to_upload so that this code reads:

if upload_is_faster_than_checking_whether_to_upload(&digests) {
  return Ok(digests.keys().map(|key| key.clone()).collect::<HashSet<Digest>());
}

in a nice self-documenting way?

(And give that method a little comment along the lines of // These values are guesses, feel free to tweak them)

Digest(extra_big_file_fingerprint(), extra_big_file_bytes().len())
}

pub fn extra_big_file_bytes() -> Bytes {

This comment has been minimized.

@illicitonion

illicitonion Apr 17, 2018

Contributor

Rather than re-implementing the file reading, how about just:

pub fn extra_big_file_bytes() -> Bytes {
  let mut bytes = big_file_bytes();
  bytes.extend(&big_file_bytes());
  bytes
}

(Part of me just wants to add more plays to all_the_henries so that it's over 1MB, though :P I'd wager making it all_the_histories would do the job...)

.expect("Error uploading file");

assert_eq!(cas.write_message_sizes.lock().unwrap().len(), 1);
assert_eq!(

This comment has been minimized.

@illicitonion

illicitonion Apr 17, 2018

Contributor

I'd be tempted to extract a method so you can call assert_contains_exactly(cas, vec![fingerprint()]), which reads a bit easier than the several assert_eq! calls on both size and contents

This comment has been minimized.

@dotordogh

dotordogh Apr 17, 2018

Contributor

How would you feel about holding off on this until changes to the test data merge? Right now we are checking that the digests exist in the cas and also checking that they are the bytes we expect. To do this, we would probably need to pass a hashmap of fingerprint to bytes to the function that does the asserts. With the new test data, this would simplify things... I could definitely implement it with a hashmap if you would like though 😄

This comment has been minimized.

@illicitonion

illicitonion Apr 18, 2018

Contributor

Happy to hold off :)

@illicitonion
Copy link
Contributor

illicitonion left a comment

Just one tiny thing to go :)

@@ -279,13 +276,30 @@ impl Store {
expanded_digests
})
.and_then(move |digests| {
if Store::upload_is_faster_than_checking_whether_to_upload(digests.clone()) {

This comment has been minimized.

@illicitonion

illicitonion Apr 18, 2018

Contributor

This clone is going to copy the contents of the HashMap, which shouldn't be necessary. If you just pass a reference to digests here, and have upload_is_faster_than_checking_whether_to_upload take a reference to a HashMap, I think that should Just Work without the copying :)

This comment has been minimized.

@dotordogh

dotordogh Apr 18, 2018

Contributor

Ahh I tried this initially but I did it wrong so it confused me. But it makes sense now! Thank you!

return Ok((
digests
.keys()
.map(|key| key.clone())

This comment has been minimized.

@stuhood

stuhood Apr 18, 2018

Member

Iterator::map(|x| x.clone()) can generally be replaced with Iterator::cloned(): so digests.keys().cloned().collect().

This comment has been minimized.

@dotordogh

dotordogh Apr 18, 2018

Contributor

This is cool! Thank you!

future::join_all(
digests
fitered_digests

This comment has been minimized.

@stuhood

stuhood Apr 18, 2018

Member

filtered*

let remote = remote.clone();
.map(move |digest| {
let entry_type = digest_entry_types.get(&digest).unwrap();
let remote = remote2.clone();

This comment has been minimized.

@stuhood

stuhood Apr 18, 2018

Member

It's not clear that this clone is necessary... it doesn't look like remote2 is used anywhere else in this closure, so I think you can just use remote2 directly in load_bytes_with?

This comment has been minimized.

@dotordogh

dotordogh Apr 18, 2018

Contributor

I tried this but it didn't work. This could be totally made up nonsense, but the way I saw it was that you can't move the same variable implicitly more than once...? Hence the clone...

Dorothy Ordogh

@dotordogh dotordogh force-pushed the dotordogh:dotordogh/remoting/missing-blobs branch from 3c599bb to cc28daa Apr 20, 2018

@illicitonion illicitonion merged commit fdb81f1 into pantsbuild:master Apr 23, 2018

1 check failed

continuous-integration/travis-ci/pr The Travis CI build failed
Details

@dotordogh dotordogh deleted the dotordogh:dotordogh/remoting/missing-blobs branch May 22, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment