New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

`fs_util directory save` has `--json` option which outputs stats #6318

Closed
illicitonion opened this Issue Aug 8, 2018 · 5 comments

Comments

Projects
2 participants
@illicitonion
Copy link
Contributor

illicitonion commented Aug 8, 2018

Right now, fs_util directory save, when successful, always outputs a single line: digest digest-size-bytes (e.g. e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855 0)

It would be useful to track the number of {files, bytes} which were {ingested, uploaded}. I'm imagining output like:

$ fs_util --local-store-path="${HOME}/.cache/pants/lmdb_store" directory save --root=. '**'
{
  "digest_hash": "0e5d2b4f970e7933878490c852132073ac2baac53822f9f08467783132f88d4c",
  "digest_size_bytes": 79,
  "ingested_file_count": 10,
  "ingested_file_bytes": 10719,
  "uploaded_file_count": 3,
  "uploaded_file_bytes": 254
}

As a first cut, let's only support this when we're uploading to a remote.

To do this, I suggest we:

  1. Modify Store::ensure_remote_has_recursive to return a Future of a tuple/struct/whatever containing the four new values ({ingested, uploaded} file {count, bytes}), instead of a ().
  2. In fs_util, use serde_json to serialize a struct containing these values, as well as the digest info, into a string.
  3. Wire up a new flag in fs_util to optionally output this.

@illicitonion illicitonion added this to To do in Remoting via automation Aug 8, 2018

@blorente

This comment has been minimized.

Copy link
Contributor

blorente commented Aug 22, 2018

@illicitonion

This comment has been minimized.

Copy link
Contributor

illicitonion commented Aug 22, 2018

Took a look from awful plane wifi, it looks great - left a few minor comments, but please send our a PR and let's get it merged! :) Thanks!

@blorente

This comment has been minimized.

Copy link
Contributor

blorente commented Aug 23, 2018

I am working on Step 2, and I have a doubt as to what approach to take. There are at least two possible approaches to this:

  • To make Digests, Fingerprints, etc serializable, and thus we would only need
    Something like SummaryWithDigest {digest, summary}
    However, this would make output similar to this
     {
       digest: { fingerprint: ..., size: ... },
       summary: {
         ingested_file_count: ...
       }
     }

Which may not be desirable.
This could be mitigated by inserting the digest into the summary, yielding something like:

    {
      digest: { fingerprint: ..., size: ... },
      ingested_file_count: ...
    }

However, ensure_remote_has_recursive currently takes a vector of digests
(even though we only pass one), so this will not work in general.

  • Destructuring the digest and summary and create a dummy data structure:
      #[derive(Serialize)]
      struct SummaryWithDigest {
        operation_fingerprint: [u8; 32],
        operation_size: usize,
        ingested_files: usize,
        ingested_bytes: usize,
        uploaded_files: usize,
        uploaded_bytes: usize
      }

The advantages of this are that only fs/fs_util knows about serialization, and we (almost) do not
have to modify the rest of the code (except for making FINGERPRINT_SIZE public). The disadvantage is that it is more verbose.

I'm leaning towards number 2 for now, since it fits the spec.

@illicitonion

This comment has been minimized.

Copy link
Contributor

illicitonion commented Aug 23, 2018

SummaryWithDigest {digest, summary} sounds great to me :)

@blorente

This comment has been minimized.

Copy link
Contributor

blorente commented Aug 23, 2018

Okay, thanks!

illicitonion added a commit that referenced this issue Aug 29, 2018

Add json upload summary to `fs_util` (#6318) (#6389)
### Problem

fs_util only displayed a minimal output from the operation done, in the form of <fingerprint> <size> pairs.
Closes #6318 

### Solution

Implement richer output, which can then be consumed by other tools.

### Result

An optional flag that generates a json report of the files ingested and uploaded, as well as their sizes:

This execution:
```bash
$ fs_util --server-address localhost:40351 --local-store-path="${HOME}/.cache/pants/lmdb_store" directory save --root=. --output-mode=json  -- '**'
```

Produces (the files were already in the remote, that is why it reports 0 upload):

```json
{
  "digest": {
    "fingerprint": "f9e7b01bdf87e9d8e5d50f5e451bd173e792f980f331a5073b8bd9de126f45ea",
    "size": 162
  },
  "summary": {
    "ingested_file_count": 4,
    "ingested_file_bytes": 17520,
    "uploaded_file_count": 0,
    "uploaded_file_bytes": 0
  }
}
```
In addition to that, this PR includes a stub implementation of a binary that can serve as a local server, taking advantage of the test utility `mock::StubCAS`. It is by no means finished, but it serves as an easy testing ground for this new functionality:

```bash
$ cd src/rust/engine/testutil/local_cas
$ ../../../../../build-support/bin/native/cargo.sh run
Started CAS at address: localhost:<random port>
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment