New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A utility to aggregate s3 access logs. #5777

merged 3 commits into from May 20, 2018


None yet
2 participants
Copy link

benjyw commented May 3, 2018

Helps us track which binaries our S3 bandwidth costs are being spent on.

Currently produces:

936.9GB 3452 bin/clang/linux/x86_64/6.0.0/clang.tar.gz
785.9GB 3401 bin/gcc/linux/x86_64/7.3.0/gcc.tar.gz
537.2GB 6987 bin/go/linux/x86_64/1.7.3/go.tar.gz
292.0GB 6891 bin/protobuf/linux/x86_64/3.4.1/protoc
195.9GB 6981 bin/cmake/linux/x86_64/3.9.5/cmake.tar.gz
187.8GB 5029 bin/thrift/linux/x86_64/0.9.2/thrift
183.1GB 5553 bin/watchman/linux/x86_64/4.9.0-pants1/watchman
123.8GB 3322 bin/binutils/linux/x86_64/2.30/binutils.tar.gz
113.9GB 1359 bin/go/linux/x86_64/1.8.3/go.tar.gz
59.8GB 3454 bin/protoc/linux/x86_64/2.4.1/protoc
42.2GB 551 bin/go/mac/10.10/1.7.3/go.tar.gz
28.8GB 634 bin/thrift/linux/x86_64/0.10.0/thrift
19.8GB 1520 bin/node/linux/x86_64/v6.9.1/node.tar.gz
A utility to aggregate s3 access logs.
Helps us track which binaries our S3 bandwidth costs are being spent on.

@benjyw benjyw force-pushed the benjyw:s3_log_aggregator branch from 029a3b9 to 15a8879 May 4, 2018


kwlzn approved these changes May 4, 2018

@stuhood stuhood force-pushed the pantsbuild:master branch from b6bb42d to 9e2fdb5 May 11, 2018

stuhood added a commit that referenced this pull request May 14, 2018

Allow alternate binaries download urls generation and convert GoDistr…
…ibution and LLVM subsystems to use it (#5780)

### Problem

`BinaryTool` is a great recent development which makes using binaries downloaded lazily from a specified place much more declarative and much more extensible. However, it's still only able to download from either our S3 hosting, or a mirror.

The previous structure requires the urls provided to the global option `--binaries-baseurls` to point to an exact mirror of the hierarchy we provide in our S3 hosting, but that can change at any time. It's not incredibly difficult to write a script to mirror our hosting into an internal network, but in general there's no reason the layout of binaries in `~/.cache/pants/bin/` needs to determine where those binaries are downloaded from.

Our bandwidth costs in S3 have recently increased due to the introduction of clang and gcc in #5490. *See #5777 and #5779 for further context on S3 hosting.*  There are reliable binary downloads for some of these tools, which we would be remiss not to use if we can do it in a structured way.

### Solution

- Introduce a `urls=` argument to multiple methods of `BinaryUtil` for `BinaryTool`s that don't download from our s3.
- Add support for extracting (not creating) `.tar.xz` archives by adding the `xz` BinaryTool (see pantsbuild/binaries#66) and integrating it into BinaryTool's `archive_type` selection mechanism.
- Use the above to download the `go` and `llvm` binaries from their official download urls.
  - Also, rename the `Clang` subsystem to `LLVM` as the binary download we use now (for ubuntu 16.04, currently) also contains many other LLVM tools, including e.g. `lld`.

### Result

Urls for binary downloads can now be created in a structured way for external downloads, with the `--force-baseurls` option as an escape hatch. Some binaries now default to external urls provided for public use by the maintainers of the software to download, thanks to the introduction of the `xz` binary tool. Two out of the three largest bandwidth users among our provided binaries have been switched to use the download urls provided by the maintainers of each project (LLVM and Go). gcc still needs to be fixed, which will happen in a separate PR.

@benjyw benjyw merged commit 4839d2a into pantsbuild:master May 20, 2018

1 check passed

continuous-integration/travis-ci/pr The Travis CI build passed

@benjyw benjyw deleted the benjyw:s3_log_aggregator branch May 20, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment