Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce verbosity of warnings when remote caching has errors #11740

Merged
merged 3 commits into from Mar 19, 2021

Conversation

Eric-Arellano
Copy link
Contributor

Closes #11494. We still always log errors, but only at the warn level for the 1st instance of a particular error, and then every 5 times after that. Otherwise, we use the debug level.

For example:

❯ ./pants --remote-cache-read --remote-cache-write --remote-store-address='grpcs://fake.com' lint src/python/pants/util/strutil.py --no-process-execution-use-local-cache
18:35:59.93 [WARN] Failed to write to remote cache (1 occurrences so far): Error from server in response to find_missing_blobs_request: Status { code: Internal, message: "Unexpected compression flag: 60, while receiving response with status: 404 Not Found" }
18:36:00.72 [WARN] Failed to write to remote cache (5 occurrences so far): Error from server in response to find_missing_blobs_request: Status { code: Internal, message: "Unexpected compression flag: 60, while receiving response with status: 404 Not Found" }
18:36:01.10 [INFO] Completed: Building docformatter.pex with 1 requirement: docformatter>=1.3.1,<1.4
18:36:01.20 [INFO] Completed: lint - Docformatter succeeded.
18:36:01.24 [WARN] Failed to write to remote cache (10 occurrences so far): Error from server in response to find_missing_blobs_request: Status { code: Internal, message: "Unexpected compression flag: 60, while receiving response with status: 404 Not Found" }

This intentionally does not add an option to control this behavior, instead trying to reach a sensible default that balances the desire for logs to know when things aren't working with not wanting to crowd the console. Pants arguably has options fatigue, and adding an option here would add lots of complexity (including wiring over FFI).

Note that for simplicity of the implementation, this does not log at warn level the final count of errors, as we do not know what the final call to run() will be. Users can use -ldebug if they care about this, especially with --log-levels-by-target.

[ci skip-build-wheels]

# Building wheels and fs_util will be skipped. Delete if not intended.
[ci skip-build-wheels]
@Eric-Arellano
Copy link
Contributor Author

and then every 5 times after that.

I chose 5 because I get 5 read errors and 18 write errors when using an invalid address and running lint with no local cache. Increments of 5 seemed reasonable, but it is somewhat arbitrary.

# Rust tests and lints will be skipped. Delete if not intended.
[ci skip-rust]

# Building wheels and fs_util will be skipped. Delete if not intended.
[ci skip-build-wheels]
# Building wheels and fs_util will be skipped. Delete if not intended.
[ci skip-build-wheels]
Base automatically changed from master to main March 19, 2021 19:20
@Eric-Arellano Eric-Arellano merged commit 7cf3cfa into pantsbuild:main Mar 19, 2021
@Eric-Arellano Eric-Arellano deleted the remoting-logging branch March 19, 2021 21:45
@benjyw benjyw mentioned this pull request Mar 20, 2021
Eric-Arellano added a commit that referenced this pull request Apr 8, 2021
…tty (#11859)

#11740 added new aggregation of remote cache logs. It was a fine start, but did not work properly for several error messages because we were dumping the headers, which included unique values like timestamps or the digest value:

> 23:16:17.28 [WARN] Failed to write to remote cache (1 occurrences so far): Error from server in response to find_missing_blobs_request: Status { code: PermissionDenied, message: "insufficient permissions", metadata: MetadataMap { headers: {"server": "awselb/2.0", "date": "Mon, 05 Apr 2021 23:16:17 GMT", "content-type": "application/grpc", "content-length": "0"} } }

Further, these headers were extremely noisy and most users don't care about them. They're now short:

> 23:16:17.28 [WARN] Failed to write to remote cache (1 occurrences so far): PermissionDenied:  "insufficient permissions"

Likewise, for most use cases, it was still too noisy to dump every increment. Our average use case should only log the very first time so that people are aware of the problem, but don't get flooded. In CI, users can set to log more frequently, which is now implemented via exponential backoff based on the exponent of 2.

[ci skip-build-wheels]
Eric-Arellano added a commit to Eric-Arellano/pants that referenced this pull request Apr 8, 2021
…tty (pantsbuild#11859)

pantsbuild#11740 added new aggregation of remote cache logs. It was a fine start, but did not work properly for several error messages because we were dumping the headers, which included unique values like timestamps or the digest value:

> 23:16:17.28 [WARN] Failed to write to remote cache (1 occurrences so far): Error from server in response to find_missing_blobs_request: Status { code: PermissionDenied, message: "insufficient permissions", metadata: MetadataMap { headers: {"server": "awselb/2.0", "date": "Mon, 05 Apr 2021 23:16:17 GMT", "content-type": "application/grpc", "content-length": "0"} } }

Further, these headers were extremely noisy and most users don't care about them. They're now short:

> 23:16:17.28 [WARN] Failed to write to remote cache (1 occurrences so far): PermissionDenied:  "insufficient permissions"

Likewise, for most use cases, it was still too noisy to dump every increment. Our average use case should only log the very first time so that people are aware of the problem, but don't get flooded. In CI, users can set to log more frequently, which is now implemented via exponential backoff based on the exponent of 2.

[ci skip-build-wheels]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

squash multiple warnings about remote cache errors
4 participants