-
Notifications
You must be signed in to change notification settings - Fork 236
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
A59: gRPC Audit Logging #346
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like a great start!
A59-audit-logging.md
Outdated
|RPC Method |Method the RPC hit. |"/pkg.service/foo" | | ||
|Principal |Identity the RPC. Currently only available in certificate-based TLS authentication.|"spiffe://foo/user1" | | ||
|Timestamp |Time when the RPC happens. |1649376211 | | ||
|Denying Policy Name|Policy name that denied the RPC. Empty if RPC is allowed. |"policy_name" | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't it possible that we also have a particular policy name response for allowing a request, if we're using an RBAC filter with an ALLOW action?
Also, aren't there possible configurations where we'll wind up generating multiple log entries for each RPC, one for each RBAC filter? Don't we need some way to know that those duplicate messages all refer to the same RPC? In particular, note that there is no restriction that prevents two RBAC filters from using the same policy name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First of all, given the scenario you used as as example, I realized it's still necessary to add some more explanation about the ON_DENY
and ON_ALLOW
. Basically, I think it'd be quite intuitive to consider ON_DENY
condition met when a RBAC rejects the RPC, even if the action of it is ALLOW
. Likewise, I think we should also consider ON_ALLOW
condition met when a RBAC lets the RPC through, even if the action could be DENY
. If the overloading of DENY
and ALLOW
for action and decision here could cause confusion, we can think about using other words like ON_AUTHORIZED
or so. WDYT?
About logging the policy name, I changed it to log the allowing policy name as well for now. But I think some discussion related to this is still needed:
- Like you said, one caveat is when there could be multiple log entries. I believe we discussed it before where we assumed this was not what users would generally want. In the gRPC authz policy section, I specifically made this assumption and added more paragraphs to explain the configuration. PTAL.
- In xDS, particularly where users can have arbitrary numbers of RBAC filters, they could audit the same RPC multiple times if they really want. Let me know if I am wrong but I did't find gRPC generating any sort of request UUID on the server side to identify a RPC. So if users really want to do this, they would need to think about how to dedup the audits if needed. I guess if feasible, we could include some RPC metadata which users could configure out a way with.
- Now let's assume users only want to audit once even with multiple RBACs, so they would only configure
ON_ALLOW
at the last RBAC. The question is how meaningful it would be to log just the policy name of the last RBAC? I don't think we should simply accumulate policy names here for audit on allow.
I don't know where users would reasonably want to use the same policy name in multiple active RBAC filters, and also want to audit log multiple times. I'd prefer we consider this as an anti-pattern and I don't see much we could do to resolve the confusion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- I don't think it will be a common config, but it is a possible config, so we need to consider the implications when that config is used.
- Correct, there is no UUID for each RPC. And I agree that that's an issue for the multiple-logging case. I suppose we could try to generate some sort of UUID, although I'm not crazy about it.
- I don't fully follow the case you're describing here. In general, I think that if there is a particular policy name, it seems right to log it. But I think there may be cases where they would not be a specific policy name to log: for example, if an RBAC policy has action
DENY
but has audit conditionON_ALLOW
, then we will log a message if none of the policies match.
I'd like to get @ejona86's thoughts on this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In 3, what I meant was if there are two RBACs present, an authorized request would need to be allowed by both of them. Now if users only want one log entry on allow, which is the assumption we have in SDK, the audit condition ON_ALLOW
will only be added to the second RBAC policy. It will not be accurate to only log the second RBAC policy name in this case though, since the first RBAC technically contributed to allowing the request as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
During our offline discussion, we agreed that it was a reasonable expectation to audit at most once per RPC and the gRPC authorization API (SDK) makes this assumption for the users. We also concluded that while the possibility of configuring multiple audit logs per RPC is acknowledged, we would not consider providing any mechanism to de-duplicate those log entries in this gRFC.
As suggested, I also added more paragraphs elaborating on this in the xDS API changes
section + Audit Condition in the HTTP Connection Manager
section in Rationale
. PTAL. @markdroth @ejona86
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that the policy name is useful in the DENY case, but questionable in the ALLOW case. If you were outputting policy names for ALLOW, you would want the policy names of all ALLOW rules that had matched, which doesn't seem worth the complication of supporting. If there is only 1 ALLOW rule, then the name doesn't need to be logged since it must always be that rule's policy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To avoid confusion, the DENY and ALLOW cases in your comment above refer to the RBAC actions, not decisions.
According to the definition in RBAC, "A match occurs when at least one policy matches the request. The policies are evaluated in lexicographic order of the policy name.", so the behavior is actually deterministic regarding which policy (aka rule in SDK) gets matched. So I think it's reasonable to just log that policy name.
Thanks, @markdroth, for the review! I revised the doc and replied to the comments. PTAL. |
Signed-off-by: Luwei Ge <lwge@google.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly editorial comments, but we do need to define the stdout audit behavior.
Looks good.
Signed-off-by: Luwei Ge <lwge@google.com>
Signed-off-by: Luwei Ge <lwge@google.com>
Signed-off-by: Luwei Ge <lwge@google.com>
Signed-off-by: Luwei Ge <lwge@google.com>
Signed-off-by: Luwei Ge <lwge@google.com>
A59-audit-logging.md
Outdated
|
||
_What is certain for now is we need to support third-party audit logger | ||
implementations registered by users._ | ||
The timestamp, as the number of seconds from the Unix Epoch, is generated from |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we sure that seconds since the Unix epoch is the right time format to use here? The output is JSON, so it's presumably meant to be a bit more user-friendly. Maybe we should format this as a human-readable time string? Or is that just going to open up problems due to (e.g.) locale-specific formatting?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When timestamp was still a field in the audit context, unix seconds was given as an example. So I assumed it was preferred but I don't have any opinion at all.
We could change to a different format, e.g., RFC3339, which seems to be recommended generally.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc @erm-g who's authoring the stdout logger in Go. We need to ensure the output is consistent across languages.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@markdroth I don't have a strong opinion, but I think in most use cases the output will be fed to some aggregation pipeline for further analysis. So Unix epoch format is an ok default.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have checked with @ejona86 about using RFC3339 seconds and that seems fine. Aggregation pipelines should be able to process either format.
I have changed the example here.
We decided to not populate `policy_name` with the HTTP filter name in xDS case. So removing it from `GenerateServiceConfig`. This will be consistent across languages. The gRFC [PR](grpc/proposal#346) has been updated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only editorial notes.
@markdroth, were you happy with this gRFC? To my knowledge it has been implemented in C and Go, so apparently only editorial issues remain?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I missed the latest reviews till now. Fixed. Let me know if it's good to merge. Thanks! @ejona86 @markdroth
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great! I have just a couple of minor wording changes that I'd like to see before merging. Thanks!
A59-audit-logging.md
Outdated
|
||
As is also documented in all the languages below, the logging function is | ||
executed by gRPC synchronously during the authorization process. Therefore, | ||
its implementation should not block the RPC. When needed, it should invoke any |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please change "should not block" to "must not block". This is not optional; it's a hard requirement.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
A59-audit-logging.md
Outdated
As is also documented in all the languages below, the logging function is | ||
executed by gRPC synchronously during the authorization process. Therefore, | ||
its implementation should not block the RPC. When needed, it should invoke any | ||
long running operations asynchronously so that the function itself returns |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest changing the wording of this sentence to make its meaning a bit clearer:
If a logger implementation needs to trigger any long-running work, it must do so asynchronously so that the function itself returns promptly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestion taken.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the review @markdroth!
A59-audit-logging.md
Outdated
|
||
As is also documented in all the languages below, the logging function is | ||
executed by gRPC synchronously during the authorization process. Therefore, | ||
its implementation should not block the RPC. When needed, it should invoke any |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
A59-audit-logging.md
Outdated
As is also documented in all the languages below, the logging function is | ||
executed by gRPC synchronously during the authorization process. Therefore, | ||
its implementation should not block the RPC. When needed, it should invoke any | ||
long running operations asynchronously so that the function itself returns |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestion taken.
Still lacking the language-specific API sections. But I'd like to open this up so that folks commenting on envoyproxy/envoy#26001 can have more context.