Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A59: gRPC Audit Logging #346

Merged
merged 33 commits into from
Oct 2, 2023
Merged

A59: gRPC Audit Logging #346

merged 33 commits into from
Oct 2, 2023

Conversation

rockspore
Copy link
Contributor

Still lacking the language-specific API sections. But I'd like to open this up so that folks commenting on envoyproxy/envoy#26001 can have more context.

Copy link
Member

@markdroth markdroth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like a great start!

A59-audit-logging.md Outdated Show resolved Hide resolved
A59-audit-logging.md Outdated Show resolved Hide resolved
A59-audit-logging.md Outdated Show resolved Hide resolved
|RPC Method |Method the RPC hit. |"/pkg.service/foo" |
|Principal |Identity the RPC. Currently only available in certificate-based TLS authentication.|"spiffe://foo/user1" |
|Timestamp |Time when the RPC happens. |1649376211 |
|Denying Policy Name|Policy name that denied the RPC. Empty if RPC is allowed. |"policy_name" |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't it possible that we also have a particular policy name response for allowing a request, if we're using an RBAC filter with an ALLOW action?

Also, aren't there possible configurations where we'll wind up generating multiple log entries for each RPC, one for each RBAC filter? Don't we need some way to know that those duplicate messages all refer to the same RPC? In particular, note that there is no restriction that prevents two RBAC filters from using the same policy name.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First of all, given the scenario you used as as example, I realized it's still necessary to add some more explanation about the ON_DENY and ON_ALLOW. Basically, I think it'd be quite intuitive to consider ON_DENY condition met when a RBAC rejects the RPC, even if the action of it is ALLOW. Likewise, I think we should also consider ON_ALLOW condition met when a RBAC lets the RPC through, even if the action could be DENY. If the overloading of DENY and ALLOW for action and decision here could cause confusion, we can think about using other words like ON_AUTHORIZED or so. WDYT?

About logging the policy name, I changed it to log the allowing policy name as well for now. But I think some discussion related to this is still needed:

  1. Like you said, one caveat is when there could be multiple log entries. I believe we discussed it before where we assumed this was not what users would generally want. In the gRPC authz policy section, I specifically made this assumption and added more paragraphs to explain the configuration. PTAL.
  2. In xDS, particularly where users can have arbitrary numbers of RBAC filters, they could audit the same RPC multiple times if they really want. Let me know if I am wrong but I did't find gRPC generating any sort of request UUID on the server side to identify a RPC. So if users really want to do this, they would need to think about how to dedup the audits if needed. I guess if feasible, we could include some RPC metadata which users could configure out a way with.
  3. Now let's assume users only want to audit once even with multiple RBACs, so they would only configure ON_ALLOW at the last RBAC. The question is how meaningful it would be to log just the policy name of the last RBAC? I don't think we should simply accumulate policy names here for audit on allow.

I don't know where users would reasonably want to use the same policy name in multiple active RBAC filters, and also want to audit log multiple times. I'd prefer we consider this as an anti-pattern and I don't see much we could do to resolve the confusion.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. I don't think it will be a common config, but it is a possible config, so we need to consider the implications when that config is used.
  2. Correct, there is no UUID for each RPC. And I agree that that's an issue for the multiple-logging case. I suppose we could try to generate some sort of UUID, although I'm not crazy about it.
  3. I don't fully follow the case you're describing here. In general, I think that if there is a particular policy name, it seems right to log it. But I think there may be cases where they would not be a specific policy name to log: for example, if an RBAC policy has action DENY but has audit condition ON_ALLOW, then we will log a message if none of the policies match.

I'd like to get @ejona86's thoughts on this.

Copy link
Contributor Author

@rockspore rockspore Mar 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In 3, what I meant was if there are two RBACs present, an authorized request would need to be allowed by both of them. Now if users only want one log entry on allow, which is the assumption we have in SDK, the audit condition ON_ALLOW will only be added to the second RBAC policy. It will not be accurate to only log the second RBAC policy name in this case though, since the first RBAC technically contributed to allowing the request as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

During our offline discussion, we agreed that it was a reasonable expectation to audit at most once per RPC and the gRPC authorization API (SDK) makes this assumption for the users. We also concluded that while the possibility of configuring multiple audit logs per RPC is acknowledged, we would not consider providing any mechanism to de-duplicate those log entries in this gRFC.

As suggested, I also added more paragraphs elaborating on this in the xDS API changes section + Audit Condition in the HTTP Connection Manager section in Rationale. PTAL. @markdroth @ejona86

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that the policy name is useful in the DENY case, but questionable in the ALLOW case. If you were outputting policy names for ALLOW, you would want the policy names of all ALLOW rules that had matched, which doesn't seem worth the complication of supporting. If there is only 1 ALLOW rule, then the name doesn't need to be logged since it must always be that rule's policy.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To avoid confusion, the DENY and ALLOW cases in your comment above refer to the RBAC actions, not decisions.

According to the definition in RBAC, "A match occurs when at least one policy matches the request. The policies are evaluated in lexicographic order of the policy name.", so the behavior is actually deterministic regarding which policy (aka rule in SDK) gets matched. So I think it's reasonable to just log that policy name.

A59-audit-logging.md Outdated Show resolved Hide resolved
A59-audit-logging.md Outdated Show resolved Hide resolved
@rockspore
Copy link
Contributor Author

Thanks, @markdroth, for the review! I revised the doc and replied to the comments. PTAL.

A59-audit-logging.md Outdated Show resolved Hide resolved
A59-audit-logging.md Outdated Show resolved Hide resolved
A59-audit-logging.md Outdated Show resolved Hide resolved
A59-audit-logging.md Outdated Show resolved Hide resolved
A59-audit-logging.md Outdated Show resolved Hide resolved
Copy link
Member

@ejona86 ejona86 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly editorial comments, but we do need to define the stdout audit behavior.

Looks good.

A59-audit-logging.md Outdated Show resolved Hide resolved
A59-audit-logging.md Outdated Show resolved Hide resolved
A59-audit-logging.md Outdated Show resolved Hide resolved
A59-audit-logging.md Outdated Show resolved Hide resolved
A59-audit-logging.md Outdated Show resolved Hide resolved
Signed-off-by: Luwei Ge <lwge@google.com>
Signed-off-by: Luwei Ge <lwge@google.com>
Signed-off-by: Luwei Ge <lwge@google.com>
Signed-off-by: Luwei Ge <lwge@google.com>
Signed-off-by: Luwei Ge <lwge@google.com>
Signed-off-by: Luwei Ge <lwge@google.com>
Signed-off-by: Luwei Ge <lwge@google.com>

_What is certain for now is we need to support third-party audit logger
implementations registered by users._
The timestamp, as the number of seconds from the Unix Epoch, is generated from
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we sure that seconds since the Unix epoch is the right time format to use here? The output is JSON, so it's presumably meant to be a bit more user-friendly. Maybe we should format this as a human-readable time string? Or is that just going to open up problems due to (e.g.) locale-specific formatting?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When timestamp was still a field in the audit context, unix seconds was given as an example. So I assumed it was preferred but I don't have any opinion at all.

We could change to a different format, e.g., RFC3339, which seems to be recommended generally.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @erm-g who's authoring the stdout logger in Go. We need to ensure the output is consistent across languages.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@markdroth I don't have a strong opinion, but I think in most use cases the output will be fed to some aggregation pipeline for further analysis. So Unix epoch format is an ok default.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have checked with @ejona86 about using RFC3339 seconds and that seems fine. Aggregation pipelines should be able to process either format.

I have changed the example here.

rockspore added a commit to grpc/grpc that referenced this pull request Aug 8, 2023
We decided to not populate `policy_name` with the HTTP filter name in
xDS case. So removing it from `GenerateServiceConfig`. This will be
consistent across languages. The gRFC
[PR](grpc/proposal#346) has been updated.
Copy link
Member

@ejona86 ejona86 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only editorial notes.

@markdroth, were you happy with this gRFC? To my knowledge it has been implemented in C and Go, so apparently only editorial issues remain?

A59-audit-logging.md Outdated Show resolved Hide resolved
A59-audit-logging.md Outdated Show resolved Hide resolved
A59-audit-logging.md Outdated Show resolved Hide resolved
A59-audit-logging.md Outdated Show resolved Hide resolved
Copy link
Contributor Author

@rockspore rockspore left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I missed the latest reviews till now. Fixed. Let me know if it's good to merge. Thanks! @ejona86 @markdroth

A59-audit-logging.md Outdated Show resolved Hide resolved
A59-audit-logging.md Outdated Show resolved Hide resolved
A59-audit-logging.md Outdated Show resolved Hide resolved
A59-audit-logging.md Outdated Show resolved Hide resolved
Copy link
Member

@markdroth markdroth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great! I have just a couple of minor wording changes that I'd like to see before merging. Thanks!


As is also documented in all the languages below, the logging function is
executed by gRPC synchronously during the authorization process. Therefore,
its implementation should not block the RPC. When needed, it should invoke any
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please change "should not block" to "must not block". This is not optional; it's a hard requirement.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

As is also documented in all the languages below, the logging function is
executed by gRPC synchronously during the authorization process. Therefore,
its implementation should not block the RPC. When needed, it should invoke any
long running operations asynchronously so that the function itself returns
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest changing the wording of this sentence to make its meaning a bit clearer:

If a logger implementation needs to trigger any long-running work, it must do so asynchronously so that the function itself returns promptly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion taken.

Copy link
Contributor Author

@rockspore rockspore left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review @markdroth!


As is also documented in all the languages below, the logging function is
executed by gRPC synchronously during the authorization process. Therefore,
its implementation should not block the RPC. When needed, it should invoke any
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

As is also documented in all the languages below, the logging function is
executed by gRPC synchronously during the authorization process. Therefore,
its implementation should not block the RPC. When needed, it should invoke any
long running operations asynchronously so that the function itself returns
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion taken.

@rockspore rockspore merged commit 257a7aa into grpc:master Oct 2, 2023
1 check passed
@rockspore rockspore deleted the audit-logging branch October 2, 2023 17:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants