New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Concurrent Session Control #4138
Conversation
7c6c98c
to
53e1ab3
Compare
Thanks for the awesome PR Description. Old PR FeedbackMy initial gut reaction, is that we're leaking the implementation detail of semaphores to customers. It feels like a new concept, and I'm proposing for tctl we stick to sessions vs sems? # lists active sessions
$ tctl sem ls
+ $ tctl sessions ls
# Cancel vs rm? Aren't these the same?
$ tctl sem cancel
+ $ tctl session deny
# Removes / kills session
$ tctl sem rm
+ $ tctl session rm Also for |
You are absolutely right that we are leaking the semaphore concept... I'm hesitant to make a Even if we wanted to, I don't see how we could make the session construct used by the audit log match up with the nist control because the audit log's session construct doesn't have a 1-to-1 mapping to an authenticated user (due to the "session joining" feature). We could create a
Do you think that is a better UX? Definitely a little more self-explanatory.
Each semaphore contains some number of leases; if Effectively, If we went with the
The |
I edited contents of @fspmarshall based on our call yesterday, just some tweaks to the output of |
e0319e9
to
652da5f
Compare
140a163
to
ea85056
Compare
Updated Regular:
Verbose:
Json:
|
As a relative newbie to Go, I like this implementation and think it seems pretty clean. I'll leave the more hardcore reviewing to backend engineers. |
2e0f393
to
0745ea4
Compare
} | ||
|
||
// SemaphoreLeaseRef identifies an existent lease. | ||
message SemaphoreLeaseRef { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's because each SemaphoreSpecV3
can maintain multiple leases?
lib/events/api.go
Outdated
|
||
// SessionControlLimit fires when a user's attempt to create an authenticated | ||
// session has been rejected due to exceeding `max_concurrent_sessions`. | ||
SessionControlLimitEvent = "sessctl.limit" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should probably be named something else. @benarent?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was this named to something else?
68a8293
to
0f17b60
Compare
dbbd51c
to
1d45431
Compare
Updated PR DescriptionRBACA role's kind: role
metadata:
name: some-role
# ...
spec:
options:
max_connections: 2
max_sessions: 2
# ...
version: v3
|
1d45431
to
1cae367
Compare
@fspmarshall does it have to be |
@klizhentas I started off using the more general terminology, but was concerned that it might cause confusion since not all |
I would still remove ssh part, precisely because we don't promise to preserve the relationship, but we would actually limit the amount of connections and sessions across all ssh/k8s/app protocols - that's AC requirement anyways. |
1cae367
to
b67a028
Compare
Updated description to reflect removal of |
b1da6e8
to
f4791be
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: I didn't re-review lib/services/semaphore.go
, hit my mental capacity after ~2hrs of reviewing.
Regarding max_connections
vs max_sessions
: it looks like these separate controls are required by FIPS, right?
I suspect this will be confusing for some customers, vs a single global max_sessions
-like limit.
We should at least make the docs very clear about how these relate @benarent
option (gogoproto.goproto_stringer) = false; | ||
option (gogoproto.stringer) = false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why are these options set?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These options are set for all top-level Resource
types. We implement custom String()
methods that display compact summaries of the important state of the resource.
@@ -304,6 +304,32 @@ func (s *SrvSuite) TestAgentForwardPermission(c *C) { | |||
c.Assert(strings.Contains(string(output), "SSH_AUTH_SOCK"), Equals, false) | |||
} | |||
|
|||
// TestMaxSesssions makes sure that MaxSessions RBAC rules prevent | |||
// too many concurrent sessions. | |||
func (s *SrvSuite) TestMaxSessions(c *C) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also check that the limit doesn't affect new connections
f4791be
to
d12cc68
Compare
Adds support for Concurrent Session Control and a new semaphore API. Roles now support two new configuration options, `max_ssh_connections` and `max_ssh_sessions` which correspond to the total number of authenticated ssh connections per cluster, and the number of ssh sessions within a connection respectively. Attempting to exceed these limits generate variants of the `session.rejected` audit event and cause the connection/session to be rejected.
d12cc68
to
eb01d6d
Compare
retest this please |
Support for limiting concurrent sessions (#2938), implemented via new
semaphore
API.EDIT: See updated PR description here.
Old PR Description
TODOs
Primary:
SemaphoreLock
abstraction.max_concurrent_sessions
.sessctl
lease expiry a cluster-level config option.Secondary:
SemaphoreLock
release on failed renewal (useful for backwards compatibility in the event we end up using the semaphore system as the basis for Session Termination in the future).github.com/gravitational/oxy/forward
to support cancellation).Overview
The
role
resource has been updated to support a newmax_concurrent_sessions
field:When a node encounters a new inbound SSH connection from a user who has a
max_concurrent_sessions
limit defined in one or more if their roles, it calculates the minimum of the defined limits as the maximum for that user. The node will then attempt to acquire a lease for the corresponding semaphore (sessctl/<username>
) by invoking the auth server's semaphore API. If the number of existent leases is already greater than or equal to the derivedmax_concurrent_sessions
value, the ssh connection is rejected.When using
tsh
, the rejection looks like this:When using
openssh
, the rejection looks like this:If a node does successfully acquire a semaphore lease, then the ssh connection continues normally, and the lease is periodically renewed in the background.
Semaphore leases have an expiry, and the node must successfully renew the lease before this expiry. Nodes begin attempting to renew a lease when it is halfway to its expiry point. If a node fails to successfully renew before the expiry time, the node will terminate the ssh connection.
Existant leases can be viewed like so:(see this comment for updated syntax)Or, more verbosely:
The semaphore API allows
lease_id
andholder
to be arbitrary strings (thoughlease_id
must be unique). When a node acquires a semaphore lease for the purposes of limiting concurrent SSH connections,lease_id
will be a random UUID, andholder
will the UUID of the node in question.A pair of additional hiddenThese two functions have been merged into a singletctl
commands,tctl sem cancel
andtctl sem rm
, allow removal of individual leases and entire semaphores respectively. These commands effectively constitute a form of asynchronous session termination, but do not prevent new sessions from being created.tctl sessctl rm
subcommand. It is still hidden to avoid confusion since removing individual sessions may be needed in some debugging scenarios, but is not a substitute for proper session termination.Gotchas/Clarifications
ControlMaster
are in play).max_concurrent_sessions
value as seen by that node.AlreadyExists
error). This is an intentional design choice since some "session" concepts that we may want to track in the future won't exist on a unique machine (kubernetes API, AAP, etc...).