-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Need better authorization for OxQL queries #5298
Comments
We talked about this a bit more in chat. We went over the query inspection and filter injection ideas, and @bnaecker suggested a middle ground between fully dynamic queries and on-off endpoints that seems quite practical: a silo-scoped endpoint that provides a predefined set of queries that developer users are allowed to make, and which implements custom authorization logic for each. For example, the request body type could be an enum (union) of types that look like this. {
"metric_name": "instance_cpu",
"params": {
"project": "my-proj",
"instance": "my-inst"
}
} When a request comes in, we know it is a valid metric because it parses as a request body at all, and then would know for that metric that we need to authorize access to the instance by a) checking that the instance is in that project, and b) authorizing the user's access to that project. It would use OxQL under the hood and therefore be easy to implement, and it would be easy to add new queries without cluttering up the API with a ton of endpoints, but it would save us from having to figure out how to solve authz in a general way, and it would let us learn a lot about the problem before we try to do that. |
I talked last week with @davepacheco about the general strategy for timeseries authz. I wanted to capture some notes here for the record, though eventually this should make its way to an RFD. We both agree that it's hard, at least to provide a view of the timeseries data that's completely consistent with the other API objects. Specifically, the goal allow a user to read timeseries data that pertains to objects that they can also read via the API is challenging, because we don't currently maintain a list of all the objects a user can read (nor is it clear we could). Instead, we maintain the roles for each user, from which we derive the actions a user can take on any given resource, assuming we have the resource by ID already. There's also the interesting question of historical data here. Timeseries data are very much going to stick around, even after the object from which it derived is deleted. For example, instance vCPU data is still accessible even after the instance is gone. Dave thought this might be a bug in the API -- we could instead show the instance, including the fact that it's been deleted, whereas today I think we return a 404 for it. But in any case, we'd like to make these two views consistent in this dimension as well: whatever objects you can learn about through a timeseries query, it would also be nice to be able to learn about those through the API. This is a hard problem, one which we both would prefer to punt on. In the interim, there is a potentially-useful API change that could take us pretty far. Right now, there's a concept of the "authorization scope" in a timeseries definition. E.g., "fleet" scope refers to hardware telemetry that should be accessible to an operator; "project" scope refers to things that any project reader should be able to see. We could reflect this scope in the API, by providing a separate timeseries query endpoint for each of those different scopes. So, if you wanted to query the This would let us avoid the problem of analyzing or modifying the query itself to solve the entirely general problem, but still let us expose the querying endpoints in a useful way. We could gather more data on the queries that we actually see, and use that to further direct efforts around finer-grained authz. |
Great idea, I'm all for it. |
A prototype query language called OxQL is implemented in #5273. That PR implements the most rudimentary of authorization checks, requiring read permissions on the fleet to call its new endpoints. This issue tracks adding more robust and sensible authorization checks instead. There are a lot of pieces to this, and here a few notes.
First, Nexus may want to authorize access to a timeseries as a whole -- for example, reading physical temperature sensors may require elevated permissions. Nexus has access to the timeseries available, but would need additional metadata about the required level of permissions for each of them. That could be in static data, or part of a layer of indirection between the public timeseries and those stored in ClickHouse itself. There are other reasons to want such a layer, such as enhancing stability, and also legitimate arguments for punting (urgency around diagnosing customer issues).
Another piece of this is restricting the data within a timeseries that's visible to a customer. For example, users should not be able to see vCPU usage data for instances that they cannot otherwise access. Nexus could inspect the filters supplied in the queries, such as on things like
project_id
, and ensure the caller can read them. It may also want to inject its own filters into the query, to ensure that we never even read data out of ClickHouse that the user isn't authorized for.These are just a few ideas, and there are likely many more. We'll want to flesh this issue out, and / or start an RFD, as we delve into it.
The text was updated successfully, but these errors were encountered: