Need better authorization for OxQL queries #5298

bnaecker · 2024-03-21T03:19:18Z

A prototype query language called OxQL is implemented in #5273. That PR implements the most rudimentary of authorization checks, requiring read permissions on the fleet to call its new endpoints. This issue tracks adding more robust and sensible authorization checks instead. There are a lot of pieces to this, and here a few notes.

First, Nexus may want to authorize access to a timeseries as a whole -- for example, reading physical temperature sensors may require elevated permissions. Nexus has access to the timeseries available, but would need additional metadata about the required level of permissions for each of them. That could be in static data, or part of a layer of indirection between the public timeseries and those stored in ClickHouse itself. There are other reasons to want such a layer, such as enhancing stability, and also legitimate arguments for punting (urgency around diagnosing customer issues).

Another piece of this is restricting the data within a timeseries that's visible to a customer. For example, users should not be able to see vCPU usage data for instances that they cannot otherwise access. Nexus could inspect the filters supplied in the queries, such as on things like project_id, and ensure the caller can read them. It may also want to inject its own filters into the query, to ensure that we never even read data out of ClickHouse that the user isn't authorized for.

These are just a few ideas, and there are likely many more. We'll want to flesh this issue out, and / or start an RFD, as we delve into it.

The text was updated successfully, but these errors were encountered:

david-crespo · 2024-05-13T19:57:52Z

We talked about this a bit more in chat. We went over the query inspection and filter injection ideas, and @bnaecker suggested a middle ground between fully dynamic queries and on-off endpoints that seems quite practical: a silo-scoped endpoint that provides a predefined set of queries that developer users are allowed to make, and which implements custom authorization logic for each. For example, the request body type could be an enum (union) of types that look like this.

{ 
  "metric_name": "instance_cpu",
  "params": { 
    "project": "my-proj",
    "instance": "my-inst"
  } 
}

When a request comes in, we know it is a valid metric because it parses as a request body at all, and then would know for that metric that we need to authorize access to the instance by a) checking that the instance is in that project, and b) authorizing the user's access to that project.

It would use OxQL under the hood and therefore be easy to implement, and it would be easy to add new queries without cluttering up the API with a ton of endpoints, but it would save us from having to figure out how to solve authz in a general way, and it would let us learn a lot about the problem before we try to do that.

bnaecker · 2024-08-12T17:47:44Z

I talked last week with @davepacheco about the general strategy for timeseries authz. I wanted to capture some notes here for the record, though eventually this should make its way to an RFD. We both agree that it's hard, at least to provide a view of the timeseries data that's completely consistent with the other API objects. Specifically, the goal allow a user to read timeseries data that pertains to objects that they can also read via the API is challenging, because we don't currently maintain a list of all the objects a user can read (nor is it clear we could). Instead, we maintain the roles for each user, from which we derive the actions a user can take on any given resource, assuming we have the resource by ID already.

There's also the interesting question of historical data here. Timeseries data are very much going to stick around, even after the object from which it derived is deleted. For example, instance vCPU data is still accessible even after the instance is gone. Dave thought this might be a bug in the API -- we could instead show the instance, including the fact that it's been deleted, whereas today I think we return a 404 for it. But in any case, we'd like to make these two views consistent in this dimension as well: whatever objects you can learn about through a timeseries query, it would also be nice to be able to learn about those through the API.

This is a hard problem, one which we both would prefer to punt on. In the interim, there is a potentially-useful API change that could take us pretty far. Right now, there's a concept of the "authorization scope" in a timeseries definition. E.g., "fleet" scope refers to hardware telemetry that should be accessible to an operator; "project" scope refers to things that any project reader should be able to see.

We could reflect this scope in the API, by providing a separate timeseries query endpoint for each of those different scopes. So, if you wanted to query the sled_data_link:bytes_sent timeseries, which has fleet scope, then one would hit the /v1/timeseries/query/fleet endpoint with the contained query. For the project-scoped data, one might do /v1/timeseries/query/project/{project_id} (or the ID might be in the request body or the query itself). In that case, we have a single project ID in the API request, so we can do authz as usual. If a user wants to do this for many project, then that would require many queries.

This would let us avoid the problem of analyzing or modifying the query itself to solve the entirely general problem, but still let us expose the querying endpoints in a useful way. We could gather more data on the queries that we actually see, and use that to further direct efforts around finer-grained authz.

david-crespo · 2024-08-13T16:17:45Z

Great idea, I'm all for it.

bnaecker added api Related to the API. Metrics labels Mar 21, 2024

bnaecker changed the title ~~Need a way to authorize OxQL queries~~ Need better authorization for OxQL queries Mar 21, 2024

bnaecker mentioned this issue Mar 21, 2024

Add basic Oximeter query language #5273

Merged

david-crespo mentioned this issue May 14, 2024

Instance CPU metrics oxidecomputer/console#2129

Open

david-crespo mentioned this issue Jul 25, 2024

Make instance CPU metrics accessible to non-fleet users #6155

Open

wfchandler mentioned this issue Sep 18, 2024

oneOf subtypes not linked correctly to parent type oxidecomputer/oxide.go#233

Closed

2 tasks

bnaecker self-assigned this Sep 19, 2024

david-crespo mentioned this issue Oct 15, 2024

Project-scoped OxQL endpoint #6873

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Need better authorization for OxQL queries #5298

Need better authorization for OxQL queries #5298

bnaecker commented Mar 21, 2024

david-crespo commented May 13, 2024 •

edited

Loading

bnaecker commented Aug 12, 2024

david-crespo commented Aug 13, 2024

Need better authorization for OxQL queries #5298

Need better authorization for OxQL queries #5298

Comments

bnaecker commented Mar 21, 2024

david-crespo commented May 13, 2024 • edited Loading

bnaecker commented Aug 12, 2024

david-crespo commented Aug 13, 2024

david-crespo commented May 13, 2024 •

edited

Loading