-
Notifications
You must be signed in to change notification settings - Fork 62
v1 instance APIs as per RFD-322 #1957
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| let project_id = nexus | ||
| .project_lookup_id( | ||
| &opctx, | ||
| &organization_name, | ||
| &project_name, | ||
| &new_instance_params, | ||
| params::ProjectSelector::ProjectAndOrg { | ||
| project_name: project_name.clone().into(), | ||
| organization_name: organization_name.clone().into(), | ||
| }, | ||
| ) | ||
| .await?; | ||
| let instance = nexus | ||
| .project_create_instance(&opctx, &project_id, &new_instance_params) | ||
| .await?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One of the implicit goals of RFD-322 is to reduce our overall reliance on names in Nexus. Effectively we only want names to be utilities to simplify lookups for humans but to not flow deeper down than API level. With this new selector lookup approach, we resolve names to relevant ids as the first step. Internal nexus apis should therefore only take IDs.
| .instance_lookup_id( | ||
| &opctx, | ||
| query.selector.to_instance_selector(path.instance.into()), | ||
| ) | ||
| .await?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even though these top level instance endpoints actually take in the project selector via query params we transform it into an instance selector for more uniform matching. This does mean that the instance_lookup_id method isn't exactly aware of where it's being called from and is thus can only really return a generic error message. We should probably have a custom error type for this method such that we can do a better job from the callsite here of giving the user accurate error feedback (E.g. you're missing a query param).
| #[serde(flatten)] | ||
| selector: params::ProjectSelector, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the flatten here causes there to be duplicates in the output because the selector is an untagged enum. Need to figure out how to fix that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like this is a serde issue: serde-rs/serde#2200
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, I believe I was incorrect. I think this is likely a schemars issue. I was able to work around it in d0a19c6 by using #[schemars(skip)] for the duplicate names, but we should dig into where this is failing and make the relevant PR.
This comment was marked as resolved.
This comment was marked as resolved.
ahl
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
still need to look at the meat of this PR, but wanted to post this so I didn't forget
| project_name: Name, | ||
| organization_name: Name, | ||
| }, | ||
| None {}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this None need to be built into the enum because of how deserialization works? Otherwise it seems like a natural spot for Option or Result. Alternatively, I feel like in the None case the endpoint should automatically 400 at the parsing step, which would mean you don't need the None in the enum itself.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So an untagged enum tries to match all of the options and errors out if it can't find a match. I think wrapping it in an option might be viable. I didn't take that tack because we always need to match against the enum anyway and having a None option (that essentially means there were not matches in the untagged enum) makes the handling easier.
If a route has a path parameter and that path parameter is not an ID then the selector having a matching arm is required. Given that two types are involved here the optional isn't really valid. In this case we can actually document None as a part of the API where it's understood at what point you shouldn't include a selector.
| VerifyEndpoint { | ||
| url: "/by-id/instances/{id}", | ||
| visibility: Visibility::Protected, | ||
| unprivileged_access: UnprivilegedAccess::None, | ||
| allowed_methods: vec![ | ||
| AllowedMethod::Get, | ||
| ], | ||
| }, | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that the endpoints are now collapsed this mechanism of having two VerifyEndpoint definitions won't work. Essentially any endpoint that is verified here will be consumed from the spec so if you try to verify the same endpoint twice you'll end up with a situation where it says the spec doesn't contain the other instance of the endpoint (because the previous verify consumed it).
After the API is converted this suite may need to be revisited.
| assert_eq!(expected_uncovered_endpoints, uncovered_endpoints); | ||
| let mut unexpected_uncovered_endpoints = "These endpoints were expected to be covered by the unauthorized_coverage test but were not:\n".to_string(); | ||
| let mut has_uncovered_endpoints = false; | ||
| for endpoint in uncovered_endpoints.lines() { | ||
| if !expected_uncovered_endpoints.contains(endpoint) { | ||
| unexpected_uncovered_endpoints | ||
| .push_str(&format!("\t{}\n", endpoint)); | ||
| has_uncovered_endpoints = true; | ||
| } | ||
| } | ||
| assert_eq!( | ||
| has_uncovered_endpoints, false, | ||
| "{}\nMake sure you've added a test for this endpoint in unauthorized.rs.", | ||
| unexpected_uncovered_endpoints | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have been on the other end of debugging this test one too many times. If you click through to the comment it's helpful sort of in explaining what's happening, but the test output most of the time is entirely illegible. This just updates the test output to tell you a list of things you need to write tests for instead of an inscrutable mismatch of file contents.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have been bitten by this before too, THANK YOU for updating to a more clear version
|
|
||
| Deprecated API endpoints to be removed at the end of the V1 migration |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because I updated the test not to be an exact file match we can add extra lines of text in here.
| pub static ref DEMO_INSTANCE_NICS_URL: String = | ||
| format!("{}/network-interfaces", *DEMO_INSTANCE_URL); | ||
| format!("/organizations/{}/projects/{}/instances/{}/network-interfaces", *DEMO_ORG_NAME, *DEMO_PROJECT_NAME, *DEMO_INSTANCE_NAME); | ||
| pub static ref DEMO_INSTANCE_EXTERNAL_IPS_URL: String = | ||
| format!("{}/external-ips", *DEMO_INSTANCE_URL); | ||
| pub static ref DEMO_INSTANCE_SERIAL_URL: String = | ||
| format!("{}/serial-console", *DEMO_INSTANCE_URL); | ||
| pub static ref DEMO_INSTANCE_SERIAL_STREAM_URL: String = | ||
| format!("{}/serial-console/stream", *DEMO_INSTANCE_URL); | ||
| format!("/organizations/{}/projects/{}/instances/{}/external-ips", *DEMO_ORG_NAME, *DEMO_PROJECT_NAME, *DEMO_INSTANCE_NAME); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The new URL structure doesn't lend itself well to the same composition as before. You have parts of the hierarchy that come after whatever we're appending to the structure. Instead of taking that tack I'm planning to just flatten this whole tree of variables. I think ultimately it'll make it clearer as to what's actually being defined.
|
Unless there's strong feedback my plan is to merge this PR by EOD. I will, of course, address any issues that come up after merge. There's some urgency around getting this API work done so that clients and be updated (and new client features worked on) before our next customer demo. |
smklein
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find this new structure really readable, and the separation of "lookup" vs "authenticated usage" is great.
There are a couple areas where clearly we need to follow-up, but I understand why we're not doing them in this PR:
- Marking the non-v1 endpoints as deprecated (presumably waiting on oxidecomputer/dropshot#503)
- Deleting the non-v1 endpoints
- Getting v1 endpoints up for all APIs, not just instances
Can we get up issues (or at least a larger, tracking issue) to make sure we're covering all these things? I know it's probably obvious in your head right now, but I wanna make sure we're recording this just in case the last 10% takes us some time to complete.
Again, great job with this PR, I'm excited to not need to puzzle over the codebase to figure out whether I should or should not have an "authz" object or a name.
| impl InstanceSelector { | ||
| pub fn new( | ||
| instance: NameOrId, | ||
| project_selector: &Option<ProjectSelector>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If new immediately clones the elements of ProjectSelector, it is probably worth passing ownership and letting the caller call clone if they need to.
david-crespo
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
heroic
Absolutely, I'll create that now. Edit: #2028 |
|
I'm going to merge this PR and move forward for now. I would still like post merge feedback from @davepacheco. Even though I'm merging, we can still revisit decisions made here if necessary. |
davepacheco
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for taking this on! Sorry this took a little bit but it sounded like the feedback was still wanted.
I found a bunch of things (noted above) that I think would be resolved if the foo_lookup() functions returned lookup::Foo instead of authz::Foo:
- The lookup functions themselves could directly use the parent lookup function (i.e.,
instance_lookup()could callproject_lookup()) rather than duplicating the logic. (I'm worried this logic duplication (and boilerplate) grows with the depth of the tree. It'll also be more than it looks like here once we handle the extra error cases. ) - The caller of the lookup function (or, more likely, subsequent code) can decide which authz check to do. This preserves the previous coupling between lookup and authz check that I hope makes it harder to get the authz check wrong (or forget one). It would also avoid doing multiple redundant authz checks in every path. (There are still some code paths with extra checks (just like before), but with the current pattern, I think every path would have redundant checks.)
- The caller of the lookup function (or subsequent code) can decide whether they want the database part (
fetch()) or not (lookup()). Otherwise, are many code paths that do multiple lookups just to re-fetch the database part that was just fetched and thrown away.
I'm curious what you think.
Thanks again for taking this on. Long live ids!
| const MAX_KEYS_PER_INSTANCE: u32 = 8; | ||
|
|
||
| impl super::Nexus { | ||
| pub async fn instance_lookup( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the thought that we'll eventually create a foo_lookup() for every type of object? And that it would includes a match block with all possible combinations of the query params? It seems like there's a lot that can go wrong there. But it feels like this is the kind of problem we solved (or tried to solve?) with LookupPath and I wonder if we can clean this up by passing around lookup::Instance or the like (discussed a little here, which I see you considered). What if we had the http_entrypoints functions use LookupPath directly and pass the lookup::Foo into app-level functions? I think that would make them more composeable, so that instance_lookup() could use project_lookup() (which would use organization_lookup()) rather than duplicating that logic. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I like that idea a lot. I'll explore it a little more. There've been some iterations on this that have slowly started to feel better. It went from names, to ids, to authz objects. Maybe the lookups are the last step to really make it compositional. Thanks for the suggestion.
| ) -> LookupResult<authz::Instance> { | ||
| match instance_selector { | ||
| params::InstanceSelector { instance: NameOrId::Id(id), .. } => { | ||
| // TODO: 400 if project or organization are present |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For what it's worth, I think these cases are valuable to fix early because they can help catch otherwise nasty client bugs, and also fixing them later seems likely a breaking change.
| .project_name(project_name) | ||
| .lookup_for(authz::Action::CreateChild) | ||
| .await?; | ||
| opctx.authorize(authz::Action::CreateChild, authz_project).await?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I gather we've basically decoupled the lookup (which is now done by the caller) from the authz check. It makes sense, just a little sad because coupling those was intentional to make it harder to mess up the authz part. (And I see now why you were talking about wanting to express the authz checks that were done in the type.)
Actually, I think if you have the foo_lookup() functions return lookup::Foo, that will address this problem too because the top-level caller doesn't have to decide which authz check to do.
As this is now, I think we'll also wind up doing 2x the authz checks now and we may wind up wanting to explore caching those. As I think about it, what we probably want to cache are the roles and not the result of the authz check itself. That's because when you call foo_lookup(), you're only checking for read (because the caller doesn't know what authz checks will be needed by other functions it's going to call later), but later functions may wind up needing other permissions, so the authz checks won't look exactly the same.
| &self, | ||
| opctx: &OpContext, | ||
| instance_id: &Uuid, | ||
| authz_instance: &authz::Instance, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this pattern of function makes a lot less sense after this change. (That's not a bad thing.) We presumably just did a lookup of the Instance, dropped the database part, and now we're doing a whole other lookup just to fetch the database part again. It seems like wherever we're using foo_fetch() now we should just have it use LookupPath()...fetch() instead of what it's currently doing, which must be LookupPath()...lookup(). This might be tricky if we're using the new Nexus::foo_lookup() pattern, but I think that's another reason to have those functions return lookup::Foo instead of authz::Foo (so the caller can decide if they want to use lookup() or fetch()).
| .organization_name(organization_name) | ||
| .project_name(project_name) | ||
| .instance_name(instance_name) | ||
| .instance_id(authz_instance.id()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here as above -- we already did a lookup in the caller. It'd be nice to just use that here instead of doing another lookup. (You could even wind up with a different result here than the caller got.)
| #[derive(Clone, Debug, Deserialize, Serialize)] | ||
| pub struct Params { | ||
| pub serialized_authn: authn::saga::Serialized, | ||
| pub organization_name: Name, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🎉 (I love dropping names from saga stuff)
| ); | ||
| } | ||
|
|
||
| #[nexus_test] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm assuming this isn't as exhaustive a test as what we had for the old APIs. Is there a plan for converting them?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, absolutely.
| std::fs::read_to_string("tests/output/uncovered-authz-endpoints.txt") | ||
| .expect("failed to load file of allowed uncovered endpoints"); | ||
| assert_eq!(expected_uncovered_endpoints, uncovered_endpoints); | ||
| let mut unexpected_uncovered_endpoints = "These endpoints were expected to be covered by the unauthorized_coverage test but were not:\n".to_string(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like we're no longer checking whether there are things in the allowlist file that are covered. That seems bad? It feels like it undermines the point of an allowlist. (why wouldn't we just put every endpoint into the allowlist?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is bending my brain a little. So, this would happen when you've set something in the allowlist that you expect to not be covered by tests but the tests cover it anyway. So the uncovered_endpoints would be smaller than the expected_uncovered_endpoints meaning you might need to remove something from the allowlist.
That's fair. What I was trying to solve here is there failure experience isn't the best. I've hit up against this test several times and it always takes me a while to figure out what went wrong and what I need to do. Ultimately my goal here is just to make it more explicit as to what needs to be done. I'll see if I can work backwards and test both ways without making it messier.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That sounds good. Yeah, I'm all for making this easier. We could also use expectorate here like we do in other places. I think the suggestion not to was based on not making it seem like the right thing for people to add things to this list.
| use std::{net::IpAddr, str::FromStr}; | ||
| use uuid::Uuid; | ||
|
|
||
| #[derive(Deserialize, JsonSchema)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At one point I think we had talked about describing more precisely the set of things that was allowed, so that it would be impossible to express for example an InstanceSelector that had an instance name and no project part. Did we decide not to do that because it can't be expressed using query parameters?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In a previous iteration of this I defined the shape of the selector as an enum of only valid combinations. There were a few challenges with this (like a bug in schemars that caused duplicate schema output). There was added implementation complexity there that ultimately still yielded the same schema output. I had floated some ideas in the RFD about how to make the schema more expressive in terms of typing, but generating that output is a highly non-trivial endeavor that would eat up more time than I have to give to this initiative.
| logout (post "/logout") | ||
|
|
||
| Deprecated API endpoints to be removed at the end of the V1 migration | ||
| instance_delete (delete "/organizations/{organization_name}/projects/{project_name}/instances/{instance_name}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why aren't we covering these any more?
… `authz::*` (#2042) This PR is mostly in response to @davepacheco's feedback on #1957. Special thanks to @smklein for helping work through some rust lifetime issues here. Previously the `instance_lookup` and `project_lookup` functions returned a `authz::Instance` or `authz::Project` respectively that was authorized with read permissions. The challenge here is that this meant that in several places we required duplicate lookups to be able to resolve the instance with the right permissions. To avoid that we're passing around the lookup step itself instead.
This PR establishes the new pattern for how the v1 of oxide's external API will be shaped. The details here are driven by the determinations of RFD-322.
To recap RFD-322, we want our API to be more ID focused but to also support names. Through the resolution of that RFD we've decided to go with a flatter API that uses query parameters to support names if needed.
The notion of being ID first is something we'd like to flow through the rest of Nexus too. Where possible, IDs should always be favored over names.
Overview
/v1/path which we agreed to introduce in RFD-299.http_entrypointsfile. New<resource>_lookupmethods are introduced that resolve to the authz identifier of a given resource and that's passed around instead.ProjectSelectorandInstanceSelect) which is a data structure that stores all possible representations of parameters required to lookup a resource.Example
Old way of fetching instance
New ways of fetching instance
Old way of creating an instance
New way of creating an instance
Note that query parameters are always used to scope down an API call regardless of its method.