Skip to content

Conversation

@just-be-dev
Copy link
Contributor

@just-be-dev just-be-dev commented Nov 17, 2022

This PR establishes the new pattern for how the v1 of oxide's external API will be shaped. The details here are driven by the determinations of RFD-322.

To recap RFD-322, we want our API to be more ID focused but to also support names. Through the resolution of that RFD we've decided to go with a flatter API that uses query parameters to support names if needed.

The notion of being ID first is something we'd like to flow through the rest of Nexus too. Where possible, IDs should always be favored over names.

Overview

  • Instead of breaking the old API the new, flattened endpoints are being added under the /v1/ path which we agreed to introduce in RFD-299.
  • Establishes the practice that names don't flow beyond the http_entrypoints file. New <resource>_lookup methods are introduced that resolve to the authz identifier of a given resource and that's passed around instead.
  • Introduces a notion of selectors (i.e. ProjectSelector and InstanceSelect) which is a data structure that stores all possible representations of parameters required to lookup a resource.

Example

Old way of fetching instance

GET /organizations/<org_name>/projects/<project_name>/instances/<instance_name>
GET /by-id/instances/<instance_id>

New ways of fetching instance

GET /v1/instances/<instance_id>
GET /v1/instances/<instance_name>?project=<project_id>
GET /v1/instances/<instance_name>?project=<project_name>&organization=<org_id>
GET /v1/instances/<instance_name>?project=<project_name>&organization=<org_name>

Old way of creating an instance

POST /organizations/<org_name>/projects/<project_name>/instances/<instance_name> { ...body }

New way of creating an instance

POST /v1/instances?project=<project_id> { ...body }
POST /v1/instances?project=<project_name>&organization=<org_id> { ...body }
POST /v1/instances?project=<project_name>&organization=<org_name> { ...body }

Note that query parameters are always used to scope down an API call regardless of its method.

@just-be-dev just-be-dev changed the title Test impl of new endpoint format PoC implementing RFD-322 Nov 17, 2022
Comment on lines 2098 to 2109
let project_id = nexus
.project_lookup_id(
&opctx,
&organization_name,
&project_name,
&new_instance_params,
params::ProjectSelector::ProjectAndOrg {
project_name: project_name.clone().into(),
organization_name: organization_name.clone().into(),
},
)
.await?;
let instance = nexus
.project_create_instance(&opctx, &project_id, &new_instance_params)
.await?;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One of the implicit goals of RFD-322 is to reduce our overall reliance on names in Nexus. Effectively we only want names to be utilities to simplify lookups for humans but to not flow deeper down than API level. With this new selector lookup approach, we resolve names to relevant ids as the first step. Internal nexus apis should therefore only take IDs.

Comment on lines 2150 to 2154
.instance_lookup_id(
&opctx,
query.selector.to_instance_selector(path.instance.into()),
)
.await?;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even though these top level instance endpoints actually take in the project selector via query params we transform it into an instance selector for more uniform matching. This does mean that the instance_lookup_id method isn't exactly aware of where it's being called from and is thus can only really return a generic error message. We should probably have a custom error type for this method such that we can do a better job from the callsite here of giving the user accurate error feedback (E.g. you're missing a query param).

Comment on lines 2129 to 2130
#[serde(flatten)]
selector: params::ProjectSelector,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the flatten here causes there to be duplicates in the output because the selector is an untagged enum. Need to figure out how to fix that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like this is a serde issue: serde-rs/serde#2200

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I believe I was incorrect. I think this is likely a schemars issue. I was able to work around it in d0a19c6 by using #[schemars(skip)] for the duplicate names, but we should dig into where this is failing and make the relevant PR.

@just-be-dev

This comment was marked as resolved.

Copy link
Contributor

@ahl ahl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

still need to look at the meat of this PR, but wanted to post this so I didn't forget

project_name: Name,
organization_name: Name,
},
None {},
Copy link
Contributor

@david-crespo david-crespo Nov 25, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this None need to be built into the enum because of how deserialization works? Otherwise it seems like a natural spot for Option or Result. Alternatively, I feel like in the None case the endpoint should automatically 400 at the parsing step, which would mean you don't need the None in the enum itself.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So an untagged enum tries to match all of the options and errors out if it can't find a match. I think wrapping it in an option might be viable. I didn't take that tack because we always need to match against the enum anyway and having a None option (that essentially means there were not matches in the untagged enum) makes the handling easier.

If a route has a path parameter and that path parameter is not an ID then the selector having a matching arm is required. Given that two types are involved here the optional isn't really valid. In this case we can actually document None as a part of the API where it's understood at what point you shouldn't include a selector.

@just-be-dev just-be-dev marked this pull request as ready for review December 6, 2022 21:05
Comment on lines -1283 to -1291
VerifyEndpoint {
url: "/by-id/instances/{id}",
visibility: Visibility::Protected,
unprivileged_access: UnprivilegedAccess::None,
allowed_methods: vec![
AllowedMethod::Get,
],
},

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that the endpoints are now collapsed this mechanism of having two VerifyEndpoint definitions won't work. Essentially any endpoint that is verified here will be consumed from the spec so if you try to verify the same endpoint twice you'll end up with a situation where it says the spec doesn't contain the other instance of the endpoint (because the previous verify consumed it).

After the API is converted this suite may need to be revisited.

Comment on lines -141 to +154
assert_eq!(expected_uncovered_endpoints, uncovered_endpoints);
let mut unexpected_uncovered_endpoints = "These endpoints were expected to be covered by the unauthorized_coverage test but were not:\n".to_string();
let mut has_uncovered_endpoints = false;
for endpoint in uncovered_endpoints.lines() {
if !expected_uncovered_endpoints.contains(endpoint) {
unexpected_uncovered_endpoints
.push_str(&format!("\t{}\n", endpoint));
has_uncovered_endpoints = true;
}
}
assert_eq!(
has_uncovered_endpoints, false,
"{}\nMake sure you've added a test for this endpoint in unauthorized.rs.",
unexpected_uncovered_endpoints
)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have been on the other end of debugging this test one too many times. If you click through to the comment it's helpful sort of in explaining what's happening, but the test output most of the time is entirely illegible. This just updates the test output to tell you a list of things you need to write tests for instead of an inscrutable mismatch of file contents.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have been bitten by this before too, THANK YOU for updating to a more clear version

Comment on lines +10 to +11

Deprecated API endpoints to be removed at the end of the V1 migration
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because I updated the test not to be an exact file match we can add extra lines of text in here.

Comment on lines 249 to +252
pub static ref DEMO_INSTANCE_NICS_URL: String =
format!("{}/network-interfaces", *DEMO_INSTANCE_URL);
format!("/organizations/{}/projects/{}/instances/{}/network-interfaces", *DEMO_ORG_NAME, *DEMO_PROJECT_NAME, *DEMO_INSTANCE_NAME);
pub static ref DEMO_INSTANCE_EXTERNAL_IPS_URL: String =
format!("{}/external-ips", *DEMO_INSTANCE_URL);
pub static ref DEMO_INSTANCE_SERIAL_URL: String =
format!("{}/serial-console", *DEMO_INSTANCE_URL);
pub static ref DEMO_INSTANCE_SERIAL_STREAM_URL: String =
format!("{}/serial-console/stream", *DEMO_INSTANCE_URL);
format!("/organizations/{}/projects/{}/instances/{}/external-ips", *DEMO_ORG_NAME, *DEMO_PROJECT_NAME, *DEMO_INSTANCE_NAME);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new URL structure doesn't lend itself well to the same composition as before. You have parts of the hierarchy that come after whatever we're appending to the structure. Instead of taking that tack I'm planning to just flatten this whole tree of variables. I think ultimately it'll make it clearer as to what's actually being defined.

@just-be-dev just-be-dev removed the request for review from davepacheco December 7, 2022 18:48
@just-be-dev
Copy link
Contributor Author

Unless there's strong feedback my plan is to merge this PR by EOD. I will, of course, address any issues that come up after merge. There's some urgency around getting this API work done so that clients and be updated (and new client features worked on) before our next customer demo.

@just-be-dev just-be-dev changed the title PoC implementing RFD-322 v1 instance APIs as per RFD-322 Dec 7, 2022
Copy link
Collaborator

@smklein smklein left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find this new structure really readable, and the separation of "lookup" vs "authenticated usage" is great.

There are a couple areas where clearly we need to follow-up, but I understand why we're not doing them in this PR:

  • Marking the non-v1 endpoints as deprecated (presumably waiting on oxidecomputer/dropshot#503)
  • Deleting the non-v1 endpoints
  • Getting v1 endpoints up for all APIs, not just instances

Can we get up issues (or at least a larger, tracking issue) to make sure we're covering all these things? I know it's probably obvious in your head right now, but I wanna make sure we're recording this just in case the last 10% takes us some time to complete.

Again, great job with this PR, I'm excited to not need to puzzle over the codebase to figure out whether I should or should not have an "authz" object or a name.

impl InstanceSelector {
pub fn new(
instance: NameOrId,
project_selector: &Option<ProjectSelector>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If new immediately clones the elements of ProjectSelector, it is probably worth passing ownership and letting the caller call clone if they need to.

Copy link
Contributor

@david-crespo david-crespo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

heroic

@just-be-dev
Copy link
Contributor Author

just-be-dev commented Dec 8, 2022

Can we get up issues (or at least a larger, tracking issue) to make sure we're covering all these things? I know it's probably obvious in your head right now, but I wanna make sure we're recording this just in case the last 10% takes us some time to complete.

Absolutely, I'll create that now.

Edit: #2028

@just-be-dev
Copy link
Contributor Author

I'm going to merge this PR and move forward for now. I would still like post merge feedback from @davepacheco. Even though I'm merging, we can still revisit decisions made here if necessary.

@just-be-dev just-be-dev merged commit cb6a75d into main Dec 8, 2022
@just-be-dev just-be-dev deleted the rfd-322-poc branch December 8, 2022 15:02
Copy link
Collaborator

@davepacheco davepacheco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for taking this on! Sorry this took a little bit but it sounded like the feedback was still wanted.

I found a bunch of things (noted above) that I think would be resolved if the foo_lookup() functions returned lookup::Foo instead of authz::Foo:

  • The lookup functions themselves could directly use the parent lookup function (i.e., instance_lookup() could call project_lookup()) rather than duplicating the logic. (I'm worried this logic duplication (and boilerplate) grows with the depth of the tree. It'll also be more than it looks like here once we handle the extra error cases. )
  • The caller of the lookup function (or, more likely, subsequent code) can decide which authz check to do. This preserves the previous coupling between lookup and authz check that I hope makes it harder to get the authz check wrong (or forget one). It would also avoid doing multiple redundant authz checks in every path. (There are still some code paths with extra checks (just like before), but with the current pattern, I think every path would have redundant checks.)
  • The caller of the lookup function (or subsequent code) can decide whether they want the database part (fetch()) or not (lookup()). Otherwise, are many code paths that do multiple lookups just to re-fetch the database part that was just fetched and thrown away.

I'm curious what you think.

Thanks again for taking this on. Long live ids!

const MAX_KEYS_PER_INSTANCE: u32 = 8;

impl super::Nexus {
pub async fn instance_lookup(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the thought that we'll eventually create a foo_lookup() for every type of object? And that it would includes a match block with all possible combinations of the query params? It seems like there's a lot that can go wrong there. But it feels like this is the kind of problem we solved (or tried to solve?) with LookupPath and I wonder if we can clean this up by passing around lookup::Instance or the like (discussed a little here, which I see you considered). What if we had the http_entrypoints functions use LookupPath directly and pass the lookup::Foo into app-level functions? I think that would make them more composeable, so that instance_lookup() could use project_lookup() (which would use organization_lookup()) rather than duplicating that logic. What do you think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I like that idea a lot. I'll explore it a little more. There've been some iterations on this that have slowly started to feel better. It went from names, to ids, to authz objects. Maybe the lookups are the last step to really make it compositional. Thanks for the suggestion.

) -> LookupResult<authz::Instance> {
match instance_selector {
params::InstanceSelector { instance: NameOrId::Id(id), .. } => {
// TODO: 400 if project or organization are present
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For what it's worth, I think these cases are valuable to fix early because they can help catch otherwise nasty client bugs, and also fixing them later seems likely a breaking change.

.project_name(project_name)
.lookup_for(authz::Action::CreateChild)
.await?;
opctx.authorize(authz::Action::CreateChild, authz_project).await?;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I gather we've basically decoupled the lookup (which is now done by the caller) from the authz check. It makes sense, just a little sad because coupling those was intentional to make it harder to mess up the authz part. (And I see now why you were talking about wanting to express the authz checks that were done in the type.)

Actually, I think if you have the foo_lookup() functions return lookup::Foo, that will address this problem too because the top-level caller doesn't have to decide which authz check to do.

As this is now, I think we'll also wind up doing 2x the authz checks now and we may wind up wanting to explore caching those. As I think about it, what we probably want to cache are the roles and not the result of the authz check itself. That's because when you call foo_lookup(), you're only checking for read (because the caller doesn't know what authz checks will be needed by other functions it's going to call later), but later functions may wind up needing other permissions, so the authz checks won't look exactly the same.

&self,
opctx: &OpContext,
instance_id: &Uuid,
authz_instance: &authz::Instance,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this pattern of function makes a lot less sense after this change. (That's not a bad thing.) We presumably just did a lookup of the Instance, dropped the database part, and now we're doing a whole other lookup just to fetch the database part again. It seems like wherever we're using foo_fetch() now we should just have it use LookupPath()...fetch() instead of what it's currently doing, which must be LookupPath()...lookup(). This might be tricky if we're using the new Nexus::foo_lookup() pattern, but I think that's another reason to have those functions return lookup::Foo instead of authz::Foo (so the caller can decide if they want to use lookup() or fetch()).

.organization_name(organization_name)
.project_name(project_name)
.instance_name(instance_name)
.instance_id(authz_instance.id())
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here as above -- we already did a lookup in the caller. It'd be nice to just use that here instead of doing another lookup. (You could even wind up with a different result here than the caller got.)

#[derive(Clone, Debug, Deserialize, Serialize)]
pub struct Params {
pub serialized_authn: authn::saga::Serialized,
pub organization_name: Name,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎉 (I love dropping names from saga stuff)

);
}

#[nexus_test]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm assuming this isn't as exhaustive a test as what we had for the old APIs. Is there a plan for converting them?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, absolutely.

std::fs::read_to_string("tests/output/uncovered-authz-endpoints.txt")
.expect("failed to load file of allowed uncovered endpoints");
assert_eq!(expected_uncovered_endpoints, uncovered_endpoints);
let mut unexpected_uncovered_endpoints = "These endpoints were expected to be covered by the unauthorized_coverage test but were not:\n".to_string();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like we're no longer checking whether there are things in the allowlist file that are covered. That seems bad? It feels like it undermines the point of an allowlist. (why wouldn't we just put every endpoint into the allowlist?)

Copy link
Contributor Author

@just-be-dev just-be-dev Dec 9, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is bending my brain a little. So, this would happen when you've set something in the allowlist that you expect to not be covered by tests but the tests cover it anyway. So the uncovered_endpoints would be smaller than the expected_uncovered_endpoints meaning you might need to remove something from the allowlist.

That's fair. What I was trying to solve here is there failure experience isn't the best. I've hit up against this test several times and it always takes me a while to figure out what went wrong and what I need to do. Ultimately my goal here is just to make it more explicit as to what needs to be done. I'll see if I can work backwards and test both ways without making it messier.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds good. Yeah, I'm all for making this easier. We could also use expectorate here like we do in other places. I think the suggestion not to was based on not making it seem like the right thing for people to add things to this list.

use std::{net::IpAddr, str::FromStr};
use uuid::Uuid;

#[derive(Deserialize, JsonSchema)]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At one point I think we had talked about describing more precisely the set of things that was allowed, so that it would be impossible to express for example an InstanceSelector that had an instance name and no project part. Did we decide not to do that because it can't be expressed using query parameters?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In a previous iteration of this I defined the shape of the selector as an enum of only valid combinations. There were a few challenges with this (like a bug in schemars that caused duplicate schema output). There was added implementation complexity there that ultimately still yielded the same schema output. I had floated some ideas in the RFD about how to make the schema more expressive in terms of typing, but generating that output is a highly non-trivial endeavor that would eat up more time than I have to give to this initiative.

logout (post "/logout")

Deprecated API endpoints to be removed at the end of the V1 migration
instance_delete (delete "/organizations/{organization_name}/projects/{project_name}/instances/{instance_name}")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why aren't we covering these any more?

just-be-dev added a commit that referenced this pull request Dec 12, 2022
… `authz::*` (#2042)

This PR is mostly in response to @davepacheco's feedback on #1957.
Special thanks to @smklein for helping work through some rust lifetime
issues here.

Previously the `instance_lookup` and `project_lookup` functions returned
a `authz::Instance` or `authz::Project` respectively that was authorized
with read permissions. The challenge here is that this meant that in
several places we required duplicate lookups to be able to resolve the
instance with the right permissions. To avoid that we're passing around
the lookup step itself instead.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

api Related to the API.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants