-
Notifications
You must be signed in to change notification settings - Fork 63
VPC external endpoints #178
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
9542772 to
02058e6
Compare
4f9ea44 to
c85d470
Compare
c85d470 to
8f7ddce
Compare
078b93d to
293a11a
Compare
8602307 to
343fe84
Compare
f243585 to
21384bd
Compare
21384bd to
6d482e5
Compare
0e1c1fc to
a3b5dc7
Compare
smklein
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey, this looks really solid - thanks for putting this together so quickly, and appreciate the robust test case!
| #[derive(Clone, Debug)] | ||
| pub struct VPC { | ||
| /** common identifying metadata */ | ||
| pub identity: IdentityMetadata, | ||
| /** id for the project containing this VPC */ | ||
| pub project_id: Uuid, | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Out of curiosity, what's the point of re-defining the VPC object within internal::nexus?
It's used as part of the interface in omicron-nexus/src/http_entrypoints_external.rs, and not "http_entrypoints_internal.rs" - from what I can tell, that means it genuinely is used as part of the external API, but not the internal API.
I'm pretty sure if we replace the usage of internal::nexus::VPC with external::VPC below, nothing breaks - WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(if we need a separate type for storage in the database, that could be a valid reason to create a new struct somewhere, but I figured "api/internal" would be for "types passed via HTTP interfaces between internal components" - which mostly means "in the requests/responses of http_entrypoints_internal.rs")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I misunderstood internal. I was thinking of it as "internal to the API" as opposed to "used by internal API endpoints". Right now they're the same anyway. If we had properties on the model we didn't want to include in the JSON, where would that live? Is that still a case for the *View concept?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I misunderstood internal. I was thinking of it as "internal to the API" as opposed to "used by internal API endpoints". Right now they're the same anyway.
ah, actually, they'll end up having some distinct properties. "used by internal API endpoints" may have versioning implications if the things communicating have different update schedules. As an example, we upgrade a Sled, but old Nexus is still running somewhere else. There are upgrade considerations with which we'd need to be cautious - the two components would need to have a compatible format.
If we had properties on the model we didn't want to include in the JSON, where would that live?
If it's just an in-memory structure which isn't passed between components, I'd say "make a struct somewhere in Nexus".
As mentioned earlier though, we could reference the external::VPC object everywhere in this PR - what additional properties are you trying to define?
Is that still a case for the
*Viewconcept?
I don't think so - "*view" was just a synonym for "external".
| ) -> ListResult<api::internal::nexus::VPC> { | ||
| let client = self.pool.acquire().await?; | ||
| sql_fetch_page_by::< | ||
| LookupByUniqueNameInProject, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're looking up VPCs by name here, but the name isn't explicitly a unique identifier for VPCs, is it?
omicron/omicron-common/src/sql/dbinit.sql
Lines 252 to 262 in da2071a
| CREATE TABLE omicron.public.VPC ( | |
| /* Identity metadata */ | |
| id UUID PRIMARY KEY, | |
| name STRING(63) NOT NULL, | |
| description STRING(512) NOT NULL, | |
| time_created TIMESTAMPTZ NOT NULL, | |
| time_modified TIMESTAMPTZ NOT NULL, | |
| /* Indicates that the object has been deleted */ | |
| time_deleted TIMESTAMPTZ, | |
| project_id UUID NOT NULL | |
| ); |
The primary key is the UUID - shouldn't we be looking up VPCs by ID, not name?
I get that the implementation of project_create_vpc should be preventing the names from overlapping, but it seems like it would be nice to rely on the structure of the DB for the uniqueness, rather than an implicit constraint.
Caveat: It's totally possible I'm misinterpreting how sql_fetch_page_by is working here...
omicron/omicron-nexus/src/db/schema.rs
Lines 378 to 395 in da2071a
| /** | |
| * Implementation of [`LookupKey`] for looking up objects within a project by | |
| * the project_id and the object's name | |
| */ | |
| pub struct LookupByUniqueNameInProject; | |
| impl<'a, R: ResourceTable> LookupKey<'a, R> for LookupByUniqueNameInProject { | |
| type ScopeKey = (&'a Uuid,); | |
| const SCOPE_KEY_COLUMN_NAMES: &'static [&'static str] = &["project_id"]; | |
| type ItemKey = Name; | |
| const ITEM_KEY_COLUMN_NAME: &'static str = "name"; | |
| fn where_select_error( | |
| _scope_key: Self::ScopeKey, | |
| item_key: &Self::ItemKey, | |
| ) -> Error { | |
| Error::not_found_by_name(R::RESOURCE_TYPE, item_key) | |
| } | |
| } |
But wouldn't we want a LookupByUniqueIDInProject that uses ItemKey = ID?
(FYI @davepacheco - I see LookupByUniqueNameInProject is also used for project_list_instances; I think my same question applies there too)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Follow-up: Is this relying on the unique index in dbinit.sql?
omicron/omicron-common/src/sql/dbinit.sql
Lines 265 to 268 in da2071a
| CREATE UNIQUE INDEX ON omicron.public.VPC ( | |
| name | |
| ) WHERE | |
| time_deleted IS NULL; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good news, I already wrote LookupByUniqueIDInProject and got rid of it in 67eccbc89, so we can easily put it back. I don't think the DB is constraining uniqueness. To be honest I also am not sure where the constraint is coming from (though the constraint is there, the test attempting to create a second VPC with the same name proves it).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I saw the test, but I'm still trying to wrap my head around this - the unique_value parameter passed to sql_insert_unique_idempotent is only used in the error message!
omicron/omicron-nexus/src/db/sql_operations.rs
Lines 624 to 656 in da2071a
| pub async fn sql_insert_unique_idempotent<'a, R>( | |
| client: &'a tokio_postgres::Client, | |
| values: &'a SqlValueSet, | |
| unique_value: &'a str, | |
| ignore_conflicts_on: &'static str, | |
| ) -> Result<(), Error> | |
| where | |
| R: ResourceTable, | |
| { | |
| let mut sql = SqlString::new(); | |
| let param_names = values | |
| .values() | |
| .iter() | |
| .map(|value| sql.next_param(*value)) | |
| .collect::<Vec<String>>(); | |
| let column_names = values.names().iter().cloned().collect::<Vec<&str>>(); | |
| sql.push_str( | |
| format!( | |
| "INSERT INTO {} ({}) VALUES ({}) ON CONFLICT ({}) DO NOTHING", | |
| R::TABLE_NAME, | |
| column_names.join(", "), | |
| param_names.join(", "), | |
| ignore_conflicts_on, | |
| ) | |
| .as_str(), | |
| ); | |
| sql_execute(client, sql.sql_fragment(), sql.sql_params()) | |
| .await | |
| .map_err(|e| sql_error_on_create(R::RESOURCE_TYPE, unique_value, e)) | |
| .map(|_| ()) | |
| } |
So I'm still kinda uncertain what provides the "names must be unique" property, if not the unique DB index.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having some conversations in chat, but figured I'd post the TL;DR here too:
If the CREATE UNIQUE INDEX statement on omicron.public.VPC is what's actually enforcing this uniqueness, wouldn't that imply that names are globally unique? (example: two different subnets both want to create a VPC named "Foo" - one of 'em would get a collision).
Seems the approach taken by instances is creating a unique index by the tuple (project_id, name). Seems like we'll need to update dbinit.sql's indices for VPC objects to guarantee similar scoping.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Confirmed in the docs that the index is the official way to enforce uniqueness for a pair of columns:
Indexes can also be used to enforce uniqueness of a column's value, or the uniqueness of the combined values of more than one column.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think y'all figured most of this out already! I'll try to explain a little more why things are the way they are.
In terms of desired behavior, RFD 4 says this about the "name" property of something that has identity metadata:
Names of resources must be unique under the scope of the parent resource. For example, Instance names are required to be unique to a project and project names are required to be unique to an Organization.
RFD 21 explains that VPCs are scoped to projects. So yes, their name should be unique within the project, and not necessarily globally unique.
In terms of implementing this: yes, we use CockroachDB unique indexes to guarantee uniqueness. That's the natural place to do it, since the database has such a facility and the behavior needs to be strongly consistent.
As @smklein points out, the unique index for Instances is on the (project_id, name) tuple, not just name. You want something similar for VPCs here. (Since we're organizing everything around Projects, nearly everything should probably look like this. Projects themselves should be scoped to an Organization, and Organization names I believe will wind up being globally unique, but we haven't implemented Organizations yet so Project names are currently globally unique.)
There's a close relationship between the fields in the unique indexes on a table, the Lookup* structs supported by the table, and the ScopeKey/ItemKey in those lookups.
@smklein wrote:
Yeah, I saw the test, but I'm still trying to wrap my head around this - the unique_value parameter passed to sql_insert_unique_idempotent is only used in the error message!
Yeah, so when the block comment on sql_insert_unique_idempotent talks about "conflict", it's referring not to some vague notion of some constraint being violated (that the function itself might be expected to check) but a specific condition that databases like CockroachDB and PostgreSQL call "conflict" and handle with an ON CONFLICT clause on the INSERT statement. In retrospect maybe that's not obvious if you haven't dug into the implementation of the function or dealt with UPSERTs before. The comment currently says:
/// Idempotently insert a database record. This is intended for cases where the
/// record may conflict with an existing record with the same id, which should
/// be ignored, or with an existing record with the same name, which should
/// produce an [`Error::ObjectAlreadyExists`] error.
It might be more precise to say:
///
/// Idempotently insert a database record. More precisely, this is intended
/// for cases where:
///
/// * the caller wants to insert a database row for some particular object
/// * the object is identified by an immutable, unique id
/// * the caller wants the operation to succeed if a row already exists for this
/// object (i.e., a row exists with the same unique id, which is by definition
/// a row for the same object)
/// * the caller wants to fail if inserting the record would violate some other
/// constraint, usually a name conflict (which is necessarily a record
/// for a _different_ object, since if it were the same object, it would have
/// the same id, and the operation would succeed without inserting anything)
///
/// That might sound incredibly specific or even contrived, but it's the common
/// case for sagas that want to create database records for API resources. For
/// example, when provisioning an Instance, the provision saga will generate a
/// unique id and use this function to insert an initial Instance object into
/// the database. Since the id is generated in the saga, the only way to have a
/// conflict on the object's id is if the saga action is being rerun. Since
/// actions are supposed to be idempotent, the correct behavior is to succeed
/// despite the conflict. On the other hand, if the new Instance's name
/// conflicts with another one in the same project, that will raise a different
/// constraint violation that we do want to package up and bubble up to the
/// caller. This condition means somebody tried to create an Instance with the
/// same name as one that's already running, so we package this up as an
/// [`Error::ObjectAlreadyExists`] error.
///
This is a long way of explaining why the unique value is not used in the function aside from the error message. (Why bother putting it in the error message? The ObjectAlreadyExists error requires you to say what the conflicting value was, and that's because I've found specific messages like that to be a huge improvement in user experience.)
@smklein wrote:
But wouldn't we want a LookupByUniqueIDInProject that uses ItemKey = ID?
(FYI @davepacheco - I see LookupByUniqueNameInProject is also used for project_list_instances; I think my same question applies there too)
I think you got this answered but I'm not sure? I wasn't quite sure why you expected to use a lookup by id. Is it just because that's the primary key in the database? Anyway, I expect API operations to do lookups by name rather than id. (RFD 4 is a little confusing in that it uses identifiers like "projectId" in all the URLs. But it also says "the endpoints that reference an individual resource work on either the uuid or the name." If I remember right, we only implement name lookups in Nexus for most resources today.) Given this, for any resource that's scoped to a project, I'd expect us to use LookupByUniqueNameInProject and not LookupByUniqueIdInProject.
The general ideas here are:
- The uuid of an object is immutable and should uniquely identify it across all time, all types of objects, and all Oxide systems. (Like many systems, we assume this by the astronomical probability of UUIDv4 collision.)
- The name of an object is mutable and uniquely identifies the object within its parent scope and while it is still live. We will (do already?) use soft deletes for many objects, and there may be multiple objects with the same name in the same scope as long as no more than one is alive. This is why the unique indexes include a
WHERE time_deleted IS NULLclause. - From the user's perspective, since they can only see live objects, both identifiers are equally good. I expect most people to want to use "name" most of the time. Software that wants to be robust against other clients renaming things that they're working on might prefer to use ids instead.
- Internally, we usually want to resolve the name to an id when we receive a request and do all of our internal work using these ids. Otherwise, we can wind up with nasty concurrency issues if somebody renames a resource while we're working on it. This can even lead to security vulnerabilities, if we were to do an access check of some resource and then somebody renames it and we wind up operating on a different resource than the one we checked.
I'd been assuming all this mostly based on past experience, but assuming it makes sense, we should probably document it.
Does all that make sense?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That all makes sense. Most of it we were able to infer from the code, but it is helpful to see it spelled out. Especially:
Internally, we usually want to resolve the name to an id when we receive a request and do all of our internal work using these ids.
I was mostly able to get there by copying the patterns I saw used for instances.
omicron-nexus/tests/test_vpcs.rs
Outdated
| // let apictx = &cptestctx.server.apictx; | ||
| // let nexus = &apictx.nexus; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Dunno if this is WIP, but dead code here
| // let nexus = &apictx.nexus; | ||
|
|
||
| /* Create a project that we'll use for testing. */ | ||
| let project_name = "springfield-squidport"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A+ naming choice
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
copied from dap in test_instances 😉
| name | ||
| CREATE UNIQUE INDEX ON omicron.public.Vpc ( | ||
| name, | ||
| project_id |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't quite right. It needs to be (project_id, name) rather than (name, project_id). This is critical to be able to use this index to list the VPCs in a Project. (Think of it this way: we want the database to be able to quickly find the first the VPC in the Project. If the index is sorted by project_id first, that's easy. Once it finds that record, the next N values in the index will be the N VPCs in that project, sorted by name -- perfect. If the index is sorted by (name, project_id), this doesn't work, and there's no efficient way to find the first VPC (or any other VPC) in a Project.)
Edit: I really wish there were some way to automatically test this. This is probably working in your change because CockroachDB is doing a table scan rather than using this index. I kind of want CockroachDB to refuse to do queries that always necessitate table scans. I'm not really sure how best to do this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it. Didn't realize the index works like a nested dictionary.
| name | ||
| /* Subnet and network interface names are unique per VPC, not project */ | ||
| CREATE UNIQUE INDEX ON omicron.public.VpcSubnet ( | ||
| name, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar to above, this needs to be (vpc_id, name), not (name, vpc_id).
| -- TODO: add project_id to index | ||
| CREATE UNIQUE INDEX ON omicron.public.NetworkInterface ( | ||
| name | ||
| name, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above.
Add
VPCViewmodel and endpoints to list, create, get, and delete VPCs. No use of sagas. No conditions on deletion yet. The VPC model doesn't have anything on it except ID metadata and a project ID, so there's no logic to test beyond existence and non-existence, uniqueness of name, etc. There's also nothing to update, so I left out that endpoint.