Updates default VPC Subnet behavior #677

bnaecker · 2022-02-10T21:47:12Z

This improves the API and database internals around VPC Subnets,
specifically the how we create the default VPC Subnet for a VPC, and how
IP address ranges are specified by clients or generated internally.

It also adds a query that checks for overlapping address ranges, and
fails the insert of a new VPC Subnet if an existing subnet in the same
VPC has overlapping IP address ranges, either IPv4 or IPv6.

This improves the API and database internals around VPC Subnets, specifically the how we create the default VPC Subnet for a VPC, and how IP address ranges are specified by clients or generated internally. It also adds a query that checks for overlapping address ranges, and fails the insert of a new VPC Subnet if an existing subnet in the same VPC has overlapping IP address ranges, either IPv4 or IPv6.

bnaecker · 2022-02-10T22:05:13Z

Some more details on this change. There's been some confusion (mostly in my own head!) about how the IP address ranges for VPC Subnets are specified. RFD 21 originally described both V4 and V6 ranges as optional, but did not get into what the behavior should be if neither is specified. We resolved this originally by allowing optional values in the request to create a VPC, but we always unwrapped the V4 block in any case.

The requirements for the MVP have shifted since RFD 21 was written, and we're now requiring that all subnets have both IPv4 and IPv6 address ranges. Clients must specify the V4 range when creating a new VPC Subnet, since there's no reasonable default. However, we allow V6 to remain unspecified in the request, and generate a random unique-local address, from the VPC's IPv6 prefix, in that case. Note that the one exception is the default VPC Subnet, created when the default VPC for a project is created. We use a default of 172.30.0.0/22 for the IPv4 range in that case. This avoids conflict with most other well-known ranges, home networking setup and VPNs, and other cloud providers. It's also small-ish, but not too small to be useless.

I've updated RFD 21 to reflect these changes, and this change is mostly to implement those new expectations.

The bulk of the LOC, however, is for handling overlapping IP address ranges. We previously checked for name conflicts, like any other resource, but there was nothing preventing two VPC Subnets actually had distinct IP ranges. This introduces a new query to handle this. The query operates by taking the "candidate" row, including the requested IP ranges, and possibly filtering it out by checking the vpc_subnet table for existing VPC Subnets with overlapping IP ranges.

I've verified that this is a reasonable query, in that it uses this index to search for records with a given VPC ID. The full explain analyze output of the CTE searching the vpc_subnet table is here:

root@127.0.0.1:32221/omicron> EXPLAIN ANALYZE WITH candidate(id, name, description, time_created, time_modified, time_deleted, vpc_id, ipv4_block, ipv6_block) AS
(VALUES (gen_random_uuid(), 'name', 'a description', current_timestamp(), current_timestamp(), NULL::TIMESTAMPTZ, '1f40f171-e2b9-4c84-935d-87114b0c7f74'::UUID, '172.29.0.0/22'::INET, 'fdeb:8d6f:2ac5::/64'::INET))
SELECT * FROM candidate
WHERE NOT EXISTS (
SELECT ipv4_block, ipv6_block FROM vpc_subnet
WHERE vpc_id = '1f40f171-e2b9-4c84-935d-87114b0c7f74' AND
time_deleted IS NULL AND
((ipv4_block && candidate.ipv4_block) OR (ipv6_block && candidate.ipv6_block)));
                                                                                                                   info
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  planning time: 778µs
  execution time: 2ms
  distribution: local
  vectorized: true
  rows read from KV: 4 (427 B)
  cumulative time spent in KV: 955µs
  maximum memory usage: 60 KiB
  network usage: 0 B (0 messages)

  • root
  │
  ├── • cross join (anti)
  │   │ nodes: n1
  │   │ actual row count: 0
  │   │ estimated row count: 1
  │   │ pred: (ipv4_block && column8) OR (ipv6_block && column9)
  │   │
  │   ├── • scan buffer
  │   │     nodes: n1
  │   │     actual row count: 1
  │   │     estimated row count: 1
  │   │     label: buffer 1 (candidate)
  │   │
  │   └── • index join
  │       │ nodes: n1
  │       │ actual row count: 2
  │       │ KV rows read: 2
  │       │ KV bytes read: 282 B
  │       │ estimated row count: 1
  │       │ table: vpc_subnet@primary
  │       │
  │       └── • scan
  │             nodes: n1
  │             actual row count: 2
  │             KV rows read: 2
  │             KV bytes read: 145 B
  │             estimated row count: 1 (100% of the table; stats collected 31 minutes ago)
  │             table: vpc_subnet@vpc_subnet_vpc_id_name_key (partial index)
  │             spans: [/'1f40f171-e2b9-4c84-935d-87114b0c7f74' - /'1f40f171-e2b9-4c84-935d-87114b0c7f74']
  │
  └── • subquery
      │ id: @S1
      │ original sql: VALUES (gen_random_uuid(), 'name', 'a description', current_timestamp(), current_timestamp(), NULL::TIMESTAMPTZ, '1f40f171-e2b9-4c84-935d-87114b0c7f74'::UUID, '172.29.0.0/22'::INET, 'fdeb:8d6f:2ac5::/64'::INET)
      │ exec mode: all rows
      │
      └── • buffer
          │ nodes: n1
          │ actual row count: 1
          │ label: buffer 1 (candidate)
          │
          └── • values
                size: 9 columns, 1 row
(52 rows)

Time: 4ms total (execution 4ms / network 0ms)

There's the line in the scan section showing that it's using the vpc_subnet@vpc_subnet_vpc_id_name_key index, and searching only specific spans. The table had only a single row here, so I'm not 100% sure it will scale, but this has been our bar for performance in the past.

This commit also generates different error messages for the client, depending on the type of "conflict". This shows the output of creating a new VPC Subnet with some invalid data. In the first two requests, the IPv4 or IPv6 subnet range overlaps, respectively. In the third, the ranges are fine, but the name conflicts.

bnaecker@shale : ~/omicron $ curl --request POST http://127.0.0.1:12220/organizations/my-org/projects/my-project3/vpcs/default/subnets -H "oxide-authn-spoof: 001de000-05e4-4000-8000-000000004007" -d '{"name": "default", "description": "howdy do", "ipv4_block": "172.30.0.0/22", "ipv6_block": "fd50:c734:ddaf::/64"}'
{
  "request_id": "69f7b60e-8e55-4cd4-bc3e-949b441bf15d",
  "error_code": "InvalidRequest",
  "message": "IP address ranges must not overlap for subnets within a VPC"
}
bnaecker@shale : ~/omicron $ curl --request POST http://127.0.0.1:12220/organizations/my-org/projects/my-project3/vpcs/default/subnets -H "oxide-authn-spoof: 001de000-05e4-4000-8000-000000004007" -d '{"name": "default", "description": "howdy do", "ipv4_block": "172.30.0.0/22", "ipv6_block": "fd50:c734:ddae::/64"}'
{
  "request_id": "61d0ebba-c08f-4c01-97de-291e0ef1252d",
  "error_code": "InvalidRequest",
  "message": "IP address ranges must not overlap for subnets within a VPC"
}
bnaecker@shale : ~/omicron $ curl --request POST http://127.0.0.1:12220/organizations/my-org/projects/my-project3/vpcs/default/subnets -H "oxide-authn-spoof: 001de000-05e4-4000-8000-000000004007" -d '{"name": "default", "description": "howdy do", "ipv4_block": "172.31.0.0/22", "ipv6_block": "fd50:c734:ddae::/64"}'
{
  "request_id": "011bff6e-2e08-48cb-9b90-596f016fa9bd",
  "error_code": "ObjectAlreadyExists",
  "message": "already exists: vpc-subnet \"default\""
}

Unfortunately, there's no way with the current code to catch both these, or to catch the name conflict first. That's because the CTE to filter the candidate is run first. So if the name conflicts, but so do the IP ranges, there's no row and the constraint violation on the name is not hit. I don't think this is a huge drawback, but it might be a bit surprising.

One other annoyance arises in the case where the IPv6 range is not specified. We generate a default, random ULA from the prefix of the VPC itself. This is a /64 allocated from the /48 prefix of the VPC. It's possible that the auto-generated range conflicts with one in the database. The probability for that is n_vpcs * 1/2**16. That's not high, but far higher than some other things we use randomness to avoid, like UUID collisions. And that probability assumes the user doesn't allocate anything larger than a /64. There's nothing stopping them from creating a /49, for example, though I'd probably recommend they do something else :) That will cause immediate conflict.

In this case, we return a 503, the intent being that another attempt is another dice roll, which will reduce the collision probability to n_vpcs * 1/2**32. That's very low indeed. We could do something smarter about retries, but I opted to defer that.

bnaecker · 2022-02-10T22:25:33Z

Well, that's sad. The error here is apparently a platform difference in how chrono renders nanosecond timestamps :/

davepacheco

Thanks for doing this change. It sounds like it'll close up a bunch of important edge cases.

This is not a complete review but I wanted to get this feedback sooner rather than later.

nexus/src/db/datastore.rs

nexus/src/db/model.rs

nexus/src/db/subnet_allocation.rs

nexus/src/db/model.rs

common/src/api/external/mod.rs

nexus/src/db/datastore.rs

nexus/src/db/model.rs

nexus/src/db/subnet_allocation.rs

nexus/src/db/datastore.rs

- Clarifies ULA checks, and adds methods for explictly checking VPC prefix validity and subnet validity for a given VPC. - Simplifies random subnet generation using big-endian representation - Adds more tests to random subnet generation, including fixed seed to verify random bits - Adds test verifying that the query used to filter conflicting VPC Subnets does not induce a full table scan - Adds a SubnetError type which can be returned from an attempt to insert a unique VPC Subnet. This has a variant that indicates there was a conflict in either the IPv4 or v6 address ranges, supporting retries and better errors for the client. - Adds retry loop around the code that generates a random IPv6 subnet for a VPC Subnet, if users don't supply on in the request. This creates a random subnet based on the VPC's IPv6 prefix, and tries to insert it using the filtering query. If this fails some small number of times, currently 3 total, then a 503 is returned to the client. We also log a warning in this case.

davepacheco · 2022-02-18T16:59:06Z

nexus/src/db/explain.rs

 {
    fn walk_ast(&self, mut out: AstPass<Pg>) -> QueryResult<()> {
-        out.push_sql("EXPLAIN (");
+        out.push_sql("EXPLAIN ");


Out of curiosity, why this change? I typically do use parentheses around stuff like this for both clarity and in case it gets composed into something else that would change the interpretation (though that latter case doesn't seem likely here).

Good question. The test I added here fails with a syntax error if parentheses are used. I agree that it seems extras should be harmless, but the documentation and examples for EXPLAIN don't seem to use them anywhere around the expression to be explained. It seems that parentheses are used to define the kind of explanation, e.g., what types are inferred or how the optimizer assigns costs.

bnaecker requested review from davepacheco and smklein February 10, 2022 21:47

bnaecker force-pushed the vpc-subnet-default-behavior branch from d1d7946 to 0299cbf Compare February 10, 2022 23:25

Timestamp formatting for tests across platforms

0299cbf

davepacheco reviewed Feb 12, 2022

View reviewed changes

smklein reviewed Feb 14, 2022

View reviewed changes

bnaecker mentioned this pull request Feb 15, 2022

Want deterministic VPC Subnet allocation strategy #685

Open

bnaecker requested review from davepacheco and smklein February 16, 2022 00:50

This comment was marked as resolved.

Sign in to view

davepacheco approved these changes Feb 18, 2022

View reviewed changes

smklein approved these changes Feb 18, 2022

View reviewed changes

bnaecker merged commit 8663a79 into main Feb 18, 2022

bnaecker deleted the vpc-subnet-default-behavior branch February 18, 2022 20:26

bnaecker mentioned this pull request Mar 7, 2022

Subnet allocation errors should distinguish between IPv4 and IPv6 overlap #722

Closed

Updates default VPC Subnet behavior #677

Updates default VPC Subnet behavior #677

Uh oh!

Conversation

bnaecker commented Feb 10, 2022

Uh oh!

bnaecker commented Feb 10, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bnaecker commented Feb 10, 2022

Uh oh!

davepacheco left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

This comment was marked as resolved.

davepacheco Feb 18, 2022

Choose a reason for hiding this comment

Uh oh!

bnaecker Feb 18, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

bnaecker commented Feb 10, 2022 •

edited

Loading

bnaecker Feb 18, 2022 •

edited

Loading