Skip to content

Conversation

@bnaecker
Copy link
Collaborator

This improves the API and database internals around VPC Subnets,
specifically the how we create the default VPC Subnet for a VPC, and how
IP address ranges are specified by clients or generated internally.

It also adds a query that checks for overlapping address ranges, and
fails the insert of a new VPC Subnet if an existing subnet in the same
VPC has overlapping IP address ranges, either IPv4 or IPv6.

This improves the API and database internals around VPC Subnets,
specifically the how we create the default VPC Subnet for a VPC, and how
IP address ranges are specified by clients or generated internally.

It also adds a query that checks for overlapping address ranges, and
fails the insert of a new VPC Subnet if an existing subnet in the same
VPC has overlapping IP address ranges, either IPv4 or IPv6.
@bnaecker
Copy link
Collaborator Author

bnaecker commented Feb 10, 2022

Some more details on this change. There's been some confusion (mostly in my own head!) about how the IP address ranges for VPC Subnets are specified. RFD 21 originally described both V4 and V6 ranges as optional, but did not get into what the behavior should be if neither is specified. We resolved this originally by allowing optional values in the request to create a VPC, but we always unwrapped the V4 block in any case.

The requirements for the MVP have shifted since RFD 21 was written, and we're now requiring that all subnets have both IPv4 and IPv6 address ranges. Clients must specify the V4 range when creating a new VPC Subnet, since there's no reasonable default. However, we allow V6 to remain unspecified in the request, and generate a random unique-local address, from the VPC's IPv6 prefix, in that case. Note that the one exception is the default VPC Subnet, created when the default VPC for a project is created. We use a default of 172.30.0.0/22 for the IPv4 range in that case. This avoids conflict with most other well-known ranges, home networking setup and VPNs, and other cloud providers. It's also small-ish, but not too small to be useless.

I've updated RFD 21 to reflect these changes, and this change is mostly to implement those new expectations.

The bulk of the LOC, however, is for handling overlapping IP address ranges. We previously checked for name conflicts, like any other resource, but there was nothing preventing two VPC Subnets actually had distinct IP ranges. This introduces a new query to handle this. The query operates by taking the "candidate" row, including the requested IP ranges, and possibly filtering it out by checking the vpc_subnet table for existing VPC Subnets with overlapping IP ranges.

I've verified that this is a reasonable query, in that it uses this index to search for records with a given VPC ID. The full explain analyze output of the CTE searching the vpc_subnet table is here:

root@127.0.0.1:32221/omicron> EXPLAIN ANALYZE WITH candidate(id, name, description, time_created, time_modified, time_deleted, vpc_id, ipv4_block, ipv6_block) AS
(VALUES (gen_random_uuid(), 'name', 'a description', current_timestamp(), current_timestamp(), NULL::TIMESTAMPTZ, '1f40f171-e2b9-4c84-935d-87114b0c7f74'::UUID, '172.29.0.0/22'::INET, 'fdeb:8d6f:2ac5::/64'::INET))
SELECT * FROM candidate
WHERE NOT EXISTS (
SELECT ipv4_block, ipv6_block FROM vpc_subnet
WHERE vpc_id = '1f40f171-e2b9-4c84-935d-87114b0c7f74' AND
time_deleted IS NULL AND
((ipv4_block && candidate.ipv4_block) OR (ipv6_block && candidate.ipv6_block)));
                                                                                                                   info
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  planning time: 778µs
  execution time: 2ms
  distribution: local
  vectorized: true
  rows read from KV: 4 (427 B)
  cumulative time spent in KV: 955µs
  maximum memory usage: 60 KiB
  network usage: 0 B (0 messages)

  • root
  │
  ├── • cross join (anti)
  │   │ nodes: n1
  │   │ actual row count: 0
  │   │ estimated row count: 1
  │   │ pred: (ipv4_block && column8) OR (ipv6_block && column9)
  │   │
  │   ├── • scan buffer
  │   │     nodes: n1
  │   │     actual row count: 1
  │   │     estimated row count: 1
  │   │     label: buffer 1 (candidate)
  │   │
  │   └── • index join
  │       │ nodes: n1
  │       │ actual row count: 2
  │       │ KV rows read: 2
  │       │ KV bytes read: 282 B
  │       │ estimated row count: 1
  │       │ table: vpc_subnet@primary
  │       │
  │       └── • scan
  │             nodes: n1
  │             actual row count: 2
  │             KV rows read: 2
  │             KV bytes read: 145 B
  │             estimated row count: 1 (100% of the table; stats collected 31 minutes ago)
  │             table: vpc_subnet@vpc_subnet_vpc_id_name_key (partial index)
  │             spans: [/'1f40f171-e2b9-4c84-935d-87114b0c7f74' - /'1f40f171-e2b9-4c84-935d-87114b0c7f74']
  │
  └── • subquery
      │ id: @S1
      │ original sql: VALUES (gen_random_uuid(), 'name', 'a description', current_timestamp(), current_timestamp(), NULL::TIMESTAMPTZ, '1f40f171-e2b9-4c84-935d-87114b0c7f74'::UUID, '172.29.0.0/22'::INET, 'fdeb:8d6f:2ac5::/64'::INET)
      │ exec mode: all rows
      │
      └── • buffer
          │ nodes: n1
          │ actual row count: 1
          │ label: buffer 1 (candidate)
          │
          └── • values
                size: 9 columns, 1 row
(52 rows)

Time: 4ms total (execution 4ms / network 0ms)

There's the line in the scan section showing that it's using the vpc_subnet@vpc_subnet_vpc_id_name_key index, and searching only specific spans. The table had only a single row here, so I'm not 100% sure it will scale, but this has been our bar for performance in the past.

This commit also generates different error messages for the client, depending on the type of "conflict". This shows the output of creating a new VPC Subnet with some invalid data. In the first two requests, the IPv4 or IPv6 subnet range overlaps, respectively. In the third, the ranges are fine, but the name conflicts.

bnaecker@shale : ~/omicron $ curl --request POST http://127.0.0.1:12220/organizations/my-org/projects/my-project3/vpcs/default/subnets -H "oxide-authn-spoof: 001de000-05e4-4000-8000-000000004007" -d '{"name": "default", "description": "howdy do", "ipv4_block": "172.30.0.0/22", "ipv6_block": "fd50:c734:ddaf::/64"}'
{
  "request_id": "69f7b60e-8e55-4cd4-bc3e-949b441bf15d",
  "error_code": "InvalidRequest",
  "message": "IP address ranges must not overlap for subnets within a VPC"
}
bnaecker@shale : ~/omicron $ curl --request POST http://127.0.0.1:12220/organizations/my-org/projects/my-project3/vpcs/default/subnets -H "oxide-authn-spoof: 001de000-05e4-4000-8000-000000004007" -d '{"name": "default", "description": "howdy do", "ipv4_block": "172.30.0.0/22", "ipv6_block": "fd50:c734:ddae::/64"}'
{
  "request_id": "61d0ebba-c08f-4c01-97de-291e0ef1252d",
  "error_code": "InvalidRequest",
  "message": "IP address ranges must not overlap for subnets within a VPC"
}
bnaecker@shale : ~/omicron $ curl --request POST http://127.0.0.1:12220/organizations/my-org/projects/my-project3/vpcs/default/subnets -H "oxide-authn-spoof: 001de000-05e4-4000-8000-000000004007" -d '{"name": "default", "description": "howdy do", "ipv4_block": "172.31.0.0/22", "ipv6_block": "fd50:c734:ddae::/64"}'
{
  "request_id": "011bff6e-2e08-48cb-9b90-596f016fa9bd",
  "error_code": "ObjectAlreadyExists",
  "message": "already exists: vpc-subnet \"default\""
}

Unfortunately, there's no way with the current code to catch both these, or to catch the name conflict first. That's because the CTE to filter the candidate is run first. So if the name conflicts, but so do the IP ranges, there's no row and the constraint violation on the name is not hit. I don't think this is a huge drawback, but it might be a bit surprising.

One other annoyance arises in the case where the IPv6 range is not specified. We generate a default, random ULA from the prefix of the VPC itself. This is a /64 allocated from the /48 prefix of the VPC. It's possible that the auto-generated range conflicts with one in the database. The probability for that is n_vpcs * 1/2**16. That's not high, but far higher than some other things we use randomness to avoid, like UUID collisions. And that probability assumes the user doesn't allocate anything larger than a /64. There's nothing stopping them from creating a /49, for example, though I'd probably recommend they do something else :) That will cause immediate conflict.

In this case, we return a 503, the intent being that another attempt is another dice roll, which will reduce the collision probability to n_vpcs * 1/2**32. That's very low indeed. We could do something smarter about retries, but I opted to defer that.

@bnaecker
Copy link
Collaborator Author

Well, that's sad. The error here is apparently a platform difference in how chrono renders nanosecond timestamps :/

@bnaecker bnaecker force-pushed the vpc-subnet-default-behavior branch from d1d7946 to 0299cbf Compare February 10, 2022 23:25
Copy link
Collaborator

@davepacheco davepacheco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for doing this change. It sounds like it'll close up a bunch of important edge cases.

This is not a complete review but I wanted to get this feedback sooner rather than later.

- Clarifies ULA checks, and adds methods for explictly checking VPC
  prefix validity and subnet validity for a given VPC.
- Simplifies random subnet generation using big-endian representation
- Adds more tests to random subnet generation, including fixed seed to
  verify random bits
- Adds test verifying that the query used to filter conflicting VPC
  Subnets does not induce a full table scan
- Adds a SubnetError type which can be returned from an attempt to
  insert a unique VPC Subnet. This has a variant that indicates there
  was a conflict in either the IPv4 or v6 address ranges, supporting
  retries and better errors for the client.
- Adds retry loop around the code that generates a random IPv6 subnet
  for a VPC Subnet, if users don't supply on in the request. This
  creates a random subnet based on the VPC's IPv6 prefix, and tries to
  insert it using the filtering query. If this fails some small number
  of times, currently 3 total, then a 503 is returned to the client. We
  also log a warning in this case.
@bnaecker

This comment was marked as resolved.

{
fn walk_ast(&self, mut out: AstPass<Pg>) -> QueryResult<()> {
out.push_sql("EXPLAIN (");
out.push_sql("EXPLAIN ");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of curiosity, why this change? I typically do use parentheses around stuff like this for both clarity and in case it gets composed into something else that would change the interpretation (though that latter case doesn't seem likely here).

Copy link
Collaborator Author

@bnaecker bnaecker Feb 18, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question. The test I added here fails with a syntax error if parentheses are used. I agree that it seems extras should be harmless, but the documentation and examples for EXPLAIN don't seem to use them anywhere around the expression to be explained. It seems that parentheses are used to define the kind of explanation, e.g., what types are inferred or how the optimizer assigns costs.

@bnaecker bnaecker merged commit 8663a79 into main Feb 18, 2022
@bnaecker bnaecker deleted the vpc-subnet-default-behavior branch February 18, 2022 20:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants