Decouple bandwidth reservations from oversubscription#238
Conversation
|
Firetiger deploy monitoring skipped This PR didn't match the auto-monitor filter configured on your GitHub connection:
Reason: PR modifies resource reservation logic and CloudFormation templates, but does not appear to change API endpoints (packages/api/cmd/api/) or Temporal workflows (packages/api/lib/temporal) that would trigger kernel API monitoring. To monitor this PR anyway, reply with |
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
Bugbot Autofix is ON. A cloud agent has been kicked off to fix the reported issue. You can view the agent here.
Reviewed by Cursor Bugbot for commit e603ced. Configure here.
| install -d -m 755 /etc/systemd/system/hypeman.service.d | ||
| cat >/etc/systemd/system/hypeman.service.d/disk-io-capacity.conf <<EOF | ||
| [Service] | ||
| Environment="CAPACITY__DISK_IO=${DataVolumeThroughput}MB/s" |
There was a problem hiding this comment.
MiB vs MB unit mismatch in disk I/O capacity
Low Severity
The DataVolumeThroughput parameter is documented as "throughput in MiB/s" (matching AWS gp3 units), but the generated environment variable appends MB/s — a decimal megabyte unit. The parseDiskIOLimit function and the c2h5oh/datasize library treat MB as 1,000,000 bytes, while AWS's MiB equals 1,048,576 bytes. This causes CAPACITY__DISK_IO to be ~4.9% lower than the actual provisioned throughput, so per-VM defaults and admission limits are consistently underestimated.
Additional Locations (1)
Reviewed by Cursor Bugbot for commit e603ced. Configure here.
There was a problem hiding this comment.
Bugbot Autofix determined this is a false positive.
This is not a real capacity mismatch because the datasize parser used by parseDiskIOLimit interprets MB as 1024^2 bytes (MiB), so ${DataVolumeThroughput}MB/s maps to the same throughput value provisioned in MiB/s.
You can send follow-ups to the cloud agent here.
e603ced to
2aa19a3
Compare
sjmiller609
left a comment
There was a problem hiding this comment.
makes sense, thanks


Summary
CAPACITY__DISK_IOonly whenDataVolumeThroughputis providedWhy
Default per-VM bandwidth reservations previously used the oversubscribed effective capacity as their input. That meant raising the oversubscription ratio increased the aggregate limit and each default VM request by the same factor, so the ratio did not let the host pack more default VMs when that resource was the binding constraint.
Illustrative example for disk I/O on a 32-vCPU host with 1GB/s disk I/O and 1-vCPU VMs:
Network had the same shape. This change makes the default per-VM reservation proportional to raw capacity, while admission still uses
capacity * oversubscription_ratio.Validation
go test ./lib/resourcesgo test ./...indeploy/aws/cloudformationgit diff --checkA full repo-wide
go test ./...was not run successfully from this checkout because some test packages require embedded local binaries that are not present in the repository checkout.