-
Notifications
You must be signed in to change notification settings - Fork 10
Description
/// NOTE: Unfortunately this doesn't automatically catch changes to
/// the API and upate itself. We must be vigilant to increment this
/// number when modifying the API.
As my comment above API_VERSION warns: it is easy to forget to increment the number when changing the API. I recently ran into an instance create issue in Omicron when upgrading xde from the current included commit of b9980158540d15d44cfc5d17fc0a5d1848c5e1ae to the latest commit of 12450552bdea4b3f97d309bfed5ad65dc2e2a775. The failure is the following, found in the sled-agent log:
{"msg":"request completed","v":0,"name":"SledAgent","level":30,"time":"2022-06-23T19:32:58.149139933Z","hostname":"kalm","pid":2419,"uri":"/instances/d9356749-e1af-4856-94e4-fd1b827fe5dc","method":"PUT","req_id":"f56b57b4-db98-4975-8585-fac2dc0374c5","remote_addr":"[fd00:1122:3344:101::3]:56544","local_addr":"[fd00:1122:3344:101::1]:12345","component":"dropshot (SledAgent)","error_message_external":"Internal Server Error","error_message_internal":"Error managing instances: Instance error: Failure interacting with the OPTE ioctl(2) interface: command CreateXde failed: DeserCmdReq(\"Hit the end of buffer, expected more data\")","response_code":"500"}
This points to the CreateXde ioctl. Sure enough, if we look at the 7 commits that have happened in the meantime, we find our culprit:
Author: Ryan Zezeski <ryan@oxide.computer>
Date: Fri May 20 12:55:33 2022 -0600
SNAT should predicate on router target (#153)
This change specifically, which updates the CreateXdeReq structure.
2047842#diff-69375b575b303e0752185db891c91e1476979415c39d98a9db7f438e7dfbd676
As I failed to update the API_VERSION value alongside this change it means that the older client will not catch the change upfront and thus will send a serialized buffer that is too short, causing the kernel's handler to fail with an OpteError::DeserCmdReq during the copy_in_req() call.
Three things should happen:
- Update the
API_VERSIONimmediately in its own commit to hopefully spare others this pain. - Update sled-agent to work with the new API.
- Find a way to automate the API_VERSION update mechanism; or perhaps I can replace it with something better.