Join GitHub today
GitHub is home to over 36 million developers working together to host and review code, manage projects, and build software together.Sign up
x/build/cmd/scaleway: our Scaleway arm machines are misbehaving #32229
According to farmer, a bunch of Scaleway machines are connected with duplicate hostnames, and 12 are missing
And in the kubectl logs for the scaleway service (that runs cmd/scaleway to keep things healthy):
Somebody should investigate & fix.
I've logged in to the Scaleway UI and investigated.
There were some duplicate instances, which is what was causing:
The original instances were 2 years old, and the new ones were created on various days, within the last month or so.
That shouldn't happen in
Of the 18 missing machines, some are started and I was able to ssh into them successfully. However, for some reason, they're not running the buildlet in a docker container.
Compare the output from a healthy instance:
Compared to a missing one:
Why that is needs more investigation. Perhaps a good thing to try is to just re-create those instances and see if that solves the problem.
There are also some instances in the Scaleway UI that are perpetually in the "stopping" state, and never actually completing. I'm going to open a Scaleway ticket about that.
I got rid of all the duplicate instances last time, but there are two duplicate again today:
I checked the Scaleway UI, and the duplicate
That shouldn't have happened because there's an original
After some debugging, I've found that the code isn't doing pagination when listing servers, and the default page size was just 50, which wasn't enough to list all servers (we have around 70: 50 expected + some duplicates + some that are stuck shutting down). Going to fix that first and then see what more needs to be done here.
Due to the aforementioned pagination issue,
I cleaned up all the duplicate instances, and also removed some stale instances that weren't connecting successfully.
So we went from this state (or an even worse version thereof):
There are now exactly 51 instances on scaleway, the 50 prod ones and 1 prep one, and all 50 are connected.