-
Notifications
You must be signed in to change notification settings - Fork 38
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MM-14999: Installation API #15
Conversation
This changelist introduces the CLI API, REST API, store updates and supervisor additions to manage installations and cluster installations. Created installations attempt to schedule a cluster installation on the first available cluster. While the API remains async as previously changed, I've ripped out the in-process parallelism in an attempt to simplify the data model surrounding locking. This will need to be revisited when we scale the provisioning server horizontally.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a lot of code but I think I internalized what's happening pretty well. Looks good for the most part, just a few comments
return | ||
} | ||
|
||
logger.Debugf("Transitioned installation from %s to %s", oldState, newState) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if this would be good to log at INFO level rather than DEBUG or would that be too spammy?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I personally found this a little spammy, and tried to make sure that the INFO-level logs capture all necessary detail for still being able to understand the meaningful changes here. Propose we run debug enabled in production anyway, but filter out downstream.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a lot of code! :) Nice functionality boost here.
So, you are right that some in-flight PRs in operator will be very handy for long-term integration functionality with CRs, but we have a good way to stub it out right now (which is exactly what you did).
Left a few comments, but it is looking good!
|
||
var installationCreateCmd = &cobra.Command{ | ||
Use: "create", | ||
Short: "Create an installation.", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not advocating for a change at this time, but I am worried about our double(triple?) usage of the word installation
. Maybe I am just missing something that would help me build up the innate meaning of cluster installation
vs. installation
, but I am finding it to be a bit of a struggle.
We may want to see if some other verbiage works better in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that installation as a concept is likely overkill given our current plans to make them 1:1 with a cluster installation. My hope was that by forcing the concept of a multi-cluster installation on the domain, we'd avoid painting ourselves into a naming corner that would be even more confusing in the future. Very happy to iterate on this, though!
|
||
var installationUpgradeCmd = &cobra.Command{ | ||
Use: "upgrade", | ||
Short: "Upgrade (or downgrade) the version of Mattermost.", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we actually want to support downgrades? I don't have enough experience to know the pros and cons of this decision.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
5/5 that we should support downgrades: otherwise there is no way to rollback an installation gone awry. The current plan is to let the operator decide how far back to allow such a downgrade -- e.g. we may not support downgrading across major versions.
deletingClusterInstallations++ | ||
} | ||
|
||
logger.Debugf( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good place for WithFields()
in my opinion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
2/5, but I'm not sure fields are fungible for sprintf-statements in all cases. Certainly we'd want structured logging for pretty much all common verbiage (e.g. the cluster
and installation
we do today), but are we actually looking to "parse" cluster_installation_deleted_count
, etc.?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In my head the fields would be no different that what you had: deleting
, failed
, etc. You would narrow the search down with a second criteria something else in the log message if you just wanted these messages.
We can always implement this later if we feel like it!
Co-Authored-By: Gabe Jackson <gabe@coffeepowered.co>
Thanks for the careful reviews and feedback, @jwilander & @gabrieljackson! A few comments above, plus some improvements based on the requested feedback :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 👍
This changelist introduces the CLI API, REST API, store updates and
supervisor additions to manage installations and cluster installations.
Created installations attempt to schedule a cluster installation on the
first available cluster. No attempt is currently made to wait for the cluster installation to actually stabilize, but I anticipate integrating that with pending changes from @gabrieljackson.
While the API remains async as previously changed, I've ripped out the
in-process parallelism in an attempt to simplify the data model
surrounding locking. This will need to be revisited when we scale the
provisioning server horizontally, but for now I haven't been able to come up with a straightforward strategy that avoids lock contention. Very open to suggestions!