-
Notifications
You must be signed in to change notification settings - Fork 22
AppServers Cleanup v2 - Only keep active, fallback and RC servers #170
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@antoviaque I had a look at the code and left some comments. Nothing major, the approach looks good in general. Did not get around to testing this yet; do you think it would make sense to deploy the changes to stage? That might speed things up, especially w/r/t being able to provision multiple app servers at once. (I don't have a working instance manager configuration on my local machine at the moment; the last time I had to spin up instances locally was before Sven finished the load balancer implementation.)
""" | ||
Check if delta between `date` and `reference_date` is greater than or equal to `expected_days_since`. | ||
Check if `later_date` is at least `expected_days_passed` after `earlier_date`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@antoviaque Typo: expected_days_passed
should be expected_days_since
. That change might make the sentence a little less readable, so perhaps change the entire thing to read:
Check if at least `expected_days_since` have passed between `earlier_date` and `later_date`.
@@ -162,6 +162,7 @@ class AppServer(ValidateModelMixin, TimeStampedModel): | |||
server = models.OneToOneField(OpenStackServer, on_delete=models.CASCADE, related_name='+') | |||
# The Instance that owns this. InstanceReference has related_name accessors like 'openedxappserver_set' | |||
owner = models.ForeignKey(InstanceReference, on_delete=models.CASCADE, related_name='%(class)s_set') | |||
last_activated = models.DateTimeField(null=True, blank=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@antoviaque Nit: Maybe add a comment above this line that mentions the purpose of this field?
|
||
# Keep a running appserver as fallback for `days` after activation, to allow reverts | ||
if active_appserver and appserver.created < active_appserver.last_activated: | ||
if not sufficient_time_passed(active_appserver.last_activated, timezone.now(), days) \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@antoviaque Is there a specific reason for calling timezone.now()
in three different places in this method? If not, maybe it would be worth calculating a single reference_date
at the beginning (before starting to loop over app servers), and using that to check if sufficient_time_passed
. If nothing else, using one specific date per run of the terminate_obsolete_appservers
method could be helpful for debugging purposes.
correctly identifies and terminates app servers created more than `days` before now, except: | ||
- the active appserver | ||
- a release candidate server (the most recent running appserver) | ||
- a fallback appserver for `days` after activation (the most recent running appserver created before) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@antoviaque "... created before the currently-active app server was activated)", perhaps?
]) | ||
|
||
def test_terminate_obsolete_appservers_no_active(self): | ||
# Terminate app servers after `days` have passed - the fallback appserver should be terminated | ||
# and the other appservers |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@antoviaque "and the other appservers": Might be worth being a bit more specific here?
(rc_appserver_failed, AppServerStatus.ConfigurationFailed, ServerStatus.Pending), | ||
]) | ||
|
||
# Terminated app servers again, much later - this time only the active appserver and the most recent |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@antoviaque Typo: "Terminated" should be "Terminate"
|
||
pip install python-novaclient | ||
pip install python-openstackclient |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for updating this :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
: ) yw!
@antoviaque A quick question regarding the test instructions, just to clarify:
"After two days" -- are you referring to the (fake) date at which I'd be running the |
@itsjeyd Thanks for the review! I'll go through your comments to address them, but I wanted to answer your questions quickly:
Yup, we could definitely deploy on stage - I'll try to do this after addressing your current comments if I have time. Or feel free to do it directly if you want to test this before I get around to do it.
Yes, exactly. |
5e2527e
to
d704071
Compare
f7825b0
to
79c3399
Compare
The previous command failed for me with: ```error: unrecognized arguments: --pub_key``` and the documentation now recommends the `openstack` wrapper, which is useful to have in the venv.
Ensure `AppServer.last_activated` is set for existing AppServers at the time of the migration.
79c3399
to
c924013
Compare
|
||
# Keep a running appserver as fallback for `days` after activation, to allow reverts | ||
if active_appserver and appserver.created < active_appserver.last_activated: | ||
if not sufficient_time_passed(active_appserver.last_activated, now, days) \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@antoviaque If there is a negative delta between the first and the second date passed to sufficient_time_passed
(i.e., if the first date is more recent than the second one), the function will return False
. If that situation ever came up (highly unlikely), not sufficient_time_passed(...)
would evaluate to True
and we'd end up keeping an additional fallback_appserver
.
Just wanted to mention this; no need to change anything. The "dangerous" code that terminates VMs below will only run if sufficient_time_passed
returns True
, so a negative delta between the first and second date would not cause VMs to get terminated.
@antoviaque Thanks for the updates, and for creating the data migration! I ended up bringing my local installation of OC IM up to date yesterday, and completed a first round of testing this morning (including verifying that the data migration correctly updates The code is looking good, and I didn't find any issues during the first round of testing. But I'll still need to verify that we're getting desired behavior when there is no active app server. I'll use stage to do that, and let you know when I'm done. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- I tested this both locally and on stage (instance); works as advertised in PR description.
- I read through the code
-
I checked for accessibility issuesN/A -
Includes documentationThe README does not include documentation about Huey tasks. It might be worth adding that at some point, but I'm fine with considering it out-of-scope here.
@itsjeyd Thank you for the prompt review and testing! Merging and deploying this now. |
Terminate app servers that were created more than
days
before now, except:days
after activating an appserver, to allow reverts (we keep the most recent running appserver created before the current activation)See OC-2041 . Follow-up to #143
Test Instructions
Create an instance that is not associated with a PR:
Run
make shell
.Create instance via:
Spawn four app servers for the instance, faking their age:
Activate
active_appserver
:Check the state of the appservers. You can use the GUI or do it directly in the shell via something like:
This should produce the following output:
Run the
clean_up
task:Verify that only the oldest appserver has been shut down. This should produce the following output:
Re-run the
cleanup_task
from different dates to check that the behavior correctly matches the description at the top of this PR. Especially:days
have passedTo run the
cleanup_task
from different dates, use (make sure you useHUEY_ALWAYS_EAGER=true
):At D+3, you should get:
Ensure AppServers which aren't in the 'running' state always get deprovisioned after two days.
Double-check "OVH - OpenStack - Horizon - OpenCraft IM - Dev" account to see if correct set of app servers left.
Reviewers