Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

locksmithd will not reboot outside of reboot windows, if no semaphore was acquired before #10

Merged
merged 2 commits into from
Aug 3, 2021

Conversation

aniruddha2000
Copy link
Contributor

Locksmithd will reboot only in the reboot window

Get the period and check if we are inside the reboot window or not. If not inside the reboot window then sleep until next period and if inside the window then reboot.

How to use

Start flatcar instance with the provided ignition and execute the commands mention here.

Get the period and check if we are inside the reboot window
or not. If not inside the reboot window then sleep until next
period and if inside the window then reboot.
Copy link
Contributor

@tormath1 tormath1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall it looks good ! By reading the code / trying on a Flatcar instance, I think we could modify a bit the logic: we currently try to get a semaphore, if we fail we sleep time.Sleep(interval) and if we succeed we sleep until we land into the window reboot period - but we actually hold the semaphore during this time which could prevent other instances to reboot.

We could just modify a bit the logic here to:

  1. Try to get a semaphore
  2. If it fails but we are in the window reboot period: time.Sleep(interval) (current behavior)
  3. If if fails but we are not in the window reboot period: getPeriodicAndSleep()
  4. If it succeed: reboot

We would just need to create a boolean method something like isInRebootWindow(): (:warning: pseudo implementation - not tested)

	for {
		err := lck.Lock()
		if err != nil && err != lock.ErrExist {
                        if isInRebootWindow() {
			    dlog.Warningf("Failed to acquire lock: %v. Retrying in %v.", err, interval)
			    interval = expBackoff(interval)
                            // we are still inside the reboot window, we can just sleep into the interval  
			    time.Sleep(interval)
                        } else {
                            getPeriodicAndSleep()
                            // we just left the sleep period, back into the window reboot period - we just need to reset the `interval` value.
                            interval = initialInterval
                        }
			continue
		}


		r.rebootAndSleep()
		return

Let me know in this PR or on Matrix is something is unclear. You did a great work :)

EDIT: Actually the DurationToStart returns a value <= 0 if we are inside the period. So instead of creating a isInRebootWindow function, we could just call the getPeriodicAndSleep.

locksmithctl/daemon.go Outdated Show resolved Hide resolved
locksmithctl/daemon.go Outdated Show resolved Hide resolved
@aniruddha2000
Copy link
Contributor Author

aniruddha2000 commented Jul 20, 2021

Overall it looks good ! By reading the code / trying on a Flatcar instance, I think we could modify a bit the logic: we currently try to get a semaphore, if we fail we sleep time.Sleep(interval) and if we succeed we sleep until we land into the window reboot period - but we actually hold the semaphore during this time which could prevent other instances to reboot.

We could just modify a bit the logic here to:

  1. Try to get a semaphore
  2. If it fails but we are in the window reboot period: time.Sleep(interval) (current behavior)
  3. If if fails but we are not in the window reboot period: getPeriodicAndSleep()
  4. If it succeed: reboot

We would just need to create a boolean method something like isInRebootWindow(): ( pseudo implementation - not tested)

	for {
		err := lck.Lock()
		if err != nil && err != lock.ErrExist {
                        if isInRebootWindow() {
			    dlog.Warningf("Failed to acquire lock: %v. Retrying in %v.", err, interval)
			    interval = expBackoff(interval)
                            // we are still inside the reboot window, we can just sleep into the interval  
			    time.Sleep(interval)
                        } else {
                            getPeriodicAndSleep()
                            // we just left the sleep period, back into the window reboot period - we just need to reset the `interval` value.
                            interval = initialInterval
                        }
			continue
		}


		r.rebootAndSleep()
		return

Let me know in this PR or on Matrix is something is unclear. You did a great work :)

EDIT: Actually the DurationToStart returns a value <= 0 if we are inside the period. So instead of creating a isInRebootWindow function, we could just call the getPeriodicAndSleep.

So I can change the code inside getPeriodicAndSleep something like this for calling it instead of isInRebootWindow -

if sleeptime > 0 {
    dlog.Infof("Waiting for %s to reboot.", sleeptime)
    time.Sleep(sleeptime)
    return false, nil // nil for the startw & lengthw error return
}
return true, nil

Or just simple returning the value of sleeptime would be more convenient and meaningful?
But calling getPeriodicAndSleep multiple times will also print the log messages that it contains right?

locksmithctl/daemon.go Outdated Show resolved Hide resolved
locksmithctl/daemon.go Outdated Show resolved Hide resolved
locksmithctl/daemon.go Outdated Show resolved Hide resolved
@tormath1
Copy link
Contributor

tormath1 commented Jul 21, 2021

Or just simple returning the value of sleeptime would be more convenient and meaningful?

If you call the current getPeriodicAndSleep inside the reboot window, it will do nothing since sleeptime will be <= 0 so we don't need to add further logic on this, we just need to move the getPeriodicAndSleep to the right place.

But calling getPeriodicAndSleep multiple times will also print the log messages that it contains right?

That's right, it will log each time 🤔, it can be confusing. I think we could try to merge the two if period != nil and log the info only if sleeptime > 0:

-       if period != nil {
-               dlog.Infof("Reboot window start is %q and length is %q", startw, lengthw)
-               next := period.Next(time.Now())
-               dlog.Infof("Next window begins at %s and ends at %s", next.Start, next.End)
-       } else {
-               dlog.Info("No configured reboot window")
-       }
-
        if period != nil {
                now := time.Now()
                sleeptime := period.DurationToStart(now)
                if sleeptime > 0 {
-                       dlog.Infof("Waiting for %s to reboot.", sleeptime)
+                       dlog.Infof("Reboot window start is %q and length is %q", startw, lengthw)
+                       next := period.Next(now)
+                       dlog.Infof("Next window begins at %s and ends at %s", next.Start, next.End)
+                       dlog.Infof("Waiting for %s to try rebooting.", sleeptime)
                        time.Sleep(sleeptime)
                }
+       } else {
+               dlog.Info("No configured reboot window")
        }
+

@aniruddha2000
Copy link
Contributor Author

aniruddha2000 commented Jul 21, 2021

If you call the current getPeriodicAndSleep inside the reboot window, it will do nothing since sleeptime will be <= 0 so we don't need to add further logic on this, we just need to move the getPeriodicAndSleep to the right place.

right :)

We could just modify a bit the logic here to:

  1. Try to get a semaphore
  2. If it fails but we are in the window reboot period: time.Sleep(interval) (current behavior)
  3. If if fails but we are not in the window reboot period: getPeriodicAndSleep()
  4. If it succeed: reboot

We would just need to create a boolean method something like isInRebootWindow(): ( pseudo implementation - not tested)

Now for point 3 I am still unclear about below this. Because our code is still performing like if it succeed it will call the getPeriodicAndSleep.

EDIT: Actually the DurationToStart returns a value <= 0 if we are inside the period. So instead of creating a isInRebootWindow function, we could just call the getPeriodicAndSleep.

For that creating a isInRebootWindow function will be more meaningful right? :)
Hence creating a isInRebootWindow i guess will also duplicate some code from the getPeriodicAndSleep, like getting the periodic ad check if we are in the reboot window or not by calling the DurationToStart.

locksmithctl/daemon.go Outdated Show resolved Hide resolved
locksmithctl/daemon.go Outdated Show resolved Hide resolved
locksmithctl/daemon.go Outdated Show resolved Hide resolved
locksmithctl/daemon.go Outdated Show resolved Hide resolved
locksmithctl/daemon.go Outdated Show resolved Hide resolved
@tormath1
Copy link
Contributor

tormath1 commented Aug 2, 2021

Awesome ! Thanks @aniruddha2000 ; everything looks correct now. Before a final review, would you mind rearrange a bit the commits ?
Logic has changed a bit between the first and the last commit - so we can squash a bit the commits. In example, everything related to the waitForRebootWindow could be in one commit with a body describing why we created such a function.

waitForRebootWindow get the environment variable REBOOT_WINDOW_START and
REBOOT_WINDOW_LENGTH and check if everything is ok or not. If not then
return error. If ok then get the next periodic by the ParsePeriodic
function.

Then check if we are inside the reboot window or not. If we are inside
the reboot window then it will sleep until the next window and return a
true. If outside the window then it just return the false.

In the lockAndReboot function we check if it has sleep or not and handle the
error.
@aniruddha2000
Copy link
Contributor Author

Hi @tormath1 I squashed last 12 commits because it all related to the waitForRebootWindow and kept this commit apart because we didn't implemented any function that time.

Copy link
Contributor

@tormath1 tormath1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks we are good ! Feel free to merge when you want - thanks again for your great work !

@tormath1 tormath1 merged commit c96ae6e into flatcar:flatcar-master Aug 3, 2021
@tormath1
Copy link
Contributor

tormath1 commented Aug 3, 2021

(I merged because I don't think you have enough permissions to do it)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants