Skip to content
This repository has been archived by the owner on Nov 9, 2022. It is now read-only.

Timestamp-verified restart #703

Closed
kford-oconnors opened this issue Nov 18, 2016 · 9 comments
Closed

Timestamp-verified restart #703

kford-oconnors opened this issue Nov 18, 2016 · 9 comments

Comments

@kford-oconnors
Copy link
Contributor

A restart function that verifies the restart was successful. Convenient for build scripts. Based on this example.

  def verified_restart
    old_timestamp = go(%Q{http://#{@properties["ml.server"]}:8001/admin/v1/timestamp}, "get").body
    restart
    retry_count = 0
    retry_max = 5
    retry_interval= 10
    new_timestamp = old_timestamp
    while retry_count < retry_max do
      begin
        new_timestamp = go(%Q{http://#{@properties["ml.server"]}:8001/admin/v1/timestamp}, "get").body
      rescue
        logger.debug 'Retry attempt ' + retry_count.to_s + ' failed'
      end
      if new_timestamp != old_timestamp
        logger.info 'Verfied restart.'
        break
      end
      logger.debug "Verifying restart..."
      sleep retry_interval
      retry_count += 1
    end
    if new_timestamp == old_timestamp
      logger.warn "Could not verify restart"
    end
  end
end
@RobertSzkutak
Copy link
Contributor

I am tempted to rename your function to restart and rename our restart to something else.

What do you think, @dmcassel and @grtjn ?

@dmcassel
Copy link
Collaborator

dmcassel commented Dec 5, 2016

I like the idea of this functionality being the default restart behavior.

@RobertSzkutak RobertSzkutak added this to the April 2017 milestone Jan 31, 2017
@divino divino mentioned this issue Apr 25, 2017
dmcassel added a commit that referenced this issue Apr 25, 2017
@dmcassel
Copy link
Collaborator

This is partially implemented in dev. The verified restart will work for the target host, but doesn't yet check that a cluster has fully restarted.

@RobertSzkutak RobertSzkutak modified the milestones: July 2017, April 2017 May 5, 2017
@grtjn
Copy link
Contributor

grtjn commented May 28, 2017

We might need to check what restart actually does. It takes some params, so it might look for timestamps of bootstrap-host, but not actually restarting bootstrap-host.

Also wondering if it would make sense to check all hosts participating in a cluster for having completed restart before continuing. Would be more straight-forward if the (original) restart function would return a list of hosts that will restart, and then the verify routine would simply check them all (one after the other perhaps)..

@divino
Copy link
Contributor

divino commented May 28, 2017 via email

@grtjn
Copy link
Contributor

grtjn commented May 29, 2017

Roxy supports restarting a specific group (e.g. subset of all hosts) or entire cluster (e.g. all hosts in cluster). You'd need to distinguish which were involved in restart first. But I'd be interested in seeing what you did. Can you send that offline?

@divino
Copy link
Contributor

divino commented May 29, 2017 via email

@grtjn
Copy link
Contributor

grtjn commented May 29, 2017

Thanks for the code. I have been thinking about this, and thought it might be more efficient and future-proof to use the manage rest api for the restart too. Those return timestamps of all involved hosts, skipping the need to look them up first.

grtjn added a commit to grtjn/roxy that referenced this issue May 29, 2017
@grtjn grtjn modified the milestones: May 2017, July 2017 May 29, 2017
@grtjn grtjn self-assigned this May 29, 2017
grtjn added a commit to grtjn/roxy that referenced this issue Jun 1, 2017
grtjn added a commit to grtjn/roxy that referenced this issue Jun 1, 2017
grtjn added a commit to grtjn/roxy that referenced this issue Jun 7, 2017
grtjn added a commit to grtjn/roxy that referenced this issue Jun 7, 2017
grtjn added a commit to grtjn/roxy that referenced this issue Jun 7, 2017
grtjn added a commit to grtjn/roxy that referenced this issue Jun 7, 2017
grtjn added a commit to grtjn/roxy that referenced this issue Jun 7, 2017
grtjn added a commit to grtjn/roxy that referenced this issue Jun 7, 2017
RobertSzkutak added a commit that referenced this issue Jun 7, 2017
Fixed #703: restart using rest api, iterate hosts to verify
@grtjn
Copy link
Contributor

grtjn commented Jun 8, 2017

Fixed in dev

@grtjn grtjn closed this as completed Jun 8, 2017
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

5 participants