Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Self update worker container on deployed remote devices #41

Closed
johnmay1 opened this issue Feb 10, 2019 · 11 comments
Closed

Self update worker container on deployed remote devices #41

johnmay1 opened this issue Feb 10, 2019 · 11 comments

Comments

@johnmay1
Copy link

Expected/Wanted Behavior

Allow worker container to update to newer version deployed to remote devices

Actual Behavior

Not able to update worker container

@johnmay1 johnmay1 changed the title Update worker container deployed remotely Update worker container deployed remote devices Feb 10, 2019
@naorlivne
Copy link
Member

Hi @johnmay1 I'm not sure what you mean by this so just to make sure is this a feature request to have some sort of auto update of the worker containers to the latest version or is this an issue you have with updating the containers managed by the workers or something else entirely?

@johnmay1
Copy link
Author

Hi @naorlivne , yes this is a feature request to have some sort of auto update of the worker containers to the latest version.

@naorlivne
Copy link
Member

naorlivne commented Feb 11, 2019

Should be possible to be done with patch & minor versions that don't introduce breaking changes to the way workers communicate with the manager.

Thinking something like a cron script that checks if the higher version tagged container then the currently running worker exists and if so kills the worker container and starts a new one with that higher version in it's place, this will likely need to be another component that runs on the worker rather then something that runs inside the worker (because part of the update process involves killing the worker container) but don't see any reason why this optional cron script won't be a Nebula managed app in it's own right so it could be updated like a normal nebula app & in turn update the Nebula worker container so the end result is a system which updates itself fully.

This script will need some way to ensure it doesn't update to anything which introduces potentiality breaking changes to the system with the user express permission for it so thinking it should have a config\envvar option named something like "ALLOW_MAJOR_VERSION_UPDATES" that's by default is set to False and only if set to True will it update major versions as well.

Needless to say that this update service will be an entirely optional add-on rather then a core part of Nebula as some people value stability over updates & don't mind being a few version behind

That's my initial thoughts on the subject, if anyone has another idea feel free to speak up.

This also gave me the idea of #42 which might be used to simplify this service cron usage

@naorlivne
Copy link
Member

naorlivne commented Feb 11, 2019

Another idea is to have the worker container detect a new version is out (this option will be off by default and will need to be turned on via a "AUTO_UPDATE" parameter in the worker configuration) using the same logic as the "is newer version out" in my comment above & if that's true then follow this update flow:

  • Download the image of the newer version
  • Spawn 2 new containers
    • The first container sleep X seconds kills the original worker container & starts a new worker with the same configuration options as original worker (this container will be spawned with all the configuration version given to it from the original) then exits
    • The 2nd container sleeps Y seconds (where Y>X with a long enough safety span) that kills the new worker and restarts the original worker should it still exists when Y passes then exits (providing failback in case of issues with the upgrade)
  • The new worker container after starting up successfully removes the 2nd container.

@johnmay1
Copy link
Author

Another idea is to have the worker container detect a new version is out (this option will be off by default and will need to be turned on via a "AUTO_UPDATE" parameter in the worker configuration) using the same logic as the "is newer version out" in my comment above & if that's true then follow this update flow:

  • Download the image of the newer version

  • Spawn 2 new containers

    • The first container sleep X seconds kills the original worker container & starts a new worker with the same configuration options as original worker (this container will be spawned with all the configuration version given to it from the original) then exits
  • The 2nd container sleeps Y seconds (where Y>X with a long enough safety span) that kills the new worker and restarts the original worker should it still exists when Y passes then exits

  • The new worker container after starting up successfully removes the 2nd container.

I like the above suggestion. Would also like to add, report the currently running version of worker to Manager and make it available through API.

@naorlivne
Copy link
Member

Another idea is to have the worker container detect a new version is out (this option will be off by default and will need to be turned on via a "AUTO_UPDATE" parameter in the worker configuration) using the same logic as the "is newer version out" in my comment above & if that's true then follow this update flow:

  • Download the image of the newer version

  • Spawn 2 new containers

    • The first container sleep X seconds kills the original worker container & starts a new worker with the same configuration options as original worker (this container will be spawned with all the configuration version given to it from the original) then exits
  • The 2nd container sleeps Y seconds (where Y>X with a long enough safety span) that kills the new worker and restarts the original worker should it still exists when Y passes then exits

  • The new worker container after starting up successfully removes the 2nd container.

I like the above suggestion. Would also like to add, report the currently running version of worker to Manager and make it available through API.

Currently the workers only query the manager to get their state, the manager has a memoized cache which speeds up things considerably (and as a direct result allows a single manager take care of considerably more workers then it would without the cache thus lowering costs), unfortunately said cache also means that we can't have the workers report their state to the manager, they can only pull data from it and then each one of them has it's own internal logic to match it's state to the one it pulled, it's possible to have something like Kafka take care of data ingestion from the workers to a central backend DB which could then be queried from the managers and presented to the enduser\admin & as much as I would like to have that option (your not the first to asked for something similar which tells me this is a needed feature) Nebula is unfortunately still just a pet project of mine with very few other contributors so I don't really have the time to have an entirely new optional "workers status reporting" component added to the mix so if you feel like assisting in that regard I will gladly agree with you about that suggestion but otherwise I think that we'll stick to just updating the workers as a first priority and worry about reporting their current version at a later ticket.

Even the auto update is something that I honestly doubt will happen in the next few months as it will rely on the #42 which in itself is a major update that will take a rather large chunk of time to get production ready.

TL;DR:
Let's start with just the auto update for the this ticket, please open a ticket about wanting to get data from the workers in a centralized fashion from the manager including the current workers version and we'll worry about that in that ticket... that part is too big to worry about in the same ticket as this request.

@johnmay1
Copy link
Author

Yes, Agee with you @naorlivne . We can do reporting stuff later.

@naorlivne
Copy link
Member

naorlivne commented Feb 12, 2019

https://github.com/v2tec/watchtower or https://github.com/pyouroboros/ouroboros - seems like this would help simplify this task, currently leaning towards ouroboros more as watchtower seems to be no longer maintained... original thought is to have a container of it start/restart if the "AUTO_UPDATE" param is set to True that is configured to only update the worker at the end of the worker boot process.

@naorlivne
Copy link
Member

naorlivne commented Mar 4, 2019

Going to go with https://github.com/pyouroboros/ouroboros (guide at https://github.com/pyouroboros/ouroboros/wiki/Usage#core) will likely need to set the following flags on it:

Seems to me that providing a template\script\guide to have that managed as a cron with the assistance of #42 should be sufficient rather then having it built into a custom way inside the worker.

@naorlivne naorlivne changed the title Update worker container deployed remote devices Self update worker container on deployed remote devices Mar 4, 2019
@naorlivne naorlivne removed the design label Apr 18, 2019
@naorlivne
Copy link
Member

naorlivne commented Apr 21, 2019

I'll create a full guide as part of the documentation when I have more time but for now you can simply have a cron_job (released in 2.5.0) with the following config:

{
  "env_vars": {"RUN_ONCE": "true", "MONITOR":"worker", "CLEANUP":"true"},
  "docker_image" : "pyouroboros/ouroboros",
  "running": true,
  "volumes": ["/var/run/docker.sock:/var/run/docker.sock"],
  "networks": ["nebula", "bridge"],
  "devices": [],
  "privileged": false,
  "schedule": "0 * * * *"
}

then each device_group which have this cron_job defined as part of will have the worker auto updated to the latest following 3 cavets:

  1. The worker container is named "worker", if not you will need to change the "MONITOR" envvar to match the worker container name.
  2. You're using the "latest" or "arm64v8" tags of the workers.
  3. The version is checked every hour in the following, this may be often\not enough depanding on your use case so you might want to change the schedule cron to match your need.

@naorlivne
Copy link
Member

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants