Self update worker container on deployed remote devices #41

johnmay1 · 2019-02-10T23:00:11Z

Expected/Wanted Behavior

Allow worker container to update to newer version deployed to remote devices

Actual Behavior

Not able to update worker container

naorlivne · 2019-02-10T23:22:49Z

Hi @johnmay1 I'm not sure what you mean by this so just to make sure is this a feature request to have some sort of auto update of the worker containers to the latest version or is this an issue you have with updating the containers managed by the workers or something else entirely?

johnmay1 · 2019-02-10T23:27:40Z

Hi @naorlivne , yes this is a feature request to have some sort of auto update of the worker containers to the latest version.

naorlivne · 2019-02-11T08:41:00Z

Should be possible to be done with patch & minor versions that don't introduce breaking changes to the way workers communicate with the manager.

Thinking something like a cron script that checks if the higher version tagged container then the currently running worker exists and if so kills the worker container and starts a new one with that higher version in it's place, this will likely need to be another component that runs on the worker rather then something that runs inside the worker (because part of the update process involves killing the worker container) but don't see any reason why this optional cron script won't be a Nebula managed app in it's own right so it could be updated like a normal nebula app & in turn update the Nebula worker container so the end result is a system which updates itself fully.

This script will need some way to ensure it doesn't update to anything which introduces potentiality breaking changes to the system with the user express permission for it so thinking it should have a config\envvar option named something like "ALLOW_MAJOR_VERSION_UPDATES" that's by default is set to False and only if set to True will it update major versions as well.

Needless to say that this update service will be an entirely optional add-on rather then a core part of Nebula as some people value stability over updates & don't mind being a few version behind

That's my initial thoughts on the subject, if anyone has another idea feel free to speak up.

This also gave me the idea of #42 which might be used to simplify this service cron usage

naorlivne · 2019-02-11T10:37:07Z

Another idea is to have the worker container detect a new version is out (this option will be off by default and will need to be turned on via a "AUTO_UPDATE" parameter in the worker configuration) using the same logic as the "is newer version out" in my comment above & if that's true then follow this update flow:

Download the image of the newer version
Spawn 2 new containers
- The first container sleep X seconds kills the original worker container & starts a new worker with the same configuration options as original worker (this container will be spawned with all the configuration version given to it from the original) then exits
- The 2nd container sleeps Y seconds (where Y>X with a long enough safety span) that kills the new worker and restarts the original worker should it still exists when Y passes then exits (providing failback in case of issues with the upgrade)
The new worker container after starting up successfully removes the 2nd container.

johnmay1 · 2019-02-11T15:12:47Z

Another idea is to have the worker container detect a new version is out (this option will be off by default and will need to be turned on via a "AUTO_UPDATE" parameter in the worker configuration) using the same logic as the "is newer version out" in my comment above & if that's true then follow this update flow:

Download the image of the newer version

Spawn 2 new containers

The first container sleep X seconds kills the original worker container & starts a new worker with the same configuration options as original worker (this container will be spawned with all the configuration version given to it from the original) then exits

The 2nd container sleeps Y seconds (where Y>X with a long enough safety span) that kills the new worker and restarts the original worker should it still exists when Y passes then exits

The new worker container after starting up successfully removes the 2nd container.

I like the above suggestion. Would also like to add, report the currently running version of worker to Manager and make it available through API.

naorlivne · 2019-02-11T15:46:27Z

Another idea is to have the worker container detect a new version is out (this option will be off by default and will need to be turned on via a "AUTO_UPDATE" parameter in the worker configuration) using the same logic as the "is newer version out" in my comment above & if that's true then follow this update flow:

Download the image of the newer version

Spawn 2 new containers

The first container sleep X seconds kills the original worker container & starts a new worker with the same configuration options as original worker (this container will be spawned with all the configuration version given to it from the original) then exits

The 2nd container sleeps Y seconds (where Y>X with a long enough safety span) that kills the new worker and restarts the original worker should it still exists when Y passes then exits

The new worker container after starting up successfully removes the 2nd container.

I like the above suggestion. Would also like to add, report the currently running version of worker to Manager and make it available through API.

Currently the workers only query the manager to get their state, the manager has a memoized cache which speeds up things considerably (and as a direct result allows a single manager take care of considerably more workers then it would without the cache thus lowering costs), unfortunately said cache also means that we can't have the workers report their state to the manager, they can only pull data from it and then each one of them has it's own internal logic to match it's state to the one it pulled, it's possible to have something like Kafka take care of data ingestion from the workers to a central backend DB which could then be queried from the managers and presented to the enduser\admin & as much as I would like to have that option (your not the first to asked for something similar which tells me this is a needed feature) Nebula is unfortunately still just a pet project of mine with very few other contributors so I don't really have the time to have an entirely new optional "workers status reporting" component added to the mix so if you feel like assisting in that regard I will gladly agree with you about that suggestion but otherwise I think that we'll stick to just updating the workers as a first priority and worry about reporting their current version at a later ticket.

Even the auto update is something that I honestly doubt will happen in the next few months as it will rely on the #42 which in itself is a major update that will take a rather large chunk of time to get production ready.

TL;DR:
Let's start with just the auto update for the this ticket, please open a ticket about wanting to get data from the workers in a centralized fashion from the manager including the current workers version and we'll worry about that in that ticket... that part is too big to worry about in the same ticket as this request.

johnmay1 · 2019-02-11T16:23:28Z

Yes, Agee with you @naorlivne . We can do reporting stuff later.

naorlivne · 2019-02-12T15:00:06Z

https://github.com/v2tec/watchtower or https://github.com/pyouroboros/ouroboros - seems like this would help simplify this task, currently leaning towards ouroboros more as watchtower seems to be no longer maintained... original thought is to have a container of it start/restart if the "AUTO_UPDATE" param is set to True that is configured to only update the worker at the end of the worker boot process.

naorlivne · 2019-03-04T13:52:03Z

Going to go with https://github.com/pyouroboros/ouroboros (guide at https://github.com/pyouroboros/ouroboros/wiki/Usage#core) will likely need to set the following flags on it:

Docker Sockets
Run Once
Self Update
Monitor or Labels Only (so it only updates the worker)
Cleanup
Repository User & Repository Password or mount the docker config file as described in https://github.com/pyouroboros/ouroboros/wiki/Private-Registries

Seems to me that providing a template\script\guide to have that managed as a cron with the assistance of #42 should be sufficient rather then having it built into a custom way inside the worker.

naorlivne · 2019-04-21T11:49:19Z

I'll create a full guide as part of the documentation when I have more time but for now you can simply have a cron_job (released in 2.5.0) with the following config:

{
  "env_vars": {"RUN_ONCE": "true", "MONITOR":"worker", "CLEANUP":"true"},
  "docker_image" : "pyouroboros/ouroboros",
  "running": true,
  "volumes": ["/var/run/docker.sock:/var/run/docker.sock"],
  "networks": ["nebula", "bridge"],
  "devices": [],
  "privileged": false,
  "schedule": "0 * * * *"
}

then each device_group which have this cron_job defined as part of will have the worker auto updated to the latest following 3 cavets:

The worker container is named "worker", if not you will need to change the "MONITOR" envvar to match the worker container name.
You're using the "latest" or "arm64v8" tags of the workers.
The version is checked every hour in the following, this may be often\not enough depanding on your use case so you might want to change the schedule cron to match your need.

naorlivne · 2019-04-22T11:01:59Z

Guide added at https://nebula.readthedocs.io/en/latest/auto-update-workers/

johnmay1 changed the title ~~Update worker container deployed remotely~~ Update worker container deployed remote devices Feb 10, 2019

naorlivne added enhancement help wanted design labels Feb 11, 2019

naorlivne mentioned this issue Mar 4, 2019

Auto match version to branch on deployment and have it part of the report generated for the optional reporting system #43

Open

naorlivne changed the title ~~Update worker container deployed remote devices~~ Self update worker container on deployed remote devices Mar 4, 2019

naorlivne removed the design label Apr 18, 2019

naorlivne closed this as completed Apr 22, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Self update worker container on deployed remote devices #41

Self update worker container on deployed remote devices #41

johnmay1 commented Feb 10, 2019

naorlivne commented Feb 10, 2019

johnmay1 commented Feb 10, 2019

naorlivne commented Feb 11, 2019 •

edited

naorlivne commented Feb 11, 2019 •

edited

johnmay1 commented Feb 11, 2019

naorlivne commented Feb 11, 2019

johnmay1 commented Feb 11, 2019

naorlivne commented Feb 12, 2019 •

edited

naorlivne commented Mar 4, 2019 •

edited

naorlivne commented Apr 21, 2019 •

edited

naorlivne commented Apr 22, 2019

Self update worker container on deployed remote devices #41

Self update worker container on deployed remote devices #41

Comments

johnmay1 commented Feb 10, 2019

Expected/Wanted Behavior

Actual Behavior

naorlivne commented Feb 10, 2019

johnmay1 commented Feb 10, 2019

naorlivne commented Feb 11, 2019 • edited

naorlivne commented Feb 11, 2019 • edited

johnmay1 commented Feb 11, 2019

naorlivne commented Feb 11, 2019

johnmay1 commented Feb 11, 2019

naorlivne commented Feb 12, 2019 • edited

naorlivne commented Mar 4, 2019 • edited

naorlivne commented Apr 21, 2019 • edited

naorlivne commented Apr 22, 2019

naorlivne commented Feb 11, 2019 •

edited

naorlivne commented Feb 11, 2019 •

edited

naorlivne commented Feb 12, 2019 •

edited

naorlivne commented Mar 4, 2019 •

edited

naorlivne commented Apr 21, 2019 •

edited