worker approach: generic or specialized #8

automactic · 2017-05-22T14:00:25Z

Regarding how workers should be designed, there are two approaches, generic and specialized. And we need to make a decision to go forward.

Note: the work task used in this issue is referring to things like mwoffliner and maintenance

Generic

Dispatcher send name of script to worker, worker download the script and execute it. Or dispatcher directly send content of the script to worker. In both situation, worker trust the script it receives is legit and executes it.

Pros:

convenience: no need to configure new worker for new task type

Cons:

security risk: the script / command could be modified during transmission
security risk: docker socket of host is exposed to worker

Specialized

Every worker is specialized, i.e., one type of worker can and can only handle zim file generation, another type of worker can and can only handle maintenance task in dispatcher. Only parameters / settings are transferred from dispatcher to worker. Worker then do the necessary work. In the case of mwoffliner it will be to make sure redis server running, run mwoffliner command with parameters, upload the file.

Pros:

much less security risk:
- truly containerized, no socket sharing
- worker can only run permitted commands
- commands run in exec mode, not in shell
more detailed progress reporting: upload progress, etc

Cons:

inconvenience: have to create new worker type for new task type

The text was updated successfully, but these errors were encountered:

automactic · 2017-06-14T20:12:38Z

@kelson42's propose:

worker serves as controller, have access to host's docker socket
worker launch other containers to do specific tasks when needed
a task is a list of commands (bash scripts) executable on a specific docker image/container.

kelson42 · 2017-06-14T20:23:37Z

Here a comments on the cons about a generic approach:

security risk: the script / command could be modified during transmission: I do not know if this comes from the web API of from the queue, but each of them should have (1) an authentication system (2) a crypted channel. If that works, the message can not be changed.
- this, theoretically speaking is true. But you never know it could go wrong before it happens. If there is a more secure way, we should always consider the more secure way.
- also I would imagine you need sudo to run docker commands, which opens up more security loopholes (by no means I am a security expert, but this article specifically said in bold font "only trusted users should be allowed to control your Docker daemon")
security risk: docker socket of host is exposed to worker: yes, the docker users/zimfarm worker master need to agree on this, but this not unusual at all and a standard docker use case.
- I am sorry, I would not trust anyone to access my docker sockets and run docker commands, particularly if the command is sent through internet
task unit not broken down:
- if one zim file failed to be generated in the whole script, the whole script has to start over
- hard to tell the result of a specific zim file generation is success or not
- hard to tell the specific status of a specific zim file (i.e., you never know if a zim file is currently pending, generating or uploading)
- makes stdout and stderr hard to read, since they are a big blob containing stdout and stderr of all commands
- makes the whole point of parallel processing moot (if all we do is run several big scripts a few times per month)

kelson42 · 2017-06-16T05:53:24Z

Docker should not run as root. We can discuss long if this is standard or not to have the docker socket writable for a specific container, the only thing I can say is that Portainer which is a minimal UI on the top of Docker has a dedicated option for this so.... It can not be so special. All your usability points are not relevant to me (1 is not a problem, 2 I already answered - see exit code, 3 the log is enough, 4 problem not specific to that solution, 5 I do not want to execute big scripts).

If you disagree with that tech solution, you can choose an other approach but I definitely need an easy solution to execute X various scripts to make ZIM files. The requirements are:

Run all the current existing scripts
Be able to create new ones easily (<1 hour)
New scripts should be executable immediately afterward (assuming nodes/workers are free)

Otherwise I will have to continue to run the scripts manually as it will be the only practicable solution to create new ZIM files.

kelson42 · 2017-06-16T07:12:23Z

BTW, part of the agreement problem we have might be solved by creating/starting Docker containers in a a Docker container. That would avoid the RW access to the TOP Docker daemon. So far I know this kind of things also work well.

automactic · 2017-06-16T14:13:29Z

Docker needs to run as root (source)

Running containers (and applications) with Docker implies running the Docker daemon. This daemon currently requires root privileges, and you should therefore be aware of some important details.

What is Portainer? a minimal UI on the top of Docker? Sorry I don't understand what you mean.
Usability should concern you the most, since you will be the one that is mostly interacting with it. I am trying to create a system that you can create tasks once and forget, not manually uploading several scripts every month and manually parse through a blob of outputs.

How about this? I will implement both approaches, you are free to try any of them. To be specific:

worker will build on docker official image. (BTW do you think there will be any problem if host docker version and worker docker version are not the same?)
worker will have two type of tasks:
- allow user to execute any script (here, a question: in the website, do you want to upload a script, or paste content of a script into text view, or both?)
- allow user to run mwoffliner container with parameters, and get the generated file and upload

kelson42 · 2017-06-19T05:49:11Z

After discussion we have decided that:

scripts would be fully data-driven
scripts would be executed on dedicated docker containers managed by the worker.

automactic · 2017-09-01T14:00:13Z

Another alternative to worker design:

a single docker compose file, containing

services:
- worker
- redis
- mwoffliner
- other offliners
volume:
- zim_output

Benefit:

mwoffliner and other offliners do not need to keep restarting
easier run commands for end user, since socket mapping and volume mapping is taken care of in compose file

Problem to solve:

a way for mwoffliner and other offliners to communicate with worker
- we could use docker exec inside worker
- would be better if they can communicate through network, so it eliminates mapping docker socket.

Downside:

cannot dynamically scale number of offliners

kelson42 · 2017-11-25T14:56:30Z

I think we can close that discussion now. I'm glad about the way it is done.

automactic added question discussion and removed question labels May 23, 2017

kelson42 self-assigned this Jun 11, 2017

automactic mentioned this issue Jul 7, 2017

Proposal: every celery task should execute mwoffliner directly, instead of running a shell script #6

Closed

kelson42 closed this as completed Nov 25, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

worker approach: generic or specialized #8

worker approach: generic or specialized #8

automactic commented May 22, 2017 •

edited

Loading

automactic commented Jun 14, 2017 •

edited by kelson42

Loading

kelson42 commented Jun 14, 2017 •

edited by automactic

Loading

kelson42 commented Jun 16, 2017 •

edited

Loading

kelson42 commented Jun 16, 2017

automactic commented Jun 16, 2017 •

edited

Loading

kelson42 commented Jun 19, 2017

automactic commented Sep 1, 2017 •

edited

Loading

kelson42 commented Nov 25, 2017

worker approach: generic or specialized #8

worker approach: generic or specialized #8

Comments

automactic commented May 22, 2017 • edited Loading

Generic

Pros:

Cons:

Specialized

Pros:

Cons:

automactic commented Jun 14, 2017 • edited by kelson42 Loading

kelson42 commented Jun 14, 2017 • edited by automactic Loading

kelson42 commented Jun 16, 2017 • edited Loading

kelson42 commented Jun 16, 2017

automactic commented Jun 16, 2017 • edited Loading

kelson42 commented Jun 19, 2017

automactic commented Sep 1, 2017 • edited Loading

kelson42 commented Nov 25, 2017

automactic commented May 22, 2017 •

edited

Loading

automactic commented Jun 14, 2017 •

edited by kelson42

Loading

kelson42 commented Jun 14, 2017 •

edited by automactic

Loading

kelson42 commented Jun 16, 2017 •

edited

Loading

automactic commented Jun 16, 2017 •

edited

Loading

automactic commented Sep 1, 2017 •

edited

Loading