Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

worker approach: generic or specialized #8

Closed
automactic opened this issue May 22, 2017 · 8 comments
Closed

worker approach: generic or specialized #8

automactic opened this issue May 22, 2017 · 8 comments
Assignees

Comments

@automactic
Copy link
Member

automactic commented May 22, 2017

Regarding how workers should be designed, there are two approaches, generic and specialized. And we need to make a decision to go forward.

Note: the work task used in this issue is referring to things like mwoffliner and maintenance

Generic

Dispatcher send name of script to worker, worker download the script and execute it. Or dispatcher directly send content of the script to worker. In both situation, worker trust the script it receives is legit and executes it.

Pros:

  • convenience: no need to configure new worker for new task type

Cons:

  • security risk: the script / command could be modified during transmission
  • security risk: docker socket of host is exposed to worker

Specialized

Every worker is specialized, i.e., one type of worker can and can only handle zim file generation, another type of worker can and can only handle maintenance task in dispatcher. Only parameters / settings are transferred from dispatcher to worker. Worker then do the necessary work. In the case of mwoffliner it will be to make sure redis server running, run mwoffliner command with parameters, upload the file.

Pros:

  • much less security risk:
    • truly containerized, no socket sharing
    • worker can only run permitted commands
    • commands run in exec mode, not in shell
  • more detailed progress reporting: upload progress, etc

Cons:

  • inconvenience: have to create new worker type for new task type
@automactic
Copy link
Member Author

automactic commented Jun 14, 2017

@kelson42's propose:

  • worker serves as controller, have access to host's docker socket
  • worker launch other containers to do specific tasks when needed
  • a task is a list of commands (bash scripts) executable on a specific docker image/container.

@kelson42
Copy link
Contributor

kelson42 commented Jun 14, 2017

Here a comments on the cons about a generic approach:

  • security risk: the script / command could be modified during transmission: I do not know if this comes from the web API of from the queue, but each of them should have (1) an authentication system (2) a crypted channel. If that works, the message can not be changed.

    • this, theoretically speaking is true. But you never know it could go wrong before it happens. If there is a more secure way, we should always consider the more secure way.
    • also I would imagine you need sudo to run docker commands, which opens up more security loopholes (by no means I am a security expert, but this article specifically said in bold font "only trusted users should be allowed to control your Docker daemon")
  • security risk: docker socket of host is exposed to worker: yes, the docker users/zimfarm worker master need to agree on this, but this not unusual at all and a standard docker use case.

    • I am sorry, I would not trust anyone to access my docker sockets and run docker commands, particularly if the command is sent through internet
  • task unit not broken down:

    • if one zim file failed to be generated in the whole script, the whole script has to start over
    • hard to tell the result of a specific zim file generation is success or not
    • hard to tell the specific status of a specific zim file (i.e., you never know if a zim file is currently pending, generating or uploading)
    • makes stdout and stderr hard to read, since they are a big blob containing stdout and stderr of all commands
    • makes the whole point of parallel processing moot (if all we do is run several big scripts a few times per month)

@kelson42
Copy link
Contributor

kelson42 commented Jun 16, 2017

Docker should not run as root. We can discuss long if this is standard or not to have the docker socket writable for a specific container, the only thing I can say is that Portainer which is a minimal UI on the top of Docker has a dedicated option for this so.... It can not be so special. All your usability points are not relevant to me (1 is not a problem, 2 I already answered - see exit code, 3 the log is enough, 4 problem not specific to that solution, 5 I do not want to execute big scripts).

If you disagree with that tech solution, you can choose an other approach but I definitely need an easy solution to execute X various scripts to make ZIM files. The requirements are:

  • Run all the current existing scripts
  • Be able to create new ones easily (<1 hour)
  • New scripts should be executable immediately afterward (assuming nodes/workers are free)

Otherwise I will have to continue to run the scripts manually as it will be the only practicable solution to create new ZIM files.

@kelson42
Copy link
Contributor

BTW, part of the agreement problem we have might be solved by creating/starting Docker containers in a a Docker container. That would avoid the RW access to the TOP Docker daemon. So far I know this kind of things also work well.

@automactic
Copy link
Member Author

automactic commented Jun 16, 2017

  • Docker needs to run as root (source)

Running containers (and applications) with Docker implies running the Docker daemon. This daemon currently requires root privileges, and you should therefore be aware of some important details.

  • What is Portainer? a minimal UI on the top of Docker? Sorry I don't understand what you mean.
  • Usability should concern you the most, since you will be the one that is mostly interacting with it. I am trying to create a system that you can create tasks once and forget, not manually uploading several scripts every month and manually parse through a blob of outputs.

How about this? I will implement both approaches, you are free to try any of them. To be specific:

  • worker will build on docker official image. (BTW do you think there will be any problem if host docker version and worker docker version are not the same?)
  • worker will have two type of tasks:
    • allow user to execute any script (here, a question: in the website, do you want to upload a script, or paste content of a script into text view, or both?)
    • allow user to run mwoffliner container with parameters, and get the generated file and upload

@kelson42
Copy link
Contributor

After discussion we have decided that:

  • scripts would be fully data-driven
  • scripts would be executed on dedicated docker containers managed by the worker.

@automactic
Copy link
Member Author

automactic commented Sep 1, 2017

Another alternative to worker design:

a single docker compose file, containing

  • services:
    • worker
    • redis
    • mwoffliner
    • other offliners
  • volume:
    • zim_output

Benefit:

  • mwoffliner and other offliners do not need to keep restarting
  • easier run commands for end user, since socket mapping and volume mapping is taken care of in compose file

Problem to solve:

  • a way for mwoffliner and other offliners to communicate with worker
    • we could use docker exec inside worker
    • would be better if they can communicate through network, so it eliminates mapping docker socket.

Downside:

  • cannot dynamically scale number of offliners

@kelson42
Copy link
Contributor

I think we can close that discussion now. I'm glad about the way it is done.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants