Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: every celery task should execute mwoffliner directly, instead of running a shell script #6

Closed
automactic opened this issue May 11, 2017 · 2 comments

Comments

@automactic
Copy link
Member

automactic commented May 11, 2017

Our current worker implementation is to run a shell script (examples), which seems to execute mwoffliner multiple times and generate multiple zim files. In this way, every celery task could generate multiple zim files.

I think, instead of doing this, it would be better to runmwoffliner once, generate one zim file per celery task. Reasons:

  1. security: if we allow user to enqueue celery task that execute any command, we expose workers to shell injection attack, accidental file deletion, etc. It's better to have worker set parameters used in mwoffliner in dispatcher and assemble the command programmatically on the worker.
  2. distributed system performance: by breaking big tasks into smaller units, more worker potentially could participate at the same time, thereby speeding up the overall process.
  3. management:
  • the stdout & stderr contains messages regarding one zim file generation
  • every celery task need to upload one zim file, easier to figure out the uploading progress and ETA
  1. error recovery: In case of error, if every celery task produce multiple zim files, it's
  • hard to figure out which generated zim file has error, which doesn't
  • impossible for another worker to pick up the process without unnecessarily re-generate zim files that does not have error
@automactic automactic changed the title Proposal: every celery task should execute mwoffliner directly, instead of running a shell script Proposal: every celery task should execute mwoffliner directly, instead of running a shell script May 11, 2017
@kelson42
Copy link
Contributor

@automactic I agree with this.

@automactic
Copy link
Member Author

further discussion continues on #8

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants