Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add a command that runs a list of commands in parallel and aggregates results #32

Open
timodonnell opened this issue Jan 20, 2017 · 4 comments
Assignees

Comments

@timodonnell
Copy link
Member

Input should be a csv file with columns:

  • name
  • bash command to run
  • output files to save

Run it with something like:

./kubeface-run-commands commands.csv --out-dir /path/to/results

Results should be written to outdir as directories for each command (named by name), with the output files for the corresponding command in each directory.

Consider

  • Is there an existing standard interface for this kind of thing that should be supported? Perhaps gnu parallel?
  • Other output formats that may be more convenient for common cases, like a flat directory of files if there is only one output file to collect per command
@rohanpai rohanpai self-assigned this Jun 9, 2017
@rohanpai
Copy link
Contributor

rohanpai commented Jun 9, 2017

What is an example of the commands.csv @timodonnell ?

@timodonnell
Copy link
Member Author

Thinking of something like this:

name,command,input_files,output_files
run1,wc -l $1 > result.txt,/path/to/some/text/file.txt,result.txt
run2,wc -l $1 > result.txt,/path/to/another/text/file.txt,result.txt

Can discuss in person

@timodonnell
Copy link
Member Author

The result from running the above would be a directory with run1/result.txt and run2/result.txt

@armish
Copy link
Member

armish commented Jun 9, 2017

GNU parallel can do that for you:
https://www.gnu.org/software/parallel/parallel_tutorial.html#Remote-execution

Wouldn't try to implement something from scratch since I have been trying out different options to do this in a nice and tidy way and can't stress the number of edge cases to be dealt with for a ground-up solution.

Assuming that you don't want to turn this into an advanced task scheduler project, I would just wrap parallel and call it a day ;)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants