Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parallel tests? #1

Closed
hollowdoor opened this issue Aug 6, 2017 · 7 comments
Closed

Parallel tests? #1

hollowdoor opened this issue Aug 6, 2017 · 7 comments

Comments

@hollowdoor
Copy link

What do you think about parallel tests? Possibly as an option in the cli. With the tap-merge module it would be possible to process multiple TAP output too.

@laat
Copy link
Owner

laat commented Aug 7, 2017

This project has about 10 lines of significant code.

To implement parallelism we would need:

  • test-scheduler or queue
  • stdout handling
  • concurrent error handling
  • a shared require cache?

a tap-merge feature would bind run-tests to one kind of test output, which is unfortunate because it becomes a less generic test runner.

I think this should be done in another package, because of the complexity of the feature. However, if you have a simple solution, a PR is welcome :)

Great idea though! :)

@laat laat closed this as completed Aug 7, 2017
@hollowdoor
Copy link
Author

test-scheduler or queue

That would come with the I/O stream, and array of scripts just like in your current module here. At worst processor overload might lead to the creation of a queue, but requiring modules already burdens memory at high test quantity.

a shared require cache?

I don't think a shared cache is required. All that's needed is stream I/O processing. One point of parallel tests is they don't share memory so they have more stability.

a tap-merge feature would bind run-tests to one kind of test output, which is unfortunate because it becomes a less generic test runner.

Not if it's an option. TAP would be processed at the child process level, and not the main process level.

Your other points are fair though. I was actually getting ready to make my own module that does what I'm asking for plus what your module does. I wanted to see if someone else had done something similar, and maybe use that instead of creating my own.

The main reason I posted this issue was so I could do a PR if you were ok with the idea. Though if you have a narrow plan for this repo the kind of feature I'm suggesting would be quite a large change.

What do you think about a compromise? Since my idea is kind of largish I can make a separate non-global module with the feature. Then I can do a PR on run-tests using that other module to be ran when a --parallel option is used on run-tests. This way there can be less changes to your module, and it will be easier if you want to back out the idea for a major release. And I can just transfer my module to some other project, or fork this one if you do. I'll consider doing a PR if this is the way you want to go.

If push comes to shove I might just create a run-tests-parallel cli module if it doesn't exist. Which is about the same amount of work as a PR to run-tests would be. So it's no problem either way for me.

@hollowdoor
Copy link
Author

For a parallel runner this is what I have so far.

const spawn = require('child_process').spawn;
const tapMerge = require('tap-merge');
const mergeStream = require('merge-stream');

module.exports = function rtp(scripts, {
    argv = [],
    tap = false,
    commandOpts = {},
} = {}){

    let stdout = mergeStream(), stderr = mergeStream();

    scripts.forEach(script=>{
        let c = spawn('node', [script].concat(argv), Object.assign(
            commandOpts,
            //Preserve I/O options for merging.
            {
                stdio: [
                    null,
                    null,
                    null
                ]
            }
        ));

        stdout.add(c.stdout);
        stderr.add(c.stderr);
    });

    if(tap){
        stdout = stdout.pipe(tapMerge());
    }

    stdout.pipe(process.stdout);
    stderr.pipe(process.stderr);

    return {
        stdout, stderr
    };
};

@laat
Copy link
Owner

laat commented Aug 7, 2017

That would come with the I/O stream, and array of scripts just like in your current module here. At worst processor overload might lead to the creation of a queue, but requiring modules already burdens memory at high test quantity.

I think task queue or scheduler is strictly needed. In the wild from the top of my head I'm sure jest, mocha.parallel, ava and even lerna (for running tests in monorepos) have concurrency flags or defaults limiting parallel scripts to the number of cpu-cores. The reason for this is because spawning a node process for every test file at the same time scales very poorly, both in CPU usage and more critically; memory usage.

I don't think a shared cache is required.

Perhaps not strictly needed, but could be a nice optimisation. When using typescript/babel/coffee reading and transforming each file to valid JS is time consuming. Every process has to do this, but it could be speed up by retrieving transformed files from a shared cache. I think jest does this.

stdout.add(c.stdout);

Simply combining outputs from multiple child-processes quickly becomes difficult to reason about when test output lines in stdout/stderr are not ordered or labeled. The merging of streams produces hard to read output. In other projects I have solved this by prefixing each line to stdout/err with a unique name, or by delaying writing to the final output-stream until the child has exited so that all the lines can be written in one go.

TAP YAML blocks also requires an uninterrupted write of the whole block to stdout to produce valid Tap 13.

let c = spawn(......)

When the parent script gets killed by SIGTERM or SIGINT the process should do a clean exit by killing child processes before dying.

To me this feature gets pretty complex quite quickly. I like the simplicity of the ~10 lines of code that probably will never have to change.

The world needs this feature, but I think a separate package is the way to go.

@hollowdoor
Copy link
Author

(for running tests in monorepos) have concurrency flags or defaults limiting parallel scripts to the number of cpu-cores.

Yeah that's what I meant by "processor overload". A queue would be trivial.

When using typescript/babel/coffee reading and transforming each file to valid JS is time consuming.

I didn't really want to add a transpiler. My idea is for plain node scripts only. But that's an interesting concept (like ava).

"Simply combining outputs from multiple child-processes quickly becomes difficult to reason about when test output lines in stdout/stderr are not ordered or labeled."

You can console.log('what ever label'). It's all the same stream. Also a file name print out would not be hard. That's actually a good idea. The output from running my example I showed you here is ordered, but I think that might be an accident. A better option would be https://www.npmjs.com/package/multistream

TAP YAML blocks also requires an uninterrupted write of the whole block to stdout to produce valid Tap 13.

Good point. I'll have to check to see if the YAML blocks are coming out whole. Though I'm betting they are. Unless merge-stream is splitting them. I'll have to do the same with multistream if I switch to using that.

When the parent script gets killed by SIGTERM or SIGINT the process should do a clean exit by killing child processes before dying.

Sure. Don't see why not. We don't want hidden processes lying around.

I like the simplicity of the ~10 lines of code that probably will never have to change.

Me too. But I think that's just the way you want to do this specific project. I've looked at ava source, and that thing's a monster. Any large project will multiply the time you'll spend maintaining it.

This exchange was very valuable still. Thank you for giving some of your time to give feedback on my idea.

When I'm done making my thing I'll put a link in my README to your project here as an alternative.

The world needs this feature, but I think a separate package is the way to go.

The world? Well we'll see. :)

@hollowdoor
Copy link
Author

I added everything you suggested except shared cache. The code size is now 148 lines (including readability stuff like blanks lines, and new lines). That's about 14 times the size you said you preferred here.

That is quite a difference. You're definitely right about it being much more complex. From what I can tell there's no way to make it simpler. It will work for my needs though.

Thanks again for your time.

@laat
Copy link
Owner

laat commented Aug 9, 2017

Glad I could help :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants