Dedicated benchmark server for running Jenkins Job #55

Open
AndreasMadsen opened this Issue Jan 4, 2017 · 9 comments

Projects

None yet

5 participants

@AndreasMadsen
Member

Running benchmarks from the benchmark/ without external interference is quite difficult and time consuming. There has already been a few issues where we get false positives because of external interference. e.g. nodejs/node#10205.

The ideal solution is to just have a Jenkins job for running benchmarks. The Jenkins script has already been developed – nodejs/benchmarking#58.

The Benchmarking WG already has a dedicated server, however this is for monitoring benchmarks. In theory this could also be used to running benchmarks from benchmark/, however it is unknown how much time they take and the monitor benchmarks needs to be executed daily, which can not be guaranteed if benchmarks from benchmark/ are executed on the same server.

The simple solution is thus to just have a dedicated server for benchmark/.

It is unclear who will be responsible for this server, as the Benchmarking WG mostly focuses on monitoring performance on a daily basis – https://benchmarking.nodejs.org.

/cc @nodejs/benchmarking, @mscdex, @gareth-ellis, @mhdawson

Original issue: nodejs/benchmarking#58

@gibfahn
Member
gibfahn commented Jan 4, 2017
@addaleax
Member
addaleax commented Jan 4, 2017

Also, probably @nodejs/build too

@jbergstroem
Member

Afaik, that machine is very idle. As long as we can guarantee scheduling I'm all for using it more. @mhdawson probably knows most about it.

@AndreasMadsen
Member

As long as we can guarantee scheduling I'm all for using it more.

Unfortunately we can't.

@jbergstroem
Member

@AndreasMadsen if everything is run from jenkins, it shouldn't be a problem. We could additionally check for a lock file at the server in a job (if the other jobs are controlled by cron)?

@AndreasMadsen
Member

@jbergstroem Please read the discussion in nodejs/benchmarking#58. I know nothing about Jenkins, but as I understand it we can't guarantee that a benchmark job runs for less that 24h. We can do some approximative checks and time estimations before we start the job, but it will not guarantee anything and it will only work for default parameters.

@mhdawson
Contributor
mhdawson commented Jan 4, 2017 edited

The concerns I've expressed, is that in the current form a single benchmark job could run for a very long time.

I may be interpreting the number wrong but if I multiply 60 * 985 s = 16 hours for just the http group, and that is only a small subset of the overall. This is from :
https://gist.github.com/AndreasMadsen/9f9739514116488c27c06ebaff813af1

@AndreasMadsen an I using your summary number correctly ? I see 985.64 for http and you mentioned needing 60 run. But 16 hours seems even longer than I could possibly expect for a reasonable result.

If I extended that calculation to include the rest of the categories, the job would run for days if not several weeks which does not seem reasonable.

@AndreasMadsen
Member

@mhdawson The calculation is correct. But it is very rare (I actually can't imagine why) that we would run the entire benchmark suite or the entire category. Doing that is arguably bad science, since it means that you aren't testing a specific hypothesis, you are just testing everything. Statistically testing everything (or just a category) is also difficult because of type 1 errors and to avoid that you would need more statistical confidence, which may require even more repetitions.

When I optimize something it typically involve:

  1. I have a performance issue.
  2. I create a fairly complex benchmark that highlights the issue (like a hello world http server).
  3. I profile it, using a highlevel profiler.
  4. I find the bad code path.
  5. I create (or have in this case) a simple benchmark that highlights just the hot code path.
  6. I profile it, using a detailed profiler.
  7. I improve the code (I hope).
  8. I run the simple benchmark and statistically tests that I improved the performance.
  9. Repeat step 6-8 until some improvement is achieved.
  10. I run the complex benchmark and statistically tests that I improved the performance.

During these steps I only used two benchmarks, the complex benchmark and the simple one.

This would "only" take 6-7h, which arguably may still be a long time, but that is also why we need the Jenkins job.

@mhdawson
Contributor
mhdawson commented Jan 4, 2017

Ok so running just a subset makes sense to me. We might be able to do the following:

  1. look to allow at most 12 hours a day for these kinds of runs.
  2. control the launch of the jobs through a node app, bot whatever
  3. have the app, bot, whatever queue up the runs, only running them during the 12 hours allotted.
  4. have the app,bot whatever kill the job if it is still running at the end of the 12 hours and then run a job to clean up the machine (kill all node processes etc. -> this would be key)

This would require somebody to create the app, bot etc, integration with jenkins and might require a bit more effort on the part of those wanting to do perf runs but might be a balance we can make workable with the existing h/w.

It would be easier to have another dedicated machine but those are more costly than the vms we use for the rest of the jobs and we are pretty much at our softlayer spend which is where we got the first one from.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment