[test] Make CUDA kernel latency tests generic #796

vkarak · 2019-05-22T11:17:12Z

The idea of making a test generic is to make no assumption as of what is self.current_system. This essentially means the following:

If you check against self.current_system or self.current_partition or self.current_environ, make sure you handle the case of an unknown system or environment.
If you are running on an unknown system, make sure to include * in self.valid_systems and self.valid_prog_environs.
If you are on a performance test, make sure that a * entry exists inside the references dictionary. Its value should be (0, None, None, 'unit').

Addresses UES-339

jgphpc · 2019-05-22T12:23:15Z

Modyfing config
to have cuda/10.1 instead of cuda will fix UES-339

It would help me test this PR.

jgphpc · 2019-05-22T12:37:21Z

I get on ault:

Found 4 gpu(s)
[gpu 0] Kernel launch latency: 8.67954 us
[gpu 1] Kernel launch latency: 9.11042 us
[gpu 2] Kernel launch latency: 8.99908 us
[gpu 3] Kernel launch latency: 8.63197 us

  * Failing phase: sanity
  * Reason: sanity error: 8 != 0

where 'latency': (0.0, None, None, 'us')

Should sanity pass or not ?

vkarak · 2019-05-22T12:40:03Z

@jgphpc There two things here:

There is a bug in my implementation, because I should have set self.num_gpus_per_node = 1.
This wouldn't prevent the test from failing though on an unknown system, but this is not a problem. You can't have a really generic test without any sort of hardware detection. The important things is that the test is loaded correctly and runs through on an unknown system. Then you inspect possible failures.

teojgo · 2019-05-22T12:42:14Z

@jgphpc Also it is a flexible test therefore it will run on all idle nodes unless instructed with --flex-alloc-tasks

jgphpc · 2019-05-22T12:56:13Z

Adding self.num_gpus_per_node = 4 makes the sanity test to pass, thanks @teojgo

./reframe.py --flex-alloc-tasks=1 -C config/cscs.py --system ault:intelv100 ...

vkarak · 2019-05-22T12:57:38Z

@jgphpc Yes, but you don't know that if running on an unknown system. Most probably the test will fail. At least it's already ready to run. That's the point.

vkarak · 2019-05-23T08:27:38Z

Modyfing config
to have cuda/10.1 instead of cuda will fix UES-339

It would help me test this PR.

@jgphpc Can you do a separate PR for this? Or open an issue?

pep8speaks · 2019-05-23T08:31:17Z

Hello @vkarak, Thank you for updating!

In the file cscs-checks/microbenchmarks/kernel_latency/kernel_latency.py:

Line 74:1: W293 blank line contains whitespace
Line 88:1: W293 blank line contains whitespace

Do see the ReFrame Coding Style Guide

vkarak · 2019-05-23T08:31:29Z

@jenkins-cscs retry daint kesch

vkarak · 2019-05-23T08:55:39Z

@jenkins-cscs retry none

Vasileios Karakasis added 4 commits May 17, 2019 14:51

Make CUDA kernel latency test generic

7fb2dfd

Fix formatting

3e711ad

Fine tune CUDA kernel latency check

11cc048

Fix system check for Kesch

2c8f1d5

vkarak added enhancement regression test labels May 22, 2019

vkarak added this to the ReFrame sprint 2019w20 milestone May 22, 2019

vkarak requested review from jgphpc, teojgo and victorusu May 22, 2019 11:17

vkarak self-assigned this May 22, 2019

jgphpc approved these changes May 22, 2019

View reviewed changes

teojgo approved these changes May 22, 2019

View reviewed changes

Set num_gpus_per_node for unknown systems

9c2211e

Merge branch 'master' into check/kernel-latency-generic

3eac79f

vkarak merged commit 08dd452 into reframe-hpc:master May 23, 2019

vkarak deleted the check/kernel-latency-generic branch May 23, 2019 08:57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[test] Make CUDA kernel latency tests generic #796

[test] Make CUDA kernel latency tests generic #796

Uh oh!

vkarak commented May 22, 2019

Uh oh!

jgphpc commented May 22, 2019 •

edited

Loading

Uh oh!

jgphpc commented May 22, 2019 •

edited

Loading

Uh oh!

vkarak commented May 22, 2019

Uh oh!

teojgo commented May 22, 2019

Uh oh!

jgphpc commented May 22, 2019 •

edited

Loading

Uh oh!

vkarak commented May 22, 2019

Uh oh!

vkarak commented May 23, 2019 •

edited

Loading

Uh oh!

pep8speaks commented May 23, 2019

Uh oh!

vkarak commented May 23, 2019

Uh oh!

vkarak commented May 23, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[test] Make CUDA kernel latency tests generic #796

[test] Make CUDA kernel latency tests generic #796

Uh oh!

Conversation

vkarak commented May 22, 2019

Uh oh!

jgphpc commented May 22, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jgphpc commented May 22, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vkarak commented May 22, 2019

Uh oh!

teojgo commented May 22, 2019

Uh oh!

jgphpc commented May 22, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vkarak commented May 22, 2019

Uh oh!

vkarak commented May 23, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pep8speaks commented May 23, 2019

Uh oh!

vkarak commented May 23, 2019

Uh oh!

vkarak commented May 23, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

jgphpc commented May 22, 2019 •

edited

Loading

jgphpc commented May 22, 2019 •

edited

Loading

jgphpc commented May 22, 2019 •

edited

Loading

vkarak commented May 23, 2019 •

edited

Loading