Skip to content

Conversation

@vkarak
Copy link
Contributor

@vkarak vkarak commented May 22, 2019

The idea of making a test generic is to make no assumption as of what is self.current_system. This essentially means the following:

  1. If you check against self.current_system or self.current_partition or self.current_environ, make sure you handle the case of an unknown system or environment.
  2. If you are running on an unknown system, make sure to include * in self.valid_systems and self.valid_prog_environs.
  3. If you are on a performance test, make sure that a * entry exists inside the references dictionary. Its value should be (0, None, None, 'unit').

Addresses UES-339

@jgphpc
Copy link
Contributor

jgphpc commented May 22, 2019

Modyfing config
to have cuda/10.1 instead of cuda will fix UES-339

It would help me test this PR.

@jgphpc
Copy link
Contributor

jgphpc commented May 22, 2019

I get on ault:

Found 4 gpu(s)
[gpu 0] Kernel launch latency: 8.67954 us
[gpu 1] Kernel launch latency: 9.11042 us
[gpu 2] Kernel launch latency: 8.99908 us
[gpu 3] Kernel launch latency: 8.63197 us
  * Failing phase: sanity
  * Reason: sanity error: 8 != 0

where 'latency': (0.0, None, None, 'us')

Should sanity pass or not ?

@vkarak
Copy link
Contributor Author

vkarak commented May 22, 2019

@jgphpc There two things here:

  1. There is a bug in my implementation, because I should have set self.num_gpus_per_node = 1.
  2. This wouldn't prevent the test from failing though on an unknown system, but this is not a problem. You can't have a really generic test without any sort of hardware detection. The important things is that the test is loaded correctly and runs through on an unknown system. Then you inspect possible failures.

@teojgo
Copy link
Contributor

teojgo commented May 22, 2019

@jgphpc Also it is a flexible test therefore it will run on all idle nodes unless instructed with --flex-alloc-tasks

@jgphpc
Copy link
Contributor

jgphpc commented May 22, 2019

Adding self.num_gpus_per_node = 4 makes the sanity test to pass, thanks @teojgo

./reframe.py --flex-alloc-tasks=1 -C config/cscs.py --system ault:intelv100 ...

@vkarak
Copy link
Contributor Author

vkarak commented May 22, 2019

@jgphpc Yes, but you don't know that if running on an unknown system. Most probably the test will fail. At least it's already ready to run. That's the point.

@vkarak
Copy link
Contributor Author

vkarak commented May 23, 2019

Modyfing config
to have cuda/10.1 instead of cuda will fix UES-339

It would help me test this PR.

@jgphpc Can you do a separate PR for this? Or open an issue?

@pep8speaks
Copy link

Hello @vkarak, Thank you for updating!

Line 74:1: W293 blank line contains whitespace
Line 88:1: W293 blank line contains whitespace

Do see the ReFrame Coding Style Guide

@vkarak
Copy link
Contributor Author

vkarak commented May 23, 2019

@jenkins-cscs retry daint kesch

@vkarak
Copy link
Contributor Author

vkarak commented May 23, 2019

@jenkins-cscs retry none

@vkarak vkarak merged commit 08dd452 into reframe-hpc:master May 23, 2019
@vkarak vkarak deleted the check/kernel-latency-generic branch May 23, 2019 08:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants