Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
65 changes: 65 additions & 0 deletions cscs-checks/mch/automatic_arrays.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,65 @@
import reframe as rfm
import reframe.utility.sanity as sn


@rfm.simple_test
class AutomaticArraysCheck(rfm.RegressionTest):
def __init__(self, **kwargs):
super().__init__()
self.valid_systems = ['daint:gpu', 'dom:gpu', 'kesch:cn']
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ajocksch The test fails constantly on Dom and (perhaps) will do so on the updated Daint. This needs investigation. The problem is that the performance checking is hardcoded inside this test, so performance checking in ReFrame has practically no effect (apart from logging) and, besides, we cannot adjust the performance values for other systems. For this reason, I don't think this test is really portable. I see two possible solutions here:

  1. Make the test more portable by separating the sanity checking from performance checking. The sanity should make sure that no validation errors occur (see source code). The performance numbers should be adapted for each system.
  2. Make this system available only for Kesch.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think @victorusu made the point in the stand-up meeting: This check should behave differently for the different versions of the compilers. We should check for negative results if expected. Thus * is not working, one needs to specify all programming environments separately.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ajocksch @victorusu Guys, I understand that we need to have different performance values for the different programming environments, but the test itself is not portable. It assumes a single performance value (obtained perhaps only on a single system) and prints a PASS/FAIL based on that. If we want to do proper sanity and performance checking, we should ignore the PASS/FAIL printed by the test and let ReFrame do the performance checking based on the reference we put per system. If we go to the direction of putting this test in production for Daint/Dom, we should make it more robust and fix the performance values accordingly. If not, which is also my proposal since we want this test in ASAP, we should only allow it to run on Kesch.

self.valid_prog_environs = ['PrgEnv-cray*', 'PrgEnv-pgi*',
'PrgEnv-gnu']
if self.current_system.name in ['daint', 'dom']:
self.modules = ['craype-accel-nvidia60']
self._pgi_flags = '-acc -ta=tesla:cc60 -Mnorpath'
self._cray_variables = {}
elif self.current_system.name in ['kesch']:
self.modules = ['craype-accel-nvidia35']
self._pgi_flags = '-O2 -ta=tesla,cc35,cuda8.0'
self._cray_variables = {'MV2_USE_CUDA': '1'}

self.num_tasks = 1
self.num_gpus_per_node = 1
self.num_tasks_per_node = 1
self.sourcepath = 'automatic_arrays.f90'
self.sanity_patterns = sn.assert_found(r'Result: ', self.stdout)
self.perf_patterns = {
'perf': sn.extractsingle(r'Timing:\s+(?P<perf>\S+)',
self.stdout, 'perf', float)
}

self.arrays_reference = {
'PrgEnv-cray': {
'daint:gpu': {'perf': (5.7E-05, None, 0.15)},
'dom:gpu': {'perf': (5.8E-05, None, 0.15)},
'kesch:cn': {'perf': (2.9E-04, None, 0.15)},
},
'PrgEnv-gnu': {
'daint:gpu': {'perf': (7.0E-03, None, 0.15)},
'dom:gpu': {'perf': (7.3E-03, None, 0.15)},
'kesch:cn': {'perf': (6.5E-03, None, 0.15)},
},
'PrgEnv-pgi': {
'daint:gpu': {'perf': (6.4E-05, None, 0.15)},
'dom:gpu': {'perf': (6.3E-05, None, 0.15)},
'kesch:cn': {'perf': (1.4E-04, None, 0.15)},
}
}

self.maintainers = ['AJ', 'VK']
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you should tag this test as production, too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can do it. However a few checks fail and we will have "red: in the ci.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No problem for that. We know that the programming environments are not working properly on Kesch. I will merge it as soon as the rest of the systems are "green".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

self.tags = {'production'}

def setup(self, partition, environ, **job_opts):
if environ.name.startswith('PrgEnv-cray'):
environ.fflags = '-O2 -hacc -hnoomp'
key = 'PrgEnv-cray'
self.variables = self._cray_variables
elif environ.name.startswith('PrgEnv-pgi'):
environ.fflags = self._pgi_flags
key = 'PrgEnv-pgi'
elif environ.name.startswith('PrgEnv-gnu'):
environ.fflags = '-O2'
key = 'PrgEnv-gnu'

self.reference = self.arrays_reference[key]
super().setup(partition, environ, **job_opts)
Loading