Automatic arrays in compiler check #311

ajocksch · 2018-06-01T14:39:00Z

Closes #254

vkarak · 2018-06-04T09:29:38Z

cscs-checks/mch/automatic_arrays.py

+class AutomaticArraysCheck(RegressionTest):
+    def __init__(self, **kwargs):
+        super().__init__('automatic_arrays_check',
+                         os.path.dirname(__file__), **kwargs)


Can you use the new syntax for regression tests? This boilerplate code won't be needed any more, as well as the _get_checks() function.

@ajocksch You can check the tutorial examples to see the actual syntax.

vkarak · 2018-06-04T09:30:30Z

cscs-checks/mch/automatic_arrays.py

+                                     self.stdout, 'perf', float)
+        }
+
+        self.aarrays_reference = {


Do you mean arrays_reference?

vkarak · 2018-06-04T09:31:30Z

cscs-checks/mch/automatic_arrays.py

+        self.maintainers = ['AJ', 'VK']
+
+    def setup(self, partition, environ, **job_opts):
+        if 'PrgEnv-cray' in environ.name:


Why are you using in here instead of ==?

since it might be PrgEnv-cray/xxxyyy

Ok, I got that from your other PR. I think though is better to check environ.name starts with PrgEnv-cray, because what you have now allows also xxx-PrgEnv-cray.

vkarak · 2018-06-04T09:33:23Z

cscs-checks/mch/automatic_arrays.py

+            environ.fflags = '-O2'
+
+        super().setup(partition, environ, **job_opts)
+        self.reference = self.aarrays_reference[self.current_environ.name]


Since you are already using environ.name, you'd better move this before super().setup(...) and use environ.name here as well, for symmetry.

vkarak · 2018-06-04T09:34:57Z

cscs-checks/mch/automatic_arrays.py

+            }
+        }
+
+        self.maintainers = ['AJ', 'VK']


I think you should tag this test as production, too.

I can do it. However a few checks fail and we will have "red: in the ci.

No problem for that. We know that the programming environments are not working properly on Kesch. I will merge it as soon as the rest of the systems are "green".

vkarak · 2018-06-10T17:42:13Z

@jenkins-cscs retry dom

vkarak · 2018-06-10T18:15:37Z

cscs-checks/mch/automatic_arrays.py

+class AutomaticArraysCheck(rfm.RegressionTest):
+    def __init__(self, **kwargs):
+        super().__init__()
+        self.valid_systems = ['daint:gpu', 'dom:gpu', 'kesch:cn']


@ajocksch The test fails constantly on Dom and (perhaps) will do so on the updated Daint. This needs investigation. The problem is that the performance checking is hardcoded inside this test, so performance checking in ReFrame has practically no effect (apart from logging) and, besides, we cannot adjust the performance values for other systems. For this reason, I don't think this test is really portable. I see two possible solutions here:

Make the test more portable by separating the sanity checking from performance checking. The sanity should make sure that no validation errors occur (see source code). The performance numbers should be adapted for each system.

Make this system available only for Kesch.

I think @victorusu made the point in the stand-up meeting: This check should behave differently for the different versions of the compilers. We should check for negative results if expected. Thus * is not working, one needs to specify all programming environments separately.

@ajocksch @victorusu Guys, I understand that we need to have different performance values for the different programming environments, but the test itself is not portable. It assumes a single performance value (obtained perhaps only on a single system) and prints a PASS/FAIL based on that. If we want to do proper sanity and performance checking, we should ignore the PASS/FAIL printed by the test and let ReFrame do the performance checking based on the reference we put per system. If we go to the direction of putting this test in production for Daint/Dom, we should make it more robust and fix the performance values accordingly. If not, which is also my proposal since we want this test in ASAP, we should only allow it to run on Kesch.

…cscs/reframe into checks/mch_automatic_arrays

codecov-io · 2018-06-14T15:47:23Z

Codecov Report

Merging #311 into master will increase coverage by 0.02%.
The diff coverage is n/a.

@@            Coverage Diff             @@
##           master     #311      +/-   ##
==========================================
+ Coverage    91.3%   91.32%   +0.02%     
==========================================
  Files          68       68              
  Lines        8107     8107              
==========================================
+ Hits         7402     7404       +2     
+ Misses        705      703       -2

Impacted Files	Coverage Δ
reframe/core/config.py	`84.54% <0%> (+1.81%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 527907f...4d140e8. Read the comment docs.

vkarak · 2018-06-14T16:28:39Z

@ajocksch The test fails due to KeyError in the arrays_reference. Have a look here:

https://jenkins.cscs.ch/blue/organizations/jenkins/ReframeCI/detail/ReframeCI/588/pipeline

ajocksch · 2018-06-14T21:20:10Z

The * in PrgEnv is the problem, since the keys are only for PrgEnv without *

One solution: Run the check for one PrgEnv only.

Another solution: Extend the dictionaries? or allow somehow the * for searches

vkarak · 2018-06-14T21:23:34Z

You can also do the following:

if environ.name.startswith('PrgEnv-pgi'):
    key = 'PrgEnv-pgi'
else:
    key = environ.name

self.reference = self.arrays_reference[key]

…cscs/reframe into checks/mch_automatic_arrays

ajocksch · 2018-06-14T22:10:13Z

PrgEnv-pgi_16 PrgEnv-pgi_17 PrgEnv-pgi_18 fail as expected

also PrgEnv-cray_aj* fails since the mvapich* libraries set only the -I and -L path for mpif90 and not for ftn; this needs to be discussed

vkarak · 2018-06-18T07:46:14Z

@jenkins-cscs retry all

vkarak · 2018-06-18T08:05:21Z

@ajocksch This is the output I am getting:

SUMMARY OF FAILURES
------------------------------------------------------------------------------
FAILURE INFO for AutomaticArraysCheck
  * System partition: kesch:cn
  * Environment: PrgEnv-pgi_16
  * Stage directory: /users/karakasv/Devel/reframe/stage/cn/AutomaticArraysCheck/PrgEnv-pgi_16
  * Job type: batch job (id=None)
  * Maintainers: ['AJ', 'VK']
  * Failing phase: compile
  * Reason: OS error: [Errno 2] No such file or directory: 'mpif90': 'mpif90'
------------------------------------------------------------------------------
FAILURE INFO for AutomaticArraysCheck
  * System partition: kesch:cn
  * Environment: PrgEnv-pgi_17
  * Stage directory: /users/karakasv/Devel/reframe/stage/cn/AutomaticArraysCheck/PrgEnv-pgi_17
  * Job type: batch job (id=None)
  * Maintainers: ['AJ', 'VK']
  * Failing phase: compile
  * Reason: caught framework exception: Command '['mpif90', '-O2', '-ta=tesla,cc35,cuda8.0', '-I/users/karakasv/Devel/reframe/stage/cn/AutomaticArraysCheck/PrgEnv-pgi_17', '/users/karakasv/Devel/reframe/stage/cn/AutomaticArraysCheck/PrgEnv-pgi_17/automatic_arrays.f90', '-o', '/users/karakasv/Devel/reframe/stage/cn/AutomaticArraysCheck/PrgEnv-pgi_17/./AutomaticArraysCheck']' failed with exit code 127:
=== STDOUT ===
=== STDERR ===
/appsmnt/escha/UES/RH7.3_experimental/pgi/18.4/linux86-64/2018/mpi/openmpi-2.1.2/bin/.bin/mpif90: error while loading shared libraries: libpgm.so: cannot open shared object file: No such file or directory

------------------------------------------------------------------------------
FAILURE INFO for AutomaticArraysCheck
  * System partition: kesch:cn
  * Environment: PrgEnv-pgi_18
  * Stage directory: /users/karakasv/Devel/reframe/stage/cn/AutomaticArraysCheck/PrgEnv-pgi_18
  * Job type: batch job (id=817530)
  * Maintainers: ['AJ', 'VK']
  * Failing phase: sanity
  * Reason: sanity error: pattern `Result: ' not found in `/users/karakasv/Devel/reframe/stage/cn/AutomaticArraysCheck/PrgEnv-pgi_18/AutomaticArraysCheck.out'
------------------------------------------------------------------------------
FAILURE INFO for AutomaticArraysCheck
  * System partition: kesch:cn
  * Environment: PrgEnv-pgi_18_aj
  * Stage directory: /users/karakasv/Devel/reframe/stage/cn/AutomaticArraysCheck/PrgEnv-pgi_18_aj
  * Job type: batch job (id=817528)
  * Maintainers: ['AJ', 'VK']
  * Failing phase: performance
  * Reason: sanity error: 0.0001628 is beyond reference value 0.00014 (l=-inf, u=0.00016099999999999998)
------------------------------------------------------------------------------
FAILURE INFO for AutomaticArraysCheck
  * System partition: kesch:cn
  * Environment: PrgEnv-cray_aj
  * Stage directory: /users/karakasv/Devel/reframe/stage/cn/AutomaticArraysCheck/PrgEnv-cray_aj
  * Job type: batch job (id=817483)
  * Maintainers: ['AJ', 'VK']
  * Failing phase: sanity
  * Reason: sanity error: pattern `Result: ' not found in `/users/karakasv/Devel/reframe/stage/cn/AutomaticArraysCheck/PrgEnv-cray_aj/AutomaticArraysCheck.out'
------------------------------------------------------------------------------
FAILURE INFO for AutomaticArraysCheck
  * System partition: kesch:cn
  * Environment: PrgEnv-cray_aj_b
  * Stage directory: /users/karakasv/Devel/reframe/stage/cn/AutomaticArraysCheck/PrgEnv-cray_aj_b
  * Job type: batch job (id=817438)
  * Maintainers: ['AJ', 'VK']
  * Failing phase: sanity
  * Reason: sanity error: pattern `Result: ' not found in `/users/karakasv/Devel/reframe/stage/cn/AutomaticArraysCheck/PrgEnv-cray_aj_b/AutomaticArraysCheck.out'
------------------------------------------------------------------------------

You should not also rely on the CI, because it only runs a test if you have changed the test's Python file. In this case, you don't, that's why it does not run it. You should try it manually.

…mvapich2

ajocksch · 2018-06-19T14:15:29Z

@lxavier it is necessary to set the variable MV2_USE_CUDA for the cray compiler and mvapich compiled for gcc, although no gpu-direct is used; otherwise the code hangs in the first OpenACC directives

ajocksch · 2018-06-19T14:19:06Z

@lxavier the performance of the check is not 100% reproducible; there might be a problem with the dynamic adaptation of the clock frequencies of the nodes

vkarak · 2018-06-19T15:45:57Z

@jenkins-cscs retry daint

lxavier · 2018-06-20T06:39:19Z

@lxavier it is necessary to set the variable MV2_USE_CUDA for the cray compiler and ..

Interesting I think when we run cosmo on CPU we set MV2_USE_CUDA=0 , but we may use a different mvapich for cpu. Anyone, all this will have to be in the cosmo module files once Hannes is completed. Let it like this for now

lxavier · 2018-06-20T06:41:55Z

@lxavier the performance of the check is not 100% reproducible

We try to make it long enough so that this should not be an issue. We can increase the threshold. We want to detect if the timing goes completly of. In addition we wanted to have a graphic to http://jenkins-mch.cscs.ch/view/POMPA/job/cosmo5_performance_benchmark/ so that we can monitor time, so it is ok if it fluctuate a bit.

WIP: automatic arrays in compiler check

5794f3b

ajocksch self-assigned this Jun 1, 2018

ajocksch requested a review from vkarak June 1, 2018 14:39

ajocksch added the regression test label Jun 1, 2018

vkarak added this to the ReFrame sprint 2018w20 milestone Jun 4, 2018

vkarak added the prio: important label Jun 4, 2018

vkarak requested changes Jun 4, 2018

View reviewed changes

vkarak changed the title ~~WIP: automatic arrays in compiler check~~ Automatic arrays in compiler check Jun 4, 2018

vkarak modified the milestones: ReFrame sprint 2018w20, Upcoming sprint Jun 4, 2018

automatic_arrays revised

41e95f6

vkarak approved these changes Jun 10, 2018

View reviewed changes

Merge branch 'master' into checks/mch_automatic_arrays

72cc8ad

vkarak requested changes Jun 10, 2018

View reviewed changes

ajocksch added 2 commits June 14, 2018 17:25

new reference numbers for dom and daint; simplified sanity check

c537f3b

Merge branch 'checks/mch_automatic_arrays' of https://github.com/eth-…

f8d60e5

…cscs/reframe into checks/mch_automatic_arrays

Merge branch 'master' into checks/mch_automatic_arrays

5e7a449

ajocksch added 2 commits June 14, 2018 23:37

fixed the * problem in PrgEnv

115202e

Merge branch 'checks/mch_automatic_arrays' of https://github.com/eth-…

098668c

…cscs/reframe into checks/mch_automatic_arrays

replaced use mpi with include mpi

4b41a4f

correct order for use and include

a467c5e

workaround MV2_USE_CUDA=1 for cray compiler linked with gnu compiled …

c32dae9

…mvapich2

vkarak approved these changes Jun 19, 2018

View reviewed changes

Merge branch 'master' into checks/mch_automatic_arrays

4d140e8

vkarak merged commit bdf2090 into master Jun 20, 2018

vkarak deleted the checks/mch_automatic_arrays branch June 20, 2018 08:02

Automatic arrays in compiler check #311

Automatic arrays in compiler check #311

Uh oh!

Conversation

ajocksch commented Jun 1, 2018 • edited by vkarak Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

vkarak commented Jun 10, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

codecov-io commented Jun 14, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

vkarak commented Jun 14, 2018

Uh oh!

ajocksch commented Jun 14, 2018

Uh oh!

vkarak commented Jun 14, 2018

Uh oh!

ajocksch commented Jun 14, 2018

Uh oh!

vkarak commented Jun 18, 2018

Uh oh!

vkarak commented Jun 18, 2018

Uh oh!

ajocksch commented Jun 19, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ajocksch commented Jun 19, 2018

Uh oh!

vkarak commented Jun 19, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lxavier commented Jun 20, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lxavier commented Jun 20, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

ajocksch commented Jun 1, 2018 •

edited by vkarak

Loading

codecov-io commented Jun 14, 2018 •

edited

Loading

ajocksch commented Jun 19, 2018 •

edited

Loading

vkarak commented Jun 19, 2018 •

edited

Loading

lxavier commented Jun 20, 2018 •

edited

Loading

lxavier commented Jun 20, 2018 •

edited

Loading