-
Notifications
You must be signed in to change notification settings - Fork 117
Automatic arrays in compiler check #311
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
cscs-checks/mch/automatic_arrays.py
Outdated
| class AutomaticArraysCheck(RegressionTest): | ||
| def __init__(self, **kwargs): | ||
| super().__init__('automatic_arrays_check', | ||
| os.path.dirname(__file__), **kwargs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you use the new syntax for regression tests? This boilerplate code won't be needed any more, as well as the _get_checks() function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ajocksch You can check the tutorial examples to see the actual syntax.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
cscs-checks/mch/automatic_arrays.py
Outdated
| self.stdout, 'perf', float) | ||
| } | ||
|
|
||
| self.aarrays_reference = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean arrays_reference?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
cscs-checks/mch/automatic_arrays.py
Outdated
| self.maintainers = ['AJ', 'VK'] | ||
|
|
||
| def setup(self, partition, environ, **job_opts): | ||
| if 'PrgEnv-cray' in environ.name: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are you using in here instead of ==?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since it might be PrgEnv-cray/xxxyyy
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I got that from your other PR. I think though is better to check environ.name starts with PrgEnv-cray, because what you have now allows also xxx-PrgEnv-cray.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
cscs-checks/mch/automatic_arrays.py
Outdated
| environ.fflags = '-O2' | ||
|
|
||
| super().setup(partition, environ, **job_opts) | ||
| self.reference = self.aarrays_reference[self.current_environ.name] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since you are already using environ.name, you'd better move this before super().setup(...) and use environ.name here as well, for symmetry.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
| } | ||
| } | ||
|
|
||
| self.maintainers = ['AJ', 'VK'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you should tag this test as production, too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can do it. However a few checks fail and we will have "red: in the ci.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No problem for that. We know that the programming environments are not working properly on Kesch. I will merge it as soon as the rest of the systems are "green".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
|
@jenkins-cscs retry dom |
| class AutomaticArraysCheck(rfm.RegressionTest): | ||
| def __init__(self, **kwargs): | ||
| super().__init__() | ||
| self.valid_systems = ['daint:gpu', 'dom:gpu', 'kesch:cn'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ajocksch The test fails constantly on Dom and (perhaps) will do so on the updated Daint. This needs investigation. The problem is that the performance checking is hardcoded inside this test, so performance checking in ReFrame has practically no effect (apart from logging) and, besides, we cannot adjust the performance values for other systems. For this reason, I don't think this test is really portable. I see two possible solutions here:
- Make the test more portable by separating the sanity checking from performance checking. The sanity should make sure that no validation errors occur (see source code). The performance numbers should be adapted for each system.
- Make this system available only for Kesch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think @victorusu made the point in the stand-up meeting: This check should behave differently for the different versions of the compilers. We should check for negative results if expected. Thus * is not working, one needs to specify all programming environments separately.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ajocksch @victorusu Guys, I understand that we need to have different performance values for the different programming environments, but the test itself is not portable. It assumes a single performance value (obtained perhaps only on a single system) and prints a PASS/FAIL based on that. If we want to do proper sanity and performance checking, we should ignore the PASS/FAIL printed by the test and let ReFrame do the performance checking based on the reference we put per system. If we go to the direction of putting this test in production for Daint/Dom, we should make it more robust and fix the performance values accordingly. If not, which is also my proposal since we want this test in ASAP, we should only allow it to run on Kesch.
Codecov Report
@@ Coverage Diff @@
## master #311 +/- ##
==========================================
+ Coverage 91.3% 91.32% +0.02%
==========================================
Files 68 68
Lines 8107 8107
==========================================
+ Hits 7402 7404 +2
+ Misses 705 703 -2
Continue to review full report at Codecov.
|
|
@ajocksch The test fails due to https://jenkins.cscs.ch/blue/organizations/jenkins/ReframeCI/detail/ReframeCI/588/pipeline |
|
The * in PrgEnv is the problem, since the keys are only for PrgEnv without * One solution: Run the check for one PrgEnv only. Another solution: Extend the dictionaries? or allow somehow the * for searches |
|
You can also do the following: if environ.name.startswith('PrgEnv-pgi'):
key = 'PrgEnv-pgi'
else:
key = environ.name
self.reference = self.arrays_reference[key] |
…cscs/reframe into checks/mch_automatic_arrays
|
PrgEnv-pgi_16 PrgEnv-pgi_17 PrgEnv-pgi_18 fail as expected also PrgEnv-cray_aj* fails since the mvapich* libraries set only the -I and -L path for mpif90 and not for ftn; this needs to be discussed |
|
@jenkins-cscs retry all |
|
@ajocksch This is the output I am getting: You should not also rely on the CI, because it only runs a test if you have changed the test's Python file. In this case, you don't, that's why it does not run it. You should try it manually. |
|
@lxavier it is necessary to set the variable MV2_USE_CUDA for the cray compiler and mvapich compiled for gcc, although no gpu-direct is used; otherwise the code hangs in the first OpenACC directives |
|
@lxavier the performance of the check is not 100% reproducible; there might be a problem with the dynamic adaptation of the clock frequencies of the nodes |
|
@jenkins-cscs retry daint |
Interesting I think when we run cosmo on CPU we set MV2_USE_CUDA=0 , but we may use a different mvapich for cpu. Anyone, all this will have to be in the cosmo module files once Hannes is completed. Let it like this for now |
We try to make it long enough so that this should not be an issue. We can increase the threshold. We want to detect if the timing goes completly of. In addition we wanted to have a graphic to http://jenkins-mch.cscs.ch/view/POMPA/job/cosmo5_performance_benchmark/ so that we can monitor time, so it is ok if it fluctuate a bit. |
Closes #254