New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unexpected behavior with co_reduce #172
Comments
Thanks for reporting this. I've found the reason for this problem and I'm producing a patch (for OpenCoarrays). What I don't completely understand is why this problem hasn't come out earlier. |
Presumably it hasn't appeared sooner because our current co_reduce unit test doesn't use the optional result_image argument. I'll add tests that include the result_image argument. |
Unfortunately result_image is not the problem. The patch that solves this bug breaks the co_reduce test already part of the OpenCoarrays test suite. |
Wow. Thanks for catching this. So if the patch breaks the current co_reduce test, does that mean the current co_reduce test is incorrect. |
The compiler does something strange on the co_reduce test included in the test suite. I'm trying to understand why... |
I created a new branch called bugfix-coreduce that includes the patch. I think this is the right way to do it but I don't have enough time to investigate further. The problem with the co_reduce test case is probably due to a compiler bug. |
Thanks for making coarrrays a reality. Really appreciate the hard work. On Sat, Apr 16, 2016 at 6:52 PM, Alessandro Fanfarillo <
|
P.S. I hit the bug a about two months ago, but was convinced the problem On Sat, Apr 16, 2016 at 6:52 PM, Alessandro Fanfarillo <
|
@afanfa, any updates? |
No updates. The issue is related to a compiler bug. |
@afanfa, in that case, I assume it's for someone else to work on under contract. If so, we'll try to contract out the work to someone. If I'm incorrect and it's something you have time to work on, let us know. No pressure. |
I guess so. Basically, the master branch has an implementation of co_reduce compatible with the code produced by the compiler for the co_reduce test currently included in the test suite. For the code proposed by @floquet (simpler than the one in the test suite) the master branch implementation produces wrong results. I looked into this bug long time ago; I would recommend to check it again before getting someone to work on this. |
@afanfa what needs to be done to resolve this? Is there any way I can help? |
While not seeing the issue before, I now get it also with gcc-6 and gcc-trunk. |
@vehre Do you mean you see this behavior with the current unit tests, or do you have some other code that causes this behavior? Was it working for a given commit in the past? If so I can git bisect to find where it broke... It looks like 14536e1 may fix this bug but @afanfa wanted to look into the issue some more.... Fixing this should definitely be prioritized, though. CC: @rouson @afanfa |
Is see the tescase co_reduce_test failing now with gcc-6 and gcc-trunk and on master of opencoarray. That wasn't so before. |
@vehre: OK, I'll investigate and bisect assuming I can reproduce...
|
|
MPICH or OpenMPI? |
I have both on my system, but used MPICH to see the failure. |
[issue #172](#172) causes co_reduce to return wrong results if the binary operator has arguments declared with the `value` attribute rather than `intent(in)`. Switching to use `intent(in)` is a valid work around. The binary operator is required to be a pure function with two arguments. As @LadaF points out, F2008 says: > C1276 The specification-part of a pure function subprogram shall specify that all its nonpointer dummy data objects have the `INTENT(IN)` or the `VALUE` attribute. Original coverage diff from adding this test was at: https://codecov.io/gh/sourceryinstitute/opencoarrays/compare/882c371d4d7e84364eb7adfba4b4f8d840e3f398...1a0e3b7edb6867d9e9371cbd9ed1d8f5b3dd2010/changes If this commit produces a different change in coverage then that likely indicates where there is a bug in the library. If the library coverage remains the same, then the bug is probably in the compiler. See discussion at [#172](#172)
But configure it so that it passes and doesn't exhibit regression
[issue #172](#172) causes co_reduce to return wrong results if the binary operator has arguments declared with the `value` attribute rather than `intent(in)`. Switching to use `intent(in)` is a valid work around. The binary operator is required to be a pure function with two arguments. As @LadaF points out, F2008 says: > C1276 The specification-part of a pure function subprogram shall specify that all its nonpointer dummy data objects have the `INTENT(IN)` or the `VALUE` attribute. Original coverage diff from adding this test was at: https://codecov.io/gh/sourceryinstitute/opencoarrays/compare/882c371d4d7e84364eb7adfba4b4f8d840e3f398...1a0e3b7edb6867d9e9371cbd9ed1d8f5b3dd2010/changes If this commit produces a different change in coverage then that likely indicates where there is a bug in the library. If the library coverage remains the same, then the bug is probably in the compiler. See discussion at [#172](#172)
But configure it so that it passes and doesn't exhibit regression
The test I'm talking about isn't on the master branch yet. You can see it in PR #310 here: 2d685c2#diff-174a6f4266c23ced33d9405eca67795c An alternate version that behaves correctly is here, also in PR #310: d7773ce#diff-c95b08ee3613c980a239d87e9edb4001 This variation in whether the binary operators arguments are The coverage is combined from all tests... so it reporting the union of all lines hit when executing all tests (that get run) in src/tests from the OS X and Linux VM on Travis-CI I'm not sure what your point about allreduce is... I have rebased the branch and thus PR #310, so not all the commits are the same with the same hashes as they once had, but the old coverage data should still be accessible. |
I have forked another issue #317 to track the noexecstack issue and keep it separate from what is causing the intent(in)/value thing. Pseudo-code signature of
and for
See the difference? In the intent(in) case the arguments are references (addresses to memory, pass by reference) in the value case they are the value itself (pass by value). So the issue here is, that the library does not respect the flags passed to the collective function in |
[issue #172](#172) causes co_reduce to return wrong results if the binary operator has arguments declared with the `value` attribute rather than `intent(in)`. Switching to use `intent(in)` is a valid work around. The binary operator is required to be a pure function with two arguments. As @LadaF points out, F2008 says: > C1276 The specification-part of a pure function subprogram shall specify that all its nonpointer dummy data objects have the `INTENT(IN)` or the `VALUE` attribute. Original coverage diff from adding this test was at: https://codecov.io/gh/sourceryinstitute/opencoarrays/compare/882c371d4d7e84364eb7adfba4b4f8d840e3f398...1a0e3b7edb6867d9e9371cbd9ed1d8f5b3dd2010/changes If this commit produces a different change in coverage then that likely indicates where there is a bug in the library. If the library coverage remains the same, then the bug is probably in the compiler. See discussion at [#172](#172)
But configure it so that it passes and doesn't exhibit regression
[issue #172](#172) causes co_reduce to return wrong results if the binary operator has arguments declared with the `value` attribute rather than `intent(in)`. Switching to use `intent(in)` is a valid work around. The binary operator is required to be a pure function with two arguments. As @LadaF points out, F2008 says: > C1276 The specification-part of a pure function subprogram shall specify that all its nonpointer dummy data objects have the `INTENT(IN)` or the `VALUE` attribute. Original coverage diff from adding this test was at: https://codecov.io/gh/sourceryinstitute/opencoarrays/compare/882c371d4d7e84364eb7adfba4b4f8d840e3f398...1a0e3b7edb6867d9e9371cbd9ed1d8f5b3dd2010/changes If this commit produces a different change in coverage then that likely indicates where there is a bug in the library. If the library coverage remains the same, then the bug is probably in the compiler. See discussion at [#172](#172)
But configure it so that it passes and doesn't exhibit regression
Changes for resolving issues and release automation - Fixes #79 - Fixes #297 - Add regression test for #243 - Add regression for #172 (currently set to pass when test fails) - Migrates to use of a `.VERSION` file to make parsing easier for scripts and allow extra comments - Adds auto-upload of release assets upon tagging with git, but requires that the tag is PGP signed (`git tag -s <tag> [tree-ish]`) - This should compute the SHA 256 checksum and create a detached signature of the cryptographic SHA 256 checksum with an encrypted GPG subway I uploaded to the repo/travis. 🔮 🎩 🐇
Add support for recude-functions with value parameters. Add support for char-arrays as arguments to reduce-functions. Rename co_reduce_1 to internal_co_reduce to get a better naming. Create mpi-datatype to transport the char-array length. Fixes #172.
For additional details see: - #317 - https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71729#c1 - #172 (comment) #317 is triggered on Fedora 25 Fixes #317
In the regressions/open folder
The co_reduce collective appears to be posting random, large number to
result_image
using the reference example in the gfortran documentation. A diagnostic version of the code and results are included in Stack Overflow: Fortran coarray anomaly with co_reduce.Sample code showing compilation and execution:
co_reduce.tar.gz
The text was updated successfully, but these errors were encountered: