-
Notifications
You must be signed in to change notification settings - Fork 11.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Flang][OpenMP][Driver][Test] Fix the omp-driver-offload-test commands #66926
Conversation
@llvm/pr-subscribers-flang-openmp @llvm/pr-subscribers-flang-driver ChangesSince the changes to -### to correctly return error codes in D156363, some additions of the not lit keyword and perhaps some changes to how fopenmp-targets is handled; the test omp-driver-offload.f90 has become a little susceptible to what appears to be sporadic failures. This was because if certain environment variables are set for AMD devices some of the commands succeed happily and the not check causes the test to fail. Whereas, if they were not set the tests would fail and the not test would succeed, allowing the test to pass provided you were not targeting AMD hardware while running the test. When the environment variables for AMD hardware are set the compiler could find the correct offload architecture to compile for, allowing the command to succeed. However, when not set it'd fail as it can't detect the offload architecture. The solution to this is to specify an AMD architecture via --offload-arch so it always passes, the architecture doesn't really matter, I've chosen to select gfx90a. Another command was just slightly incorrect, although it emitted the expected output, it'd return an error code as it was trying to emit multiple outputs to one file (had to remove the -o). Full diff: https://github.com/llvm/llvm-project/pull/66926.diff 1 Files Affected:
diff --git a/flang/test/Driver/omp-driver-offload.f90 b/flang/test/Driver/omp-driver-offload.f90
index 74ed0f9006e81c3..bfdc3f6f4d4726b 100644
--- a/flang/test/Driver/omp-driver-offload.f90
+++ b/flang/test/Driver/omp-driver-offload.f90
@@ -1,6 +1,10 @@
! Test that flang-new OpenMP and OpenMP offload related
! commands forward or expand to the appropriate commands
-! for flang-new -fc1 as expected.
+! for flang-new -fc1 as expected. Assumes a gfx90a, aarch64,
+! and sm_70 architecture, but doesn't require one to be
+! installed or compiled for, just testing the appropriate
+! generation of jobs are created with the correct
+! corresponding arguments.
! Test regular -fopenmp with no offload
! RUN: %flang -### -fopenmp %s 2>&1 | FileCheck --check-prefixes=CHECK-OPENMP %s
@@ -33,7 +37,7 @@
! OFFLOAD-HOST-NOT: "-triple" "nvptx64-nvidia-cuda"
! OFFLOAD-HOST-NOT: "{{[^"]*}}flang-new" "-fc1" "-triple" "aarch64-unknown-linux-gnu"
-! RUN: not %flang -S -### %s -o %t 2>&1 \
+! RUN: %flang -S -### %s 2>&1 \
! RUN: -fopenmp --offload-arch=gfx90a --offload-arch=sm_70 --offload-device-only \
! RUN: --target=aarch64-unknown-linux-gnu \
! RUN: | FileCheck %s --check-prefix=OFFLOAD-DEVICE
@@ -44,7 +48,7 @@
! OFFLOAD-DEVICE-NOT: "{{[^"]*}}flang-new" "-fc1" "-triple" "aarch64-unknown-linux-gnu"
! Test regular -fopenmp with offload for basic fopenmp-is-target-device flag addition and correct fopenmp
-! RUN: not %flang -### -fopenmp -fopenmp-targets=amdgcn-amd-amdhsa %s 2>&1 | FileCheck --check-prefixes=CHECK-OPENMP-IS-TARGET-DEVICE %s
+! RUN: %flang -### -fopenmp --offload-arch=gfx90a -fopenmp-targets=amdgcn-amd-amdhsa %s 2>&1 | FileCheck --check-prefixes=CHECK-OPENMP-IS-TARGET-DEVICE %s
! CHECK-OPENMP-IS-TARGET-DEVICE: "{{[^"]*}}flang-new" "-fc1" {{.*}} "-fopenmp" {{.*}} "-fopenmp-is-target-device" {{.*}}.f90"
! Testing appropriate flags are gnerated and appropriately assigned by the driver when offloading
@@ -58,44 +62,51 @@
! OPENMP-OFFLOAD-ARGS-NEXT: "{{[^"]*}}flang-new" "-fc1" "-triple" "aarch64-unknown-linux-gnu" {{.*}} "-fopenmp" {{.*}} "-fembed-offload-object={{.*}}.out" {{.*}}.bc"
! Test -fopenmp with offload for RTL Flag Options
-! RUN: not %flang -### %s -o %t 2>&1 \
-! RUN: -fopenmp -fopenmp-targets=amdgcn-amd-amdhsa \
+! RUN: %flang -### %s -o %t 2>&1 \
+! RUN: -fopenmp --offload-arch=gfx90a \
+! RUN: -fopenmp-targets=amdgcn-amd-amdhsa \
! RUN: -fopenmp-assume-threads-oversubscription \
! RUN: | FileCheck %s --check-prefixes=CHECK-THREADS-OVS
! CHECK-THREADS-OVS: "{{[^"]*}}flang-new" "-fc1" {{.*}} "-fopenmp" {{.*}} "-fopenmp-is-target-device" "-fopenmp-assume-threads-oversubscription" {{.*}}.f90"
-! RUN: not %flang -### %s -o %t 2>&1 \
-! RUN: -fopenmp -fopenmp-targets=amdgcn-amd-amdhsa \
+! RUN: %flang -### %s -o %t 2>&1 \
+! RUN: -fopenmp --offload-arch=gfx90a \
+! RUN: -fopenmp-targets=amdgcn-amd-amdhsa \
! RUN: -fopenmp-assume-teams-oversubscription \
! RUN: | FileCheck %s --check-prefixes=CHECK-TEAMS-OVS
! CHECK-TEAMS-OVS: "{{[^"]*}}flang-new" "-fc1" {{.*}} "-fopenmp" {{.*}} "-fopenmp-is-target-device" "-fopenmp-assume-teams-oversubscription" {{.*}}.f90"
-! RUN: not %flang -### %s -o %t 2>&1 \
-! RUN: -fopenmp -fopenmp-targets=amdgcn-amd-amdhsa \
+! RUN: %flang -### %s -o %t 2>&1 \
+! RUN: -fopenmp --offload-arch=gfx90a \
+! RUN: -fopenmp-targets=amdgcn-amd-amdhsa \
! RUN: -fopenmp-assume-no-nested-parallelism \
! RUN: | FileCheck %s --check-prefixes=CHECK-NEST-PAR
! CHECK-NEST-PAR: "{{[^"]*}}flang-new" "-fc1" {{.*}} "-fopenmp" {{.*}} "-fopenmp-is-target-device" "-fopenmp-assume-no-nested-parallelism" {{.*}}.f90"
-! RUN: not %flang -### %s -o %t 2>&1 \
-! RUN: -fopenmp -fopenmp-targets=amdgcn-amd-amdhsa \
+! RUN: %flang -### %s -o %t 2>&1 \
+! RUN: -fopenmp --offload-arch=gfx90a \
+! RUN: -fopenmp-targets=amdgcn-amd-amdhsa \
! RUN: -fopenmp-assume-no-thread-state \
! RUN: | FileCheck %s --check-prefixes=CHECK-THREAD-STATE
! CHECK-THREAD-STATE: "{{[^"]*}}flang-new" "-fc1" {{.*}} "-fopenmp" {{.*}} "-fopenmp-is-target-device" "-fopenmp-assume-no-thread-state" {{.*}}.f90"
-! RUN: not %flang -### %s -o %t 2>&1 \
-! RUN: -fopenmp -fopenmp-targets=amdgcn-amd-amdhsa \
+! RUN: %flang -### %s -o %t 2>&1 \
+! RUN: -fopenmp --offload-arch=gfx90a \
+! RUN: -fopenmp-targets=amdgcn-amd-amdhsa \
! RUN: -fopenmp-target-debug \
! RUN: | FileCheck %s --check-prefixes=CHECK-TARGET-DEBUG
! CHECK-TARGET-DEBUG: "{{[^"]*}}flang-new" "-fc1" {{.*}} "-fopenmp" {{.*}} "-fopenmp-is-target-device" "-fopenmp-target-debug" {{.*}}.f90"
-! RUN: not %flang -### %s -o %t 2>&1 \
-! RUN: -fopenmp -fopenmp-targets=amdgcn-amd-amdhsa \
+! RUN: %flang -### %s -o %t 2>&1 \
+! RUN: -fopenmp --offload-arch=gfx90a \
+! RUN: -fopenmp-targets=amdgcn-amd-amdhsa \
! RUN: -fopenmp-target-debug \
! RUN: | FileCheck %s --check-prefixes=CHECK-TARGET-DEBUG
! CHECK-TARGET-DEBUG-EQ: "{{[^"]*}}flang-new" "-fc1" {{.*}} "-fopenmp" {{.*}} "-fopenmp-is-target-device" "-fopenmp-target-debug=111" {{.*}}.f90"
-! RUN: not %flang -S -### %s -o %t 2>&1 \
-! RUN: -fopenmp -fopenmp-targets=amdgcn-amd-amdhsa \
+! RUN: %flang -S -### %s -o %t 2>&1 \
+! RUN: -fopenmp --offload-arch=gfx90a \
+! RUN: -fopenmp-targets=amdgcn-amd-amdhsa \
! RUN: -fopenmp-target-debug -fopenmp-assume-threads-oversubscription \
! RUN: -fopenmp-assume-teams-oversubscription -fopenmp-assume-no-nested-parallelism \
! RUN: -fopenmp-assume-no-thread-state \
@@ -104,8 +115,9 @@
! CHECK-RTL-ALL: "-fopenmp-assume-threads-oversubscription" "-fopenmp-assume-no-thread-state" "-fopenmp-assume-no-nested-parallelism"
! CHECK-RTL-ALL: {{.*}}.f90"
-! RUN: not %flang -### %s -o %t 2>&1 \
-! RUN: -fopenmp -fopenmp-targets=amdgcn-amd-amdhsa \
+! RUN: %flang -### %s -o %t 2>&1 \
+! RUN: -fopenmp --offload-arch=gfx90a \
+! RUN: -fopenmp-targets=amdgcn-amd-amdhsa \
! RUN: -fopenmp-version=45 \
! RUN: | FileCheck %s --check-prefixes=CHECK-OPENMP-VERSION
! CHECK-OPENMP-VERSION: "{{[^"]*}}flang-new" "-fc1" {{.*}} "-fopenmp" "-fopenmp-version=45" {{.*}}.f90"
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for sending this! TBH, it's hard to follow without knowing/understanding the semantics of the relevant options. Looking at:
! RUN: %flang -### %s -o %t 2>&1 \
! RUN: -fopenmp --offload-arch=gfx90a \
! RUN: -fopenmp-targets=amdgcn-amd-amdhsa \
what is significant about --offload-arch=gfx90a
that makes the driver not fail? IIUC, that's the main thing that's changing here.
@@ -33,7 +37,7 @@ | |||
! OFFLOAD-HOST-NOT: "-triple" "nvptx64-nvidia-cuda" | |||
! OFFLOAD-HOST-NOT: "{{[^"]*}}flang-new" "-fc1" "-triple" "aarch64-unknown-linux-gnu" | |||
|
|||
! RUN: not %flang -S -### %s -o %t 2>&1 \ | |||
! RUN: %flang -S -### %s 2>&1 \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please move to a dedicate patch
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can do, but this is still sort of the same set of changes trying to fix the test back to a state of normalcy, just slightly differently to the rest. In this case it's failing because the command expects to output multiple inputs rather than a single so the -o segment of the command breaks it and the not is then required. Although, we can leave it as an incorrect command and it'd still be able to check the correct results and yield positive! This change mostly just allows us to remove the not from the front allowing all not's to be removed from the file, making understanding of the test a little easier.
So do please let me know if you still wish it in a seperate PR and I'll remove it and move it to a new one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So do please let me know if you still wish it in a seperate PR and I'll remove it and move it to a new one.
Yes, please. The rationale for this particular change is straightforward and unrelated to the other changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No problem, I'll do that soon!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
New PR with the moved test is here: #67335 I'll update this patch shortly to exclude the change in the other PR.
No problem! Thank you for reviewing it. And yes it is, sorry about that, it's mostly just specifying the GPU triple and specific hardware architecture.
I believe the amdgcn-amd-amdhsa command is more of a: "we're compiling for this generic amd gpu hardware triple,", so we don't know the specific GPU architecture just yet. Whereas defining --offload-arch=gfx90a is defining the architecture we're specifically targeting. Without the --offload-arch=gfx90a command it appears that with the right combination of AMD libraries in tow in your environment the GPU on your system can be identified (which on AMD systems like mine is a gfx of some variety) and the command will succeed. But the test has been trained in a past commit to expect failure for the return code, which is correct in most cases, unless you have the correct libraries in the environment and an AMD GPU (or other AMD offload arch) as it turns out. So it creates a scenario where we'll pass most of the time, but occasionally it'll fail when the stars align, even though the test is finding the correct information. So specifying the offload-arch just means the information is always available regardless of environment, so no chance failures. Maybe I'm incorrect, this is just my understanding from diagnosing the test failures @jhuber6 can likely correct me if I'm wrong. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM :)
I don't want to come across as picky, but changing
IIUC, this test does not depend on any env variables and it should succeed regardless of the env variables? If it does depend on env variables then that should be documented. Is it? Similar note on various libraries being present or not. As per PR title:
This test works for me fine and from my perspective it is being "refined" rather than "fixed". Perhaps there's an issue that affects some folks and this change resolves that, but from your description it sounds like a "downstream" issue? Are there any public buildbots affected by this? "fix" hints to me that there's a bug and in such cases I feel that I should understand what the issue is before I approve a patch. In this case I don't (*), but am happy to rely on your expertise and the expertise of other reviewers. The whole discussion seems to assume a good deal of background on OpenMP offloading for AMD GPUs, which I don't have. (*) With the exception of the |
Not picky at all, always happy to have questions/attempt to clarify things! The
The test itself doesn't depend on any environment variables (or at least I didn't write the initial version with that knowledge/intent), but the success or failure of the commands as they are currently written depends on having rocm's library folder in your path. Defining the architecture and defining the tests around this should avoid this behavior as we're no longer dependent on the rocm library for picking up the native architecture. Perhaps I'm incorrect though it's a part of the driver I'm unfamiliar with @jhuber6 can probably fact check me/correct me in this case.
I'm happy with whatever classification we want to give it a refine or fix! I chose fix as the commands are failing currently, which wasn't the intention of them when they were originally added. I don't think it's a downstream issue, I believe it's an issue for anyone developing for AMD GPU in conjunction with a Flang build I think (this test fails in this manner on an unmodified llvm project), the community just happens to be quite small at the moment as it's in development (and it's been noted by @jsjodin and @TIFitis to fail arbitrarily as well, I just happened to stumble across what I believe is the reason why while fixing an unrelated patch).
There are currently no public buildbots that run Flang+AMDGPU+OpenMP tests at the same time as far as I am aware, and this particular test is tied to Flang, so it's unfortunately fallen under the radar for a little while I think . the closest is this buildbot which tests AMDGPU + OpenMP with a focus on the runtime, but it doesn't have a flang build: https://lab.llvm.org/buildbot/#/builders/193/builds/39058/steps/6/logs/stdio (feels like a bit of a gap as we're beginning to add Flang+OpenMP runtime tests and I don't know who to discuss with about perhaps adding a Flang build for this to get test coverage for these, but might not be worth the extra build time at the moment).
That's understandable. Perhaps it's not necessarily a bug, just a not so future proof test originally written by me I believe so the test is in need of an adjustment. So perhaps fix is a bit strong, I don't mind what wording we use though, so feel free to suggest any changes to the commit message and title you wish! |
…ands to be more consistent Since the changes to -### to correctly return error codes in D156363, some additions of the not lit keyword and perhaps some changes to how fopenmp-targets is handled the test omp-driver-offload.f90 has become a little susceptible to what appears to be sporadic failures (they're not). This was because if the rocm libraries from AMD devices are findable in a users environment some of the commands succeed happily and the not check causes the test to fail. Whereas, if they were not set the tests would fail and the not test would succeed, allowing the test to pass provided you were not targetting AMD hardware while running the test. When the environment variables for AMD hardware are set the compiler could find the correct offload architecture to compile for, allowing the command to succeed. However, when not set it'd fail as it can't detect the offload architecture. The solution to this is to specify an AMD architecture via --offload-arch so it always passes, the architecture doesn't really matter, I've chosen to select gfx90a.
ba0a569
to
41a170e
Compare
Thank you for this comprehensive explanation - the additional context helps a lot. Also thank your taking the initiative to improve this - that's greatly appreciated! LGTM |
No problem, always happy to try and explain, I know I'm not great at it sometimes, so I appreciate your patience. Thank you very much for the review and your time! |
llvm#66926) Since the changes to -### to correctly return error codes in D156363, some additions of the not lit keyword and perhaps some changes to how fopenmp-targets is handled; the test omp-driver-offload.f90 has become a little susceptible to what appears to be sporadic failures. This was because if certain environment variables are set for AMD devices some of the commands succeed happily and the not check causes the test to fail. Whereas, if they were not set the tests would fail and the not test would succeed, allowing the test to pass provided you were not targeting AMD hardware while running the test. When the environment variables for AMD hardware are set the compiler could find the correct offload architecture to compile for, allowing the command to succeed. However, when not set it'd fail as it can't detect the offload architecture. The solution to this is to specify an AMD architecture via --offload-arch so it always passes, the architecture doesn't really matter, I've chosen to select gfx90a.
Since the changes to -### to correctly return error codes in D156363, some additions of the not lit keyword and perhaps some changes to how fopenmp-targets is handled; the test omp-driver-offload.f90 has become a little susceptible to what appears to be sporadic failures.
This was because if certain environment variables are set for AMD devices some of the commands succeed happily and the not check causes the test to fail. Whereas, if they were not set the tests would fail and the not test would succeed, allowing the test to pass provided you were not targeting AMD hardware while running the test.
When the environment variables for AMD hardware are set the compiler could find the correct offload architecture to compile for, allowing the command to succeed. However, when not set it'd fail as it can't detect the offload architecture. The solution to this is to specify an AMD architecture via --offload-arch so it always passes, the architecture doesn't really matter, I've chosen to select gfx90a.
Another command was just slightly incorrect, although it emitted the expected output, it'd return an error code as it was trying to emit multiple outputs to one file (had to remove the -o).