Skip to content

Conversation

jjhursey
Copy link
Member

  • PGI was throwing the following error.
NVC++-S-0103-Illegal operand types for comparison operator (osc_rdma_frag.h: 75)
NVC++/power Linux 20.11-0: compilation completed with severe errors
  • It must not have liked the inline declaration of the NULL pointer.
    • So replace with a variable, as we do in other places in the code base.

 * PGI was throwing the following error.
```
NVC++-S-0103-Illegal operand types for comparison operator (osc_rdma_frag.h: 75)
NVC++/power Linux 20.11-0: compilation completed with severe errors
```
 * It must not have liked the inline declaration of the NULL pointer.
   - So replace with a variable, as we do in other places in the code base.

Signed-off-by: Joshua Hursey <jhursey@us.ibm.com>
@jjhursey jjhursey added the bug label Dec 30, 2020
@jjhursey jjhursey requested a review from hjelmn December 30, 2020 17:40
@jjhursey
Copy link
Member Author

I'd like someone familiar with the osc/rdma component to take a look to make sure this is good to go. I flagged Nathan, but if there is someone else that feels comfortable reviewing please do so.

Note that IBM CI did not pick this up I found it while working on fixing PGI builds in MTT. With this patch we compile cleanly. I'm not 100% sure why CI didn't pick it up - they are different build environments so that's probably contributing.

@hjelmn
Copy link
Member

hjelmn commented Dec 30, 2020

PGI is dead wrong here (what a surprise) but the work around is fine.

@hjelmn
Copy link
Member

hjelmn commented Dec 30, 2020

Maybe we should open a bug with PGI. Their quality control is not the best and the only way to ensure it ever gets fixed is to bug them about it :). Though maybe since they are part of NVidia they are getting better.

@jjhursey
Copy link
Member Author

@artemry-mlnx it looks like the Mellanox CI is stuck. There are items in the queue waiting since Sunday, and the last run was Dec. 23 - nothing is running right now. Can you take a look?

@jjhursey
Copy link
Member Author

@hjelmn I agree. I'll try to work up a small reproducer tomorrow and file a bug with NVIDIA/PGI. That'll at least make them aware of it. Thanks for the review.

@artemry-nv
Copy link

@artemry-mlnx it looks like the Mellanox CI is stuck. There are items in the queue waiting since Sunday, and the last run was Dec. 23 - nothing is running right now. Can you take a look?

There was an issue with CI host - fixed.

@jjhursey
Copy link
Member Author

@artemry-mlnx It looks like there is a setup issue on the Mellanox CI side. It's throwing this error before starting the build:

++ /usr/bin/modulecmd sh load hpcx-gcc-stack
ModuleCmd_Load.c(213):ERROR:105: Unable to locate a modulefile for 'hpcx-gcc-stack'
+ eval
+ '[' no = yes ']'
/__w/1/jenkins_scripts/jenkins/ompi/ompi_test.sh: line 395: HPCX_UCX_DIR: unbound variable

@jjhursey
Copy link
Member Author

jjhursey commented Dec 31, 2020

FYI: I moved our CI boxes RHEL 8.2 from Centos 7, and that seems to trigger this same issue as it does in MTT (MTT runs on RHEL 8.1). Just retesting to make double sure.

bot:ibm:pgi:retest

@jjhursey
Copy link
Member Author

jjhursey commented Jan 4, 2021

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@jjhursey jjhursey merged commit 3f3ec63 into open-mpi:master Jan 4, 2021
@jjhursey jjhursey deleted the fix-pgi-rdma branch January 4, 2021 14:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants