Skip to content

Conversation

@wenduwan
Copy link
Contributor

Running latest prrte and pmix through CI.

Signed-off-by: Wenduo Wang <wenduwan@amazon.com>
@wenduwan
Copy link
Contributor Author

I'm also seeing AWS internal CI failures.

@wenduwan
Copy link
Contributor Author

CI failure

...
--> Running example: hello_c
+ timeout -s SIGSEGV 4m mpirun --get-stack-traces --timeout 180 --hostfile /home/ec2-user/workspace/pen-mpi.pull_request-v2_PR-12342/hostfile -np 2 --bind-to none ./examples/hello_c
.ci/community-jenkins/pr-builder.sh: line 278:  2255 Segmentation fault      ${1} ${2}
+ ret=139
+ test 139 -ne 0
+ echo 'Example failed: 139'
Example failed: 139
+ echo 'Command was: timeout -s SIGSEGV 4m mpirun --get-stack-traces --timeout 180 --hostfile /home/ec2-user/workspace/pen-mpi.pull_request-v2_PR-12342/hostfile -np 2 --bind-to none  ./examples/hello_c'
Command was: timeout -s SIGSEGV 4m mpirun --get-stack-traces --timeout 180 --hostfile /home/ec2-user/workspace/pen-mpi.pull_request-v2_PR-12342/hostfile -np 2 --bind-to none  ./examples/hello_c
+ exit 139

@wenduwan
Copy link
Contributor Author

mpi4py failure

/home/runner/work/_temp/beacf9d7-d716-4623-b178-4d8637661150.sh: line 1: 145044 Segmentation fault      (core dumped) mpiexec -n 1 python test/main.py -v

@rhc54
Copy link
Contributor

rhc54 commented Feb 15, 2024

The mpi4py error tells me nothing as I have no idea what is going on in that test. I'd be very suspicious of a segfault in "hello" - if that were true, then we would be failing in a lot of places.

@rhc54
Copy link
Contributor

rhc54 commented Feb 15, 2024

FWIW: MPI "hello" runs fine in my tests. There is a problem in the RAS allocation code right now, and I'll have to take a look at it. PRRTE master is undergoing some change to handle scheduler integration, and so there will be some churn there (none of which is going into the release branch). I wouldn't advise updating until we calm it down.

@wenduwan wenduwan closed this Feb 16, 2024
@wenduwan wenduwan deleted the main_bump_submodules branch February 16, 2024 15:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants