Skip to content

Conversation

@afriedle-intel
Copy link
Contributor

This new MTL supports PSM2 for Omni Path. PSM2 is a descendant of PSM
with changes to support more processes and some MPI-3 features like mprobe.

PSM2 will only support Omni Path networks; PSM only supports True Scale.
Likewise, the existing PSM MTL will continue to be maintained for True
Scale, while the PSM2 MTL is developed and maintained for Omni Path.

This new MTL runs over PSM2 for Omni Path.  PSM2 is a descendant of PSM
with changes to support more ranks and some MPI-3 features like mprobe.

PSM2 will only support Omni Path networks; PSM only supports True Scale.
Likewise, the existing PSM MTL will continue to be maintained for True
Scale, while the PSM2 MTL is developed and maintained for Omni Path.
@mellanox-github
Copy link

Refer to this link for build results (access rights to CI server needed):
http://bgate.mellanox.com/job/gh-ompi-master-pr/639/

Build Log
last 50 lines

[...truncated 48512 lines...]
640      1.538933
704      1.564410
768      1.579106
832      1.580462
896      1.613407
960      1.618928
1024     1.635517
+ '[' -n '' ']'
+ for exe in latency_th bw_th message_rate_th
+ exe_path=/var/lib/jenkins/jobs/gh-ompi-master-pr/workspace/ompi_install1/thread_tests/thread-tests-1.1/latency_th
+ PATH=/var/lib/jenkins/jobs/gh-ompi-master-pr/workspace/ompi_install1/bin:/hpc/local/bin::/usr/local/bin:/bin:/usr/bin:/usr/sbin:/hpc/local/bin:/hpc/local/bin/:/hpc/local/bin/:/sbin:/usr/sbin:/bin:/usr/bin:/usr/local/sbin:/opt/ibutils/bin
+ LD_LIBRARY_PATH=/var/lib/jenkins/jobs/gh-ompi-master-pr/workspace/ompi_install1/lib:
+ mpi_runner 2 /var/lib/jenkins/jobs/gh-ompi-master-pr/workspace/ompi_install1/thread_tests/thread-tests-1.1/latency_th 8
+ local np=2
+ local exe_path=/var/lib/jenkins/jobs/gh-ompi-master-pr/workspace/ompi_install1/thread_tests/thread-tests-1.1/latency_th
+ local exe_args=8
+ local 'common_mca=-bind-to core'
+ local 'mca=-bind-to core'
+ '[' no == yes ']'
+ '[' yes == yes ']'
+ timeout -s SIGKILL 10m mpirun -np 2 -bind-to core -mca pml ob1 -mca btl self,sm /var/lib/jenkins/jobs/gh-ompi-master-pr/workspace/ompi_install1/thread_tests/thread-tests-1.1/latency_th 8
Size (bytes)     Time (us)
0    12.584484
16   94.967711
32   117.122708
48   28.904879
64   272.144444
80   83.756944
96   171.889636
112      255.150686
128      75.062764
144      218.371365
160      403.257619
176      208.569274
[jenkins01:12471] *** An error occurred in MPI_Recv
[jenkins01:12471] *** reported by process [140735849824257,0]
[jenkins01:12471] *** on communicator MPI_COMM_WORLD
[jenkins01:12471] *** MPI_ERR_INTERN: internal error
[jenkins01:12471] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
[jenkins01:12471] ***    and potentially your MPI job)
Build step 'Execute shell' marked build as failure
[htmlpublisher] Archiving HTML reports...
[htmlpublisher] Archiving at BUILD level /var/lib/jenkins/jobs/gh-ompi-master-pr/workspace/cov_build to /var/lib/jenkins/jobs/gh-ompi-master-pr/builds/639/htmlreports/Coverity_Report
Setting commit status on GitHub for https://github.com/open-mpi/ompi/commit/f2aa01c684542af94da4b5005ce9ed0590964b4e
[BFA] Scanning build for known causes...
[BFA] No failure causes found
[BFA] Done. 0s
Setting status of 2c9be59b3726d27c21312ca12c485f83ccc67f58 to FAILURE with url http://bgate.mellanox.com:8888/jenkins/job/gh-ompi-master-pr/639/ and message: 'Build finished.'
Using conext: Mellanox

@rhc54 rhc54 added this to the Future milestone Jun 22, 2015
@rhc54
Copy link
Contributor

rhc54 commented Jun 22, 2015

@miked-mellanox We keep seeing this failure recently - is it bogus?

@mike-dubman
Copy link
Member

the failure seems unrelated to your commit but real.

it seems that ob1 and sm have issues with threading support.

@hjelmn - could you please take a look?

@hppritcha
Copy link
Member

I"m not sure for the 1.10.x that we are claiming support for mpi thread multiple for ob1.

@hjelmn
Copy link
Member

hjelmn commented Jun 22, 2015

ob1 is thread safe but not all of the btls are. I know vader is thread safe but have no idea if sm is. It might be close to time to put btl/sm out to pasture.

@mike-dubman
Copy link
Member

cool, thanks
will remove this test

@mellanox-github
Copy link

Refer to this link for build results (access rights to CI server needed):
http://bgate.mellanox.com/job/gh-ompi-master-pr/641/

@mellanox-github
Copy link

Refer to this link for build results (access rights to CI server needed):
http://bgate.mellanox.com/job/gh-ompi-master-pr/642/

@afriedle-intel
Copy link
Contributor Author

Is it OK for me to hit the merge button, or should one of the core developers do it? Anything else that I need to address?

@rhc54
Copy link
Contributor

rhc54 commented Jun 22, 2015

Go for it 😀

Sent from my iPhone

On Jun 22, 2015, at 12:41 PM, Andrew Friedley notifications@github.com wrote:

Is it OK for me to hit the merge button, or should one of the core developers do it? Anything else that I need to address?


Reply to this email directly or view it on GitHub.

afriedle-intel added a commit that referenced this pull request Jun 22, 2015
@afriedle-intel afriedle-intel merged commit a5cfbdd into open-mpi:master Jun 22, 2015
@afriedle-intel afriedle-intel deleted the afriedle-psm2-mtl branch June 22, 2015 20:00
jsquyres added a commit to jsquyres/ompi that referenced this pull request Sep 19, 2016
HCOLL: Enable alltoall interface
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants