-
Notifications
You must be signed in to change notification settings - Fork 0
Open
Description
On my clusters, nodes have 96 cores splitted in 2 sockets (2 * 48).
Shell runner is not enable to launch a batch of tests with OpenMPI if one specific job is scheduled for execution on CPU not sharing the same socket.
More precisely I runned the test suite with n_workers = 95.
We can see in pytest_static_sched_1.sh that one job is scheduled for execution on ranks 47 and 48 :
[some tests before]
(mpiexec --cpu-list 45,46 -np 2 python3 -u -m pytest -s --_worker --_scheduler_ip_address=XXX --_scheduler_port=XXX --_session_folder=tmpb3y7c9gv -p pytest_parallel.plugin -rfEs -s --durations=10 --scheduler=shell --n-workers=95 --_test_idx=71 {TESTPATH} > .pytest_parallel/tmpb3y7c9gv/{OUTPATH} 2>&1 ; python3 -m pytest_parallel.send_report --_scheduler_ip_address=XXX --_scheduler_port=XXX --_session_folder=tmpb3y7c9gv --_test_idx=71 --_test_name=.pytest_parallel/tmpb3y7c9gv/{TESTNAME} & \
(mpiexec --cpu-list 47,48 -np 2 python3 -u -m pytest -s --_worker --_scheduler_ip_address=XXX --_scheduler_port=XXX --_session_folder=tmpb3y7c9gv -p pytest_parallel.plugin -rfEs -s --durations=10 --scheduler=shell --n-workers=95 --_test_idx=72 {TESTPATH} > .pytest_parallel/tmpb3y7c9gv/{OUTPATH} 2>&1 ; python3 -m pytest_parallel.send_report --_scheduler_ip_address=XXX --_scheduler_port=XXX --_session_folder=tmpb3y7c9gv --_test_idx=72 --_test_name=.pytest_parallel/tmpb3y7c9gv/{TESTNAME} & \
[some tests after]All the step crash with the message
INTERNALERROR> AssertionError: FATAL ERROR in pytest_parallel early processing
and if I look the log file of related, it contains this OpemMPI error :
--------------------------------------------------------------------------
Your job failed to map because the resulting process placement
would cause the process to be bound to CPUs in more than one
package:
Mapping policy: PE-LIST:NOOVERSUBSCRIBE
Binding policy: CORE:IF-SUPPORTED
PE-LIST: 47,48
This configuration almost always results in a loss of performance
that can significantly impact applications. Either alter the
mapping, binding, and/or PE-LIST policies so that each process
can fit into a single package, or consider using an alternative
mapper that can handle this configuration (e.g., the rankfile mapper).
--------------------------------------------------------------------------
Metadata
Metadata
Assignees
Labels
No labels