Skip to content

Tool attach failure #3243

@lee218llnl

Description

@lee218llnl

When using the latest OpenMPI commit (d782542) and trying to attach to an MPI job via LaunchMON (https://github.com/LLNL/LaunchMON) I get the following error:

--------------------------------------------------------------------------
There are not enough slots available in the system to satisfy the 60 slots
that were requested by the application:
  /nfs/tmp2/lee218/prefix/stat-travis/bin/STATD

Either request fewer slots for your application, or make more slots available
for use.
--------------------------------------------------------------------------

I am on a system that allocated nodes via SLURM and this is on 2 nodes with 64 tasks. The application was launched with mpirun/orterun with 4 MPI processes, which might explain why it's trying to fill the node with 64-4=60 slots. However, it should only try launching 1 STATD daemon process per node. Let me know if there are more diagnostics that I can gather to help diagnose this. Note I am able to attach TotalView to the a similarly launched MPI job, so LaunchMON is doing something different than TotalView for process acquisition/daemon launch.

Perhaps similarly related, I notice that if I instead launch the MPI application using all 64 tasks that were allocated, my attempt to attach a tool results in:

--------------------------------------------------------------------------
All nodes which are allocated for this job are already filled.
--------------------------------------------------------------------------

Let me know if this should be submitted as a separate issue. I can provide additional diagnostics for this too if need be.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions