Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Understanding error with stat v3.0.1 #16

Closed
pramodk opened this issue Feb 6, 2020 · 4 comments
Closed

Understanding error with stat v3.0.1 #16

pramodk opened this issue Feb 6, 2020 · 4 comments

Comments

@pramodk
Copy link

pramodk commented Feb 6, 2020

On our existing cluster we have v3.0.1 and I have used that for more than one year without any issues. Recently I am seeing below error:

Screenshot 2020-02-06 at 17 09 22

kumbhar@r1i4n33:/gpfs/bbp.cscs.ch/project/proj95/kumbhar/05-02-2020$ stat-gui
Traceback (most recent call last):
  File "/gpfs/bbp.cscs.ch/apps/tools/install/linux-rhel7-x86_64/gcc-4.8.5/stat-3.0.1-db6xhh/lib/python2.7/site-packages/STATview.py", line 3402, in set_dotcode
    if self.tabs[page].widget.set_dotcode(dotcode, filename, self.options["truncate"], self.options["max node name"]):
  File "/gpfs/bbp.cscs.ch/apps/tools/install/linux-rhel7-x86_64/gcc-4.8.5/stat-3.0.1-db6xhh/lib/python2.7/site-packages/STATview.py", line 2675, in set_dotcode
    xdot.DotWidget.set_dotcode(self, dotcode2, filename)
  File "/gpfs/bbp.cscs.ch/apps/tools/install/linux-rhel7-x86_64/gcc-4.8.5/stat-3.0.1-db6xhh/lib/python2.7/site-packages/xdot.py", line 1541, in set_dotcode
    xdotcode = self.run_filter(dotcode)
  File "/gpfs/bbp.cscs.ch/apps/tools/install/linux-rhel7-x86_64/gcc-4.8.5/stat-3.0.1-db6xhh/lib/python2.7/site-packages/xdot.py", line 1523, in run_filter
    universal_newlines=True
  File "/usr/lib64/python2.7/subprocess.py", line 711, in __init__
    errread, errwrite)
  File "/usr/lib64/python2.7/subprocess.py", line 1327, in _execute_child
    raise child_exception
OSError: [Errno 2] No such file or directory
^Csrun: interrupt (one more within 1 sec to abort)
srun: step:92482.3 tasks 0-5: running
<Feb 06 17:09:25> <SigHandler> (INFO): A signal (2) received. Starting cleanup...
Terminated
<Feb 06 17:09:25> <STAT_BackEnd.C:1279> r1i4n33: STAT returned error type STAT_MRNET_ERROR: stream::recv() failure -1
<Feb 06 17:09:25> <STAT_BackEnd.C:1279> r1i6n23: STAT returned error type STAT_MRNET_ERROR: stream::recv() failure -1
<Feb 06 17:09:25> <STAT_BackEnd.C:1279> r1i6n24: STAT returned error type STAT_MRNET_ERROR: stream::recv() failure -1
<Feb 06 17:09:25> <STAT_BackEnd.C:1279> r1i6n26: STAT returned error type STAT_MRNET_ERROR: stream::recv() failure -1
<Feb 06 17:09:25> <STAT_BackEnd.C:1279> r1i6n25: STAT returned error type STAT_MRNET_ERROR: stream::recv() failure -1
<Feb 06 17:09:25> <STAT_BackEnd.C:1279> r1i4n34: STAT returned error type STAT_MRNET_ERROR: stream::recv() failure -1
kumbhar@r1i4n33:/gpfs/bbp.cscs.ch/project/proj95/kumbhar/05-02-2020$ srun: error: r1i6n26: task 5: Exited with exit code 2
srun: Terminating job step 92482.3
srun: error: r1i6n25: task 4: Exited with exit code 2
srun: error: r1i6n23: task 2: Exited with exit code 2
srun: error: r1i6n24: task 3: Exited with exit code 2
srun: error: r1i4n34: task 1: Exited with exit code 2
srun: error: r1i4n33: task 0: Exited with exit code 2
<Feb 06 17:09:25> <STATD.C:214> r1i4n34: STAT returned error type STAT_MRNET_ERROR: Failure in STAT BE main loop
<Feb 06 17:09:25> <STATD.C:214> r1i6n24: STAT returned error type STAT_MRNET_ERROR: Failure in STAT BE main loop
<Feb 06 17:09:25> <STATD.C:214> r1i4n33: STAT returned error type STAT_MRNET_ERROR: Failure in STAT BE main loop
<Feb 06 17:09:25> <STATD.C:214> r1i6n25: STAT returned error type STAT_MRNET_ERROR: Failure in STAT BE main loop
<Feb 06 17:09:25> <STATD.C:214> r1i6n23: STAT returned error type STAT_MRNET_ERROR: Failure in STAT BE main loop
<Feb 06 17:09:25> <STATD.C:214> r1i6n26: STAT returned error type STAT_MRNET_ERROR: Failure in STAT BE main loop
<Feb 06 17:09:25> <SigHandler> (INFO): Aborting...
---------------------------

The specified directory has:

kumbhar@r1i4n33:/gpfs/bbp.cscs.ch/project/proj95/kumbhar/05-02-2020$ ls stat_results/a.out.0000/
00_a.out.0000.2D.dot  a.out.0000.fulltop  a.out.0000.perf  a.out.0000.ptab  a.out.0000.top

The application is simple hello-world MPI program.

Could you provide some suggestions about what might be an issue? It will be great help!

@lee218llnl
Copy link
Collaborator

would you be able to attach to the .dot file?

@pramodk
Copy link
Author

pramodk commented Feb 6, 2020

Sure! stat_results.zip

Let me know if I should try something. (By the way, I am also on Spack slack in case quick followup needed).

@lee218llnl
Copy link
Collaborator

I was able to open up stat-view stat_results/a.out.0000/00_a.out.0000.2D.dot

Given the errors about subprocess, I'm wondering if STAT is having trouble finding the graphviz dot executable? You may want to try doing a strace -f when you run stat-gui to debug further. Have there been system updates since you installed STAT? What happens if you run which dot?

@pramodk
Copy link
Author

pramodk commented Feb 6, 2020

Wow! you were absolutely right! I added graphviz to $PATH and:

Screenshot 2020-02-06 at 22 12 56

Thank you very much for quick response! Really appreciate!

bit more explicit error message would have been useful.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants