MDSplus dispatcher opens lots of files #2731

smithsp · 2024-03-26T13:12:17Z

Affiliation
GA/DIII-D

Version(s) Affected
Our production server, atlas, currently suffers this bug. @sflanagan will provide the version at some point.

Platform
RHEL8

Describe the bug
The MDSplus dispatcher leaves open files, filling up the limit of open files. Somehow the limit of number of open files is not honored by the OS or dispatcher, even after attempting to increase the limit in various ways in various configuration files.

To Reproduce
Our production server, atlas, currently suffers this bug. Details to be provided at some point by @sflanagan .

Expected behavior
I expect that the dispatcher will close its own files as it goes, without accumulating open files.

Additional context
Any fix provided needs to be on a branch starting from the version that is currently on atlas.

joshStillerman · 2024-03-26T14:35:31Z

I think you need to add dispatch/close to the script that is doing the dispatch/build.

TCL> help dispatch/close

  Command: DISPATCH/CLOSE
  Purpose: Instruct MDSplus actions server(s) to close all open trees.
  Format: DISPATCH/CLOSE[/SERVER=(server1,server2...)]

  Description:

  The DISPATCH/CLOSE can be used to instruct an MDSplus action server to close all MDSplus
  trees the server currently has open. If the /SERVER qualifier is not included to specify
  any action servers then the current dispatch table (See: DISPATCH/BUILD and
  DISPATCH/PHASE commands) is inspected to construct a list of servers that had actions
  dispatched to them to send close instructions to. The DISPATCH/CLOSE command is typically
  used at the end of a data acquisition cycle to instruct the action server to close
  any open trees.

TCL>

I am happy to meet this afternoon to discuss...

mwinkel-dev · 2024-03-26T14:41:59Z

According to Mitchell's post on another issue, Atlas has been running alpha-7.139.59 since November 2023. And that Atlas is on RHEL8.
Issue 2704, post of 13-Feb-2024

sflanagan · 2024-03-26T17:26:34Z

There's likely more to this than open files, as enabling core dumps against the systemd services running the "disaptcher" (i.e. mdsip -s on atlas:8002) and the "mdsloader" (i.e. mdsip -s on atlas:8003) comes up with some potential issues in mdstcpip/mdsipshr/ConnectToMds.c:?

I haven't looked at this too closely, but for the sake of notes I'll just quickly post the coredumpctl gdb <pid>; bt of the last core dump produced (this one is from the port 8002 side):

[root@atlas ~]# coredumpctl gdb 865034
           PID: 865034 (mdsip)
           UID: 117 (mdsadmin)
           GID: 511 (mdsdev)
        Signal: 11 (SEGV)
     Timestamp: Mon 2024-03-25 21:44:26 PDT (12h ago)
  Command Line: /usr/local/mdsplus/bin/mdsip -s -p 8002 -h /etc/mdsip.hosts
    Executable: /usr/local/mdsplus/bin64/mdsip
 Control Group: /
         Slice: -.slice
       Boot ID: 7392c4d0130e43a9a07fbde2f35cd9c7
    Machine ID: 1a91ee619a7f42d0b73da694e83b0be3
      Hostname: atlas.gat.com
       Storage: /var/lib/systemd/coredump/core.mdsip.117.7392c4d0130e43a9a07fbde2f35cd9c7.865034.1711428266000000.lz4
       Message: Process 865034 (mdsip) of user 117 dumped core.

                Stack trace of thread 866623:
                #0  0x00001468186e2c61 do_login (libMdsIpShr.so)
                #1  0x000014681530a01e server_connect (libMdsServerShr.so)
                #2  0x000014681530ba91 ServerConnect (libMdsServerShr.so)
                #3  0x000014681530bc57 ServerSendMessage (libMdsServerShr.so)
                #4  0x000014681530cdd7 ServerDispatchAction (libMdsServerShr.so)
                #5  0x0000146815307ec6 dispatch (libMdsServerShr.so)
                #6  0x000014681530834c ServerDispatchPhase (libMdsServerShr.so)
                #7  0x0000146814868521 TclDispatch_phase (libtcl_commands.so)
                #8  0x00001468150f1a9a dispatchToHandler (libMdsdcl.so)
                #9  0x00001468150f30b3 cmdExecute (libMdsdcl.so)
                #10 0x00001468150f6f04 mdsdcl_do_command_extra_args (libMdsdcl.so)
                #11 0x00001468150f2abb mdsdcl_do_command (libMdsdcl.so)
                #12 0x0000146815304449 DoSrvCommand (libMdsServerShr.so)
                #13 0x0000146815304b29 WorkerThread (libMdsServerShr.so)
                #14 0x000014681808b1ca start_thread (libpthread.so.0)
                #15 0x0000146817cf7e73 __clone (libc.so.6)

                Stack trace of thread 865034:
                #0  0x0000146817de4a3f __select (libc.so.6)
                #1  0x0000146816d8e877 listen_loop (libMdsIpTCP.so)
                #2  0x0000146816d8ecc6 io_listen (libMdsIpTCP.so)
                #3  0x00000000004008e8 main (mdsip)
                #4  0x0000146817cf8d85 __libc_start_main (libc.so.6)
                #5  0x000000000040097e _start (mdsip)

                Stack trace of thread 866830:
                #0  0x000014681809145c pthread_cond_wait@@GLIBC_2.3.2 (libpthread.so.0)
                #1  0x0000146815306f07 send_monitor_pop (libMdsServerShr.so)
                #2  0x00001468153076ab send_monitor_thread (libMdsServerShr.so)
                #3  0x000014681808b1ca start_thread (libpthread.so.0)
                #4  0x0000146817cf7e73 __clone (libc.so.6)

                Stack trace of thread 1046533:
                #0  0x0000146817de4a3f __select (libc.so.6)
                #1  0x0000146816d8e154 io_recv_to (libMdsIpTCP.so)
                #2  0x00001468186d9304 get_bytes_to (libMdsIpShr.so)
                #3  0x00001468186d93e2 get_bytes_to (libMdsIpShr.so)
                #4  0x00001468186e2eba ConnectionDoMessage (libMdsIpShr.so)
                #5  0x0000146816d8d492 client_thread (libMdsIpTCP.so)
                #6  0x000014681808b1ca start_thread (libpthread.so.0)
                #7  0x0000146817cf7e73 __clone (libc.so.6)

                Stack trace of thread 866831:
                #0  0x0000146817de4a3f __select (libc.so.6)
                #1  0x000014681530b42e receiver_thread (libMdsServerShr.so)
                #2  0x000014681808b1ca start_thread (libpthread.so.0)
                #3  0x0000146817cf7e73 __clone (libc.so.6)

GNU gdb (GDB) Red Hat Enterprise Linux 8.2-19.el8
Copyright (C) 2018 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from /usr/local/mdsplus/bin64/mdsip...done.
[New LWP 866623]
[New LWP 865034]
[New LWP 866830]
[New LWP 1046533]
[New LWP 866831]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/usr/lib64/libthread_db.so.1".
Core was generated by `/usr/local/mdsplus/bin/mdsip -s -p 8002 -h /etc/mdsip.hosts'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  do_login (c=0x1) at /git/mdsplus/mdstcpip/mdsipshr/ConnectToMds.c:78
78      /git/mdsplus/mdstcpip/mdsipshr/ConnectToMds.c: No such file or directory.
[Current thread is 1 (Thread 0x146814c71700 (LWP 866623))]
Missing separate debuginfos, use: yum debuginfo-install mdsplus-alpha-kernel_bin-7.139-59.el8.x86_64
(gdb) bt
#0  do_login (c=0x1) at /git/mdsplus/mdstcpip/mdsipshr/ConnectToMds.c:78
#1  ConnectToMds (hostin=hostin@entry=0x1468080fdea0 "atlas.gat.com:8003") at /git/mdsplus/mdstcpip/mdsipshr/ConnectToMds.c:161
#2  0x000014681530a01e in server_connect (server=server@entry=0x1468080fdea0 "atlas.gat.com:8003", addr=1755710912, port=<optimized out>)
    at /git/mdsplus/servershr/ServerSendMessage.c:529
#3  0x000014681530ba91 in ServerConnect (server_in=server_in@entry=0x146814c70530 "mdsloader") at /git/mdsplus/servershr/ServerSendMessage.c:557
#4  0x000014681530bc57 in ServerSendMessage (msgid=msgid@entry=0x0, server=server@entry=0x146814c70530 "mdsloader", op=op@entry=2,
    retstatus=retstatus@entry=0x146808999338, lock=lock@entry=0x146808999358, conid_out=conid_out@entry=0x14680899933c,
    callback_done=0x146815307b60 <action_done>, callback_param=0xa7, callback_before=0x146815306d50 <before>, numargs_in=3)
    at /git/mdsplus/servershr/ServerSendMessage.c:339
#5  0x000014681530cdd7 in ServerDispatchAction (id=id@entry=0x0, server=0x146814c70530 "mdsloader", tree=0x146808992fb4 "D3D", shot=<optimized out>,
    nid=<optimized out>, ast=ast@entry=0x146815307b60 <action_done>, astprm=0xa7, retstatus=0x146808999338, lock=0x146808999358,
    socket=0x14680899933c, before_ast=0x146815306d50 <before>) at /git/mdsplus/servershr/ServerDispatchAction.c:67
#6  0x0000146815307ec6 in dispatch (i=i@entry=167) at /git/mdsplus/servershr/ServerDispatchPhase.c:201
#7  0x000014681530834c in ServerDispatchPhase (id=id@entry=0x0, vtable=<optimized out>, phasenam=phasenam@entry=0x1468080b36b0 "TEST",
    noact_in=noact_in@entry=0 '\000', sync=sync@entry=1, output_rtn=output_rtn@entry=0x146814867740 <printIt>, monitor=0x0)
    at /git/mdsplus/servershr/ServerDispatchPhase.c:536
#8  0x0000146814868521 in TclDispatch_phase (ctx=<optimized out>, error=0x146814c70c28, output=<optimized out>) at /git/mdsplus/tcl/tcl_dispatch.c:411
#9  0x00001468150f1a9a in dispatchToHandler (getlineinfo=0x0, getline=0x0, output=0x146814c70c20, error=0x146814c70c28, prompt=<optimized out>,
    cmdDef=<optimized out>, cmd=0x1468080fdf60, image=<optimized out>) at /git/mdsplus/mdsdcl/cmdExecute.c:935
#10 processCommand (docList=docList@entry=0x146808011e50, verbNode_in=<optimized out>, cmd=cmd@entry=0x1468080065c0,
    cmdDef=cmdDef@entry=0x146808006610, prompt=prompt@entry=0x146814c70c10, error=error@entry=0x146814c70c28, output=0x146814c70c20, getline=0x0,
    getlineinfo=0x0) at /git/mdsplus/mdsdcl/cmdExecute.c:1165
#11 0x00001468150f30b3 in cmdExecute (cmd=0x1468080065c0, prompt_out=prompt_out@entry=0x0, error_out=error_out@entry=0x146814c70d70,
    output_out=output_out@entry=0x146814c70d78, getline=getline@entry=0x0, getlineinfo=getlineinfo@entry=0x0) at /git/mdsplus/mdsdcl/cmdExecute.c:1453
#12 0x00001468150f6f04 in mdsdcl_do_command_extra_args (command=0x146810008db0 "DISPATCH/PHASE TEST", prompt=prompt@entry=0x0,
    error=error@entry=0x146814c70d70, output=output@entry=0x146814c70d78, getline=getline@entry=0x0, getlineInfo=getlineInfo@entry=0x0)
    at mdsdcl/yylex/cmdParse.y:213
#13 0x00001468150f2abb in mdsdcl_do_command (command=<optimized out>) at /git/mdsplus/mdsdcl/cmdExecute.c:1350
#14 0x0000146815304449 in DoSrvCommand (job_in=job_in@entry=0x1468100430b0) at /git/mdsplus/servershr/ServerQAction.c:642
#15 0x0000146815304b29 in WorkerThread (arg=<optimized out>) at /git/mdsplus/servershr/ServerQAction.c:786
#16 0x000014681808b1ca in start_thread () from /usr/lib64/libpthread.so.0
#17 0x0000146817cf7e73 in clone () from /usr/lib64/libc.so.6
(gdb)

sflanagan · 2024-03-26T17:34:42Z

EDIT:

Note that as of the time the following lsof was pulled, those PIDs are for:

dispatcher: systemd service mds-8002 on pid 3730021
mdsloader: systemd service mds-8003 on pid 3730036

[root@atlas system]# ps -fp 3730021
UID          PID    PPID  C STIME TTY          TIME CMD
mdsadmin 3730021       1  0 10:19 ?        00:00:01 /usr/local/mdsplus/bin/mdsip -s -p 8002 -h /etc/mdsip.hosts -c 0

[root@atlas system]# ps -fp 3730036
UID          PID    PPID  C STIME TTY          TIME CMD
mdsadmin 3730036       1  0 10:19 ?        00:00:01 /usr/local/mdsplus/bin/mdsip -s -p 8003 -h /etc/mdsip.hosts -c 0

A current lsof -u mdsadmin | grep TCP:

[root@atlas ~]# lsof -u mdsadmin | grep TCP
mdsip   3730021 mdsadmin  mem    REG    9,127     80560 135299483 /usr/local/mdsplus/lib64/libMdsIpTCP.so
mdsip   3730021 mdsadmin    3u  IPv4 12579304       0t0       TCP *:8002 (LISTEN)
mdsip   3730021 mdsadmin   22u  IPv4 12298036       0t0       TCP *:8801 (LISTEN)
mdsip   3730021 mdsadmin   28u  IPv4 12428281       0t0       TCP atlas.gat.com:60656->atlas.gat.com:8003 (ESTABLISHED)
mdsip   3730021 mdsadmin  102u  IPv4 12298128       0t0       TCP atlas.gat.com:60622->atlas.gat.com:8003 (ESTABLISHED)
mdsip   3730021 mdsadmin  103u  IPv4 12298039       0t0       TCP atlas.gat.com:40034->atlas.gat.com:8003 (ESTABLISHED)
mdsip   3730021 mdsadmin  104u  IPv4 12410125       0t0       TCP atlas.gat.com:8801->atlas.gat.com:41562 (ESTABLISHED)
mdsip   3730021 mdsadmin  105u  IPv4 12298131       0t0       TCP atlas.gat.com:60636->atlas.gat.com:8003 (ESTABLISHED)
mdsip   3730021 mdsadmin  106u  IPv4 12648855       0t0       TCP atlas.gat.com:51018->atlas.gat.com:8003 (ESTABLISHED)
mdsip   3730021 mdsadmin  107u  IPv4 12648858       0t0       TCP atlas.gat.com:51026->atlas.gat.com:8003 (ESTABLISHED)
mdsip   3730021 mdsadmin  108u  IPv4 12298136       0t0       TCP atlas.gat.com:60650->atlas.gat.com:8003 (ESTABLISHED)
mdsip   3730021 mdsadmin  109u  IPv4 12298160       0t0       TCP atlas.gat.com:52952->atlas.gat.com:8003 (ESTABLISHED)
mdsip   3730021 mdsadmin  110u  IPv4 12298181       0t0       TCP atlas.gat.com:49554->atlas.gat.com:8003 (ESTABLISHED)
mdsip   3730021 mdsadmin  111u  IPv4 12420686       0t0       TCP atlas.gat.com:54658->atlas.gat.com:8003 (ESTABLISHED)
mdsip   3730021 mdsadmin  112u  IPv4 12420672       0t0       TCP atlas.gat.com:49558->atlas.gat.com:8003 (ESTABLISHED)
mdsip   3730021 mdsadmin  115u  IPv4 12420676       0t0       TCP atlas.gat.com:57096->atlas.gat.com:8003 (ESTABLISHED)
mdsip   3730021 mdsadmin  116u  IPv4 12298188       0t0       TCP atlas.gat.com:57098->atlas.gat.com:8003 (ESTABLISHED)
mdsip   3730021 mdsadmin  117u  IPv4 12420679       0t0       TCP atlas.gat.com:57114->atlas.gat.com:8003 (ESTABLISHED)
mdsip   3730021 mdsadmin  118u  IPv4 12420682       0t0       TCP atlas.gat.com:57122->atlas.gat.com:8003 (ESTABLISHED)
mdsip   3730021 mdsadmin  119u  IPv4 12298192       0t0       TCP atlas.gat.com:54668->atlas.gat.com:8003 (ESTABLISHED)
mdsip   3730021 mdsadmin  120u  IPv4 12420690       0t0       TCP atlas.gat.com:38610->atlas.gat.com:8003 (ESTABLISHED)
mdsip   3730021 mdsadmin  121u  IPv4 12298196       0t0       TCP atlas.gat.com:38614->atlas.gat.com:8003 (ESTABLISHED)
mdsip   3730021 mdsadmin  122u  IPv4 12427862       0t0       TCP atlas.gat.com:43624->atlas.gat.com:8003 (ESTABLISHED)
mdsip   3730021 mdsadmin  123u  IPv4 12427868       0t0       TCP atlas.gat.com:43626->atlas.gat.com:8003 (ESTABLISHED)
mdsip   3730021 mdsadmin  124u  IPv4 12427871       0t0       TCP atlas.gat.com:43630->atlas.gat.com:8003 (ESTABLISHED)
mdsip   3730021 mdsadmin  125u  IPv4 12427877       0t0       TCP atlas.gat.com:43644->atlas.gat.com:8003 (ESTABLISHED)
mdsip   3730021 mdsadmin  126u  IPv4 12427884       0t0       TCP atlas.gat.com:43650->atlas.gat.com:8003 (ESTABLISHED)
mdsip   3730021 mdsadmin  128u  IPv4 12427961       0t0       TCP atlas.gat.com:47100->atlas.gat.com:8003 (ESTABLISHED)
mdsip   3730021 mdsadmin  129u  IPv4 12298215       0t0       TCP atlas.gat.com:47072->atlas.gat.com:8003 (ESTABLISHED)
mdsip   3730021 mdsadmin  130u  IPv4 12427957       0t0       TCP atlas.gat.com:47092->atlas.gat.com:8003 (ESTABLISHED)
mdsip   3730021 mdsadmin  131u  IPv4 12427954       0t0       TCP atlas.gat.com:47086->atlas.gat.com:8003 (ESTABLISHED)
mdsip   3730021 mdsadmin  132u  IPv4 12428079       0t0       TCP atlas.gat.com:47680->atlas.gat.com:8003 (ESTABLISHED)
mdsip   3730021 mdsadmin  133u  IPv4 12428055       0t0       TCP atlas.gat.com:33592->atlas.gat.com:8003 (ESTABLISHED)
mdsip   3730021 mdsadmin  134u  sock      0,9       0t0  12428052 protocol: TCP
mdsip   3730021 mdsadmin  135u  IPv4 12428059       0t0       TCP atlas.gat.com:33608->atlas.gat.com:8003 (ESTABLISHED)
mdsip   3730021 mdsadmin  136u  IPv4 12428094       0t0       TCP atlas.gat.com:46216->atlas.gat.com:8003 (ESTABLISHED)
mdsip   3730021 mdsadmin  137u  IPv4 12428232       0t0       TCP atlas.gat.com:46218->atlas.gat.com:8003 (ESTABLISHED)
mdsip   3730021 mdsadmin  138u  IPv4 12428235       0t0       TCP atlas.gat.com:46228->atlas.gat.com:8003 (ESTABLISHED)
mdsip   3730021 mdsadmin  139u  IPv4 12428240       0t0       TCP atlas.gat.com:46234->atlas.gat.com:8003 (ESTABLISHED)
mdsip   3730021 mdsadmin  140u  IPv4 12648863       0t0       TCP atlas.gat.com:51030->atlas.gat.com:8003 (ESTABLISHED)
mdsip   3730021 mdsadmin  142u  IPv4 12428244       0t0       TCP atlas.gat.com:46250->atlas.gat.com:8003 (ESTABLISHED)
mdsip   3730021 mdsadmin  143u  IPv4 12428251       0t0       TCP atlas.gat.com:46266->atlas.gat.com:8003 (ESTABLISHED)
mdsip   3730021 mdsadmin  144u  IPv4 12428270       0t0       TCP atlas.gat.com:60642->atlas.gat.com:8003 (ESTABLISHED)
mdsip   3730021 mdsadmin  145u  IPv4 12728337       0t0       TCP atlas.gat.com:46278->atlas.gat.com:8003 (ESTABLISHED)
mdsip   3730021 mdsadmin  146u  IPv4 12728340       0t0       TCP atlas.gat.com:46282->atlas.gat.com:8003 (ESTABLISHED)
mdsip   3730021 mdsadmin  147u  IPv4 12728343       0t0       TCP atlas.gat.com:46292->atlas.gat.com:8003 (ESTABLISHED)
mdsip   3730021 mdsadmin  148u  IPv4 12428267       0t0       TCP atlas.gat.com:46306->atlas.gat.com:8003 (ESTABLISHED)
mdsip   3730021 mdsadmin  149u  IPv4 12428273       0t0       TCP atlas.gat.com:60654->atlas.gat.com:8003 (ESTABLISHED)
mdsip   3730021 mdsadmin  153u  IPv4 12740616       0t0       TCP atlas.gat.com:60674->atlas.gat.com:8003 (ESTABLISHED)
mdsip   3730021 mdsadmin  154u  IPv4 12428284       0t0       TCP atlas.gat.com:60666->atlas.gat.com:8003 (ESTABLISHED)
mdsip   3730021 mdsadmin  155u  IPv4 12428287       0t0       TCP atlas.gat.com:60670->atlas.gat.com:8003 (ESTABLISHED)
mdsip   3730021 mdsadmin  157u  IPv4 12728345       0t0       TCP atlas.gat.com:60684->atlas.gat.com:8003 (ESTABLISHED)
mdsip   3730021 mdsadmin  158u  IPv4 12715420       0t0       TCP atlas.gat.com:60694->atlas.gat.com:8003 (ESTABLISHED)
mdsip   3730021 mdsadmin  159u  IPv4 12715423       0t0       TCP atlas.gat.com:60710->atlas.gat.com:8003 (ESTABLISHED)
mdsip   3730021 mdsadmin  160u  IPv4 12715428       0t0       TCP atlas.gat.com:60720->atlas.gat.com:8003 (ESTABLISHED)
mdsip   3730021 mdsadmin  161u  IPv4 12715431       0t0       TCP atlas.gat.com:44726->atlas.gat.com:8003 (ESTABLISHED)
mdsip   3730021 mdsadmin  162u  IPv4 12715435       0t0       TCP atlas.gat.com:44736->atlas.gat.com:8003 (ESTABLISHED)
mdsip   3730021 mdsadmin  163u  IPv4 12401444       0t0       TCP atlas.gat.com:44744->atlas.gat.com:8003 (ESTABLISHED)
mdsip   3730021 mdsadmin  164u  IPv4 12401452       0t0       TCP atlas.gat.com:44762->atlas.gat.com:8003 (ESTABLISHED)
mdsip   3730021 mdsadmin  165u  IPv4 12401459       0t0       TCP atlas.gat.com:44772->atlas.gat.com:8003 (ESTABLISHED)
mdsip   3730021 mdsadmin  166u  IPv4 12401449       0t0       TCP atlas.gat.com:44760->atlas.gat.com:8003 (ESTABLISHED)
mdsip   3730021 mdsadmin  167u  IPv4 12401464       0t0       TCP atlas.gat.com:44788->atlas.gat.com:8003 (ESTABLISHED)
mdsip   3730021 mdsadmin  168u  IPv4 12401467       0t0       TCP atlas.gat.com:44804->atlas.gat.com:8003 (ESTABLISHED)
mdsip   3730021 mdsadmin  169u  IPv4 12715452       0t0       TCP atlas.gat.com:44806->atlas.gat.com:8003 (ESTABLISHED)
mdsip   3730021 mdsadmin  170u  IPv4 12715460       0t0       TCP atlas.gat.com:EtherNet/IP-2->atlas.gat.com:8003 (ESTABLISHED)
mdsip   3730021 mdsadmin  171u  IPv4 12715463       0t0       TCP atlas.gat.com:44832->atlas.gat.com:8003 (ESTABLISHED)
mdsip   3730021 mdsadmin  172u  IPv4 12401471       0t0       TCP atlas.gat.com:57696->atlas.gat.com:8003 (ESTABLISHED)
mdsip   3730021 mdsadmin  173u  IPv4 12401475       0t0       TCP atlas.gat.com:57708->atlas.gat.com:8003 (ESTABLISHED)
mdsip   3730021 mdsadmin  174u  IPv4 12401480       0t0       TCP atlas.gat.com:57712->atlas.gat.com:8003 (ESTABLISHED)
mdsip   3730021 mdsadmin  175u  IPv4 12401483       0t0       TCP atlas.gat.com:57720->atlas.gat.com:8003 (ESTABLISHED)
mdsip   3730021 mdsadmin  177u  IPv4 12401488       0t0       TCP atlas.gat.com:57734->atlas.gat.com:8003 (ESTABLISHED)
mdsip   3730021 mdsadmin  178u  IPv4 12401493       0t0       TCP atlas.gat.com:57750->atlas.gat.com:8003 (ESTABLISHED)
mdsip   3730021 mdsadmin  179u  IPv4 12401498       0t0       TCP atlas.gat.com:57752->atlas.gat.com:8003 (ESTABLISHED)
mdsip   3730021 mdsadmin  180u  IPv4 12401501       0t0       TCP atlas.gat.com:57760->atlas.gat.com:8003 (ESTABLISHED)
mdsip   3730021 mdsadmin  181u  IPv4 12401505       0t0       TCP atlas.gat.com:57768->atlas.gat.com:8003 (ESTABLISHED)
mdsip   3730021 mdsadmin  182u  IPv4 12401508       0t0       TCP atlas.gat.com:57782->atlas.gat.com:8003 (ESTABLISHED)
mdsip   3730021 mdsadmin  183u  IPv4 12715507       0t0       TCP atlas.gat.com:57784->atlas.gat.com:8003 (ESTABLISHED)
mdsip   3730021 mdsadmin  184u  IPv4 12715550       0t0       TCP atlas.gat.com:60928->atlas.gat.com:8003 (ESTABLISHED)
mdsip   3730021 mdsadmin  185u  IPv4 12715558       0t0       TCP atlas.gat.com:60946->atlas.gat.com:8003 (ESTABLISHED)
mdsip   3730021 mdsadmin  186u  IPv4 12715553       0t0       TCP atlas.gat.com:60932->atlas.gat.com:8003 (ESTABLISHED)
mdsip   3730021 mdsadmin  188u  IPv4 12715562       0t0       TCP atlas.gat.com:60948->atlas.gat.com:8003 (ESTABLISHED)
mdsip   3730021 mdsadmin  189u  IPv4 12715568       0t0       TCP atlas.gat.com:60964->atlas.gat.com:8003 (ESTABLISHED)
mdsip   3730021 mdsadmin  190u  IPv4 12715573       0t0       TCP atlas.gat.com:60966->atlas.gat.com:8003 (ESTABLISHED)
mdsip   3730021 mdsadmin  191u  IPv4 12401511       0t0       TCP atlas.gat.com:60980->atlas.gat.com:8003 (ESTABLISHED)
mdsip   3730021 mdsadmin  192u  IPv4 12401515       0t0       TCP atlas.gat.com:60994->atlas.gat.com:8003 (ESTABLISHED)
mdsip   3730021 mdsadmin  193u  IPv4 12401608       0t0       TCP atlas.gat.com:59376->atlas.gat.com:8003 (ESTABLISHED)
mdsip   3730021 mdsadmin  194u  IPv4 12401605       0t0       TCP atlas.gat.com:59374->atlas.gat.com:8003 (ESTABLISHED)
mdsip   3730026 mdsadmin  mem    REG    9,127     80560 135299483 /usr/local/mdsplus/lib64/libMdsIpTCP.so
mdsip   3730026 mdsadmin    3u  IPv4 12568210       0t0       TCP *:8012 (LISTEN)
mdsip   3730036 mdsadmin  mem    REG    9,127     80560 135299483 /usr/local/mdsplus/lib64/libMdsIpTCP.so
mdsip   3730036 mdsadmin    3u  IPv4 12568211       0t0       TCP *:8003 (LISTEN)
mdsip   3730036 mdsadmin    4u  IPv4 12609768       0t0       TCP atlas.gat.com:8003->atlas.gat.com:40034 (ESTABLISHED)
mdsip   3730036 mdsadmin   61u  IPv4 12628995       0t0       TCP atlas.gat.com:53526->atlas.gat.com:sunwebadmin (CLOSE_WAIT)
mdsip   3730036 mdsadmin   62u  IPv4 12628998       0t0       TCP atlas.gat.com:41562->atlas.gat.com:8801 (ESTABLISHED)
mdsip   3730036 mdsadmin   63u  IPv4 12609794       0t0       TCP atlas.gat.com:8003->atlas.gat.com:60622 (ESTABLISHED)
mdsip   3730036 mdsadmin   64u  IPv4 12609800       0t0       TCP atlas.gat.com:8003->atlas.gat.com:60650 (ESTABLISHED)
mdsip   3730036 mdsadmin   65u  IPv4 12230182       0t0       TCP atlas.gat.com:8003->atlas.gat.com:52952 (ESTABLISHED)
mdsip   3730036 mdsadmin   66u  IPv4 12230189       0t0       TCP atlas.gat.com:8003->atlas.gat.com:49554 (ESTABLISHED)
mdsip   3730036 mdsadmin   67u  IPv4 12230192       0t0       TCP atlas.gat.com:8003->atlas.gat.com:49558 (ESTABLISHED)
mdsip   3730036 mdsadmin   68u  IPv4 12230203       0t0       TCP atlas.gat.com:8003->atlas.gat.com:57096 (ESTABLISHED)
mdsip   3730036 mdsadmin   69u  IPv4 12230209       0t0       TCP atlas.gat.com:8003->atlas.gat.com:57114 (ESTABLISHED)
mdsip   3730036 mdsadmin   70u  IPv4 12230212       0t0       TCP atlas.gat.com:8003->atlas.gat.com:57122 (ESTABLISHED)
mdsip   3730036 mdsadmin   71u  IPv4 12230250       0t0       TCP atlas.gat.com:8003->atlas.gat.com:54658 (ESTABLISHED)
mdsip   3730036 mdsadmin   72u  IPv4 12230260       0t0       TCP atlas.gat.com:8003->atlas.gat.com:54668 (ESTABLISHED)
mdsip   3730036 mdsadmin   73u  IPv4 12230263       0t0       TCP atlas.gat.com:8003->atlas.gat.com:38610 (ESTABLISHED)
mdsip   3730036 mdsadmin   74u  IPv4 12230266       0t0       TCP atlas.gat.com:8003->atlas.gat.com:38614 (ESTABLISHED)
mdsip   3730036 mdsadmin   75u  IPv4 12715064       0t0       TCP atlas.gat.com:8003->atlas.gat.com:43624 (ESTABLISHED)
mdsip   3730036 mdsadmin   76u  IPv4 12715069       0t0       TCP atlas.gat.com:8003->atlas.gat.com:43626 (ESTABLISHED)
mdsip   3730036 mdsadmin   77u  IPv4 12715076       0t0       TCP atlas.gat.com:8003->atlas.gat.com:43644 (ESTABLISHED)
mdsip   3730036 mdsadmin   78u  IPv4 12715079       0t0       TCP atlas.gat.com:8003->atlas.gat.com:43650 (ESTABLISHED)
mdsip   3730036 mdsadmin   79u  IPv4 12230269       0t0       TCP atlas.gat.com:8003->atlas.gat.com:47072 (ESTABLISHED)
mdsip   3730036 mdsadmin   80u  IPv4 12715164       0t0       TCP atlas.gat.com:8003->atlas.gat.com:47086 (ESTABLISHED)
mdsip   3730036 mdsadmin   81u  IPv4 12715167       0t0       TCP atlas.gat.com:8003->atlas.gat.com:47092 (ESTABLISHED)
mdsip   3730036 mdsadmin   82u  IPv4 12715170       0t0       TCP atlas.gat.com:8003->atlas.gat.com:47100 (ESTABLISHED)
mdsip   3730036 mdsadmin   83u  IPv4 12582098       0t0       TCP atlas.gat.com:8003->atlas.gat.com:33592 (ESTABLISHED)
mdsip   3730036 mdsadmin   84u  IPv4 12582101       0t0       TCP atlas.gat.com:8003->atlas.gat.com:33608 (ESTABLISHED)
mdsip   3730036 mdsadmin   85u  IPv4 12582112       0t0       TCP atlas.gat.com:8003->atlas.gat.com:47680 (ESTABLISHED)
mdsip   3730036 mdsadmin   86u  IPv4 12582115       0t0       TCP atlas.gat.com:8003->atlas.gat.com:46216 (ESTABLISHED)
mdsip   3730036 mdsadmin   87u  IPv4 12582118       0t0       TCP atlas.gat.com:8003->atlas.gat.com:46218 (ESTABLISHED)
mdsip   3730036 mdsadmin   88u  IPv4 12582124       0t0       TCP atlas.gat.com:8003->atlas.gat.com:46234 (ESTABLISHED)
mdsip   3730036 mdsadmin   89u  IPv4 12582127       0t0       TCP atlas.gat.com:8003->atlas.gat.com:46250 (ESTABLISHED)
mdsip   3730036 mdsadmin   90u  IPv4 12582130       0t0       TCP atlas.gat.com:8003->atlas.gat.com:46266 (ESTABLISHED)
mdsip   3730036 mdsadmin   91u  IPv4 12625488       0t0       TCP atlas.gat.com:8003->atlas.gat.com:46278 (ESTABLISHED)
mdsip   3730036 mdsadmin   92u  IPv4 12582136       0t0       TCP atlas.gat.com:8003->atlas.gat.com:60642 (ESTABLISHED)
mdsip   3730036 mdsadmin   93u  IPv4 12582139       0t0       TCP atlas.gat.com:8003->atlas.gat.com:60654 (ESTABLISHED)
mdsip   3730036 mdsadmin   94u  IPv4 12582142       0t0       TCP atlas.gat.com:8003->atlas.gat.com:60656 (ESTABLISHED)
mdsip   3730036 mdsadmin   95u  IPv4 12582151       0t0       TCP atlas.gat.com:8003->atlas.gat.com:60674 (ESTABLISHED)
mdsip   3730036 mdsadmin   96u  IPv4 12582154       0t0       TCP atlas.gat.com:8003->atlas.gat.com:60694 (ESTABLISHED)
mdsip   3730036 mdsadmin   97u  IPv4 12582159       0t0       TCP atlas.gat.com:8003->atlas.gat.com:60710 (ESTABLISHED)
mdsip   3730036 mdsadmin   98u  IPv4 12582162       0t0       TCP atlas.gat.com:8003->atlas.gat.com:60720 (ESTABLISHED)
mdsip   3730036 mdsadmin   99u  IPv4 12582165       0t0       TCP atlas.gat.com:8003->atlas.gat.com:44726 (ESTABLISHED)
mdsip   3730036 mdsadmin  100u  IPv4 12582168       0t0       TCP atlas.gat.com:8003->atlas.gat.com:44736 (ESTABLISHED)
mdsip   3730036 mdsadmin  101u  IPv4 12625500       0t0       TCP atlas.gat.com:8003->atlas.gat.com:44744 (ESTABLISHED)
mdsip   3730036 mdsadmin  102u  IPv4 12625504       0t0       TCP atlas.gat.com:8003->atlas.gat.com:44760 (ESTABLISHED)
mdsip   3730036 mdsadmin  103u  IPv4 12625510       0t0       TCP atlas.gat.com:8003->atlas.gat.com:44772 (ESTABLISHED)
mdsip   3730036 mdsadmin  104u  IPv4 12625513       0t0       TCP atlas.gat.com:8003->atlas.gat.com:44788 (ESTABLISHED)
mdsip   3730036 mdsadmin  105u  IPv4 12625516       0t0       TCP atlas.gat.com:8003->atlas.gat.com:44804 (ESTABLISHED)
mdsip   3730036 mdsadmin  106u  IPv4 12582173       0t0       TCP atlas.gat.com:8003->atlas.gat.com:44806 (ESTABLISHED)
mdsip   3730036 mdsadmin  107u  IPv4 12582177       0t0       TCP atlas.gat.com:8003->atlas.gat.com:EtherNet/IP-2 (ESTABLISHED)
mdsip   3730036 mdsadmin  108u  IPv4 12625519       0t0       TCP atlas.gat.com:8003->atlas.gat.com:57696 (ESTABLISHED)
mdsip   3730036 mdsadmin  109u  IPv4 12625522       0t0       TCP atlas.gat.com:8003->atlas.gat.com:57708 (ESTABLISHED)
mdsip   3730036 mdsadmin  110u  IPv4 12625525       0t0       TCP atlas.gat.com:8003->atlas.gat.com:57712 (ESTABLISHED)
mdsip   3730036 mdsadmin  111u  IPv4 12625528       0t0       TCP atlas.gat.com:8003->atlas.gat.com:57720 (ESTABLISHED)
mdsip   3730036 mdsadmin  112u  IPv4 12625531       0t0       TCP atlas.gat.com:8003->atlas.gat.com:57734 (ESTABLISHED)
mdsip   3730036 mdsadmin  113u  IPv4 12625534       0t0       TCP atlas.gat.com:8003->atlas.gat.com:57750 (ESTABLISHED)
mdsip   3730036 mdsadmin  114u  IPv4 12625537       0t0       TCP atlas.gat.com:8003->atlas.gat.com:57752 (ESTABLISHED)
mdsip   3730036 mdsadmin  115u  IPv4 12625543       0t0       TCP atlas.gat.com:8003->atlas.gat.com:57768 (ESTABLISHED)
mdsip   3730036 mdsadmin  116u  IPv4 12625546       0t0       TCP atlas.gat.com:8003->atlas.gat.com:57782 (ESTABLISHED)
mdsip   3730036 mdsadmin  117u  IPv4 12514018       0t0       TCP atlas.gat.com:8003->atlas.gat.com:57784 (ESTABLISHED)
mdsip   3730036 mdsadmin  118u  IPv4 12514021       0t0       TCP atlas.gat.com:8003->atlas.gat.com:60928 (ESTABLISHED)
mdsip   3730036 mdsadmin  119u  IPv4 12514024       0t0       TCP atlas.gat.com:8003->atlas.gat.com:60932 (ESTABLISHED)
mdsip   3730036 mdsadmin  120u  IPv4 12514027       0t0       TCP atlas.gat.com:8003->atlas.gat.com:60946 (ESTABLISHED)
mdsip   3730036 mdsadmin  122u  IPv4 12514031       0t0       TCP atlas.gat.com:8003->atlas.gat.com:60948 (ESTABLISHED)
mdsip   3730036 mdsadmin  123u  IPv4 12609797       0t0       TCP atlas.gat.com:8003->atlas.gat.com:60636 (ESTABLISHED)
mdsip   3730036 mdsadmin  124u  IPv4 12514034       0t0       TCP atlas.gat.com:8003->atlas.gat.com:60964 (ESTABLISHED)
mdsip   3730036 mdsadmin  125u  IPv4 12514037       0t0       TCP atlas.gat.com:8003->atlas.gat.com:60966 (ESTABLISHED)
mdsip   3730036 mdsadmin  126u  IPv4 12625549       0t0       TCP atlas.gat.com:8003->atlas.gat.com:60980 (ESTABLISHED)
mdsip   3730036 mdsadmin  127u  IPv4 12625552       0t0       TCP atlas.gat.com:8003->atlas.gat.com:60994 (ESTABLISHED)
mdsip   3730036 mdsadmin  128u  IPv4 12625561       0t0       TCP atlas.gat.com:8003->atlas.gat.com:59374 (ESTABLISHED)
mdsip   3730036 mdsadmin  129u  IPv4 12230206       0t0       TCP atlas.gat.com:8003->atlas.gat.com:57098 (ESTABLISHED)
mdsip   3730036 mdsadmin  130u  IPv4 12625564       0t0       TCP atlas.gat.com:8003->atlas.gat.com:59376 (ESTABLISHED)
mdsip   3730036 mdsadmin  131u  IPv4 12401620       0t0       TCP atlas.gat.com:8003->atlas.gat.com:51018 (ESTABLISHED)
mdsip   3730036 mdsadmin  132u  IPv4 12401626       0t0       TCP atlas.gat.com:8003->atlas.gat.com:51030 (ESTABLISHED)
mdsip   3730036 mdsadmin  138u  IPv4 12715072       0t0       TCP atlas.gat.com:8003->atlas.gat.com:43630 (ESTABLISHED)
mdsip   3730036 mdsadmin  150u  IPv4 12582121       0t0       TCP atlas.gat.com:8003->atlas.gat.com:46228 (ESTABLISHED)
mdsip   3730036 mdsadmin  153u  IPv4 12582145       0t0       TCP atlas.gat.com:8003->atlas.gat.com:60666 (ESTABLISHED)
mdsip   3730036 mdsadmin  155u  IPv4 12625491       0t0       TCP atlas.gat.com:8003->atlas.gat.com:46282 (ESTABLISHED)
mdsip   3730036 mdsadmin  156u  IPv4 12625494       0t0       TCP atlas.gat.com:8003->atlas.gat.com:46292 (ESTABLISHED)
mdsip   3730036 mdsadmin  157u  IPv4 12582133       0t0       TCP atlas.gat.com:8003->atlas.gat.com:46306 (ESTABLISHED)
mdsip   3730036 mdsadmin  160u  IPv4 12582148       0t0       TCP atlas.gat.com:8003->atlas.gat.com:60670 (ESTABLISHED)
mdsip   3730036 mdsadmin  164u  IPv4 12625497       0t0       TCP atlas.gat.com:8003->atlas.gat.com:60684 (ESTABLISHED)
mdsip   3730036 mdsadmin  172u  IPv4 12625507       0t0       TCP atlas.gat.com:8003->atlas.gat.com:44762 (ESTABLISHED)
mdsip   3730036 mdsadmin  178u  IPv4 12582180       0t0       TCP atlas.gat.com:8003->atlas.gat.com:44832 (ESTABLISHED)
mdsip   3730036 mdsadmin  186u  IPv4 12625540       0t0       TCP atlas.gat.com:8003->atlas.gat.com:57760 (ESTABLISHED)
mdsip   3730036 mdsadmin  201u  IPv4 12401623       0t0       TCP atlas.gat.com:8003->atlas.gat.com:51026 (ESTABLISHED)
mdsip   3767135 mdsadmin  mem    REG    9,127     80560 135299483 /usr/local/mdsplus/lib64/libMdsIpTCP.so
mdsip   3767135 mdsadmin    0u  IPv4 12230301       0t0       TCP atlas.gat.com:mdsip->iris29.gat.com:59836 (ESTABLISHED)
mdsip   3771645 mdsadmin  mem    REG    9,127     80560 135299483 /usr/local/mdsplus/lib64/libMdsIpTCP.so
mdsip   3771645 mdsadmin    0u  IPv4 12689154       0t0       TCP atlas.gat.com:mdsip->iris29.gat.com:60068 (ESTABLISHED)

sflanagan · 2024-03-26T17:53:09Z

Systemd contents:

mds-8002

[Unit]
Description=MDSplus Dispatcher Service
After=network.target

[Service]
User=mdsadmin
Type=simple
LimitNOFILE=64000
LimitCORE=infinity
ExecStart=/usr/local/mdsplus/bin/mdsipd-dispatching 8002 -block

[Install]
WantedBy=multi-user.target

mds-8003:

[Unit]
Description=MDSplus Loader Service
After=network.target

[Service]
Type=simple
LimitNOFILE=64000
LimitCORE=infinity
ExecStart=/usr/local/mdsplus/bin/mdsipd-dispatching 8003 -block

[Install]
WantedBy=multi-user.target

OS (atlas.gat.com):

[root@atlas system]# cat /etc/*-release
NAME="Red Hat Enterprise Linux"
VERSION="8.8 (Ootpa)"
ID="rhel"
ID_LIKE="fedora"
VERSION_ID="8.8"
PLATFORM_ID="platform:el8"
PRETTY_NAME="Red Hat Enterprise Linux 8.8 (Ootpa)"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:redhat:enterprise_linux:8::baseos"
HOME_URL="https://www.redhat.com/"
DOCUMENTATION_URL="https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/8"
BUG_REPORT_URL="https://bugzilla.redhat.com/"

REDHAT_BUGZILLA_PRODUCT="Red Hat Enterprise Linux 8"
REDHAT_BUGZILLA_PRODUCT_VERSION=8.8
REDHAT_SUPPORT_PRODUCT="Red Hat Enterprise Linux"
REDHAT_SUPPORT_PRODUCT_VERSION="8.8"
Red Hat Enterprise Linux release 8.8 (Ootpa)
Red Hat Enterprise Linux release 8.8 (Ootpa)

MDSplus Version:

[root@atlas system]# rpm -qa | grep mds
mdsplus-alpha-mssql-7.139-59.el8.x86_64
mdsplus-alpha-idl_bin-7.139-59.el8.x86_64
mdsplus-alpha-kernel_bin-7.139-59.el8.x86_64
mdsplus-alpha-d3d-7.139-59.el8.noarch
mdsplus-alpha-kernel-7.139-59.el8.noarch
mdsplus-alpha-motif-7.139-59.el8.noarch
mdsplus-alpha-repo-7.139-40.el8.noarch
mdsplus-alpha-motif_bin-7.139-59.el8.x86_64
mdsplus-alpha-python-7.139-59.el8.noarch
mdsplus-alpha-idl-7.139-59.el8.noarch

To explain mdsipd-dispatching:

#!/bin/sh -f
#
# MDSplus server daemon procedure
#
# Parameters:
#
#       $1 = service or portnumber

if [ $# -lt 1  ]
then
  echo ""
  echo "Correct usage is $0 portnumber [-block]"
  echo example: $0 8001
  echo ""
  exit
fi

MDSPLUS_DIR=/usr/local/mdsplus

# Note that we source the setup-server.sh file because this is to be a server
# process, not a regular data server process.
. $MDSPLUS_DIR/setup-server.sh

logfile=/var/log/mdsplus/mdsip-server/$1.log
echo "" >> $logfile 2>> $logfile
echo "Telling any existing server on this port to shut down..." >> $logfile 2>> $logfile
/usr/local/mdsplus/bin/mdstcl stop server localhost:$1 >> $logfile 2>> $logfile
echo "" >> $logfile 2>> $logfile
echo "############# END OF OLD LOGFILE #############" >> $logfile 2>> $logfile
echo "" >> $logfile 2>> $logfile
echo "############ START OF NEW LOGFILE ############" >> $logfile 2>> $logfile
echo "Starting mdsip -s on port "$1" at "`date` >> $logfile  2>> $logfile
echo "--------------------" >> $logfile  2>> $logfile
echo "Showing ulimit -a" >> $logfile  2>> $logfile
ulimit -a >> $logfile  2>> $logfile
echo "" >> $logfile  2>> $logfile
echo "--------------------" >> $logfile  2>> $logfile
echo "" >> $logfile  2>> $logfile

# if -block is specified, then don't run in the background
if [ $# -gt 1 ]
then
  if [ $2 = "-block" ]
  then
    exec $MDSPLUS_DIR/bin/mdsip -s -p $1 -h /etc/mdsip.hosts -c 0 >> $logfile 2>> $logfile
    exit 0
  fi
fi
exec $MDSPLUS_DIR/bin/mdsip -s -p $1 -h /etc/mdsip.hosts -c 0 >> $logfile 2>> $logfile &

Please note that the error mentioned occurs after ~1000 actions nodes triggered.

This happens whether it its 1k actions in a single shot, or across multiple shots.

joshStillerman · 2024-03-26T17:54:55Z

Thanks for all the info.
A quick workaround:
add
Restart=always
to you service files
after the last dispatch /phase add exit

mwinkel-dev · 2024-03-26T19:41:40Z

Hi @sflanagan -- Josh has succeeded in reproducing the issue with the current alpha on Ubuntu. We are now debugging the problem. And will post additional updates here later this afternoon.

mwinkel-dev · 2024-03-26T22:05:06Z

Took some time to configure my dev environment, but I am now also able to reproduce the bug. So will attach the debugger and figure out how the code is leaving dangling sockets.

mwinkel-dev · 2024-03-27T06:59:12Z

Progress has been made on debugging the issue, but still more to do to locate the root cause of the dangling sockets. Have provided my initial findings to Josh and Stephen for review and suggestions.

smithsp · 2024-03-27T17:08:59Z

@mwinkel-dev Glad you were able to reproduce. Did the workaround proposed by @joshStillerman work?

mwinkel-dev · 2024-03-27T21:06:08Z

Hi @smithsp and @sflanagan -- I will test Josh's workaround right now. I'm sure it will work.

I have been manually killing the mdsip process between experiments and that of course cleans up the dangling sockets. Nonetheless, you have a very good point that I should also automate that by using Josh's workaround (which kills the mdsip at the end of the shot and relies on systemd to restart the process).

Current investigation shows that the dangling sockets gradually grow until the process that launches the actions (such as TCL exits). When the process exits, the sockets are cleaned up. After I test Josh's workaround, I will resume looking for a fix that kills the sockets when each action thread finishes, instead of waiting until the entire process exits.

mwinkel-dev · 2024-03-27T23:20:51Z

Hi @smithsp and @sflanagan -- I have tested Josh's recommended workaround and it is A-OK.

Here are the details of the experiment . . .

There are two action nodes in a dinky tree named zzz. Each action node just sleeps for 10 seconds.
The mdsip access to the zzz_<shot> trees is on the usual port 8000.
The mdsip process for actions (aka action server) is on port 8100 and configured as a systemd service.
The systemd service file is similar to GA's, but also has Restart=always in the [Service] section.
Then these TCL commands are executed (as per Josh's reproduce case).

TCL> set tree zzz
TCL> create pulse 1
TCL> set tree zzz /shot=1
TCL> dispatch /build
TCL> dispatch/phase/log init

After ~20 seconds, control returns to the TCL> prompt.
In another terminal window, run lsof and grep for the action server on port 8100. Note the dangling TCP sockets. (They do get cleaned up if exit TCL. However for this example we wish to remain in TCL.)

mdsip     3899986                            mdsplus  mem       REG                8,5    102144   31209609 /usr/local/xmw_mds/ga_2731_v4/lib/libMdsIpTCP.so
mdsip     3899986                            mdsplus    3u     IPv4           30142804       0t0        TCP *:8100/tcp (LISTEN)
mdsip     3899986                            mdsplus    6u     IPv4           30155996       0t0        TCP localhost:8100/tcp->localhost:51822 (ESTABLISHED)
mdsip     3899986                            mdsplus    7u     IPv4           30144205       0t0        TCP localhost:57616->localhost:8800 (ESTABLISHED)
mdsip     3899986                            mdsplus    9u     IPv4           30156000       0t0        TCP localhost:8100/tcp->localhost:51826 (ESTABLISHED)
mdsip     3899986 3900283 mdsip              mdsplus  mem       REG                8,5    102144   31209609 /usr/local/xmw_mds/ga_2731_v4/lib/libMdsIpTCP.so
mdsip     3899986 3900283 mdsip              mdsplus    3u     IPv4           30142804       0t0        TCP *:8100/tcp (LISTEN)
mdsip     3899986 3900283 mdsip              mdsplus    6u     IPv4           30155996       0t0        TCP localhost:8100/tcp->localhost:51822 (ESTABLISHED)
mdsip     3899986 3900283 mdsip              mdsplus    7u     IPv4           30144205       0t0        TCP localhost:57616->localhost:8800 (ESTABLISHED)
mdsip     3899986 3900283 mdsip              mdsplus    9u     IPv4           30156000       0t0        TCP localhost:8100/tcp->localhost:51826 (ESTABLISHED)
mdsip     3899986 3900284 mdsip              mdsplus  mem       REG                8,5    102144   31209609 /usr/local/xmw_mds/ga_2731_v4/lib/libMdsIpTCP.so
mdsip     3899986 3900284 mdsip              mdsplus    3u     IPv4           30142804       0t0        TCP *:8100/tcp (LISTEN)
mdsip     3899986 3900284 mdsip              mdsplus    6u     IPv4           30155996       0t0        TCP localhost:8100/tcp->localhost:51822 (ESTABLISHED)
mdsip     3899986 3900284 mdsip              mdsplus    7u     IPv4           30144205       0t0        TCP localhost:57616->localhost:8800 (ESTABLISHED)
mdsip     3899986 3900284 mdsip              mdsplus    9u     IPv4           30156000       0t0        TCP localhost:8100/tcp->localhost:51826 (ESTABLISHED)
mdsip     3899986 3900285 mdsip              mdsplus  mem       REG                8,5    102144   31209609 /usr/local/xmw_mds/ga_2731_v4/lib/libMdsIpTCP.so
mdsip     3899986 3900285 mdsip              mdsplus    3u     IPv4           30142804       0t0        TCP *:8100/tcp (LISTEN)
mdsip     3899986 3900285 mdsip              mdsplus    6u     IPv4           30155996       0t0        TCP localhost:8100/tcp->localhost:51822 (ESTABLISHED)
mdsip     3899986 3900285 mdsip              mdsplus    7u     IPv4           30144205       0t0        TCP localhost:57616->localhost:8800 (ESTABLISHED)
mdsip     3899986 3900285 mdsip              mdsplus    9u     IPv4           30156000       0t0        TCP localhost:8100/tcp->localhost:51826 (ESTABLISHED)

Then at the TCL> prompt, type this: TCL> stop server localhost:8100
That kills the current action server.
Whereupon systemd automatically restarts the service. It remains on port 8100, but of course has a new process ID (i.e., compare the following process ID with the one above).

root@mfews-mwinkel:/home/mwinkel/x_markw/config# ps -ef | fgrep mdsip
mdsplus  3900338       1  0 18:57 ?        00:00:00 /usr/local/xmw_mds/mdsplus/bin/mdsip -s -p 8100 -h /tmp/xmw_ga_2731/mdsip.hosts -c 9

And as expected, lsof shows that the new process has just a single TCP socket listening on port 8100.

root@mfews-mwinkel:/home/mwinkel/x_markw/config# lsof | fgrep 3900338 | fgrep TCP
mdsip     3900338                            mdsplus  mem       REG                8,5    102144   31209609 /usr/local/xmw_mds/ga_2731_v4/lib/libMdsIpTCP.so
mdsip     3900338                            mdsplus    3u     IPv4           30160993       0t0        TCP *:8100/tcp (LISTEN)

While still at the TCL> prompt, run a new shot (i.e., repeat all the steps listed above). And observe the same behavior.

In short, Josh's workaround works fine on my Ubuntu 20 development system.

Let us know if you have any problems with this workaround on GA's RHEL8 server.

mwinkel-dev · 2024-03-29T23:40:24Z

Hi @smithsp and @sflanagan -- Root cause of the dangling sockets in mdstcl has been found. When "actions" complete, there is a coordination problem between two threads. Conceptually, one thread attempts to delete the associated socket, but can't because the other thread still has a reference to it. (That is a generalization; the details are a bit different.)

It will take us a few days to fix the issue (i.e., iterate on the design, manual testing, and peer review).

While investigating this issue, I have also been using the systemd workaround (so I have a clean slate for each experiment). The workaround is working fine on my development system.

Let me know if you have any problems with the systemd workaround on GA's server.

Addendum
This post (and many of the following posts) were initial conjectures and thus wrong. For the final summary of the investigation, refer to the post at this link.
#2731 (comment)

mwinkel-dev · 2024-04-01T15:22:30Z

Hi @sflanagan -- How many "actions" are issued when running a single DIII-D shot? And how many "action servers" are involved? And how many shots per day?

Would like that information so that I can design a test case that is many times more load on the "action server" than the DIII-D load.

sflanagan · 2024-04-01T21:37:31Z

Assuming, as an upper bound, that everything is enabled and ops is having a long and/or productive run day...

How many "actions" are issued when running a single DIII-D shot?

~300

And how many "action servers" are involved?

1

And how many shots per day?

~50

sflanagan · 2024-04-01T22:08:13Z

Let me know if you have any problems with the systemd workaround on GA's server.

I redesigned my mds-8002 systemd file to add:

User=mdsadmin
Restart=always
ExecStartPost=/usr/local/mdsplus/local/mds-8002-post.sh

Where the mds-8002-post.sh just automatically sets up the dispatch table, shot number(s), opens the relevant shot tree(s), etc... and then triggers a "test" action node.

Seems "fine" as a temporary workaround, but it does leave me open to skipping/missing a shot if the dispatcher service is restarting while the DAQ system is trying to start a new shot cycle. Trying to reduce that risk is next up on my agenda (if I can).

mwinkel-dev · 2024-04-02T15:59:50Z

Hi @sflanagan -- Thanks for the information about the number of actions in a DIII-D shot. And for the news regarding the systemd workaround.

mwinkel-dev · 2024-04-03T04:57:28Z

Now have a prototype fix. Next step is to do some manual stress testing, followed by peer review. Will take a few more days before there is a final fix that will be merged to alpha and cherry-picked to the GA branch.

mwinkel-dev · 2024-04-04T18:26:16Z

Hi @smithsp and @sflanagan,

Summary
This post is a summary of recent emails with @smithsp, plus provides additional details.

This problem was introduced with PR #2288 on 19-May-2021. The bug appears to only affect the dispatch command of TCL. Thus, it is likely that the associated fix is low risk. (Testing is presently underway to confirm that is indeed true.)

Details
When mdstcl (which is a client) dispatches an action, it also creates a "receiver" thread to receive the eventual status reply regarding that action. (The action is sent to an "action server" that then processes the action and sends a reply back to the mdstcl client.) For each action that is dispatched, mdstcl thus opens up two sockets: one is the "to" socket used to send the action to the action server, and the other is the "from" socket used to receive the reply from the action server.

The issue is that the "connection ID table" in the mdstcl client should be accessible by all threads, but unfortunately isn't. Thus, when the receiver thread receives the SrvJobFINISHED reply, it can close its "from" socket, but is unable to access the main thread's "connection ID table" in order to close the "to" socket. Thus, for each action dispatched, the current code leaves a dangling "to" socket.

The two workarounds clean up the dangling "to" sockets. Adding an "exit" statement at the end of a TCL script that dispatches actions terminates the TCL process thus all of its sockets are closed. Alternatively, using the stop server command of TCL will cause the action server to reboot (if systemd or xinetd is configured correctly) which also closes all dangling sockets.

The proposed fix does not alter any logic regarding sockets and connections. All that it does is allow the "receiver" thread to access the "connection ID table" so that the "to" socket can be closed. Preliminary testing confirms that the fix works and that mdstcl is no longer leaking a "to" socket for each dispatched action. Thus, the workarounds are no longer needed.

Low Risk?
Whenever fiddling with multi-threaded code, there is always a risk of race conditions being created. Hence the focus on extensive testing of the fix prior to release to GA. The goal of the testing is to confirm that all critical sections are protected with locks. Initial testing is promising; no threading issues have been encountered.

And because the problem appears to be limited to the dispatch command of mdstcl, that also reduces the risk associated with the fix. It is unlikely to affect other aspects of the MDSplus suite of tools.

Note also that this bug has been present in the alpha branch for ~3 years, and that no other customers have reported problems. That tends to confirm that the problem doesn't affect the entire socket communication foundation of MDSplus. And likely also means that the fix will have lower risks than expected. (Regardless, extensive testing is needed to confirm that is so.)

Clarification
This issue affects all commands sent to an "action server". That includes the following activities (i.e., the list gives descriptions; the actual TCL statements are likely different).

abort server
close trees
create pulse
dispatch action
dispatch command
monitor check-in
monitor
logging
stop server

mwinkel-dev · 2024-04-04T18:36:45Z

Hi @sflanagan -- Questions about your site's single "action server" . . .

What version of MDSplus does it run?
What Operating System and version?
Does the "action server" MDSplus/OS version match that of the "dispatch server" (apparently Atlas)?
Is it a dedicated "action server" (only processes) actions? (Am curious if it is used for other MDSplus activities such as data analysis and/or to run non-MDSplus applications.)

Answers to the above questions will enable us to set up a cross-version testing environment, if needed.

mwinkel-dev · 2024-04-05T05:01:35Z

A stress test of 1,200 actions (400 actions per shot, 3 shots) reveals that there is an unprotected critical section that can cause a segfault. The segfault also occurs in the alpha branch thus was not introduced by the proposed fix.

Adding a delay of 0.01 seconds to each action eliminates the segfault, but causes the proposed fix to leave ~10 dangling sockets (i.e., it closes the other 1,190 "to" sockets). Additional investigation is required to determine if the dangling sockets are caused by a second unprotected critical section. And if so, whether it was introduced with the proposed fix.

Increasing the delay to 0.1 seconds allows the proposed fix to work fine -- no dangling sockets.

zack-vii · 2024-04-09T15:57:19Z

This problem was introduced with PR #2288 on 19-May-2021. The bug appears to only affect the dispatch command of TCL. Thus, it is likely that the associated fix is low risk. (Testing is presently underway to confirm that is indeed true.)

I slowly remember the struggle we had sorting out the existing problem at the time while keeping backward compatibility. The issue was that the old protocol required a reconnection on an offered port to send the result of the action.

the original goals that i remember were to support:

cancelation of dispatched actions by event (e.g. user intervention, Ctrl+C)
timeout
parallel execution of actions

mwinkel-dev · 2024-04-09T16:11:35Z

Hi @zack-vii -- Thanks for the history of the change. Much appreciated!

I will expand my experiments to see if my proposed fix for the leaking sockets breaks any of the features you listed above.

mwinkel-dev · 2024-04-09T18:24:49Z

When configure mdstcl to always dispatch N actions more than have finished, some actions will fail. If hundreds of actions are dispatched, eventually there will be N failed actions. The action server never sends a "done" message back to mdstcl for those failed actions, which therefore remain connected. Thus causing N leaked sockets.

The actions are failing because their associated "job callback" functions are misbehaving. Now investigating if the "job callback" problem is a multi-threading issue.

mwinkel-dev · 2024-04-10T22:56:04Z

Hi @smithsp and @sflanagan,

I am now compiling and testing the proposed fix (PR #2740) that was created by @zack-vii. Results of the test will be posted in that PR and summarized in this issue.

Note that @zack-vii was the author of PRs #2288 and #2289 that involved the dispatch feature of mdstcl. His contributions are thus welcomed and appreciated.

mwinkel-dev · 2024-04-11T02:28:16Z

PR #2740 by @zack-vii is a server-side fix and thus more elegant than the client-side (i.e., mdstcl) fix that I created. We should use PR #2740 as the foundation for the fix of this issue.

However, there is an edge case in mdstcl associated with "failed actions" that leaks sockets even with PR #2740 present. The edge case likely will require a client-side fix. And will probably be a separate PR.

The complete fix of this issue will likely consist of two or three PRs.

smithsp · 2024-04-12T22:03:48Z

@mwinkel-dev @zack-vii
Thanks for your continuing efforts and attention to testing in developing a fix.

smithsp · 2024-04-15T16:30:22Z

Note that the workaround provided of letting the system restart is not robust. We look forward to a fix as soon as possible.

mwinkel-dev · 2024-04-16T00:17:57Z

Hi @smithsp -- Thank you for the update. I will put that news on the agenda for Tuesday's meeting of the MDSplus software team.

mwinkel-dev · 2024-04-16T14:10:09Z

Hi @smithsp and @sflanagan,

Although the temporary workarounds are not an adequate solution for GA, nonetheless it would be helpful to for us to learn what aspect of the workaround is failing for GA. (That information might help us create additional test cases for the real fix.)

On 1-Apr-2024, @sflanagan posted that the temporary fix does mean that a shot could be missed (which of course is a serious problem). However, are there now additional problems with the temporary workarounds?
#2731 (comment)

Also, which of these workarounds has GA tried?

adding an exit statement at the end of all TCL scripts, or
adding a stop server command at the end of all TCL scripts, or
running a shell script between shots that executes the following systemd commands:

systemctl stop action_server
systemctl start action_server

where "action_server" is whatever the name is of the systemd service that runs your "action server"

or running a shell script between shots that executes kill -9 action_server

mwinkel-dev · 2024-04-16T22:31:56Z

As per the request submitted via PR #2740, that partial fix will be cherry-picked into the GA branch. After the associated RPM packages are distributed to GA, work will resume on fixing the edge cases associated with this Issue #2731.

mwinkel-dev · 2024-04-17T13:09:34Z

Hi @smithsp and @sflanagan,

It is my understanding that the GA branch should be based on alpha-7.139.59 (based on the following post from Mitchell). If that is incorrect, let us know.
#2704 (comment)

smithsp · 2024-04-17T23:41:41Z

@mwinkel-dev The version is in #2731 (comment) (repeated here):

[root@atlas system]# rpm -qa | grep mds
mdsplus-alpha-mssql-7.139-59.el8.x86_64
mdsplus-alpha-idl_bin-7.139-59.el8.x86_64
mdsplus-alpha-kernel_bin-7.139-59.el8.x86_64
mdsplus-alpha-d3d-7.139-59.el8.noarch
mdsplus-alpha-kernel-7.139-59.el8.noarch
mdsplus-alpha-motif-7.139-59.el8.noarch
mdsplus-alpha-repo-7.139-40.el8.noarch
mdsplus-alpha-motif_bin-7.139-59.el8.x86_64
mdsplus-alpha-python-7.139-59.el8.noarch
mdsplus-alpha-idl-7.139-59.el8.noarch

mwinkel-dev · 2024-04-18T13:03:27Z

Hi @smithsp,

Apparently, I need to drink more coffee and wake up. Which is to say that I'm amused to learn from your post that the version information was present if I had just scrolled up to the top of this GitHub issue.

Regardless, thanks for confirming the version again. Reason I was double checking was to make sure we create the GA branch from the correct commit.

Today, we will be cherry-picking PR #2735 and PR #2740 into the branch. Here are the tasks:

creating the GA branch
cherry picking the two PRs
manually building the branch (including running the automated tests)
testing the RPMs on a RHEL8 system to ensure not leaking sockets
notifying GA when the release is ready
note: so as not to add additional delay to the release, we will not be adding the GA branch to the Jenkins build system today

If all goes according to plan, the above tasks will all be completed this afternoon or evening.

When the release is ready, GA will have two options:

I can provide a *.tgz file of the RPMs as promised, and/or
GA can build the branch and create its own RPMs.

mwinkel-dev · 2024-04-19T01:25:14Z

Trial run of GA branch / build is underway. Now have RPMs on a Rocky8 system for testing on Friday. Some minor build issues still to be addressed (major-minor-release labelling scheme for the branch, and so forth).

mwinkel-dev · 2024-04-19T14:21:56Z

Hi @smithsp and @sflanagan,

What "tagging" scheme would you like us to use for the GA branch (i.e., values for "branch-major-minor-release")?

GA release zero is identical to alpha_7.139.59. This commit does not presently have a GA "tag".

What should the next release of the GA branch be called? (This will be the release with the cherry-picks of PR #2735 and PR #2740.)

smithsp · 2024-04-19T14:34:29Z

I propose the next release be labeled with atlas. It could be atlas_1.0.0. (Or this could be the source for the new stable set of labels, since we think it is the most recent stable release.)

mwinkel-dev · 2024-04-19T14:56:19Z

Hi @smithsp and @sflanagan,

Thanks for the answer. We will start with atlas_1.0.0. However, that label can be changed later if GA so decides.

mwinkel-dev · 2024-04-19T17:00:39Z

Just dispatched 2,000 actions on Rocky Linux 8.9 without leaking any sockets.

Details of the trial branch build and test . . .

Manually ran the Docker build for RHEL8 which produced unsigned RPMs for the trial branch (= alpha_7.139.59 plus cherry-picks of PR 2735 and PR 2740).
Installed the RPMs on Rocky Linux 8.9 (with gpgcheck=0 because RPMs were unsigned).
Dispatched the 2,000 actions to the "action service".
Used lsof to confirm that no sockets leaked.

mwinkel-dev · 2024-04-19T19:45:59Z

Many of the preceding posts by @mwinkel-dev were initial conjectures and thus incorrect. Now that the investigation has been completed, here is a summary of the root cause.

It is expensive to create connections, and thus the architecture of mdstcl and the "action service" is to reuse connections whenever possible. The following explanation uses the simplest configuration: mdstcl, one "action service" and a "mdsip service" (for tree access).

In normal usage, mdstcl has one connection open to the mdsip service so that it can obtain all action nodes from the tree. It then builds a table of actions to be dispatched. So, mdstcl also opens another connection to the "action service", and dispatches all actions over that connection. The main thread of mdstcl dispatches the actions. As actions execute, the "action server" sends replies back to mdstcl on a different connection. There are not multiple "receiver" threads; mdstcl just has a single thread listening for replies. After a connection is made, it is reused over and over. (Of course if a connection dies because of a network glitch, then mdstcl will create a replacement connection.)

And this is how the software changed over the years . . .

Release stable_7.96.9 is what GA was running until November 2023. It reused the connection to dispatch actions so did not leak sockets.

PR #2288 refactored the software and inadvertently omitted reuse of the dispatch connection. It created a new connection for each action dispatched, thus leaked sockets.

The proposed fix created by @mwinkel-dev incorrectly assumed that the architecture was indeed supposed to create a new connection per action. And also incorrectly assumed that there was a receiver thread created for each action. (Maintenance developers rely on comments in the source code to guide them; when those comments are missing it is easy to make bad assumptions.). This proposed fix did solve the leaking socket issue, but had other problems (such as deadlock).

PR #2740 corrects the omission of PR #2288 and reuses the dispatch connection. Thus eliminating the leaking sockets. Although some code cleanup was done on the "action service", the primary fix is actually in the client-side code.

mwinkel-dev · 2024-04-23T11:45:40Z

Hi @smithsp,

PR #2746 addresses some problems that were spotted while investigating this Issue #2731. Let us know if you also want PR #2746 cherry-picked to GA's branch.

smithsp · 2024-04-23T15:55:25Z

@mwinkel-dev
If the fixes of #2746 are needed to address this issue, or would be encountered during our testing, then please include them; however, we were hoping to have the tarball very soon, and wouldn't want to delay that for #2746, unless you think it is part of the overall fix that should be included.

mwinkel-dev · 2024-04-24T02:08:39Z

Hi @smithsp,

Thanks for the decision. Therefore, we will include PR #2746 only if two conditions are met:

it adds no delay to the release of the RPM tarball of the GA branch, and
the peer reviewers consider it useful to have it in GA's branch.

If PR #2746 is not included in the forthcoming tarball (aka "ga_atlas_1.0.0"), my recommendation is that GA add it to the list of things to be included in an eventual subsequent release ("ga_atlas_1.0.1").

Regarding the tarball, I will post an update on Wednesday 24-Apr-2024 with an updated schedule. (I was out of the office today, so must check with my colleagues to find out the present status of the GA branch and tarball.)

mwinkel-dev · 2024-04-24T22:26:09Z

Hi @smithsp,

With guidance from @WhoBrokeTheBuild, the official GA branch is now available. It is named ga_atlas.

And it contains the cherry-picks of PR #2735 and PR #2740. (PR #2746 was not included because it is still in the review process.)

The history of the ga_atlas branch is as follows.

ga_atlas_1.0.0
   PR #2740 = 4cee3b1 = Fix: reduce open files due to dispatcher
   PR #2735 = 471a06c = When activate debug trace, now compiles without error

alpha_7.139.59 = version GA installed in November 2023
   PR #2647 = 37652c2 = Fix: dir /full segfault

The ga_atlas_1.0.0 was manually built (on my dev system using Docker images) for three platforms: RHEL8, Ubuntu20 and Windows. All builds successfully ran the automated test suite. In addition, the Ubuntu20 build passed the manual test of 1,200 dispatched actions without leaking any sockets.

Stephen informed me that the Jenkins build server is not configured to build the ga_atlas branch. If GA has problems building the branch, let us know.

I am also creating a tarball of the RHEL8 RPMs so that I can install the ga_atlas_1.0.0 release on my Rocky Linux 8 virtual machine and repeat the dispatch / socket test. As a courtesy to GA, I can also make my RPM tarball available if needed.

mwinkel-dev · 2024-04-25T16:41:39Z

Timeline of fix (in calendar days, not work days) and lessons @mwinkel-dev learned . . .

workaround proposed and tested = 1 day
investigation and first two versions of fix = 12 days
develop PR 2740 fix = 2 days
review, merge and test PR 2740 = 9 days
create PR 2746 to fix residual dispatcher issues = 4 days
create ga_atlas branch and deliver tarball of RPMs = 2 days

Step 2 was the result of undocumented source code and wrong assumptions about the software architecture. To prevent this from occurring again, PR #2746 adds comments to help future maintenance programmers. Lesson learned = seek input from others sooner.

Six days elapsed between Steps 4 and 6. Fix was merged at step 4 thus available to customer, but not in the form the customer requested. Lesson learned = better coordination saves time.

smithsp · 2024-05-14T13:59:05Z

The ga_atlas branch is working in production.

mwinkel-dev · 2024-05-15T22:41:23Z

Hi @smithsp -- Thanks for the update. Am glad to read that the ga_atlas branch is working OK at DIII-D.

smithsp added the bug An unexpected problem or unintended behavior label Mar 26, 2024

joshStillerman added the US Priority label Mar 26, 2024

mwinkel-dev added branch/alpha This is present on or relates to the alpha branch tool/actions Relates to the action tools (actions, actmon, actlog) os/linux This is present on or relates to Linux labels Mar 26, 2024

mwinkel-dev mentioned this issue Apr 2, 2024

Activating the MDSDBG() debug statements generates compiler errors #2734

Closed

mwinkel-dev self-assigned this Apr 2, 2024

mwinkel-dev mentioned this issue Apr 2, 2024

Fix: When activate debug trace, now compiles without error. #2735

Merged

smithsp mentioned this issue Apr 5, 2024

Building on conda forge OSX fails #2736

Open

zack-vii mentioned this issue Apr 10, 2024

Fix: reduce open files due to dispatcher #2740

Merged

mwinkel-dev mentioned this issue Apr 14, 2024

Some calling routines do not properly handle SsINTERNAL, C_ERROR, and/or FALSE returned by low-level routines #2741

Open

mwinkel-dev mentioned this issue Apr 23, 2024

Fix: improve mdstcl's error handling and add comments #2746

Merged

smithsp closed this as completed May 14, 2024

MDSplus dispatcher opens lots of files #2731

MDSplus dispatcher opens lots of files #2731

Comments

smithsp commented Mar 26, 2024

joshStillerman commented Mar 26, 2024

mwinkel-dev commented Mar 26, 2024 • edited Loading

sflanagan commented Mar 26, 2024 • edited Loading

sflanagan commented Mar 26, 2024 • edited Loading

sflanagan commented Mar 26, 2024

joshStillerman commented Mar 26, 2024 • edited Loading

mwinkel-dev commented Mar 26, 2024

mwinkel-dev commented Mar 26, 2024

mwinkel-dev commented Mar 27, 2024

smithsp commented Mar 27, 2024

mwinkel-dev commented Mar 27, 2024

mwinkel-dev commented Mar 27, 2024 • edited Loading

mwinkel-dev commented Mar 29, 2024 • edited Loading

mwinkel-dev commented Apr 1, 2024

sflanagan commented Apr 1, 2024

sflanagan commented Apr 1, 2024 • edited Loading

mwinkel-dev commented Apr 2, 2024

mwinkel-dev commented Apr 3, 2024

mwinkel-dev commented Apr 4, 2024 • edited Loading

mwinkel-dev commented Apr 4, 2024

mwinkel-dev commented Apr 5, 2024

zack-vii commented Apr 9, 2024

mwinkel-dev commented Apr 9, 2024

mwinkel-dev commented Apr 9, 2024 • edited Loading

mwinkel-dev commented Apr 10, 2024

mwinkel-dev commented Apr 11, 2024

smithsp commented Apr 12, 2024

smithsp commented Apr 15, 2024 • edited Loading

mwinkel-dev commented Apr 16, 2024

mwinkel-dev commented Apr 16, 2024 • edited Loading

mwinkel-dev commented Apr 16, 2024

mwinkel-dev commented Apr 17, 2024

smithsp commented Apr 17, 2024

mwinkel-dev commented Apr 18, 2024

mwinkel-dev commented Apr 19, 2024

mwinkel-dev commented Apr 19, 2024

smithsp commented Apr 19, 2024

mwinkel-dev commented Apr 19, 2024

mwinkel-dev commented Apr 19, 2024

mwinkel-dev commented Apr 19, 2024 • edited Loading

mwinkel-dev commented Apr 23, 2024 • edited Loading

smithsp commented Apr 23, 2024

mwinkel-dev commented Apr 24, 2024

mwinkel-dev commented Apr 24, 2024

mwinkel-dev commented Apr 25, 2024 • edited Loading

smithsp commented May 14, 2024

mwinkel-dev commented May 15, 2024

mwinkel-dev commented Mar 26, 2024 •

edited

Loading

sflanagan commented Mar 26, 2024 •

edited

Loading

sflanagan commented Mar 26, 2024 •

edited

Loading

joshStillerman commented Mar 26, 2024 •

edited

Loading

mwinkel-dev commented Mar 27, 2024 •

edited

Loading

mwinkel-dev commented Mar 29, 2024 •

edited

Loading

sflanagan commented Apr 1, 2024 •

edited

Loading

mwinkel-dev commented Apr 4, 2024 •

edited

Loading

mwinkel-dev commented Apr 9, 2024 •

edited

Loading

smithsp commented Apr 15, 2024 •

edited

Loading

mwinkel-dev commented Apr 16, 2024 •

edited

Loading

mwinkel-dev commented Apr 19, 2024 •

edited

Loading

mwinkel-dev commented Apr 23, 2024 •

edited

Loading

mwinkel-dev commented Apr 25, 2024 •

edited

Loading