We are experiencing an intermittent issue where both Qmaster and some execd hosts randomly crash.
In the logs, we repeatedly see the following fatal message right before the daemon stops:
XXX|C|!!!!!!!!!! got nullptr element for Jb_owner !!!!!!!!!!
Observed Behavior
• The message appears randomly on different hosts (Qmaster or execd).
• Once the message is printed, the daemon immediately exits.
• Restarting the service temporarily resolves the issue, but the crash eventually reoccurs.