Skip to content

[ML] Processes that fail to connect to the JVM within a reasonable time should exit #1504

@droberts195

Description

@droberts195

Debugging the problems related to #1503 and elastic/elasticsearch#62823 have highlighted that if an ML autodetect, normalize or data_frame_analyzer process fails to connect to the JVM (for example because the JVM gives up trying to open its named pipes) then that process will hang indefinitely. Nothing wakes up the C++ process's wait for something to connect to the other end of its named pipes except something connecting to the other end of its named pipes.

CThread has functionality for waking up blocking calls that we can use to avoid this situation. Processes that connect named pipes should start a thread before attempting to connect them that will interrupt the wait for a connection to the other end of the pipe after a certain time. The question is what timeout to use:

  1. Ideally this would be based on the xpack.ml.process_connect_timeout setting. That would have to be passed to each C++ process in a command line argument.
  2. The alternative would be a hardcoded timeout higher than any reasonable value for xpack.ml.process_connect_timeout, say 10 minutes.

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions