Diagnostics about Elasticsearch client sockets #134362

rudolf · 2022-06-14T15:20:44Z

The Elasticsearch-js client is currently configured to use maxSockets: Infinity which means connections aren't being re-used causing every outgoing connection to have to establish a new connection + TLS. We know we need to reduce this value but it's really hard to choose an appropriate number.

In order to tune this value we need to expose the number of actual sockets used by the http agent that the Elasticsearch-js client is using. That way we could log a warning when Kibana hits the limit. If at the limit the event loop is still really healthy then performance could be improved by increasing the number of sockets.

In addition to a warning when we're at the limit, it would be useful to expose the number of sockets as part of our monitoring data as well as through the ops.metrics logger.

Context:
#112756 (comment)

The text was updated successfully, but these errors were encountered:

elasticmachine · 2022-06-14T15:20:46Z

Pinging @elastic/kibana-core (Team:Core)

gsoldevila · 2022-08-31T09:51:12Z

After experimenting a bit with elasticsearch-js client and with Node's Http Agents, the concepts are much more clear in my head now. Just to make sure we are all in the same page:

ATM connections will be reused if they are idle*.

The maxSockets parameter determines how many concurrent sockets the agent can have open per origin.

The maxTotalSockets is a more global limit that applies to all origins of the connections managed by the agent.
Note that agents have one pool of sockets for each origin.

When calling Node's http.request(), the corresponding agent first checks the pool matching the desired origin to see if there are any idle sockets available. If there aren't any, it will check maxSockets and maxTotalSockets to see if it can open a new one. Finally, idle connections are closed / removed from the pool after the keepAliveMsecs timeout.

Let's say we have an Agent that connects to ES nodes A, B and C, and we define maxSockets: 5; maxTotalSockets: 10. In that scenario, we will have at most 5 concurrent connections to each origin (A, B and C), but not more than 10 in total.

*socket still open thanks to the keepAlive, but no request or response travelling through.

gsoldevila · 2022-08-31T10:33:01Z

Currently, we are creating multiple elasticsearch-js Client instances, through Kibana's ClusterClient class.

Core elasticsearch service's contract exposes a method to createClient(...) that accepts a type parameter and creates a new ClusterClient() instance behind the scenes.
Each ClusterClient creates 2 elasticsearch-js Client instances (root Vs scoped, which use different credentials).

Each of the elasticsearch-js's Client instances creates 1 connection pool for each ES node, and 1 connection for each pool, each one using an independent Agent instance.

Thus, with the current implementation, each agent is targeting only a single origin, and it will manage a single pool of sockets:

We are not exploiting the agent's "multi origin" capabilities.
We can't benefit from the maxTotalSockets parameter.

As a result, in our deployments we have a bunch of elasticsearch-js's Client instances, each one using multiple agents (one per ES node). This makes it quite difficult to monitor (let alone limit) the number of open connections.

gsoldevila · 2022-08-31T11:27:10Z

With that in mind, we can consider multiple initiatives:

Monitoring and limiting the number of open connections (this issue).
- We must add some telemetry in order to find out where we stand.
- We might need to define appropriate limit to the number of open connections, in order to protect the event loop and maximise performance.
Reducing the number of ES Client instances to minimise memory consumption and improve performance: Reduce the number of Http Agent instances #139809.

pgayvallet · 2022-08-31T12:19:13Z

Reducing the number of ES Client instances

Would having all the instantiated client share a common parent to reuse the same ConnectionPool work here?

I guess not, given the ConnectionPool option passed to the client constructor is a class, and not an instance, right?

gsoldevila · 2022-08-31T12:23:37Z

@pgayvallet yes, that works. When calling child() we inherit the connection pool.

pgayvallet · 2022-08-31T12:35:39Z

Hum, good to know. So in theory, we could have a way for all ES client instances to use the same connection pool (so connection, and therefore agent, if I followed correctly) by having all the client created within ClusterClient to inherit from a same 'root' client?

We would need to check if all the options we're using (in parseClientOptions and configureClient) are working properly / can be used when calling parent.child() instead of using a new instantiation.

gsoldevila · 2022-08-31T12:45:14Z

Wait, not so fast. It works in the sense that it allows sharing the Agent + pools across the instances.
But it does not cover all of Kibana's cases, cause there are some es-js Clients that are created with different, incompatible configurations, that make it impossible to have a unique Agent instance.

Plus there is the problem that the ClusterClient has a few methods (update(), empty(), ...) that impact the underlying pool and connections, and having this pool shared across instances might cause problems if/when these methods are called. Now that I think about it, instances created using child() are exposed to that problem ATM.

rudolf added Team:Core Core services & architecture: plugins, logging, config, saved objects, http, ES client, i18n, etc performance labels Jun 14, 2022

pgayvallet added the Feature:elasticsearch label Jun 14, 2022

rudolf added the Supportability Improve our (devs, SREs, support eng, users) ability to troubleshoot/self-service product better. label Jun 15, 2022

This was referenced Jul 29, 2022

Migrate elasticsearch client to the new undici http client #116087

Open

Improve management of Elasticsearch client socket pools #137734

Closed

Create HTTP Agent manager #137748

Merged

lukeelmers assigned gsoldevila Aug 2, 2022

gsoldevila mentioned this issue Aug 31, 2022

Reduce the number of Http Agent instances #139809

Open

2 tasks

gsoldevila mentioned this issue Sep 22, 2022

Collect metrics about the active/idle connections to ES nodes #141434

Merged

exalate-issue-sync bot closed this as completed Sep 22, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Diagnostics about Elasticsearch client sockets #134362

Diagnostics about Elasticsearch client sockets #134362

rudolf commented Jun 14, 2022

elasticmachine commented Jun 14, 2022

gsoldevila commented Aug 31, 2022

gsoldevila commented Aug 31, 2022 •

edited

Loading

gsoldevila commented Aug 31, 2022 •

edited

Loading

pgayvallet commented Aug 31, 2022

gsoldevila commented Aug 31, 2022

pgayvallet commented Aug 31, 2022

gsoldevila commented Aug 31, 2022 •

edited

Loading

Diagnostics about Elasticsearch client sockets #134362

Diagnostics about Elasticsearch client sockets #134362

Comments

rudolf commented Jun 14, 2022

elasticmachine commented Jun 14, 2022

gsoldevila commented Aug 31, 2022

gsoldevila commented Aug 31, 2022 • edited Loading

gsoldevila commented Aug 31, 2022 • edited Loading

pgayvallet commented Aug 31, 2022

gsoldevila commented Aug 31, 2022

pgayvallet commented Aug 31, 2022

gsoldevila commented Aug 31, 2022 • edited Loading

gsoldevila commented Aug 31, 2022 •

edited

Loading

gsoldevila commented Aug 31, 2022 •

edited

Loading

gsoldevila commented Aug 31, 2022 •

edited

Loading