Skip to content

Conversation

@mhl-b
Copy link
Contributor

@mhl-b mhl-b commented Nov 14, 2025

A high level overview of networking for the distributed architecture guide. ES-7886

@elasticsearchmachine elasticsearchmachine added needs:triage Requires assignment of a team area label v9.3.0 labels Nov 14, 2025
@mhl-b mhl-b added Team:Distributed Coordination Meta label for Distributed Coordination team >non-issue :Distributed Coordination/Distributed A catch all label for anything in the Distributed Coordination area. Please avoid if you can. and removed needs:triage Requires assignment of a team area label labels Nov 14, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-distributed-coordination (Team:Distributed Coordination)

@mhl-b mhl-b requested review from DaveCTurner and ywangd November 14, 2025 18:25
Copy link
Contributor

@DaveCTurner DaveCTurner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left a few factual nits, nothing major. It needs some editorial polishing too - can we use any of our fancy new AI tools to help with that?

Comment on lines 26 to 27
be HTTP spec compliant, Elastic is not a webserver. We support GET requests with
payload (some old proxies might drop content), requests cannot be cached by
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe mention that in these cases we also support the same API with a different verb, normally POST, for clients who cannot send a GET-with-body.


HTTP transport provides two options for content processing: aggregate fully and
incremental. Aggregated content is a preferable choice for a small messages that
cannot be parsed incrementally (like JSON). But aggregation has drawbacks, it
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cannot be parsed incrementally (like JSON)

Mmm sorta depends what you mean by "incrementally" but I'd say that the SAX-style parsing we do is in fact working incrementally. It doesn't really make sense to start parsing before we've received the whole body, but that's different from saying "cannot".

Comment on lines 78 to 79
The job of the REST handler is to parse and validate HTTP request and construct
typed version of request, often Transport request (see Transport section below).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be worth pointing out here that (if security is enabled then) authentication happens before the REST handler but authorization happens after (when entering the transport layer).


`Transport` is an umbrella term for a node-to-node communication. It's a
TCP-based custom binary protocol. Every node in a cluster is a client and server
at the same time. Node-to-node communication never uses HTTP transport.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

never

except for reindex-from-remote :)

connections for different purposes: ping, node-state, bulks, etc. Pool structure
is defined in `ConnectionProfile` class.

ES has resilience to disconnects, frequent reconnects, but in general we assume
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

resilience

I don't think we are particularly resilient to frequent reconnects. At least, not without being more specific. ES never behaves incorrectly (e.g. loses data) in the face of network outages but it may become unavailable unless the network is stable.


Another area of different networking clients is snapshotting. ES supports
snapshotting to remote repositories. These repositories usually come with their
own SDK and networking stack. For example AWS SDK comes with tomcat or netty,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although the Apache client we use may have originally come from Tomcat, it's not called that any more and this'll confuse folks that don't know the history. Let's just call it the Apache client.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apache indeed, no idea why I wrote Tomcat.

be forked to another thread pool. But forking comes with overhead, doing forking
on every tiny request is a wasted CPU work. As a rule of thumb: don't fork
simple requests that can be served from memory and do not require heavy
computations (seconds), otherwise fork.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A pause of "seconds" is still pretty disastrous

Suggested change
computations (seconds), otherwise fork.
computations (millseconds), otherwise fork.

One of the performance edges of netty is controlled memory allocation. Netty
manages byte buffer pools and reuse them heavily. This performance gain comes
with a cost. And cost is reference counting on developer's shoulders. Netty
reads socket bytes into pooled byte-buffers and passes them to application. Then
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be worth saying a bit more about how this looks in a heap dump - there's a pool of 1MiB byte[] objects which Netty slices into 16kiB pages, not all of which may be in use, so you have to account for this when investigating memory usage.

One of the performance edges of netty is controlled memory allocation. Netty
manages byte buffer pools and reuse them heavily. This performance gain comes
with a cost. And cost is reference counting on developer's shoulders. Netty
reads socket bytes into pooled byte-buffers and passes them to application. Then
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also Netty technically reads the bytes into a direct buffer and then CopyBytesSocketChannel copies them into the pooled buffer.

Another area of different networking clients is snapshotting. ES supports
snapshotting to remote repositories. These repositories usually come with their
own SDK and networking stack. For example AWS SDK comes with tomcat or netty,
Azure with netty-based project-reactor, GCP uses default java HTTP
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Netty needs a capital N (here and in several other places)

Copy link
Member

@ywangd ywangd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have some minor comments. Will leave the approval to David. Thanks!

Comment on lines 35 to 36
Security is not enabled by default, meaning no TLS and authentication. These
features are available in the x-pack/security module.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would not say "Security is not enabled by default" but instead just say "security features are achieved with separate x-pack modules". We had dedicated effort to ensure security is on by default and basic security including authc/authz/TLS is in the free tier. It is rather disheartening to say the opposite.

Comment on lines 106 to 113
Once node discovers cluster it will open a pool of connections to every other
node in a cluster, and every other node will open a pool of connections
back. That means a connection between nodes A and B is A->B pool + B->A pool. A
node sends requests only on connections it opens (as a client).

A default connection pool is around 13 connections, a pool has sub-pools of
connections for different purposes: ping, node-state, bulks, etc. Pool structure
is defined in `ConnectionProfile` class.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder whether we should align the wording here with the code where Conneciton is a higher level abstraction that encapsulates a pool of Channels. Similarly, ConnectionProfile is at higher level while transport profile is for each channel. The most prominent Connection implementation is NodeChannels (note the plura form). It took a while to get my head around them when I first read the code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree with aligning terminology. "Connection is a pool of Channels" might need another explanation that Channel is a non-blocking TCP connection based on Java NIO definition.


## Snapshots

Another area of different networking clients is snapshotting. ES supports
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are other features such as SAML/JWT metadata reloading, watcher http action, reindex and probably ML related features such as inference that also use HTTP clients. I wonder whether we should mention them here. Or do you see snapshots being the most relevant one for the team?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice to mention all networking interactions and high level overview. To display big picture of ins and outs. I will add ones you mentioned, but will leave details for next round of docs improvement.

@mhl-b
Copy link
Contributor Author

mhl-b commented Nov 18, 2025

@DaveCTurner, @ywangd

I addressed all comments in new revision.b691644

@mhl-b mhl-b requested review from DaveCTurner and ywangd November 18, 2025 23:40
@github-actions
Copy link
Contributor

github-actions bot commented Nov 18, 2025

🔍 Preview links for changed docs

@github-actions
Copy link
Contributor

ℹ️ Important: Docs version tagging

👋 Thanks for updating the docs! Just a friendly reminder that our docs are now cumulative. This means all 9.x versions are documented on the same page and published off of the main branch, instead of creating separate pages for each minor version.

We use applies_to tags to mark version-specific features and changes.

Expand for a quick overview

When to use applies_to tags:

✅ At the page level to indicate which products/deployments the content applies to (mandatory)
✅ When features change state (e.g. preview, ga) in a specific version
✅ When availability differs across deployments and environments

What NOT to do:

❌ Don't remove or replace information that applies to an older version
❌ Don't add new information that applies to a specific version without an applies_to tag
❌ Don't forget that applies_to tags can be used at the page, section, and inline level

🤔 Need help?

Copy link
Contributor

@DaveCTurner DaveCTurner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, just a few nits.

Comment on lines 18 to 19
## HTTP Transport
The HTTP Transport Server (simply HTTP Transport) is a single entry point for
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We typically use the word "transport" to mean the node-to-node protocol, to distinguish it from the HTTP network interface. So yes although "HTTP Transport" is kinda correct, it is confusing in this context. Could we avoid using "transport" here?

by middle boxes.

There is no connection limit, but a limit on payload size exists. The default
maximum payload is 100MB after compression. It's a very large number and almost
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe mention the relevant setting here?

chunks. It's more complicated for application code, but provides better control
over memory usage.

Incremental bulk indexing includes a back-pressure feature.When memory pressure
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Incremental bulk indexing includes a back-pressure feature.When memory pressure
Incremental bulk indexing includes a back-pressure feature. When memory pressure

chunks. It's more complicated for application code, but provides better control
over memory usage.

Incremental bulk indexing includes a back-pressure feature.When memory pressure
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we link to the org.elasticsearch.index.IndexingPressure here?

Comment on lines 52 to 53
ES supports multiple `Content-Types` for the payload, collectively referred to
internally as `XContentType`: CBOR, JSON, SMILE, YAML, and their versioned types.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can also handle other content-types, it's just that these are the best-supported ones.


HTTP routing is based on a combination of Method and URI. For example,
`RestCreateIndexAction` handler uses `("PUT", "/{index}")`, where curly braces
indicate path variables. RestBulkAction specifies a list of routes
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
indicate path variables. RestBulkAction specifies a list of routes
indicate path variables. `RestBulkAction` specifies a list of routes

servers and clients, handling potentially hundreds or thousands of connections.

Event-loop threads serve many connections each, it's critical to not block
threads for a long time.Fork any blocking operation or heavy computation to
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
threads for a long time.Fork any blocking operation or heavy computation to
threads for a long time. Fork any blocking operation or heavy computation to

Copy link
Contributor

@burqen burqen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left some suggestions for improvement as I found them while reading.

In the GeneralArchitectureGuide there is a section about # REST and Transport Layers and there is a bit of overlap between that section and what you add here. I think it would be a good idea to combine them into one thing but I also realize that it would be quite a bit of additional work to do so. I leave it as a suggestion to be dealt with either now or in the Glorious Future™.


HTTP transport provides two options for content processing: aggregate fully and
incremental. Aggregated content is a preferable choice for small messages that
do not fit for incremental parsing (e.g., JSON). Aggregation has drawbacks, it
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
do not fit for incremental parsing (e.g., JSON). Aggregation has drawbacks, it
are not fit for incremental parsing (e.g., JSON). Aggregation has drawbacks, it

maximum payload is 100MB after compression. It's a very large number and almost
never a good target that the client should approach.

Security features, including basic security (authc/authz/TLS) in the free tier,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Security features, including basic security (authc/authz/TLS) in the free tier,
Security features, including basic security (authentication/authorization/Transport Layer Security) in the free tier,

For someone that is not used to working with networking and security (me) it's not obvious what authc/authz/TLS means without googling. If target audience is newcomers and open source community I think it's worth spelling those out.

Comment on lines 43 to 45
Large delimited content, such as bulk indexing, which is processed in byte
chunks. It's more complicated for application code, but provides better control
over memory usage.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Large delimited content, such as bulk indexing, which is processed in byte
chunks. It's more complicated for application code, but provides better control
over memory usage.
Large delimited content, such as bulk indexing, which is processed in byte
chunks, provides better control over memory usage but is more complicated
for application code.


`Netty4HttpServerTransport` is a single implementation of
`AbstractHttpServerTransport` from the `transport-netty4`
module. Security module injects SSL and headers validator.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
module. Security module injects SSL and headers validator.
module. Security module injects SSL (Secure Sockets Layer)
and headers validator.

Not obvious abbreviation for me.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's call it TLS rather than SSL - SSL has been deprecated for over a decade at this point

@mhl-b
Copy link
Contributor Author

mhl-b commented Nov 19, 2025

Thanks for the feedback!

In the GeneralArchitectureGuide there is a section about # REST and Transport Layers and there is a bit of overlap between that section and what you add here.

I think overlap is ok, as long it's consistent. It also overlaps with public docs https://www.elastic.co/docs/reference/elasticsearch/configuration-reference/networking-settings. I will add this to the doc too.

For someone that is not used to working with networking and security

Sometimes I wonder if my life would be better had I never learned computer networks.

@mhl-b
Copy link
Contributor Author

mhl-b commented Nov 19, 2025

@DaveCTurner, @burqen. Addressed feedback in 08991aa

@mhl-b mhl-b requested a review from DaveCTurner November 19, 2025 20:52
@mhl-b mhl-b requested a review from burqen November 19, 2025 20:52
Copy link
Member

@ywangd ywangd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

I have only optional editorial level comments.

## HTTP Server

The HTTP Server is a single entry point for all external clients (excluding
cross-cluster communication). Management,ingestion, search, and all other
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit

Suggested change
cross-cluster communication). Management,ingestion, search, and all other
cross-cluster communication). Management, ingestion, search, and all other

This mechanism protects against unbounded memory usage and `OutOfMemory`
errors (OOMs).

ES supports multiple `Content-Types` for the payload. These are
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit

Suggested change
ES supports multiple `Content-Types` for the payload. These are
ES supports multiple `Content-Type`s for the payload. These are

?

Comment on lines 59 to 60
implementations of `MediaType` interface. A common ones internally called
`XContentType`: CBOR, JSON, SMILE, YAML, and their versioned types. X-pack
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit

Suggested change
implementations of `MediaType` interface. A common ones internally called
`XContentType`: CBOR, JSON, SMILE, YAML, and their versioned types. X-pack
implementations of `MediaType` interface. A common implementation is called
`XContentType`, including CBOR, JSON, SMILE, YAML, and their versioned types. X-pack

implementations of `MediaType` interface. A common ones internally called
`XContentType`: CBOR, JSON, SMILE, YAML, and their versioned types. X-pack
extensions includes PLAIN_TEXT, CSV, etc. Classes that implement
`ToXContent...` can be serialized and sent over HTTP.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit

Suggested change
`ToXContent...` can be serialized and sent over HTTP.
`ToXContent` and friends can be serialized and sent over HTTP.


`Netty4HttpServerTransport` is a single implementation of
`AbstractHttpServerTransport` from the `transport-netty4`
module. Security module injects TLS and headers validator.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
module. Security module injects TLS and headers validator.
module. The `x-pack/security` module injects TLS and headers validator.

Comment on lines 120 to 122
`Connections` between any two nodes `(A→B and B→A)`. A node sends requests only
on the `Connection` it opens (acting as a client). The default pool is around 13
`Channels`, divided into sub-pools for different purposes (e.g., ping,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if there is a convention here. I personally prefer Connections over Connections since literal string seems to indicate corresponding java concept, e.g. class.

Comment on lines 217 to 218
The compiler does not help detect these issues. They require careful testing
using Netty's LeakDetector with a Paranoid level.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's enabled by default for tests? If so, maybe worth to mention.

@burqen
Copy link
Contributor

burqen commented Nov 20, 2025

Things are looking good, approving 😄

@mhl-b
Copy link
Contributor Author

mhl-b commented Nov 22, 2025

@DaveCTurner, are you ok with current version?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Distributed Coordination/Distributed A catch all label for anything in the Distributed Coordination area. Please avoid if you can. >non-issue Team:Distributed Coordination Meta label for Distributed Coordination team v9.3.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants