-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Add networking guide docs #138119
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Add networking guide docs #138119
Conversation
|
Pinging @elastic/es-distributed-coordination (Team:Distributed Coordination) |
DaveCTurner
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left a few factual nits, nothing major. It needs some editorial polishing too - can we use any of our fancy new AI tools to help with that?
| be HTTP spec compliant, Elastic is not a webserver. We support GET requests with | ||
| payload (some old proxies might drop content), requests cannot be cached by |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe mention that in these cases we also support the same API with a different verb, normally POST, for clients who cannot send a GET-with-body.
|
|
||
| HTTP transport provides two options for content processing: aggregate fully and | ||
| incremental. Aggregated content is a preferable choice for a small messages that | ||
| cannot be parsed incrementally (like JSON). But aggregation has drawbacks, it |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cannot be parsed incrementally (like JSON)
Mmm sorta depends what you mean by "incrementally" but I'd say that the SAX-style parsing we do is in fact working incrementally. It doesn't really make sense to start parsing before we've received the whole body, but that's different from saying "cannot".
| The job of the REST handler is to parse and validate HTTP request and construct | ||
| typed version of request, often Transport request (see Transport section below). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be worth pointing out here that (if security is enabled then) authentication happens before the REST handler but authorization happens after (when entering the transport layer).
|
|
||
| `Transport` is an umbrella term for a node-to-node communication. It's a | ||
| TCP-based custom binary protocol. Every node in a cluster is a client and server | ||
| at the same time. Node-to-node communication never uses HTTP transport. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
never
except for reindex-from-remote :)
| connections for different purposes: ping, node-state, bulks, etc. Pool structure | ||
| is defined in `ConnectionProfile` class. | ||
|
|
||
| ES has resilience to disconnects, frequent reconnects, but in general we assume |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
resilience
I don't think we are particularly resilient to frequent reconnects. At least, not without being more specific. ES never behaves incorrectly (e.g. loses data) in the face of network outages but it may become unavailable unless the network is stable.
|
|
||
| Another area of different networking clients is snapshotting. ES supports | ||
| snapshotting to remote repositories. These repositories usually come with their | ||
| own SDK and networking stack. For example AWS SDK comes with tomcat or netty, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although the Apache client we use may have originally come from Tomcat, it's not called that any more and this'll confuse folks that don't know the history. Let's just call it the Apache client.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apache indeed, no idea why I wrote Tomcat.
| be forked to another thread pool. But forking comes with overhead, doing forking | ||
| on every tiny request is a wasted CPU work. As a rule of thumb: don't fork | ||
| simple requests that can be served from memory and do not require heavy | ||
| computations (seconds), otherwise fork. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A pause of "seconds" is still pretty disastrous
| computations (seconds), otherwise fork. | |
| computations (millseconds), otherwise fork. |
| One of the performance edges of netty is controlled memory allocation. Netty | ||
| manages byte buffer pools and reuse them heavily. This performance gain comes | ||
| with a cost. And cost is reference counting on developer's shoulders. Netty | ||
| reads socket bytes into pooled byte-buffers and passes them to application. Then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be worth saying a bit more about how this looks in a heap dump - there's a pool of 1MiB byte[] objects which Netty slices into 16kiB pages, not all of which may be in use, so you have to account for this when investigating memory usage.
| One of the performance edges of netty is controlled memory allocation. Netty | ||
| manages byte buffer pools and reuse them heavily. This performance gain comes | ||
| with a cost. And cost is reference counting on developer's shoulders. Netty | ||
| reads socket bytes into pooled byte-buffers and passes them to application. Then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also Netty technically reads the bytes into a direct buffer and then CopyBytesSocketChannel copies them into the pooled buffer.
| Another area of different networking clients is snapshotting. ES supports | ||
| snapshotting to remote repositories. These repositories usually come with their | ||
| own SDK and networking stack. For example AWS SDK comes with tomcat or netty, | ||
| Azure with netty-based project-reactor, GCP uses default java HTTP |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Netty needs a capital N (here and in several other places)
ywangd
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have some minor comments. Will leave the approval to David. Thanks!
| Security is not enabled by default, meaning no TLS and authentication. These | ||
| features are available in the x-pack/security module. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would not say "Security is not enabled by default" but instead just say "security features are achieved with separate x-pack modules". We had dedicated effort to ensure security is on by default and basic security including authc/authz/TLS is in the free tier. It is rather disheartening to say the opposite.
| Once node discovers cluster it will open a pool of connections to every other | ||
| node in a cluster, and every other node will open a pool of connections | ||
| back. That means a connection between nodes A and B is A->B pool + B->A pool. A | ||
| node sends requests only on connections it opens (as a client). | ||
|
|
||
| A default connection pool is around 13 connections, a pool has sub-pools of | ||
| connections for different purposes: ping, node-state, bulks, etc. Pool structure | ||
| is defined in `ConnectionProfile` class. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder whether we should align the wording here with the code where Conneciton is a higher level abstraction that encapsulates a pool of Channels. Similarly, ConnectionProfile is at higher level while transport profile is for each channel. The most prominent Connection implementation is NodeChannels (note the plura form). It took a while to get my head around them when I first read the code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree with aligning terminology. "Connection is a pool of Channels" might need another explanation that Channel is a non-blocking TCP connection based on Java NIO definition.
|
|
||
| ## Snapshots | ||
|
|
||
| Another area of different networking clients is snapshotting. ES supports |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are other features such as SAML/JWT metadata reloading, watcher http action, reindex and probably ML related features such as inference that also use HTTP clients. I wonder whether we should mention them here. Or do you see snapshots being the most relevant one for the team?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be nice to mention all networking interactions and high level overview. To display big picture of ins and outs. I will add ones you mentioned, but will leave details for next round of docs improvement.
|
I addressed all comments in new revision.b691644 |
🔍 Preview links for changed docs |
ℹ️ Important: Docs version tagging👋 Thanks for updating the docs! Just a friendly reminder that our docs are now cumulative. This means all 9.x versions are documented on the same page and published off of the main branch, instead of creating separate pages for each minor version. We use applies_to tags to mark version-specific features and changes. Expand for a quick overviewWhen to use applies_to tags:✅ At the page level to indicate which products/deployments the content applies to (mandatory) What NOT to do:❌ Don't remove or replace information that applies to an older version 🤔 Need help?
|
DaveCTurner
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, just a few nits.
| ## HTTP Transport | ||
| The HTTP Transport Server (simply HTTP Transport) is a single entry point for |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We typically use the word "transport" to mean the node-to-node protocol, to distinguish it from the HTTP network interface. So yes although "HTTP Transport" is kinda correct, it is confusing in this context. Could we avoid using "transport" here?
| by middle boxes. | ||
|
|
||
| There is no connection limit, but a limit on payload size exists. The default | ||
| maximum payload is 100MB after compression. It's a very large number and almost |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe mention the relevant setting here?
| chunks. It's more complicated for application code, but provides better control | ||
| over memory usage. | ||
|
|
||
| Incremental bulk indexing includes a back-pressure feature.When memory pressure |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| Incremental bulk indexing includes a back-pressure feature.When memory pressure | |
| Incremental bulk indexing includes a back-pressure feature. When memory pressure |
| chunks. It's more complicated for application code, but provides better control | ||
| over memory usage. | ||
|
|
||
| Incremental bulk indexing includes a back-pressure feature.When memory pressure |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we link to the org.elasticsearch.index.IndexingPressure here?
| ES supports multiple `Content-Types` for the payload, collectively referred to | ||
| internally as `XContentType`: CBOR, JSON, SMILE, YAML, and their versioned types. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can also handle other content-types, it's just that these are the best-supported ones.
|
|
||
| HTTP routing is based on a combination of Method and URI. For example, | ||
| `RestCreateIndexAction` handler uses `("PUT", "/{index}")`, where curly braces | ||
| indicate path variables. RestBulkAction specifies a list of routes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| indicate path variables. RestBulkAction specifies a list of routes | |
| indicate path variables. `RestBulkAction` specifies a list of routes |
| servers and clients, handling potentially hundreds or thousands of connections. | ||
|
|
||
| Event-loop threads serve many connections each, it's critical to not block | ||
| threads for a long time.Fork any blocking operation or heavy computation to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| threads for a long time.Fork any blocking operation or heavy computation to | |
| threads for a long time. Fork any blocking operation or heavy computation to |
burqen
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left some suggestions for improvement as I found them while reading.
In the GeneralArchitectureGuide there is a section about # REST and Transport Layers and there is a bit of overlap between that section and what you add here. I think it would be a good idea to combine them into one thing but I also realize that it would be quite a bit of additional work to do so. I leave it as a suggestion to be dealt with either now or in the Glorious Future™.
|
|
||
| HTTP transport provides two options for content processing: aggregate fully and | ||
| incremental. Aggregated content is a preferable choice for small messages that | ||
| do not fit for incremental parsing (e.g., JSON). Aggregation has drawbacks, it |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| do not fit for incremental parsing (e.g., JSON). Aggregation has drawbacks, it | |
| are not fit for incremental parsing (e.g., JSON). Aggregation has drawbacks, it |
| maximum payload is 100MB after compression. It's a very large number and almost | ||
| never a good target that the client should approach. | ||
|
|
||
| Security features, including basic security (authc/authz/TLS) in the free tier, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| Security features, including basic security (authc/authz/TLS) in the free tier, | |
| Security features, including basic security (authentication/authorization/Transport Layer Security) in the free tier, |
For someone that is not used to working with networking and security (me) it's not obvious what authc/authz/TLS means without googling. If target audience is newcomers and open source community I think it's worth spelling those out.
| Large delimited content, such as bulk indexing, which is processed in byte | ||
| chunks. It's more complicated for application code, but provides better control | ||
| over memory usage. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| Large delimited content, such as bulk indexing, which is processed in byte | |
| chunks. It's more complicated for application code, but provides better control | |
| over memory usage. | |
| Large delimited content, such as bulk indexing, which is processed in byte | |
| chunks, provides better control over memory usage but is more complicated | |
| for application code. |
|
|
||
| `Netty4HttpServerTransport` is a single implementation of | ||
| `AbstractHttpServerTransport` from the `transport-netty4` | ||
| module. Security module injects SSL and headers validator. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| module. Security module injects SSL and headers validator. | |
| module. Security module injects SSL (Secure Sockets Layer) | |
| and headers validator. |
Not obvious abbreviation for me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's call it TLS rather than SSL - SSL has been deprecated for over a decade at this point
|
Thanks for the feedback!
I think overlap is ok, as long it's consistent. It also overlaps with public docs https://www.elastic.co/docs/reference/elasticsearch/configuration-reference/networking-settings. I will add this to the doc too.
Sometimes I wonder if my life would be better had I never learned computer networks. |
|
@DaveCTurner, @burqen. Addressed feedback in 08991aa |
ywangd
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
I have only optional editorial level comments.
| ## HTTP Server | ||
|
|
||
| The HTTP Server is a single entry point for all external clients (excluding | ||
| cross-cluster communication). Management,ingestion, search, and all other |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit
| cross-cluster communication). Management,ingestion, search, and all other | |
| cross-cluster communication). Management, ingestion, search, and all other |
| This mechanism protects against unbounded memory usage and `OutOfMemory` | ||
| errors (OOMs). | ||
|
|
||
| ES supports multiple `Content-Types` for the payload. These are |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit
| ES supports multiple `Content-Types` for the payload. These are | |
| ES supports multiple `Content-Type`s for the payload. These are |
?
| implementations of `MediaType` interface. A common ones internally called | ||
| `XContentType`: CBOR, JSON, SMILE, YAML, and their versioned types. X-pack |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit
| implementations of `MediaType` interface. A common ones internally called | |
| `XContentType`: CBOR, JSON, SMILE, YAML, and their versioned types. X-pack | |
| implementations of `MediaType` interface. A common implementation is called | |
| `XContentType`, including CBOR, JSON, SMILE, YAML, and their versioned types. X-pack |
| implementations of `MediaType` interface. A common ones internally called | ||
| `XContentType`: CBOR, JSON, SMILE, YAML, and their versioned types. X-pack | ||
| extensions includes PLAIN_TEXT, CSV, etc. Classes that implement | ||
| `ToXContent...` can be serialized and sent over HTTP. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit
| `ToXContent...` can be serialized and sent over HTTP. | |
| `ToXContent` and friends can be serialized and sent over HTTP. |
|
|
||
| `Netty4HttpServerTransport` is a single implementation of | ||
| `AbstractHttpServerTransport` from the `transport-netty4` | ||
| module. Security module injects TLS and headers validator. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| module. Security module injects TLS and headers validator. | |
| module. The `x-pack/security` module injects TLS and headers validator. |
| `Connections` between any two nodes `(A→B and B→A)`. A node sends requests only | ||
| on the `Connection` it opens (acting as a client). The default pool is around 13 | ||
| `Channels`, divided into sub-pools for different purposes (e.g., ping, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if there is a convention here. I personally prefer Connections over Connections since literal string seems to indicate corresponding java concept, e.g. class.
| The compiler does not help detect these issues. They require careful testing | ||
| using Netty's LeakDetector with a Paranoid level. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's enabled by default for tests? If so, maybe worth to mention.
|
Things are looking good, approving 😄 |
|
@DaveCTurner, are you ok with current version? |
A high level overview of networking for the distributed architecture guide. ES-7886