New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add read timeouts to http module #27713

Merged
merged 1 commit into from Dec 8, 2017

Conversation

Projects
None yet
3 participants
@tbrooks8
Contributor

tbrooks8 commented Dec 7, 2017

We currently do not have any server-side read timeouts implemented in
elasticsearch. This commit adds a read timeout setting that defaults to
30 seconds. If after 30 seconds a read has not occurred, the channel
will be closed. A timeout of value of 0 will disable the timeout.

Add read timeouts to http module
We currently do not have any server-side read timeouts implemented in
elasticsearch. This commit adds a read timeout setting that defaults to
30 seconds. If after 30 seconds a read has not occurred, the channel
will be closed. A timeout of value of 0 will disable the timeout.
@jasontedor

LGTM. Good catch.

@tbrooks8 tbrooks8 merged commit ad8a571 into elastic:master Dec 8, 2017

2 checks passed

CLA Commit author is a member of Elasticsearch
Details
elasticsearch-ci Build finished.
Details

tbrooks8 added a commit that referenced this pull request Dec 8, 2017

Add read timeouts to http module (#27713)
We currently do not have any server-side read timeouts implemented in
elasticsearch. This commit adds a read timeout setting that defaults to
30 seconds. If after 30 seconds a read has not occurred, the channel
will be closed. A timeout of value of 0 will disable the timeout.

jasontedor added a commit to jasontedor/elasticsearch that referenced this pull request Dec 8, 2017

Merge branch 'master' into do-not-open-broken-indices
* master: (414 commits)
  Set ACK timeout on indices service test
  Implement byte array reusage in `NioTransport` (elastic#27696)
  [TEST] Remove leftover ES temp directories before Vagrant tests (elastic#27722)
  Cleanup split strings by comma method
  Remove unused import from AliasResolveRoutingIT
  Add read timeouts to http module (elastic#27713)
  Fix routing with leading or trailing whitespace
  remove await fix from FullClusterRestartIT.testRecovery
  Add missing 's' to tmpdir name (elastic#27721)
  [Issue-27716]: CONTRIBUTING.md IntelliJ configurations settings are confusing. (elastic#27717)
  [TEST] Now actually wait for merges
  Test out of order delivery of append only index and retry with an intermediate delete
  [TEST] remove code duplications in RequestTests
  [Tests] Add test for GeoShapeFieldType#setStrategyName (elastic#27703)
  Remove unused *Commit* classes (elastic#27714)
  Add test for writer operation buffer accounting (elastic#27707)
  [TEST] Wait for merging to complete before testing breaker
  Add Open Index API to the high level REST client (elastic#27574)
  Correcting some minor typos in comments
  Add unreleased v5.6.6 version
  ...

jasontedor added a commit to jasontedor/elasticsearch that referenced this pull request Dec 9, 2017

Merge branch 'master' into disable-package-distribution-integ-tests
* master:
  Fix index with unknown setting test
  Remove internal channel tracking in transports (elastic#27711)
  Improve error msg when a field name contains only white spaces (elastic#27709)
  Do not open indices with broken settings
  Set ACK timeout on indices service test
  Implement byte array reusage in `NioTransport` (elastic#27696)
  [TEST] Remove leftover ES temp directories before Vagrant tests (elastic#27722)
  Cleanup split strings by comma method
  Remove unused import from AliasResolveRoutingIT
  Add read timeouts to http module (elastic#27713)
  Fix routing with leading or trailing whitespace
  remove await fix from FullClusterRestartIT.testRecovery
  Add missing 's' to tmpdir name (elastic#27721)
  [Issue-27716]: CONTRIBUTING.md IntelliJ configurations settings are confusing. (elastic#27717)
  [TEST] Now actually wait for merges
@tlrx

This comment has been minimized.

Show comment
Hide comment
@tlrx

tlrx Dec 13, 2017

Member

@tbrooks8 Do you know how it plays when it comes to requests that must block until completion? I'm thinking of restoring a snapshot (see #27791) or waiting for the cluster to reach a status etc

Member

tlrx commented Dec 13, 2017

@tbrooks8 Do you know how it plays when it comes to requests that must block until completion? I'm thinking of restoring a snapshot (see #27791) or waiting for the cluster to reach a status etc

@tbrooks8

This comment has been minimized.

Show comment
Hide comment
@tbrooks8

tbrooks8 Dec 14, 2017

Contributor

@tlrx My understanding of the netty ReadTimeoutHandler it that if the timeout period passes without a read, the channel closes. Even if a read has already occurred. Reads need to keep happening for the channel to stay open. The exception is when we are in the middle of a read (like a message has been partially read, but readComplete has not been triggered). If you are in the middle of a read, the timeout will not occur.

So if the read completes and it takes longer than the read timeout period for us to write the response back, I think that the read timeout will be triggered.

We could change the timeout handler to be ALL_IDLE which means that the channel will be closed if neither a read or a write happens for a certain time period. This would prevent us from closing in the middle of a write. Although this does not solve an issue where the request processing time is taking too long. We could also default the timeout to 0 which means no timeout. And then just have the facility for adding timeouts if you want them. Or we could change the timeout for the test (if we think that this is kind of a specialized thing).

Contributor

tbrooks8 commented Dec 14, 2017

@tlrx My understanding of the netty ReadTimeoutHandler it that if the timeout period passes without a read, the channel closes. Even if a read has already occurred. Reads need to keep happening for the channel to stay open. The exception is when we are in the middle of a read (like a message has been partially read, but readComplete has not been triggered). If you are in the middle of a read, the timeout will not occur.

So if the read completes and it takes longer than the read timeout period for us to write the response back, I think that the read timeout will be triggered.

We could change the timeout handler to be ALL_IDLE which means that the channel will be closed if neither a read or a write happens for a certain time period. This would prevent us from closing in the middle of a write. Although this does not solve an issue where the request processing time is taking too long. We could also default the timeout to 0 which means no timeout. And then just have the facility for adding timeouts if you want them. Or we could change the timeout for the test (if we think that this is kind of a specialized thing).

@tlrx

This comment has been minimized.

Show comment
Hide comment
@tlrx

tlrx Dec 14, 2017

Member

Thanks for the details @tbrooks8. I do like this change but to be honest I'm a bit worried about the impact it can have for users. I think this change must be clearly documented, the read timeout must be logged using a higher level than TRACE and we must have a bypass or something for all requests that often take more than 30s to be processed. If we don't have all of this I think it would be better to revert this change for now, what do you think?

Member

tlrx commented Dec 14, 2017

Thanks for the details @tbrooks8. I do like this change but to be honest I'm a bit worried about the impact it can have for users. I think this change must be clearly documented, the read timeout must be logged using a higher level than TRACE and we must have a bypass or something for all requests that often take more than 30s to be processed. If we don't have all of this I think it would be better to revert this change for now, what do you think?

@tbrooks8

This comment has been minimized.

Show comment
Hide comment
@tbrooks8

tbrooks8 Dec 14, 2017

Contributor

@jasontedor thoughts on @tlrx's concern? We can also default the timeout to not enabled.

Contributor

tbrooks8 commented Dec 14, 2017

@jasontedor thoughts on @tlrx's concern? We can also default the timeout to not enabled.

@jasontedor

This comment has been minimized.

Show comment
Hide comment
@jasontedor

jasontedor Dec 15, 2017

Member

Let's set the default to not enabled indeed.

Member

jasontedor commented Dec 15, 2017

Let's set the default to not enabled indeed.

tbrooks8 added a commit to tbrooks8/elasticsearch that referenced this pull request Dec 18, 2017

Default to no http read timeout
Elasticsearch offers a number of http requests that can take a while to
execute. In elastic#27713 we introduced an http read timeout that defaulted to
30 seconds. This means that if no reads happened for 30 seconds (even
after a request is received), the connection would be closed due to
timeout.

This commit disables the read timeout by default to allow us to evaluate
the impact of read timeouts and to avoid introducing distruptive
behavior.

tbrooks8 added a commit that referenced this pull request Dec 19, 2017

Default to no http read timeout (#27879)
Elasticsearch offers a number of http requests that can take a while to
execute. In #27713 we introduced an http read timeout that defaulted to
30 seconds. This means that if no reads happened for 30 seconds (even
after a request is received), the connection would be closed due to
timeout.

This commit disables the read timeout by default to allow us to evaluate
the impact of read timeouts and to avoid introducing distruptive
behavior.

tbrooks8 added a commit that referenced this pull request Dec 19, 2017

Default to no http read timeout (#27879)
Elasticsearch offers a number of http requests that can take a while to
execute. In #27713 we introduced an http read timeout that defaulted to
30 seconds. This means that if no reads happened for 30 seconds (even
after a request is received), the connection would be closed due to
timeout.

This commit disables the read timeout by default to allow us to evaluate
the impact of read timeouts and to avoid introducing distruptive
behavior.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment