feat(resource): deprecate `auto_restart_interval` in favor of `health_check_interval` #10910

thalesmg · 2023-06-01T13:19:51Z

See:
https://emqx.atlassian.net/wiki/spaces/P/pages/612368639/open+e5.1+remove+auto+restart+interval+from+buffer+worker+resource+options

Current problem:

In 5.0.x, we have two timer options that control the state changing of buffer worker
resources: auto_restart_interval and health_check_interval.

auto_restart_interval controls how often the resource attempts to transition from
disconnected to connected.
health_check_interval controls how often the resource is checked and potentially moved
from connected to disconnected or connecting.

The existence of two independent timers for very similar purposes is confusing to users,
QA and even developers. Also, an intimately related configuration is request_timeout,
which can interact badly with auto_restart_interval if the latter is poorly configured:
requests may always expire if request_timeout < auto_restart_interval and if the resource
enters the disconnected state. For health_check_interval, we attempt to derive a sane
default that gives requests a chance to retry (if request timeout is finite, then the
resource retries requests with a period of min(health_check_interval, request_timeout /
3).

Another problem with the separate auto_restart_interval is that its default value (60 s)
is too high when compared to the default request timeout and health check, leading to the
problems described above if not tuned.

Proposed solution:

We propose to drop auto_restart_interval in favor of health_check_interval, which will be
used for both disconnected -> connected and connected -> {disconnected, connecting}
transition checks. With that, the resource will attempt to reconnect at the same interval
as the health check, which currently is 15 s.

Also, as two smaller changes to accompany this one:

Increase the default request_timeout from 15 s to 45 s.
Rename request_timeout to request_ttl.

Summary

`🤖 Generated by Copilot at 34d1c9e`

Refactor the bridge resource management and configuration options to improve consistency, clarity, and simplicity. Remove the auto_restart_interval option and feature from all bridge types, as it is redundant and problematic. Rename the request_timeout option to request_ttl in all bridge modules and test suites, to avoid naming conflicts and unify the meaning of the request time-to-live option. Bump the version numbers of the affected bridge applications to reflect the changes.

PR Checklist

Please convert it to a draft if any of the following conditions are not met. Reviewers may skip over until all the items are checked:

Added tests for the changes
Changed lines covered in coverage report
Change log has been added to changes/{ce,ee}/(feat|perf|fix)-<PR-id>.en.md files
For internal contributor: there is a jira ticket to track this change
If there should be document changes, a PR to emqx-docs.git is sent, or a jira ticket is created to follow up
Schema changes are backward compatible

Checklist for CI (.github/workflows) changes

If changed package build workflow, pass this action (manual trigger)
Change log has been added to changes/ dir for user-facing artifacts update

…_check_interval` See: https://emqx.atlassian.net/wiki/spaces/P/pages/612368639/open+e5.1+remove+auto+restart+interval+from+buffer+worker+resource+options Current problem: In 5.0.x, we have two timer options that control the state changing of buffer worker resources: auto_restart_interval and health_check_interval. - auto_restart_interval controls how often the resource attempts to transition from disconnected to connected. - health_check_interval controls how often the resource is checked and potentially moved from connected to disconnected or connecting. The existence of two independent timers for very similar purposes is confusing to users, QA and even developers. Also, an intimately related configuration is request_timeout, which can interact badly with auto_restart_interval if the latter is poorly configured: requests may always expire if request_timeout < auto_restart_interval and if the resource enters the disconnected state. For health_check_interval, we attempt to derive a sane default that gives requests a chance to retry (if request timeout is finite, then the resource retries requests with a period of min(health_check_interval, request_timeout / 3). Another problem with the separate auto_restart_interval is that its default value (60 s) is too high when compared to the default request timeout and health check, leading to the problems described above if not tuned. Proposed solution: We propose to drop auto_restart_interval in favor of health_check_interval, which will be used for both disconnected -> connected and connected -> {disconnected, connecting} transition checks. With that, the resource will attempt to reconnect at the same interval as the health check, which currently is 15 s. Also, as two smaller changes to accompany this one: - Increase the default request_timeout from 15 s to 45 s. - Rename request_timeout to request_ttl.

See https://emqx.atlassian.net/wiki/spaces/P/pages/612368639/open+e5.1+remove+auto+restart+interval+from+buffer+worker+resource+options

coveralls · 2023-06-01T18:05:17Z

Pull Request Test Coverage Report for Build 5155352642

16 of 16 (100.0%) changed or added relevant lines in 5 files are covered.
431 unchanged lines in 42 files lost coverage.
Overall coverage decreased (-0.1%) to 81.618%

Files with Coverage Reduction	New Missed Lines	%
apps/emqx_bridge_oracle/src/emqx_bridge_oracle.erl	1	93.33%
apps/emqx_bridge_rocketmq/src/emqx_bridge_rocketmq_connector.erl	1	74.74%
apps/emqx_bridge_sqlserver/src/emqx_bridge_sqlserver_connector.erl	1	82.18%
apps/emqx_conf/src/emqx_conf_schema.erl	1	95.0%
apps/emqx_ft/src/emqx_ft_storage.erl	1	91.3%
apps/emqx_gateway_mqttsn/src/emqx_mqttsn_channel.erl	1	73.18%
apps/emqx_gateway/src/emqx_gateway_conf.erl	1	89.2%
apps/emqx_gateway/src/emqx_gateway_schema.erl	1	88.89%
apps/emqx_management/src/emqx_mgmt_app.erl	1	85.71%
apps/emqx_modules/src/emqx_delayed.erl	1	95.65%

Totals
Change from base Build 5144947922:	-0.1%
Covered Lines:	28888
Relevant Lines:	35394

💛 - Coveralls

zmstone

please add a changelog to describe in detail the changed fields and default values.

apps/emqx_resource/src/schema/emqx_resource_schema.erl

Co-authored-by: Zaiming (Stone) Shi <zmstone@gmail.com>

thalesmg · 2023-06-02T12:24:12Z

please add a changelog to describe in detail the changed fields and default values.

✔️

yanzhiemq · 2023-06-13T06:02:53Z

Enhancements

The data bridge resource option auto_restart_interval was deprecated in favor of health_check_interval, and request_timeout was renamed to request_ttl. Also, the default request_ttl value went from 15 seconds to 45 seconds.

The previous existence of both auto_restart_interval and health_check_interval was a source of confusion, as both parameters influenced the recovery of data bridges under failures. An inconsistent configuration of those two parameters could lead to messages being expired without a chance to retry. Now, health_check_interval is used both to control the periodicity of health checks that may transition the data bridge into disconnected or connecting states, as well as recovering from disconnected.

yanzhiemq · 2023-06-13T06:02:55Z

增强

数据桥接资源选项 auto_restart_interval 已被弃用，改为使用 health_check_interval，而 request_timeout 则被重命名为 request_ttl。此外，默认的 request_ttl 值从 15 秒增加到了 45 秒。

之前同时存在 auto_restart_interval 和 health_check_interval 会导致混淆，因为这两个参数都会影响数据桥接在故障下的恢复。这两个参数的配置不一致可能导致消息过期而无法重试。现在，health_check_interval 用于控制定期进行健康检查的频率，这可能会将数据桥接转换为 disconnected 或 connecting 状态，并从 disconnected 状态中进行恢复。

thalesmg added 2 commits June 1, 2023 11:20

feat(resource): increase default request timeout to 45 s

f42ccb6

See https://emqx.atlassian.net/wiki/spaces/P/pages/612368639/open+e5.1+remove+auto+restart+interval+from+buffer+worker+resource+options

thalesmg force-pushed the unify-restart-interval-v50 branch from 34d1c9e to e6cb0ee Compare June 1, 2023 14:21

thalesmg added 2 commits June 1, 2023 13:01

refactor(resource): rename request_timeout -> request_ttl

9979622

See https://emqx.atlassian.net/wiki/spaces/P/pages/612368639/open+e5.1+remove+auto+restart+interval+from+buffer+worker+resource+options

test(pulsar_producer): fix flaky test

3e4790e

thalesmg force-pushed the unify-restart-interval-v50 branch from e6cb0ee to 3e4790e Compare June 1, 2023 16:02

thalesmg marked this pull request as ready for review June 1, 2023 17:57

thalesmg requested a review from a team as a code owner June 1, 2023 17:57

thalesmg requested a review from zmstone June 1, 2023 18:43

zmstone reviewed Jun 2, 2023

View reviewed changes

apps/emqx_resource/src/schema/emqx_resource_schema.erl Outdated Show resolved Hide resolved

thalesmg and others added 2 commits June 2, 2023 09:08

refactor: use default's type as first union member

0790c88

Co-authored-by: Zaiming (Stone) Shi <zmstone@gmail.com>

docs: add changelog

940353c

savonarola approved these changes Jun 2, 2023

View reviewed changes

thalesmg merged commit 33aa879 into emqx:master Jun 2, 2023
165 of 166 checks passed

thalesmg deleted the unify-restart-interval-v50 branch June 2, 2023 19:20

thalesmg mentioned this pull request Jun 6, 2023

refactor(bridge): update request_timeout emqx/emqx-dashboard5#1448

Merged

zmstone mentioned this pull request Jun 9, 2023

0609 fix schema for dashboard use #10990

Merged

8 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(resource): deprecate `auto_restart_interval` in favor of `health_check_interval` #10910

feat(resource): deprecate `auto_restart_interval` in favor of `health_check_interval` #10910

thalesmg commented Jun 1, 2023 •

edited by ghost

coveralls commented Jun 1, 2023 •

edited

zmstone left a comment

thalesmg commented Jun 2, 2023

yanzhiemq commented Jun 13, 2023 •

edited

yanzhiemq commented Jun 13, 2023 •

edited

feat(resource): deprecate auto_restart_interval in favor of health_check_interval #10910

feat(resource): deprecate auto_restart_interval in favor of health_check_interval #10910

Conversation

thalesmg commented Jun 1, 2023 • edited by ghost

Summary

🤖 Generated by Copilot at 34d1c9e

PR Checklist

Checklist for CI (.github/workflows) changes

coveralls commented Jun 1, 2023 • edited

Pull Request Test Coverage Report for Build 5155352642

💛 - Coveralls

zmstone left a comment

Choose a reason for hiding this comment

thalesmg commented Jun 2, 2023

yanzhiemq commented Jun 13, 2023 • edited

Enhancements

yanzhiemq commented Jun 13, 2023 • edited

增强

feat(resource): deprecate `auto_restart_interval` in favor of `health_check_interval` #10910

feat(resource): deprecate `auto_restart_interval` in favor of `health_check_interval` #10910

thalesmg commented Jun 1, 2023 •

edited by ghost

`🤖 Generated by Copilot at 34d1c9e`

coveralls commented Jun 1, 2023 •

edited

yanzhiemq commented Jun 13, 2023 •

edited

yanzhiemq commented Jun 13, 2023 •

edited