Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[6X] Backdown and retry for "No route to host" in gpfdist ext table #11406

Merged

Conversation

huiliang-liu
Copy link
Member

@huiliang-liu huiliang-liu commented Jan 25, 2021

Reading from gpfdist external table can generate lots of HTTP traffic
in very short time window. This may cause intermittent network issues
resulting in "No route to host" error. Backdown and retry in such scenario.

Reuse the backdown and retry logic for writable external table as
gp_perform_backoff_and_check_response. Pass read/write specific
part as function pointer(multi_perform_work/easy_perform_work).

(cherry picked from commit 7f1589a)

This patch depends on gpfdist_retry_timeout GUC. So backport the commit of adding gpfdist_retry_timeout GUC too.
(cherry picked from commit ab73713)

Here are some reminders before you submit the pull request

  • Add tests for the change
  • Document changes
  • Communicate in the mailing list if needed
  • Pass make installcheck
  • Review a PR in return to support the community

* Add GUC write_to_gpfdist_timeout

write_to_gpfdist_timeout controls timeout value (in seconds) for writing data to gpfdist server. Default value is 300, valid scope is [1, 7200]

Set CURLOPT_TIMEOUT as write_to_gpfdist_timeout
For any error, retry with double interval time, returns SQL ERROR if write_to_gpfdist_timeout is reached

Add regression test for GUC writable_external_table_timeout

(cherry picked from commit ab73713)
Peifeng Qiu and others added 2 commits February 2, 2021 16:39
…plum-db#11401)

Reading from gpfdist external table can generate lots of HTTP traffic
in very short time window. This may cause intermittent network issues
resulting in "No route to host" error. Backdown and retry in such
scenario.

Reuse the backdown and retry logic for writable external table as
gp_perform_backoff_and_check_response. Pass read/write specific
part as function pointer(multi_perform_work/easy_perform_work).

It's difficult to reproduce such scenario in an automatical way so
we only manually tested this by setting IP_TTL option of socket to 1.
This will immediately result in TTL exceeded packet that cause the
"No route to host" error.

(cherry picked from commit 7f1589a)
The GUC controls the time that gpdb waits before returning when it
connects or writes to gpfdist. It handles retry times if getting an
error from socket/network.
So write_to_gpfdist_timeout is not an exact name and need to be renamed.
@huiliang-liu huiliang-liu changed the title [6X backport] Add GUC write_to_gpfdist_timeout [6X backport] Backdown and retry for "No route to host" in gpfdist ext table Feb 2, 2021
@huiliang-liu huiliang-liu changed the title [6X backport] Backdown and retry for "No route to host" in gpfdist ext table [6X] Backdown and retry for "No route to host" in gpfdist ext table Feb 3, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants