[6X] Backdown and retry for "No route to host" in gpfdist ext table #11406

huiliang-liu · 2021-01-25T03:17:44Z

Reading from gpfdist external table can generate lots of HTTP traffic
in very short time window. This may cause intermittent network issues
resulting in "No route to host" error. Backdown and retry in such scenario.

Reuse the backdown and retry logic for writable external table as
gp_perform_backoff_and_check_response. Pass read/write specific
part as function pointer(multi_perform_work/easy_perform_work).

(cherry picked from commit 7f1589a)

This patch depends on gpfdist_retry_timeout GUC. So backport the commit of adding gpfdist_retry_timeout GUC too.
(cherry picked from commit ab73713)

Here are some reminders before you submit the pull request

Add tests for the change
Document changes
Communicate in the mailing list if needed
Pass make installcheck
Review a PR in return to support the community

* Add GUC write_to_gpfdist_timeout write_to_gpfdist_timeout controls timeout value (in seconds) for writing data to gpfdist server. Default value is 300, valid scope is [1, 7200] Set CURLOPT_TIMEOUT as write_to_gpfdist_timeout For any error, retry with double interval time, returns SQL ERROR if write_to_gpfdist_timeout is reached Add regression test for GUC writable_external_table_timeout (cherry picked from commit ab73713)

…plum-db#11401) Reading from gpfdist external table can generate lots of HTTP traffic in very short time window. This may cause intermittent network issues resulting in "No route to host" error. Backdown and retry in such scenario. Reuse the backdown and retry logic for writable external table as gp_perform_backoff_and_check_response. Pass read/write specific part as function pointer(multi_perform_work/easy_perform_work). It's difficult to reproduce such scenario in an automatical way so we only manually tested this by setting IP_TTL option of socket to 1. This will immediately result in TTL exceeded packet that cause the "No route to host" error. (cherry picked from commit 7f1589a)

The GUC controls the time that gpdb waits before returning when it connects or writes to gpfdist. It handles retry times if getting an error from socket/network. So write_to_gpfdist_timeout is not an exact name and need to be renamed.

huiliang-liu added backport version: 6X_STABLE labels Jan 25, 2021

Peifeng Qiu and others added 2 commits February 2, 2021 16:39

huiliang-liu changed the title ~~[6X backport] Add GUC write_to_gpfdist_timeout~~ [6X backport] Backdown and retry for "No route to host" in gpfdist ext table Feb 2, 2021

huiliang-liu requested review from pf-qiu and lij55 February 2, 2021 09:26

huiliang-liu changed the title ~~[6X backport] Backdown and retry for "No route to host" in gpfdist ext table~~ [6X] Backdown and retry for "No route to host" in gpfdist ext table Feb 3, 2021

pf-qiu approved these changes Feb 3, 2021

View reviewed changes

huiliang-liu mentioned this pull request Feb 7, 2021

Rename write_to_gpfdist_timeout as gpfdist_retry_timeout #11434

Merged

5 tasks

huiliang-liu merged commit 9760b15 into greenplum-db:6X_STABLE Mar 2, 2021

This was referenced Mar 11, 2021

Docs - gpfdist guc for 6X_STABLE #11634

Closed

Docs gpfdist guc 6 x stable #11636

Closed

Docs - Adding GUC gpfdist_retry_timeout to 6X_STABLE #11637

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[6X] Backdown and retry for "No route to host" in gpfdist ext table #11406

[6X] Backdown and retry for "No route to host" in gpfdist ext table #11406

huiliang-liu commented Jan 25, 2021 •

edited

[6X] Backdown and retry for "No route to host" in gpfdist ext table #11406

[6X] Backdown and retry for "No route to host" in gpfdist ext table #11406

Conversation

huiliang-liu commented Jan 25, 2021 • edited

Here are some reminders before you submit the pull request

huiliang-liu commented Jan 25, 2021 •

edited