Skip to content

Conversation

@ptiurin
Copy link
Contributor

@ptiurin ptiurin commented Jan 24, 2022

https://docs.aws.amazon.com/elasticloadbalancing/latest/network/network-load-balancers.html#connection-idle-timeout
ALB timeout is set to 350 seconds. If any connection is kept for longer than that it will be abandoned. This leads to long-running queries being stuck.
Those two settings set keepalive mechanism on TCP to avoid the above.
Fix can be tested using

cursor.execute(
    "SELECT sleepEachRow(1) from numbers(360)", set_parameters={"advanced_mode": "1", "use_standard_sql": "0"}
)

#114 is a pre-requisite since httpcore has changed how it interacts with sockets and different way of overriding is needed with the old version.

@ptiurin
Copy link
Contributor Author

ptiurin commented Jan 24, 2022

Unit tests will succeed after #114 is merged.

Copy link
Contributor

@stepansergeevitch stepansergeevitch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we have an integration test for this case?

@ptiurin
Copy link
Contributor Author

ptiurin commented Jan 25, 2022

Can we have an integration test for this case?

We can, but it would significantly slow down integration tests execution. The test needs to run for 360 seconds.


DEFAULT_TIMEOUT_SECONDS: int = 5
KEEPALIVE_FLAG: int = 1
KEEPIDLE_RATE: int = 60 # seconds
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do I understand correctly, as long as this value is less than 350, the trick will work, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically yes, I was trying to find a typical expected value for this but it differs a lot. Wiki gives 45-60 range so I went with the latter.

Copy link
Contributor

@yuryfirebolt yuryfirebolt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally looks good

@yuryfirebolt yuryfirebolt self-requested a review January 25, 2022 12:31
@ptiurin
Copy link
Contributor Author

ptiurin commented Jan 25, 2022

As discussed offline, added integration tests here anyway. Will look into parallelising the integration tests as a separate PR.

@sonarqubecloud
Copy link

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 0 Code Smells

87.6% 87.6% Coverage
0.0% 0.0% Duplication

@ptiurin ptiurin merged commit d241cca into main Jan 25, 2022
@ptiurin ptiurin deleted the tcp-timeout-fix branch January 25, 2022 16:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants