Skip to content

Conversation

@danzh2010
Copy link
Contributor

Signed-off-by: Dan Zhang danzh@google.com

Commit Message: Add a new state FailedRecently to the status tracker. And make the connectivity grid check this state in newStream(). If the h3 pool is in this state, do not delay TCP racing.

Risk Level: low, grid only
Testing: added unit tests
Docs Changes: N/A
Release Notes: N/A
Platform Specific Features: N/A
Part of #18715

Signed-off-by: Dan Zhang <danzh@google.com>
Signed-off-by: Dan Zhang <danzh@google.com>
Signed-off-by: Dan Zhang <danzh@google.com>
@danzh2010
Copy link
Contributor Author

failed thrift_proxy:integration_test on Windows seems unrelated.

@danzh2010
Copy link
Contributor Author

/assign @RyanTheOptimist

@danzh2010
Copy link
Contributor Author

/retest

@repokitteh-read-only
Copy link

Retrying Azure Pipelines:
Retried failed jobs in: envoy-presubmit

🐱

Caused by: a #20722 (comment) was created by @danzh2010.

see: more, trace.

Copy link
Contributor

@RyanTheOptimist RyanTheOptimist left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great!

auto attempt = std::make_unique<ConnectionAttemptCallbacks>(*this, current_);
LinkedList::moveIntoList(std::move(attempt), connection_attempts_);
if (!next_attempt_timer_->enabled()) {
if (next_attempt_timer_ != nullptr && !next_attempt_timer_->enabled()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought that next_attempt_timer_ was initialized in the constructor and could never be null. Is this check needed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right. It's not needed

EXPECT_FALSE(tracker_.isHttp3Confirmed());

EXPECT_CALL(*timer_, enabled()).WillOnce(Return(false));
EXPECT_CALL(*timer_, enableTimer(std::chrono::milliseconds(5 * 60 * 1000), nullptr));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we also test what happens when the timer fires? Or, maybe we tested that earlier?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We tested in MarkBrokenThenExpires that when timer fires, isHttp3Broken() returns false and hasHttp3FailedRecently() returns true

PoolIterator pool = pools_.begin();
if (!shouldAttemptHttp3() || !options.can_use_http3_) {
Instance::StreamOptions overriding_options(options);
bool delay_tcp_attempt{true};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: we have two initializations, one uses {} and the other (). Could they both use {} (or even = to be more idiomatic)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

EXPECT_NE(grid_->first(), nullptr);
// The 2nd pool should be TCP pool and it should have been created together with h3 pool.
EXPECT_NE(grid_->second(), nullptr);
EXPECT_EQ(2u, grid_->callbacks_.size());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not quite sure I understand how this test that the TCP attempt is not delayed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By checking EXPECT_NE(grid_->second(), nullptr). The 2nd pool (TCP) is created immediately with the quic pool without waiting for the alarm to fire.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, thanks. I see.

}
if (!delay_tcp_attempt) {
// Immediately start TCP attempt if HTTP/3 failed recently.
wrapped_callbacks_.front()->tryAnotherConnection();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we stop the failover timer at this point since there is nothing else to try?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

next_attempt_timer_ is not accessible from the grid, so I left as is. The alarm should trigger another call to tryAnotherConnection() and early returns if there is no next pool. We definitely can cancel it explicitly by adding a getter interface, but I'm wondering if it's worth it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, fair enough!

Signed-off-by: Dan Zhang <danzh@google.com>
EXPECT_NE(grid_->first(), nullptr);
// The 2nd pool should be TCP pool and it should have been created together with h3 pool.
EXPECT_NE(grid_->second(), nullptr);
EXPECT_EQ(2u, grid_->callbacks_.size());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, thanks. I see.

}
if (!delay_tcp_attempt) {
// Immediately start TCP attempt if HTTP/3 failed recently.
wrapped_callbacks_.front()->tryAnotherConnection();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, fair enough!

@RyanTheOptimist RyanTheOptimist enabled auto-merge (squash) April 11, 2022 17:22
@RyanTheOptimist
Copy link
Contributor

FYI: I've enabled auto-merge so once CI passes it should land automatically.

@danzh2010
Copy link
Contributor Author

/retest

@repokitteh-read-only
Copy link

Retrying Azure Pipelines:
Retried failed jobs in: envoy-presubmit

🐱

Caused by: a #20722 (comment) was created by @danzh2010.

see: more, trace.

@RyanTheOptimist RyanTheOptimist merged commit 833ed92 into envoyproxy:main Apr 11, 2022
vehre-x41 pushed a commit to vehre-x41/envoy that referenced this pull request Apr 19, 2022
…oyproxy#20722)

Commit Message: Add a new state FailedRecently to the status tracker. And make the connectivity grid check this state in newStream(). If the h3 pool is in this state, do not delay TCP racing.

Risk Level: low, grid only
Testing: added unit tests
Docs Changes: N/A
Release Notes: N/A
Platform Specific Features: N/A
Part of envoyproxy#18715

Signed-off-by: Dan Zhang danzh@google.com
Signed-off-by: Andre Vehreschild <vehre@x41-dsec.de>
ravenblackx pushed a commit to ravenblackx/envoy that referenced this pull request Jun 8, 2022
…oyproxy#20722)

Commit Message: Add a new state FailedRecently to the status tracker. And make the connectivity grid check this state in newStream(). If the h3 pool is in this state, do not delay TCP racing.

Risk Level: low, grid only
Testing: added unit tests
Docs Changes: N/A
Release Notes: N/A
Platform Specific Features: N/A
Part of envoyproxy#18715

Signed-off-by: Dan Zhang danzh@google.com
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants