hds: prevent timer reset on every update #5977

sschepens · 2019-02-15T15:04:18Z

If we push frequent updates but with the same interval, we may never get responses back because the timer gets reset on every update.
This change makes a reset only if the interval has changed or a disconnect ocurred.

Signed-off-by: Sebastian Schepens <sebastian.schepens@mercadolibre.com>

sschepens · 2019-02-15T15:05:11Z

@markatou @htuch could you take a look at this please?

htuch

Thanks, looks legit, can you add a test case in hds_integration_test to cover this behavior?

sschepens · 2019-02-15T17:28:27Z

@htuch i was wondering how to test this in an integration test. Do you have any ideas?
I was thinking about about making an update loop, but then verify that reports have been received, but this seems somehow tricky.

htuch · 2019-02-19T15:37:36Z

@sschepens I think some variant of

envoy/test/integration/hds_integration_test.cc

Line 341 in 90a7a3e

    
           // Tests Envoy TCP health checking an endpoint that doesn't respond and reporting that it is

with multiple updates and a sufficiently long timeout to cover them should work. Bonus points for using Test/SimulatedTimeSystem (not sure how easy it is to use inside integration tests yet).

sschepens · 2019-02-19T20:04:23Z

@htuch we have 2 things that could be tested, one is the update of the interval, if I change the interval I should receive a new response in the new interval. The other thing is the bug of the timer being reset everytime an update is received.
Not sure if we want to test both, testing an update on the interval is probably trivial, but testing the timer reset is somehow tricky, i was thinking of making a loop of update+responseCheck with a wait time lower than the interval, if the timer is reset everytime, this would generate a neverending loop.

htuch · 2019-02-19T20:21:48Z

@sschepens

i was thinking of making a loop of update+responseCheck with a wait time lower than the interval, if the timer is reset everytime, this would generate a neverending loop.

Maybe just iterate some arbitrary number of times, e.g. 10, and see if we observe 10 responses?

Signed-off-by: Sebastian Schepens <sebastian.schepens@mercadolibre.com>

sschepens · 2019-02-19T21:54:39Z

@htuch

Maybe just iterate some arbitrary number of times, e.g. 10, and see if we observe 10 responses?

We would need to sleep enough in each iteration to allow for responses to arrive, would that be OK?

I just committed a test for interval update, could you take a look at it please?

test/integration/hds_integration_test.cc

htuch · 2019-02-19T23:07:40Z

Interval update test looks good (although I wish we could use simulated time, will need to check with @jmarantz when he is back on this). I'd take as guidance that we don't want these tests to take more than a few seconds each as the upper bound on any real-time delays you'll add.

Signed-off-by: Sebastian Schepens <sebastian.schepens@mercadolibre.com>

stale · 2019-02-27T23:01:55Z

This pull request has been automatically marked as stale because it has not had activity in the last 7 days. It will be closed in 7 days if no further activity occurs. Please feel free to give a status update now, ping for review, or re-open when it's ready. Thank you for your contributions!

jmarantz · 2019-02-27T23:10:17Z

+1 for using simulated time, which is definitely done in at least one integration test:

envoy/test/integration/integration_admin_test.cc

Line 454 in 50c2357

public Event::SimulatedTimeSystem,

I'm not 100% sure whether that will work in this case, because if the timing is done in grpc itself it won't be using the Envoy time system abstraction.

It's easy enough to tell: just switch to a SimulatedTimeSystem per the pattern above, and then run the test. It will run very fast, and probably fail 99% of the time, or work 100% of the time.

Signed-off-by: Sebastian Schepens <sebastian.schepens@mercadolibre.com>

sschepens · 2019-02-28T14:32:27Z

It's easy enough to tell: just switch to a SimulatedTimeSystem per the pattern above, and then run the test. It will run very fast, and probably fail 99% of the time, or work 100% of the time.

@jmarantz it would seem that Event::SimulatedTimeSystem conflicts with BaseIntegrationTest they both provide a timeSystem() getter, which is actually being used in hds_integration_test.
what would be the way to work around this?

jmarantz · 2019-02-28T14:37:18Z

See the example I pointed to. Just inherit from SimulatedTimeSystem in your test class.

TimeSystem is the base class for SimulatedTimeSystem.

Signed-off-by: Sebastian Schepens <sebastian.schepens@mercadolibre.com>

sschepens · 2019-02-28T14:48:08Z

Maybe make the time skew 250ms instead of 100ms between expect/timeout, that should hopefully not flake.

@htuch not excatly sure what you meant here, but I increased the timeouts to avoid flakes

jmarantz · 2019-02-28T15:25:24Z

@sschepens ping me on slack if you have more questions about how sim-time works with integration tests. The the admin integration test I described above inherits indirectly from BaseIntegrationTest but it just inherits from SimulatedTimeSystem first, which establishes what time-system to use for the duration of the test.

htuch · 2019-02-28T19:24:41Z

@sschepens Looks good, but could you chat with @jmarantz and figure out if we could use simulated time here without too much extra work? Thanks.

jmarantz · 2019-02-28T23:00:29Z

SimulatedTimeSystem works with this test, but actually (and surprisingly to me) makes it 10x slower. I want to go investigate why that is and follow up. In the meantime I suggest going forward with this as is, using real time and I can address later.

htuch · 2019-02-28T23:36:22Z

@jmarantz ack, thanks for looking into this folks.

htuch

Thanks!

If we push frequent updates but with the same interval, we may never get responses back because the timer gets reset on every update. This change makes a reset only if the interval has changed or a disconnect ocurred. Signed-off-by: Sebastian Schepens <sebastian.schepens@mercadolibre.com> Signed-off-by: Fred Douglas <fredlas@google.com>

hds: prevent timer reset on every update

fd50174

Signed-off-by: Sebastian Schepens <sebastian.schepens@mercadolibre.com>

htuch reviewed Feb 15, 2019

View reviewed changes

htuch added the waiting label Feb 15, 2019

htuch self-assigned this Feb 15, 2019

add test for timer update

4277f73

Signed-off-by: Sebastian Schepens <sebastian.schepens@mercadolibre.com>

repokitteh-read-only bot removed the waiting label Feb 19, 2019

htuch reviewed Feb 19, 2019

View reviewed changes

test/integration/hds_integration_test.cc Show resolved Hide resolved

Merge remote-tracking branch 'envoyproxy/master' into hds-timer

27c2953

Signed-off-by: Sebastian Schepens <sebastian.schepens@mercadolibre.com>

htuch added the waiting label Feb 20, 2019

stale bot added the stale stalebot believes this issue/PR has not been touched recently label Feb 27, 2019

stale bot removed the stale stalebot believes this issue/PR has not been touched recently label Feb 27, 2019

Merge remote-tracking branch 'envoyproxy/master' into hds-timer

9e5716b

Signed-off-by: Sebastian Schepens <sebastian.schepens@mercadolibre.com>

repokitteh-read-only bot removed the waiting label Feb 28, 2019

increase timeouts to avoid flakes

89cf57a

Signed-off-by: Sebastian Schepens <sebastian.schepens@mercadolibre.com>

htuch added the waiting:any label Feb 28, 2019

repokitteh-read-only bot removed the waiting:any label Feb 28, 2019

htuch approved these changes Feb 28, 2019

View reviewed changes

htuch merged commit 8c3321e into envoyproxy:master Feb 28, 2019

sschepens deleted the hds-timer branch March 1, 2019 02:24

htuch mentioned this pull request Jul 22, 2020

test: hds_integration_test failures with tsan #12184

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

hds: prevent timer reset on every update #5977

hds: prevent timer reset on every update #5977

sschepens commented Feb 15, 2019

sschepens commented Feb 15, 2019

htuch left a comment

sschepens commented Feb 15, 2019

htuch commented Feb 19, 2019

sschepens commented Feb 19, 2019

htuch commented Feb 19, 2019

sschepens commented Feb 19, 2019 •

edited

Loading

htuch commented Feb 19, 2019

stale bot commented Feb 27, 2019

jmarantz commented Feb 27, 2019

sschepens commented Feb 28, 2019

jmarantz commented Feb 28, 2019

sschepens commented Feb 28, 2019

jmarantz commented Feb 28, 2019

htuch commented Feb 28, 2019

jmarantz commented Feb 28, 2019

htuch commented Feb 28, 2019

htuch left a comment

hds: prevent timer reset on every update #5977

hds: prevent timer reset on every update #5977

Conversation

sschepens commented Feb 15, 2019

sschepens commented Feb 15, 2019

htuch left a comment

Choose a reason for hiding this comment

sschepens commented Feb 15, 2019

htuch commented Feb 19, 2019

sschepens commented Feb 19, 2019

htuch commented Feb 19, 2019

sschepens commented Feb 19, 2019 • edited Loading

htuch commented Feb 19, 2019

stale bot commented Feb 27, 2019

jmarantz commented Feb 27, 2019

sschepens commented Feb 28, 2019

jmarantz commented Feb 28, 2019

sschepens commented Feb 28, 2019

jmarantz commented Feb 28, 2019

htuch commented Feb 28, 2019

jmarantz commented Feb 28, 2019

htuch commented Feb 28, 2019

htuch left a comment

Choose a reason for hiding this comment

sschepens commented Feb 19, 2019 •

edited

Loading