Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix a rare race condition in treeaction timeout detection #38

Merged
merged 3 commits into from
Jan 26, 2021

Conversation

zhengmu
Copy link
Collaborator

@zhengmu zhengmu commented Jan 22, 2021

No description provided.

@TravisOnGit
Copy link
Member

TravisOnGit commented Jan 26, 2021

                break;

Iteration 1 introduced a small behavior change that I think we should address. If we hit this break statement (line 798), we are now going to check if we will hit timeout before throwing ActionTimeoutException. I'm leaning towards keeping the original behavior of throwing exception in this case.

Example 1) Do not retry, and timeout after 60 seconds.
"Timeout": 600000,
"ContinuationOnTimeout": true

Let's say the Action returned unsuccessfully close to the 60 second timeout. It hits this break statement since no retry policy is set and no continuation is set. We then check if it hits a timeout (new code). We discover that it does/will hit a timeout, so we commit TimeoutOnAction.

Example 2) Retry 3 times, then fail. If it times out, continue.
"RetryPolicy": {
"Type": "FixedCount",
"MaxRetryCount": 3,
"MinBackoffMs": 10000
}
"Timeout": 600000,
"ContinuationOnRetryExhaustion": false
"ContinuationOnTimeout": true

In this example, let's say we finish our 3 retries after 55 seconds. We hit this break statement. We then check if it hits a timeout (new code). We discover that it will hit a timeout, since 55 (stopwatch) + 10 (waitTime) > 60 (timeout). Though it is true it would hit a timeout if we waited for another waitTime cycle, but in this case we have already exhausted our retries. It does not make sense to check timeout logic after we know we've hit retry exhausted. #Closed


Refers to: Forge.TreeWalker/src/TreeWalkerSession.cs:798 in 3173bfb. [](commit_id = 3173bfb, deletion_comment = False)

@TravisOnGit
Copy link
Member

                break;

Considering solutions to this. An easy way forward I believe would be to duplicate your changes for ContinuationOnRetryExhaustion. Add a check outside the while loop, before the timeout check.

There may be a more clever way to solve this without having to duplicate the checks.


In reply to: 767879309 [](ancestors = 767879309)


Refers to: Forge.TreeWalker/src/TreeWalkerSession.cs:798 in 3173bfb. [](commit_id = 3173bfb, deletion_comment = False)

return;
}
}

// Retries are exhausted. Throw ActionTimeoutException with executeAction exception as innerException.
Copy link
Member

@TravisOnGit TravisOnGit Jan 26, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Retries are exhausted [](start = 15, length = 21)

Let's update this message as well since it should be hitting timeouts and not retries exhausted. #Resolved

Copy link
Member

@TravisOnGit TravisOnGit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@zhengmu zhengmu merged commit 23f4846 into master Jan 26, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants