Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Context instead of timeout #7

Closed
gammazero opened this issue Oct 13, 2020 · 15 comments
Closed

Context instead of timeout #7

gammazero opened this issue Oct 13, 2020 · 15 comments
Assignees
Labels
enhancement New feature or request

Comments

@gammazero
Copy link

gammazero commented Oct 13, 2020

When creating a new WorkerPoolXT let the caller pass in a Context instead of specifying a timeout. The direct caller may need to use a context because:

  • They may need it to timeout or cancel it depending on circumstances
  • They may have been given a context that times out or gets canceled, which they must use directly or create a child context from.

Consider the case where a server is processing a client's request, and receives a context with the request. The server wants to use WorkerPoolXT to process the request, and needs to use the request's context to know if processing should be abandoned.

Also, this makes the user choose a timeout, or non at all - whichever is most appropriate. This means that the default timeout should also probably go away. A default timeout is just a guess at how long something should take, and WPXT has know way to know this. Often having no timeout is the correct thing, and is typically the expected default behavior.

@oze4
Copy link
Owner

oze4 commented Oct 13, 2020

"Under the hood" the timeout is using context but I do agree this is a better solution.

Thank you for your feedback!

@oze4
Copy link
Owner

oze4 commented Oct 13, 2020

@gammazero Again, I cannot thank you enough for your feedback! I have been toying with this today and I think I understand what you are getting at now.

I should have some updated code later today - whenever you are free, I'd love another review :)

@oze4 oze4 added the enhancement New feature or request label Oct 14, 2020
@oze4 oze4 added this to High priority in Enhancements Oct 14, 2020
@oze4
Copy link
Owner

oze4 commented Oct 14, 2020

@gammazero I was finally able to get this knocked out. While I still need to clean up the code, am I using context correctly or could it still use some work?

Thanks again!

@oze4 oze4 self-assigned this Oct 15, 2020
@oze4
Copy link
Owner

oze4 commented Oct 15, 2020

I went ahead and cleaned up the code, please, in your own time, open a new issue if you find something that looks fishy or could be improved.

This was an excellent idea and I can clearly see how it improves things a ton. Cannot thank you enough.

https://github.com/oze4/workerpoolxt/releases/tag/v1.1.0

@oze4 oze4 closed this as completed Oct 15, 2020
Enhancements automation moved this from High priority to Closed Oct 15, 2020
@oze4 oze4 mentioned this issue Oct 16, 2020
@gammazero
Copy link
Author

The implementation is not quite what I think is needed. You need to have a separate context per request (per job), so that individual requests can time out, and given that, there should probably not be a "default timeout" at all whether in the form of a duration or a context. IMO, the job timeout should be determined by a context passed into call to Submit, and it should be only up to the caller to determine the timeout for any request (even if they use some default ctx of their own). What I am thinking is that, New should not take a context at all, nor should a context be a member of WorkerPoolXT.

Then SubmitXT will look like this:

func (wp *WorkerPoolXT) SubmitXT(ctx context.Context, job *Job) {
	wp.Submit(wp.wrap(ctx, job))
}

Then your wrap function looks like this:

func (wp *WorkerPoolXT) wrap(ctx context.Context, job *Job) func() {
	// This is the func we ultimately pass to workerpool
	return func() {
		// Allow job options to override default pool options
		if j.Options == nil {
			j.Options = p.options
		}

		j.result = make(chan Result)
		j.startedAt = time.Now()

		go j.run()
		// Wait for job to complete or context to timeout/cancel
		select {
		case r := <-j.result:
			p.result <- r
		case <- ctx.Done():
			p.result <- j.errResult(j.childCtx.Err())
		}
	}
}

There are a few things to note above:

  • The job does not need a Context member, since this is passed as a separate argument.
  • The job's childCtx and done are not needed (job.done() is never called to cancel job anyway)
  • Job does not need getResult() as this is done in-line above
  • Job does not need runDone() method

@oze4 oze4 reopened this Oct 21, 2020
Enhancements automation moved this from Closed to Needs triage Oct 21, 2020
@oze4
Copy link
Owner

oze4 commented Oct 21, 2020

@gammazero I seriously cannot thank you enough. Your response and explanation mean more to me than you could imagine!

This makes so much sense to me now.

One day I will be like you lol. Sad to admit, but this has been kicking my a** for like 10 hours...

Seriously. Thank you. You are a life saver.

@gammazero
Copy link
Author

FYI - this fix demonstrated in playground

@BredSt
Copy link

BredSt commented Oct 21, 2020

@gammazero - I think that the tricky part here is how to combine the ctx.timeout with retry, from my POV the timeout should win always, and in case that there is a timeout the job should be killed to free resources, no matter how much retries you have in queue, WDYT?

@oze4
Copy link
Owner

oze4 commented Oct 21, 2020

@BredSt if we solve the issue with "lingering jobs" that also solves it for retry.

As of now, the timeout does win out over retry. There are tests for this as well. Just that jobs will keep running behind the scenes, while appearing to cancel/timeout.

@oze4
Copy link
Owner

oze4 commented Oct 21, 2020

@gammazero
Copy link
Author

gammazero commented Oct 21, 2020

@oze4 @BredSt Comments here may be helpful: gammazero/workerpool#40 (comment)

For handling the retries, the problem is the same whether workerpoolxt is involved or not. Consider that you are given a function by some outside caller. You have no idea what the function does, or if it will ever return. All you can do is to call it in a separate goroutine and wait for that goroutine to complete or give up waiting. The retry can happen when the task function completes and returns an error, but while the task function is running you are waiting on that completion or for timeout. If a timeout happens then there will be no retry when (if ever) the task completes. Even if an error is reported at this point, it is best to keep waiting for the task goroutine so that the pooled goroutine is still occupied until the task finishes.

Note that tracking task information (execution time, number or retries, etc.) is a separate concern from limiting concurrency. In other words, most of the things that workerpoolxt does could also be done if run in any goroutine. As a design idea, consider creating a Job that can be run in a plain goroutine (go job.Run()) or in a pooled goroutine (wp.Submit(job.Run)) and either way you can get stats on your job once it is completed: result <-job.Result()

@gammazero
Copy link
Author

Instead of handling the timeout yourself, by waiting for <-ctx.Done(), you can pass the context into the task function. Even though the caller defines what the function does, you are giving them a context that hopefully their function will look at to see if it is time to stop and return an appropriate error (ctx.Err()).

@oze4
Copy link
Owner

oze4 commented Oct 21, 2020

Excellent idea. I really like this.

I had that thought yesterday - instead of having the caller return something from a Task, pass a "done" func to each Task which the caller can use to aggregate results. Like x.Done(theJobResult) instead of returning a Result.

I also plan on testing all of the solutions you've provided. I also like your idea of returning a chan a lot.

Cannot thank you enough.

@oze4
Copy link
Owner

oze4 commented Oct 21, 2020

I just had an idea... What if instead of (or in addition to) passing in the context, the caller passes in the cancel func? Wouldn't this offer us more control over things?

@gammazero
Copy link
Author

@oze4 Do not pass in the cancel function. The task has no idea when to call cancel. Only the caller (or where ever ctx was created) knowns when to cancel.

@oze4 oze4 moved this from Needs triage to Closed in Enhancements Oct 25, 2020
@oze4 oze4 closed this as completed Oct 25, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
Development

No branches or pull requests

3 participants