Context instead of timeout #7

gammazero · 2020-10-13T16:26:47Z

When creating a new WorkerPoolXT let the caller pass in a Context instead of specifying a timeout. The direct caller may need to use a context because:

They may need it to timeout or cancel it depending on circumstances
They may have been given a context that times out or gets canceled, which they must use directly or create a child context from.

Consider the case where a server is processing a client's request, and receives a context with the request. The server wants to use WorkerPoolXT to process the request, and needs to use the request's context to know if processing should be abandoned.

Also, this makes the user choose a timeout, or non at all - whichever is most appropriate. This means that the default timeout should also probably go away. A default timeout is just a guess at how long something should take, and WPXT has know way to know this. Often having no timeout is the correct thing, and is typically the expected default behavior.

The text was updated successfully, but these errors were encountered:

oze4 · 2020-10-13T18:00:57Z

"Under the hood" the timeout is using context but I do agree this is a better solution.

Thank you for your feedback!

oze4 · 2020-10-13T20:05:29Z

@gammazero Again, I cannot thank you enough for your feedback! I have been toying with this today and I think I understand what you are getting at now.

I should have some updated code later today - whenever you are free, I'd love another review :)

oze4 · 2020-10-14T23:15:30Z

@gammazero I was finally able to get this knocked out. While I still need to clean up the code, am I using context correctly or could it still use some work?

Thanks again!

oze4 · 2020-10-15T03:36:03Z

I went ahead and cleaned up the code, please, in your own time, open a new issue if you find something that looks fishy or could be improved.

This was an excellent idea and I can clearly see how it improves things a ton. Cannot thank you enough.

https://github.com/oze4/workerpoolxt/releases/tag/v1.1.0

gammazero · 2020-10-21T06:02:40Z

The implementation is not quite what I think is needed. You need to have a separate context per request (per job), so that individual requests can time out, and given that, there should probably not be a "default timeout" at all whether in the form of a duration or a context. IMO, the job timeout should be determined by a context passed into call to Submit, and it should be only up to the caller to determine the timeout for any request (even if they use some default ctx of their own). What I am thinking is that, New should not take a context at all, nor should a context be a member of WorkerPoolXT.

Then SubmitXT will look like this:

func (wp *WorkerPoolXT) SubmitXT(ctx context.Context, job *Job) {
	wp.Submit(wp.wrap(ctx, job))
}

Then your wrap function looks like this:

func (wp *WorkerPoolXT) wrap(ctx context.Context, job *Job) func() {
	// This is the func we ultimately pass to workerpool
	return func() {
		// Allow job options to override default pool options
		if j.Options == nil {
			j.Options = p.options
		}

		j.result = make(chan Result)
		j.startedAt = time.Now()

		go j.run()
		// Wait for job to complete or context to timeout/cancel
		select {
		case r := <-j.result:
			p.result <- r
		case <- ctx.Done():
			p.result <- j.errResult(j.childCtx.Err())
		}
	}
}

There are a few things to note above:

The job does not need a Context member, since this is passed as a separate argument.
The job's childCtx and done are not needed (job.done() is never called to cancel job anyway)
Job does not need getResult() as this is done in-line above
Job does not need runDone() method

oze4 · 2020-10-21T06:15:29Z

@gammazero I seriously cannot thank you enough. Your response and explanation mean more to me than you could imagine!

This makes so much sense to me now.

One day I will be like you lol. Sad to admit, but this has been kicking my a** for like 10 hours...

Seriously. Thank you. You are a life saver.

gammazero · 2020-10-21T06:51:45Z

FYI - this fix demonstrated in playground

BredSt · 2020-10-21T07:55:52Z

@gammazero - I think that the tricky part here is how to combine the ctx.timeout with retry, from my POV the timeout should win always, and in case that there is a timeout the job should be killed to free resources, no matter how much retries you have in queue, WDYT?

oze4 · 2020-10-21T08:08:04Z

@BredSt if we solve the issue with "lingering jobs" that also solves it for retry.

As of now, the timeout does win out over retry. There are tests for this as well. Just that jobs will keep running behind the scenes, while appearing to cancel/timeout.

oze4 · 2020-10-21T08:10:20Z

@BredSt see these links:

gammazero/workerpool#40 (comment)

gammazero · 2020-10-21T17:22:16Z

@oze4 @BredSt Comments here may be helpful: gammazero/workerpool#40 (comment)

For handling the retries, the problem is the same whether workerpoolxt is involved or not. Consider that you are given a function by some outside caller. You have no idea what the function does, or if it will ever return. All you can do is to call it in a separate goroutine and wait for that goroutine to complete or give up waiting. The retry can happen when the task function completes and returns an error, but while the task function is running you are waiting on that completion or for timeout. If a timeout happens then there will be no retry when (if ever) the task completes. Even if an error is reported at this point, it is best to keep waiting for the task goroutine so that the pooled goroutine is still occupied until the task finishes.

Note that tracking task information (execution time, number or retries, etc.) is a separate concern from limiting concurrency. In other words, most of the things that workerpoolxt does could also be done if run in any goroutine. As a design idea, consider creating a Job that can be run in a plain goroutine (go job.Run()) or in a pooled goroutine (wp.Submit(job.Run)) and either way you can get stats on your job once it is completed: result <-job.Result()

gammazero · 2020-10-21T18:44:40Z

Instead of handling the timeout yourself, by waiting for <-ctx.Done(), you can pass the context into the task function. Even though the caller defines what the function does, you are giving them a context that hopefully their function will look at to see if it is time to stop and return an appropriate error (ctx.Err()).

oze4 · 2020-10-21T18:54:03Z

Excellent idea. I really like this.

I had that thought yesterday - instead of having the caller return something from a Task, pass a "done" func to each Task which the caller can use to aggregate results. Like x.Done(theJobResult) instead of returning a Result.

I also plan on testing all of the solutions you've provided. I also like your idea of returning a chan a lot.

Cannot thank you enough.

oze4 · 2020-10-21T20:20:32Z

I just had an idea... What if instead of (or in addition to) passing in the context, the caller passes in the cancel func? Wouldn't this offer us more control over things?

gammazero · 2020-10-22T03:18:34Z

@oze4 Do not pass in the cancel function. The task has no idea when to call cancel. Only the caller (or where ever ctx was created) knowns when to cancel.

oze4 added the enhancement New feature or request label Oct 14, 2020

oze4 self-assigned this Oct 15, 2020

oze4 closed this as completed Oct 15, 2020

oze4 mentioned this issue Oct 16, 2020

Using timeouts #8

Closed

oze4 reopened this Oct 21, 2020

oze4 mentioned this issue Oct 21, 2020

Stop job while running gammazero/workerpool#40

Closed

BredSt mentioned this issue Oct 22, 2020

Jobs still run after context is done #10

Closed

oze4 mentioned this issue Oct 25, 2020

using workerpool in production #12

Closed

oze4 closed this as completed Oct 25, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Context instead of timeout #7

Context instead of timeout #7

gammazero commented Oct 13, 2020 •

edited

Loading

oze4 commented Oct 13, 2020

oze4 commented Oct 13, 2020

oze4 commented Oct 14, 2020 •

edited

Loading

oze4 commented Oct 15, 2020

gammazero commented Oct 21, 2020

oze4 commented Oct 21, 2020

gammazero commented Oct 21, 2020

BredSt commented Oct 21, 2020 •

edited

Loading

oze4 commented Oct 21, 2020

oze4 commented Oct 21, 2020

gammazero commented Oct 21, 2020 •

edited

Loading

gammazero commented Oct 21, 2020

oze4 commented Oct 21, 2020

oze4 commented Oct 21, 2020

gammazero commented Oct 22, 2020

Context instead of timeout #7

Context instead of timeout #7

Comments

gammazero commented Oct 13, 2020 • edited Loading

oze4 commented Oct 13, 2020

oze4 commented Oct 13, 2020

oze4 commented Oct 14, 2020 • edited Loading

oze4 commented Oct 15, 2020

gammazero commented Oct 21, 2020

oze4 commented Oct 21, 2020

gammazero commented Oct 21, 2020

BredSt commented Oct 21, 2020 • edited Loading

oze4 commented Oct 21, 2020

oze4 commented Oct 21, 2020

gammazero commented Oct 21, 2020 • edited Loading

gammazero commented Oct 21, 2020

oze4 commented Oct 21, 2020

oze4 commented Oct 21, 2020

gammazero commented Oct 22, 2020

gammazero commented Oct 13, 2020 •

edited

Loading

oze4 commented Oct 14, 2020 •

edited

Loading

BredSt commented Oct 21, 2020 •

edited

Loading

gammazero commented Oct 21, 2020 •

edited

Loading