-
Notifications
You must be signed in to change notification settings - Fork 84
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add sync lock to create job functions #75
Conversation
Add sync lock to make create job calls thread safe.
This method have a select {} at the end, so my thinking is it may be blocked for a while if we use locking there. On my view i am thinking why not set lastcall = "c" + job-handle or some random to prevent overwriting? |
The function we saved is used when we get a response that is JOB_CREATED in the processLoop() function. That's the first time we get the job handle so we can't use that. We also need to be able to find the correct handler at that point so a random value would not work. The only alternative I can think of that might work would be to make a queue instead of a single function to handle the JOB_CREATED responses we get. However I can't find it documented anywhere in gearman that the server guarantees that we get JOB_CREATED in the same order we sent the SUBMIT_JOB requests. |
Putting "c" in inner handler is safe cause it is putting the same handler for all. So no overlapping. But Dig some dipper and find out, this mapping is not required. Job failure returned with cc: @mikespook |
The function is not the same on every call because it will use the "result" go channel from the outer scope. This means that when the innerhandle["c"] function is called in processLoop() if there are several callers it will assign the job handle to the wrong job. This in turn means that we will return the wrong handle in the select statement and that means that the results of jobs may be sent as response to the wrong one. |
@No-ops , I work with @sadlil and we are trying to solve this issue in our fork https://github.com/appscode/g2. Based on my understanding, I think this problem can't be solved using a shared TCP connection between client and server. Say, multiple go routines in an application p1 are trying to submit background jobs. First go routine, submits a job with uid_1. Then second go routine submits a job with uid_2. Now, gearman server might return JOB_CREATED response with handle_1 and handle_2 in any order. In the protocol spec, no ordering in guaranteed. So, the only way to be completely sure, is to use separate TCP connection (separate client). Then gearman server will return JOB_CREATED response with respective job handle to the correct client. Using mutex to block until response for uid_1 JOB_CREATED in returned will fix this issue. But I think that may severely hamper performance of the client. Using a queue will not work, because the JOB_CREATED response does not return any field from the original client request. So, there is no way to associate a SUBMIT_JOB request with JOB_CREATED response. IMHO, this is a mistake in gearman protocol spec. All the other client requests can be done in a shared connection, since the request/response both contains the job handle. I would like to hear your thoughts, @No-ops . |
I agree that the inability to give some sort of id when doing a SUBMIT_JOB that can be used to identify the related JOB_CREATED is an oversight in the gearman protocol. I also agree that my solution will slow down the job creation significantly but seeing that it will only slow it down in the same cases where we otherwise would get a race condition I find it acceptable. I haven't looked that deep into the gearman protocol but wouldn't WORK_COMPLETE to the client be sent on the same connection that the SUBMIT_JOB was sent on? In that case the whole client library would have to work with a dedicated connection for each job. |
I am making the assumption that in case of a background job, server will always send JOB_CREATED response before sending |
I think @No-ops is correct. There is a race in the function. |
@mikespook If we unlock at line 242 we don't solve the race condition. The race condition happens because the innerhandler["c"] callback will get a new function with a different result channel before we get a response and go through the select statement. We need the call to complete before we can unlock and allow anyone to write to the innerhandler["c"] callback which means the unlock needs to happen after the select statement. |
@No-ops Ops, I think you are right. I've added you (@No-ops, @sadlil, and @tamalsaha) as collaborators and let's discuss what's the better solution for this. Thanks all of you! PS: I'm not using gearman in production environment anymore, but I'm still interesting working on gearman-go. |
Thanks for adding us as collaborator @mikespook . We think the client must be concurrent-safe. Here is my opinion on this issue:
|
Since this has been open a while now and there's no objections I'm going ahead and merge this. |
@No-ops is there any plan to move this forward with proposal? |
Do you mean the suggestion @tamalsaha made? I don't think we can eliminate the risk of having to wait for the gearman server to send JOB_CREATED completely. The "inner connection" that we create must be kept alive until the job finalizes if we want a response so making one for each job would be resource intensive and error prone. I suggest we make a connection pool for creating jobs. That way we can mitigate the problem without risking having too many long running tcp connections alive. |
Created a new issue on this #77 so we can move the discussion over there. |
…ob functions Signed-off-by: sadlil <sadlil@appscode.com>
* remove rescheduling job while running worker stopped * Seperate jobDone, jobFailed, jobFailedWithException Support CanDOTimeout * remove dependency from db & fix some error * Server and worker restart support - set timeout for every running job - monitor all running job & remove if timeout expire * report to client that job failed with timeout exception * generic db call for all. * DB test fixed * fix * Merge Pull Request mikespook/gearman-go#75, Add sync lock to create job functions Signed-off-by: sadlil <sadlil@appscode.com>
Add sync lock to make create job calls thread safe.