Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multiple executions for a single job. #28

Closed
dgioulakis opened this issue Mar 18, 2019 · 6 comments
Closed

Multiple executions for a single job. #28

dgioulakis opened this issue Mar 18, 2019 · 6 comments

Comments

@dgioulakis
Copy link

Please see: HangfireIO/Hangfire#1025 (comment)

Doesn't seem memory storage is actually dequeuing the job or marking it as dequeued (I don't know what the correct behavior should be).

config.UseMemoryStorage(
    new MemoryStorageOptions
    {
        FetchNextJobTimeout = TimeSpan.FromSeconds(10)
    });
public class MyJobPerformer
{
    private readonly string _performerId;
    public MyJobPerformer()
    {
        _performerId = Guid.NewGuid().ToString("N");
    }

    public async Task Perform(RequestBase request)
    {
        Console.WriteLine($"{_performerId}: {DateTime.UtcNow}");
        await Task.Delay(TimeSpan.FromSeconds(5));
        Console.WriteLine($"{_performerId}: {DateTime.UtcNow}");
        await Task.Delay(TimeSpan.FromSeconds(5));
        Console.WriteLine($"{_performerId}: {DateTime.UtcNow}");
        await Task.Delay(TimeSpan.FromSeconds(5));
        Console.WriteLine($"{_performerId}: {DateTime.UtcNow}");
    }
}

Results

69b3501a7a284d0c88b030abde997810: 3/17/19 11:38:20 PM
69b3501a7a284d0c88b030abde997810: 3/17/19 11:38:25 PM
69b3501a7a284d0c88b030abde997810: 3/17/19 11:38:30 PM
6d0b3551c5354633b7d572da2cd5abc8: 3/17/19 11:38:32 PM <<< Duplicate job execution! 10 seconds lapsed.
69b3501a7a284d0c88b030abde997810: 3/17/19 11:38:35 PM
6d0b3551c5354633b7d572da2cd5abc8: 3/17/19 11:38:37 PM
6d0b3551c5354633b7d572da2cd5abc8: 3/17/19 11:38:42 PM
6d0b3551c5354633b7d572da2cd5abc8: 3/17/19 11:38:47 PM

One job enqueue results in multiple worker executions. It's even worse if FetchNextJobTimeout is reduced further. I'm not sure whether this is a problem with Hangfire, MemoryStorage, or both.

@dgioulakis
Copy link
Author

dgioulakis commented Mar 18, 2019

I believe the problematic or curious line is:

&& (!q.FetchedAt.HasValue || q.FetchedAt.Value < timeout)

|| q.FetchedAt.Value < timeout
Can anyone explain why this provider is allowing already "fetched" jobs to be re-dequeued? As far as I can tell, FetchNextJobTimeout is essentially a window of time in which a job must complete before it gets selected again for execution.

Why should the storage connection fetch a job that's already been fetched? Or why are jobs not really dequeued here?

Update

https://github.com/HangfireIO/Hangfire/blob/3186f81549e068500192709764293d47b28317d6/src/Hangfire.Core/Server/Worker.cs#L140

Still digging through; looks like Hangfire only dequeues after the job has completed processing. So you may have jobs that continue processing longer than FetchNextJobTimeout causing them to be re-executed by an additional concurrent worker process. Something doesn't seem right here.

@dgioulakis
Copy link
Author

Same problem referenced here: HangfireIO/Hangfire#1197

@perrich
Copy link
Owner

perrich commented Mar 18, 2019

Hi,
I think it's by design to retrieve not completed job.
You can define another timeout for instance 24 hours, by set FetchNextJobTimeout in the MemoryStorageOptions

@praveenlobo7
Copy link

Hi,
Do we have any solution for this? We are using Oracle provider and we see the same issue.

@dgioulakis
Copy link
Author

dgioulakis commented Mar 21, 2019

Why would we want to retrieve a job that's currently being processed by a different Worker?

Update

Looks like this is partially the answer: HangfireIO/Hangfire#936 (comment)
Like InvisibilityTimeout and SlidingInvisibilityTimeout in the SQL Server storage implementation, FetchNextJob is really a window of time in which a job MUST complete or else may be started again by another Worker either concurrently, or upon next dequeue from the storage connection. I just don't understand why this can't be automated by Hangfire given the we can find out a job's state. Perhaps that's not guaranteed?

@haga2112
Copy link

We no longer have the FetchNextJobTimeout property in version 1.7 (memory storage 1.4), I don't know what to do

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants