Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with
or
.
Download ZIP

Loading…

Quartz job periodically fails in clustered environment with "Transaction not connected, or was disconnected" exception #91

Closed
IlyaGrebnov opened this Issue · 9 comments

4 participants

@IlyaGrebnov

We upgraded yesterday. Unfortunately issue is not fixed. We still see "Transaction not connected, or was disconnected" exception.

Environment:
We are using SQL Server 2012 running in Windows Azure Virtual Machine.

Configuration:

            properties["quartz.scheduler.instanceName"]                 = "default";
            properties["quartz.scheduler.instanceId"]                   = "AUTO";
            properties["quartz.threadPool.type"]                        = "Quartz.Simpl.SimpleThreadPool, Quartz";
            properties["quartz.threadPool.threadCount"]                 = "10";
            properties["quartz.threadPool.threadPriority"]              = "Normal";
            properties["quartz.jobStore.type"]                          = "Quartz.Impl.AdoJobStore.JobStoreTX, Quartz";
            properties["quartz.jobStore.tablePrefix"]                   = "QRTZ_";
            properties["quartz.jobStore.clustered"]                     = "true";
            properties["quartz.jobStore.driverDelegateType"]            = "Quartz.Impl.AdoJobStore.SqlServerDelegate, Quartz";
            properties["quartz.jobStore.dataSource"]                    = "default";
            properties["quartz.jobStore.useProperties"]                 = "true";
            properties["quartz.dataSource.default.provider"]            = "SqlServer-20";
            properties["quartz.dataSource.default.connectionString"]    = connectionString;

            this.Scheduler = new StdSchedulerFactory(properties).GetScheduler();

Exception:

2013-01-04 18:13:34,933 [default_QuartzSchedulerThread] ERROR Quartz.Impl.AdoJobStore.JobStoreTX - Couldn't rollback ADO.NET connection. Transaction not connected, or was disconnected

System.Data.DataException: Transaction not connected, or was disconnected
at Quartz.Impl.AdoJobStore.JobStoreSupport.RollbackConnection(ConnectionAndTransactionHolder cth) in c:\Work\OpenSource\Quartz.NET\quartznet\src\Quartz\Impl\AdoJobStore\JobStoreSupport.cs:line 3251
2013-01-04 18:13:34,933 [default_QuartzSchedulerThread] ERROR Quartz.Core.ErrorLogger - An error occurred while scanning for the next trigger to fire.

Quartz.JobPersistenceException: Couldn't acquire next trigger: Transaction (Process ID 53) was deadlocked on lock resources with another process and has been chosen as the deadlock victim. Rerun the transaction. ---> System.Data.SqlClient.SqlException: Transaction (Process ID 53) was deadlocked on lock resources with another process and has been chosen as the deadlock victim. Rerun the transaction.
at System.Data.SqlClient.SqlConnection.OnError(SqlException exception, Boolean breakConnection, Action1 wrapCloseInAction)
at System.Data.SqlClient.TdsParser.ThrowExceptionAndWarning(TdsParserStateObject stateObj, Boolean callerHasConnectionLock, Boolean asyncClose)
at System.Data.SqlClient.TdsParser.TryRun(RunBehavior runBehavior, SqlCommand cmdHandler, SqlDataReader dataStream, BulkCopySimpleResultSet bulkCopyHandler, TdsParserStateObject stateObj, Boolean& dataReady)
at System.Data.SqlClient.SqlDataReader.TryHasMoreRows(Boolean& moreRows)
at System.Data.SqlClient.SqlDataReader.TryReadInternal(Boolean setTimeout, Boolean& more)
at System.Data.SqlClient.SqlDataReader.Read()
at Quartz.Impl.AdoJobStore.StdAdoDelegate.SelectTriggerToAcquire(ConnectionAndTransactionHolder conn, DateTimeOffset noLaterThan, DateTimeOffset noEarlierThan, Int32 maxCount) in c:\Work\OpenSource\Quartz.NET\quartznet\src\Quartz\Impl\AdoJobStore\StdAdoDelegate.cs:line 2099
at Quartz.Impl.AdoJobStore.JobStoreSupport.AcquireNextTrigger(ConnectionAndTransactionHolder conn, DateTimeOffset noLaterThan, Int32 maxCount, TimeSpan timeWindow) in c:\Work\OpenSource\Quartz.NET\quartznet\src\Quartz\Impl\AdoJobStore\JobStoreSupport.cs:line 2383
--- End of inner exception stack trace ---
at Quartz.Impl.AdoJobStore.JobStoreSupport.AcquireNextTrigger(ConnectionAndTransactionHolder conn, DateTimeOffset noLaterThan, Int32 maxCount, TimeSpan timeWindow) in c:\Work\OpenSource\Quartz.NET\quartznet\src\Quartz\Impl\AdoJobStore\JobStoreSupport.cs:line 2462
at Quartz.Impl.AdoJobStore.JobStoreSupport.<>c__DisplayClass65.<AcquireNextTriggers>b__62(ConnectionAndTransactionHolder conn) in c:\Work\OpenSource\Quartz.NET\quartznet\src\Quartz\Impl\AdoJobStore\JobStoreSupport.cs:line 2354
at Quartz.Impl.AdoJobStore.JobStoreSupport.ExecuteInNonManagedTXLock(String lockName, Func
2 txCallback) in c:\Work\OpenSource\Quartz.NET\quartznet\src\Quartz\Impl\AdoJobStore\JobStoreSupport.cs:line 3432
at Quartz.Impl.AdoJobStore.JobStoreSupport.AcquireNextTriggers(DateTimeOffset noLaterThan, Int32 maxCount, TimeSpan timeWindow) in c:\Work\OpenSource\Quartz.NET\quartznet\src\Quartz\Impl\AdoJobStore\JobStoreSupport.cs:line 2352
at Quartz.Core.QuartzSchedulerThread.Run() in c:\Work\OpenSource\Quartz.NET\quartznet\src\Quartz\Core\QuartzSchedulerThread.cs:line 277 [See nested exception: System.Data.SqlClient.SqlException (0x80131904): Transaction (Process ID 53) was deadlocked on lock resources with another process and has been chosen as the deadlock victim. Rerun the transaction.
at System.Data.SqlClient.SqlConnection.OnError(SqlException exception, Boolean breakConnection, Action`1 wrapCloseInAction)
at System.Data.SqlClient.TdsParser.ThrowExceptionAndWarning(TdsParserStateObject stateObj, Boolean callerHasConnectionLock, Boolean asyncClose)
at System.Data.SqlClient.TdsParser.TryRun(RunBehavior runBehavior, SqlCommand cmdHandler, SqlDataReader dataStream, BulkCopySimpleResultSet bulkCopyHandler, TdsParserStateObject stateObj, Boolean& dataReady)
at System.Data.SqlClient.SqlDataReader.TryHasMoreRows(Boolean& moreRows)
at System.Data.SqlClient.SqlDataReader.TryReadInternal(Boolean setTimeout, Boolean& more)
at System.Data.SqlClient.SqlDataReader.Read()
at Quartz.Impl.AdoJobStore.StdAdoDelegate.SelectTriggerToAcquire(ConnectionAndTransactionHolder conn, DateTimeOffset noLaterThan, DateTimeOffset noEarlierThan, Int32 maxCount) in c:\Work\OpenSource\Quartz.NET\quartznet\src\Quartz\Impl\AdoJobStore\StdAdoDelegate.cs:line 2099
at Quartz.Impl.AdoJobStore.JobStoreSupport.AcquireNextTrigger(ConnectionAndTransactionHolder conn, DateTimeOffset noLaterThan, Int32 maxCount, TimeSpan timeWindow) in c:\Work\OpenSource\Quartz.NET\quartznet\src\Quartz\Impl\AdoJobStore\JobStoreSupport.cs:line 2383
ClientConnectionId:1323bab5-d482-456e-98db-32859cc353d3]

@lahma
Owner

Are there any other processes accessing the database, like backup jobs or other applications? do your jobs create their own locks (access the database somehow)?

@IlyaGrebnov

Thanks for reminding me. Our database is dedicated to Quartz.net. There is no agents or other applications accessing it. But I have one job which try to unblock triggers which got stuck in BLOCKED state using following sql.

I understand this is a hack and would be happy to remove this. I just need some ability to unblock trigger based on timeout. Do you know other way of doing this?

BEGIN TRANSACTION

DECLARE @dt_TriggerDeadlockThreshold AS DATETIME = DATEADD(MINUTE, -10, GETUTCDATE());

DECLARE @tv_QRTZ_TRIGGERS AS TABLE
(
    SCHED_NAME     NVARCHAR (250) NOT NULL,
    TRIGGER_NAME   NVARCHAR (250) NOT NULL,
    TRIGGER_GROUP  NVARCHAR (250) NOT NULL
)

INSERT INTO
    @tv_QRTZ_TRIGGERS(SCHED_NAME, TRIGGER_NAME, TRIGGER_GROUP)
SELECT
    SCHED_NAME, TRIGGER_NAME, TRIGGER_GROUP
FROM
    dbo.QRTZ_TRIGGERS
WITH
    (NOLOCK)
WHERE
    NEXT_FIRE_TIME IS NOT NULL AND CAST(NEXT_FIRE_TIME/864000000000.0 - 693595.0 AS DATETIME) < @dt_TriggerDeadlockThreshold
OPTION
    (FORCE ORDER)
;

DELETE
    FIRED_TRIGGERS
FROM
    @tv_QRTZ_TRIGGERS AS STUCK_TRIGGERS
INNER JOIN
    dbo.QRTZ_FIRED_TRIGGERS AS FIRED_TRIGGERS
WITH
    (ROWLOCK, UPDLOCK)
ON
    STUCK_TRIGGERS.SCHED_NAME        = FIRED_TRIGGERS.SCHED_NAME    AND
    STUCK_TRIGGERS.TRIGGER_NAME      = FIRED_TRIGGERS.TRIGGER_NAME  AND
    STUCK_TRIGGERS.TRIGGER_GROUP     = FIRED_TRIGGERS.TRIGGER_GROUP
OPTION
    (FORCE ORDER)
;

UPDATE
    FIRED_TRIGGERS
SET
    TRIGGER_STATE = 'WAITING'
FROM
    @tv_QRTZ_TRIGGERS AS STUCK_TRIGGERS
INNER JOIN
    dbo.QRTZ_TRIGGERS AS FIRED_TRIGGERS
WITH 
    (ROWLOCK, UPDLOCK)
ON
    STUCK_TRIGGERS.SCHED_NAME        = FIRED_TRIGGERS.SCHED_NAME    AND
    STUCK_TRIGGERS.TRIGGER_NAME      = FIRED_TRIGGERS.TRIGGER_NAME  AND
    STUCK_TRIGGERS.TRIGGER_GROUP     = FIRED_TRIGGERS.TRIGGER_GROUP
WHERE
    FIRED_TRIGGERS.TRIGGER_STATE <> 'WAITING'
OPTION
    (FORCE ORDER)
;

COMMIT TRANSACTION

@lahma
Owner

First of all, I having this kind of automatic workaround might lead to more trouble. Not knowing the amount of triggers you run (nor how frequently this is ran) it could also create locks with big amount of triggers, you are doing joins without indexes anyway.

I'd suggest trying to run without this and see if the deadlocks still occur.

@IlyaGrebnov

Ok. Let me try to upgrade to 2.1, remove this hack and see that happens. We also recently move away from Azure Database to Dedicated SQL Sever running on Azure Virtual Machine. This could also helps. I should be back with update in week or two. Thanks for feedback.

@IlyaGrebnov

I remove this hack, but I still see this error. Here is another stack trace:

2013-01-07 22:57:48,698 [Role Start Thread] WARN Quartz.Simpl.SimpleInstanceIdGenerator - Host name 'RD00155D4656B0.reddog.microsoft.com' was too long, shortened to 'RD00155D4656B0.reddog.microsof'

2013-01-07 23:01:54,743 [default_QuartzSchedulerThread] ERROR Quartz.Impl.AdoJobStore.JobStoreTX - Couldn't rollback ADO.NET connection. Transaction not connected, or was disconnected

System.Data.DataException: Transaction not connected, or was disconnected
at Quartz.Impl.AdoJobStore.JobStoreSupport.RollbackConnection(ConnectionAndTransactionHolder cth) in c:\Work\OpenSource\Quartz.NET\quartznet\src\Quartz\Impl\AdoJobStore\JobStoreSupport.cs:line 3251
2013-01-07 23:01:54,962 [default_QuartzSchedulerThread] ERROR Quartz.Core.ErrorLogger - An error occurred while scanning for the next trigger to fire.

Quartz.JobPersistenceException: Couldn't acquire next trigger: Transaction (Process ID 56) was deadlocked on lock resources with another process and has been chosen as the deadlock victim. Rerun the transaction. ---> System.Data.SqlClient.SqlException: Transaction (Process ID 56) was deadlocked on lock resources with another process and has been chosen as the deadlock victim. Rerun the transaction.
at System.Data.SqlClient.SqlConnection.OnError(SqlException exception, Boolean breakConnection, Action1 wrapCloseInAction)
at System.Data.SqlClient.TdsParser.ThrowExceptionAndWarning(TdsParserStateObject stateObj, Boolean callerHasConnectionLock, Boolean asyncClose)
at System.Data.SqlClient.TdsParser.TryRun(RunBehavior runBehavior, SqlCommand cmdHandler, SqlDataReader dataStream, BulkCopySimpleResultSet bulkCopyHandler, TdsParserStateObject stateObj, Boolean& dataReady)
at System.Data.SqlClient.SqlDataReader.TryHasMoreRows(Boolean& moreRows)
at System.Data.SqlClient.SqlDataReader.TryReadInternal(Boolean setTimeout, Boolean& more)
at System.Data.SqlClient.SqlDataReader.Read()
at Quartz.Impl.AdoJobStore.StdAdoDelegate.SelectTriggerToAcquire(ConnectionAndTransactionHolder conn, DateTimeOffset noLaterThan, DateTimeOffset noEarlierThan, Int32 maxCount) in c:\Work\OpenSource\Quartz.NET\quartznet\src\Quartz\Impl\AdoJobStore\StdAdoDelegate.cs:line 2099
at Quartz.Impl.AdoJobStore.JobStoreSupport.AcquireNextTrigger(ConnectionAndTransactionHolder conn, DateTimeOffset noLaterThan, Int32 maxCount, TimeSpan timeWindow) in c:\Work\OpenSource\Quartz.NET\quartznet\src\Quartz\Impl\AdoJobStore\JobStoreSupport.cs:line 2383
--- End of inner exception stack trace ---
at Quartz.Impl.AdoJobStore.JobStoreSupport.AcquireNextTrigger(ConnectionAndTransactionHolder conn, DateTimeOffset noLaterThan, Int32 maxCount, TimeSpan timeWindow) in c:\Work\OpenSource\Quartz.NET\quartznet\src\Quartz\Impl\AdoJobStore\JobStoreSupport.cs:line 2462
at Quartz.Impl.AdoJobStore.JobStoreSupport.<>c__DisplayClass65.<AcquireNextTriggers>b__62(ConnectionAndTransactionHolder conn) in c:\Work\OpenSource\Quartz.NET\quartznet\src\Quartz\Impl\AdoJobStore\JobStoreSupport.cs:line 2354
at Quartz.Impl.AdoJobStore.JobStoreSupport.ExecuteInNonManagedTXLock(String lockName, Func
2 txCallback) in c:\Work\OpenSource\Quartz.NET\quartznet\src\Quartz\Impl\AdoJobStore\JobStoreSupport.cs:line 3432
at Quartz.Impl.AdoJobStore.JobStoreSupport.AcquireNextTriggers(DateTimeOffset noLaterThan, Int32 maxCount, TimeSpan timeWindow) in c:\Work\OpenSource\Quartz.NET\quartznet\src\Quartz\Impl\AdoJobStore\JobStoreSupport.cs:line 2352
at Quartz.Core.QuartzSchedulerThread.Run() in c:\Work\OpenSource\Quartz.NET\quartznet\src\Quartz\Core\QuartzSchedulerThread.cs:line 277 [See nested exception: System.Data.SqlClient.SqlException (0x80131904): Transaction (Process ID 56) was deadlocked on lock resources with another process and has been chosen as the deadlock victim. Rerun the transaction.
at System.Data.SqlClient.SqlConnection.OnError(SqlException exception, Boolean breakConnection, Action`1 wrapCloseInAction)
at System.Data.SqlClient.TdsParser.ThrowExceptionAndWarning(TdsParserStateObject stateObj, Boolean callerHasConnectionLock, Boolean asyncClose)
at System.Data.SqlClient.TdsParser.TryRun(RunBehavior runBehavior, SqlCommand cmdHandler, SqlDataReader dataStream, BulkCopySimpleResultSet bulkCopyHandler, TdsParserStateObject stateObj, Boolean& dataReady)
at System.Data.SqlClient.SqlDataReader.TryHasMoreRows(Boolean& moreRows)
at System.Data.SqlClient.SqlDataReader.TryReadInternal(Boolean setTimeout, Boolean& more)
at System.Data.SqlClient.SqlDataReader.Read()
at Quartz.Impl.AdoJobStore.StdAdoDelegate.SelectTriggerToAcquire(ConnectionAndTransactionHolder conn, DateTimeOffset noLaterThan, DateTimeOffset noEarlierThan, Int32 maxCount) in c:\Work\OpenSource\Quartz.NET\quartznet\src\Quartz\Impl\AdoJobStore\StdAdoDelegate.cs:line 2099
at Quartz.Impl.AdoJobStore.JobStoreSupport.AcquireNextTrigger(ConnectionAndTransactionHolder conn, DateTimeOffset noLaterThan, Int32 maxCount, TimeSpan timeWindow) in c:\Work\OpenSource\Quartz.NET\quartznet\src\Quartz\Impl\AdoJobStore\JobStoreSupport.cs:line 2383
ClientConnectionId:fcc852a1-2e8a-4a19-b08c-d59c7922cfa0]

@grzpas

@lahma : I have a question. Is quartz capable of reconnecting to the database after a problem with connection? Are there any settings which allow to configure the reconnection count etc ? This seems to be quite important when using database based job store. Is there any simple way to add such a possibility if it does not exist?

@lahma
Owner

@grzpas release 2.2 has actually functionality ported from Java Quartz that tries to address this issue with better retry policy that you can see in JobStoreSupport's method RetryExecuteInNonManagedTXLock. The queries are tried again until they succeed.

It would nice to hear from users from this thread whether this has helped the situation.

@lloydcotten

Just an FYI, I commented on a similar issues in the discussion in issue #67. Last week we were able to upgrade to release 2.2 and put that into production. I am happy to report that since doing that upgrade we've not encountered a single job getting stuck in the blocked state, whereas previously we were getting multiple per day.

@lahma
Owner

@lloydcotten this is indeed happy news, thanks for reporting! I'm closing this issue for now.

@lahma lahma closed this
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.