You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
Updating the workchunk status on a failure for a batch2 job fails when using MS SQL Server, causing the job to remain in IN_PROGRESS state. This prevents the job to transition to FAILED state after max retries are exhausted.
This happens on SQL Server, works fine on H2, and Postgres.
To Reproduce
We will use a delete expunge operation (which is executed as a batch2 job) to reproduce the issue.
Create a bunch of Patient resources
Connect to your MS SQL server using a tool such as DBeaver but for the connection set the Transaction Isolation Level to Repeatable Read as shown in the screenshot.
Execute the following script in DBeaver which reads Patient resources in a transaction but doesn't commit or rollback the transaction on purpose. After you execute this script leave the connection to the DB open in DBeaver (don't close or disconnect). Since you've set the isolation level to Repeatable Read in the previous step, no other connection will be able to modify Patient resources until this transaction terminates. This will fail the delete expunge operation we will invoke next, because the job won't be able to delete Patient resources.
BEGIN TRANSACTION;
SELECT * FROM HFJ_RESOURCE hr WHERE RES_TYPE = 'Patient';
Send a delete expunge request for Patients, e.g. DELETE http://{your-server-address}/Patient/?_expunge=true
The delete expunge job will get stuck in IN_PROGRESS state, not transitioning to FAILED state after max retrials reached. The job will keep retrying the work in the background. You can see that the work is being retried by looking at the logs which will contain errors similar to following for each retry:
ERROR M: R: o.h.e.jdbc.spi.SqlExceptionHelper - The query has timed out.
INFO M: R: c.uhn.fhir.log.batch_troubleshooting - Temporary problem executing job DELETE_EXPUNGE step expunge, marking chunk 0bce69c5-9844-4403-ad54-e9ba1b1f3079 as retriable ERRORED
WARN M: R: o.h.e.jdbc.spi.SqlExceptionHelper - SQL Error: 245, SQLState: S0001
ERROR M: R: o.h.e.jdbc.spi.SqlExceptionHelper - Conversion failed when converting the varchar value 'Too many errors: ' to data type int.tionHelper - Conversion failed when converting the varchar value 'Too many errors: ' to data type int.
Expected behavior
The job should transition to FAILED state after max number of retrials, and not get stuck in IN_PROGRESS.
Screenshots
Environment (please complete the following information):
Describe the bug
Updating the workchunk status on a failure for a batch2 job fails when using MS SQL Server, causing the job to remain in IN_PROGRESS state. This prevents the job to transition to FAILED state after max retries are exhausted.
This happens on SQL Server, works fine on H2, and Postgres.
To Reproduce
We will use a delete expunge operation (which is executed as a batch2 job) to reproduce the issue.
DELETE http://{your-server-address}/Patient/?_expunge=true
The delete expunge job will get stuck in IN_PROGRESS state, not transitioning to FAILED state after max retrials reached. The job will keep retrying the work in the background. You can see that the work is being retried by looking at the logs which will contain errors similar to following for each retry:
Expected behavior
The job should transition to FAILED state after max number of retrials, and not get stuck in IN_PROGRESS.
Screenshots

Environment (please complete the following information):
Additional context
Related issue on Hibernate: https://hibernate.atlassian.net/jira/software/c/projects/HHH/issues/HHH-3627
The text was updated successfully, but these errors were encountered: