New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Severe JobRunr Exception #337
Comments
Hi Dat, is it an option to try to use Postgres? I see this error mostly with MariaDb/MySql and I can't find the root cause of what is going on to be honest. |
@datnguyen293 What is your MariaDB setup? Is it a High Availability configuration? |
Yes I have a semi-sync replication. |
Are there multiple instances / regions where JobRunr is connecting to these replicated MariaDB's? |
Hi Ronald, I have 2 instance of JobRunr connected to the same active master instance of MySQL. The standby master instance is read only and currently acting as backup server, not being used by the apps. |
I had a similar situation in a PostgreSQL with Pgpool-II cluster installation. jobrunr/core/src/main/java/org/jobrunr/server/JobZooKeeper.java Lines 186 to 192 in 545f972
I wonder if the PostgreSQL instance used by JobZooKeep may differ from the one used by some other BackgroundJobServer. And data not been synchronized in the I have no idea how to deal with this issue inside JobRunr, and end up setting up another non-HA database for JobRunr to rely on. |
@tan9 - thanks for thinking with us what the reason could be. I may not have enough knowledge of database replication but I guess JobRunr just connects to the MariaDb/Postgres insurance that is configured in the Datasource - the active master. Is your idea then that these databases internally read from other instances thus having not the latest data at hand as it is not synced yet? Are the concurrent modifications completely solved after setting up a non HA database? |
Some history: I added this logic (everything related to concurrent job modifications) to make sure that users report errors as it was not easy to get the code right from the first time due to the concurrency (inside one JVM and even multiple if there are multiple background job servers). I also solved some issues thanks to these big reports in the past. Lately however, I'm not getting a lot of these reports though and in JobRunr 4.0.3 I added the option to set the However, the If @datnguyen293 is sure there are no concurrent job modification errors when using a non HA database (and thus I can rest assured there are no programming errors), the But I still don't understand the root cause as to why these exceptions pop up which frustrates me. |
Thanks @rdehuyss. I will have a look at the |
Hi @rdehuyss ,
Also I don't think this is timeout issue as currently the database (MySQL) is free without much load. I have tested with only 2 recurring jobs which run every 5 and 10 minutes. |
Hi Dat, Can you try JobRunr Pro 4.0.2 for this latest bug report? Thanks, |
I will try today.
Thanks Ronald,
Dat
…Sent from my iPhone
________________________________
From: Ronald Dehuysser ***@***.***>
Sent: Thursday, January 27, 2022 12:25:26 AM
To: jobrunr/jobrunr ***@***.***>
Cc: Dat Nguyen ***@***.***>; Mention ***@***.***>
Subject: Re: [jobrunr/jobrunr] Severe JobRunr Exception (Issue #337)
Hi Dat,
Can you try JobRunr Pro 4.0.2 for this latest bug report?
Thanks,
Ronald
—
Reply to this email directly, view it on GitHub<#337 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AAHOIHAKDNSYXRAYLKPFVMTUYAVANANCNFSM5MXGSE7Q>.
You are receiving this because you were mentioned.Message ID: ***@***.***>
|
@rdehuyss @datnguyen293 I can confirm this issue was gone after we run JobRunr over a single PostgreSQL instance.
I am not familiar with MySQL HA setup. But if the standby master is configured as a read replica, and query can be executed on the instance, outdated data may be fetched.
@datnguyen293 in what situation the master will be marked as down? |
I would like to report a bug in upgrading JobRunr Pro from 4.0.1 to 4.0.2. Would have a migration I guess.
|
Hi @datnguyen293 , indeed - my mistake. I'm doing a little bit too much for the moment 😞 . This should have become JobRunr Pro 5 as there is now support for scheduling jobs with an interval and it is not backwards compatible. I'll release JobRunr Pro 5 soon and keep you posted. Sorry for the inconvenience. |
Thanks Ronald, I think I can reset the job tables in case upgrading to version 5. Thanks for your hard working, |
Are you trying to run JobRunr in HA Active-Active environment? As that won't work due to not having one single source of truth (as the DB is replicated). |
That's exactly what I'm trying to check. I'm going to turn off one instance of database (so we have only one DB node). But as far as my experience, it won't be the issue. The standby DB instance won't be used for query. |
Hi @datnguyen293 , were you able to reproduce it with only one DB instance? |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
SevereJobRunrException occurred in BackgroundJobServer 89d7c0b4-19ab-4da1-93bb-5e34e7629426: Could not resolve ConcurrentJobModificationException
Runtime information
Background Job Servers
(workerPoolSize: 4, pollIntervalInSeconds: 5, firstHeartbeat: 2022-01-21T05:40:38.121464Z, lastHeartbeat: 2022-01-21T05:41:33.343126Z)
Diagnostics from exception
Concurrent modified jobs:
Job id: eab73d9e-3c68-4249-852a-c82d14a098e8
Local version: 5; Storage version: 6
Local state: DELETED (at 2022-01-21T05:41:04.576913404Z) ← PROCESSING (at 2022-01-21T05:41:04.537562334Z on BackgroundJobServer 89d7c0b4-19ab-4da1-93bb-5e34e7629426) ← ENQUEUED (at 2022-01-21T05:36:05.000067844Z)
Storage state: DELETED (at 2022-01-21T05:41:01.534887099Z) ← PROCESSING (at 2022-01-21T05:40:59.403639452Z on BackgroundJobServer 89d7c0b4-19ab-4da1-93bb-5e34e7629426) ← ENQUEUED (at 2022-01-21T05:36:05.000067844Z)
Exception
SevereJobRunrException occurred in BackgroundJobServer 8ce1ecda-52c8-4034-9d41-8cc6a0a5a1b0: Could not resolve ConcurrentJobModificationException
Runtime information
Background Job Servers
(workerPoolSize: 4, pollIntervalInSeconds: 5, firstHeartbeat: 2022-01-22T03:46:05.448662Z, lastHeartbeat: 2022-01-22T03:48:19.568798Z)
Diagnostics from exception
Concurrent modified jobs:
Job id: 8f72af34-9efb-424c-9a51-9b8bdd12c25a
Local version: 12; Storage version: 13
Local state: SCHEDULED (at 2022-01-22T03:48:19.589220618Z) ← FAILED (at 2022-01-22T03:48:19.583438212Z) ← PROCESSING (at 2022-01-22T03:47:26.967061153Z on BackgroundJobServer 8ce1ecda-52c8-4034-9d41-8cc6a0a5a1b0)
Storage state: SUCCEEDED (at 2022-01-22T03:48:19.568197190Z) ← PROCESSING (at 2022-01-22T03:47:26.967061153Z on BackgroundJobServer 8ce1ecda-52c8-4034-9d41-8cc6a0a5a1b0) ← ENQUEUED (at 2022-01-22T03:47:25.748309585Z)
Exception
The text was updated successfully, but these errors were encountered: