Jobs continue to run after service restart. #7139

jvermillion-identifi · 2021-06-25T17:48:09Z

Describe the bug
Recently upgraded from 3.3.3 to 3.3.11.
When restarting the Rundeck service after the Web GUI stops responding jobs that are currently running continue to run after the restart. The work the jobs are doing has been completed but Rundeck still shows the job in a running state. Previously in 3.3.3 Rundeck would mark all jobs running when service restarted as "Incomplete".

The jobs running prevent future scheduled executions from running and have to be manually killed as they won't end on their own.

IRC chat support (MegaDrive) suggested this was a bug.

My Rundeck detail

Rundeck version: 3.3.11
install type: Deb
OS Name/version: Debian 4.9
DB Type/version: mssql

To Reproduce
Steps to reproduce the behavior:

Have "running" jobs in Rundeck.
Restart rundeckd service via bash.
When WebUI comes back up there will be jobs "running" that are doing no work.

Expected behavior
Previously jobs used to get marked incomplete.

Screenshots
N/A

Desktop (please complete the following information):
N/A

Additional context
Attaching service.log at time of issue. Approx 3:05PM EST 06/24/2021

service.log

MegaDrive68k · 2021-06-25T18:19:19Z

Confirmed following the @jvermillion-identifi steps.

On 3.3.3:

On 3.3.11 (the execution bar stucks):

service.log output:

[2021-06-25T14:14:30,175] ERROR quartzjobs.ExecutionJob - Unable to start Job execution: Job "HelloWorld" {{Job f91370b2-1b50-4152-88aa-db38e7930b7c}} is currently being executed {{Execution 10}}

Happening on 3.4.0:

Thanks for the feedback @jvermillion-identifi!

NapoSky · 2021-07-01T09:06:59Z

Got the same problem on certains jobs on 3.3.11 and 3.4.0.

in service.log ; I can see an "error parsing JSON" while trying to delete the execution even if the job seems to still running. I have no "dead" process on the instance.

(Edit)
I've found a silly workaround to make it work and to avoid multiple executions if your job doesn't need to.
If you're using a MySQL database, you can query your job state by doing a select on the "execution" table :
SELECT * FROM execution where id = 'yourExecutionID'\G;

By comparing an already succeeded and one still running, you can see there's no date_completed (set to NULL) and the status is "running".

*************************** 1. row ***************************
id: 309173
version: 1
scheduled_execution_id: 352
do_nodedispatch:
node_exclude_os_arch: NULL
node_keepgoing:
succeeded_node_list: NULL
retry_attempt: 0
node_include: NULL
retry_prev_id: NULL
success_on_empty_node_filter:
extra_metadata: NULL
node_exclude_os_version: NULL
timeout: 10m
node_exclude_precedence:
node_exclude_name: NULL
node_include_os_version: NULL
node_exclude_os_name: NULL
retry: NULL
filter: NULL
orchestrator_id: NULL
node_include_name: NULL
rduser: admin
retry_original_id: NULL
execution_type: scheduled
node_include_os_name: NULL
abortedby: NULL
filter_exclude: NULL
node_exclude: NULL
node_rank_order_ascending:
node_include_os_arch: NULL
loglevel: INFO
node_exclude_os_family: NULL
node_include_os_family: NULL
cancelled:
retry_delay: NULL
workflow_id: 309822
timed_out:
failed_node_list: NULL
arg_string: NULL
user_role_list: users
node_rank_attribute: NULL
date_completed: NULL
outputfilepath: /var/lib/rundeck/logs/rundeck/myProject/job/7fc98cd3-50af-47f6-9fb6-78c9b7947733/logs/309173.rdlog
server_nodeuuid: 1ec72185-acd2-41ad-a96a-0cb12a5ab7f1
will_retry:
retry_execution_id: NULL
node_exclude_tags: NULL
exclude_filter_uncheck:
node_threadcount: 1
node_include_tags: NULL
date_started: 2021-06-09 10:00:00.167000
status: running
node_filter_editable:
project: myProject

In this example, I've make it work by UPDATE those values with a later date and change the status to "succeeded"

UPDATE execution SET status = "succeeded" where id = "309173";
UPDATE execution SET date_completed = "021-06-09 10:05:00.167000" where id = "309173";

I hope it'll works for you.

josemrs · 2021-07-27T12:12:47Z

This has happened to me too. Rundeck 3.3.12. It is really annoying.

In my environment I have several jobs scheduled to run very often and the AWS instance where Rundeck is running (only servide, the DB is persisten in RDS) may be redeployed at will. Sometimes when the instance gets redeployed lost of "running" jobs get stuck and start to pile up.

WestXu · 2021-10-18T06:37:45Z

Rundeck 3.4.4 still experiences this. Don't know what to do with it.

gpochiscan · 2022-03-17T07:42:09Z

Same behaviour on 3.4.1

kefa8 · 2022-03-18T01:35:59Z

Seems like a fairly serious bug why hasn't it been fixed yet. I've had to create my own work around using the rundeck cli tool.

Get details of current running jobs: rd executions list -p
Get rundeck start time using: rd system info
Find jobs with start time before rundeck start time and kill the job: rd executions kill -e <exec_id>

stale · 2023-04-02T08:11:12Z

In an effort to focus on bugs and issues that impact currently supported versions of Rundeck, we have elected to notify GitHub issue creators if their issue is classified as stale and close the issue. An issue is identified as stale when there have been no new comments, responses or other activity within the last 12 months. If a closed issue is still present please feel free to open a new Issue against the current version and we will review it. If you are an enterprise customer, please contact your Rundeck Support to assist in your request.
Thank you, The Rundeck Team

ahamilto-nodal · 2023-06-27T14:56:05Z

We are still seeing this bug on Rundeck 4.12.1.

siilike · 2023-12-31T13:38:11Z

Still happening in 5.0.0.20231216-1.

MegaDrive68k added the status:reproduced label Jun 25, 2021

MegaDrive68k mentioned this issue Jul 5, 2021

Rundeck 3.4.0 - Scheduled Job stuck at waiting for execution #7127

Closed

stale bot added the wontfix:stale label Apr 2, 2023

stale bot removed the wontfix:stale label Jun 27, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Jobs continue to run after service restart. #7139

Jobs continue to run after service restart. #7139

jvermillion-identifi commented Jun 25, 2021

MegaDrive68k commented Jun 25, 2021

NapoSky commented Jul 1, 2021 •

edited

josemrs commented Jul 27, 2021

WestXu commented Oct 18, 2021

gpochiscan commented Mar 17, 2022

kefa8 commented Mar 18, 2022

stale bot commented Apr 2, 2023

ahamilto-nodal commented Jun 27, 2023

siilike commented Dec 31, 2023

Jobs continue to run after service restart. #7139

Jobs continue to run after service restart. #7139

Comments

jvermillion-identifi commented Jun 25, 2021

MegaDrive68k commented Jun 25, 2021

NapoSky commented Jul 1, 2021 • edited

josemrs commented Jul 27, 2021

WestXu commented Oct 18, 2021

gpochiscan commented Mar 17, 2022

kefa8 commented Mar 18, 2022

stale bot commented Apr 2, 2023

ahamilto-nodal commented Jun 27, 2023

siilike commented Dec 31, 2023

NapoSky commented Jul 1, 2021 •

edited