Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jobs continue to run after service restart. #7139

Open
jvermillion-identifi opened this issue Jun 25, 2021 · 9 comments
Open

Jobs continue to run after service restart. #7139

jvermillion-identifi opened this issue Jun 25, 2021 · 9 comments

Comments

@jvermillion-identifi
Copy link

Describe the bug
Recently upgraded from 3.3.3 to 3.3.11.
When restarting the Rundeck service after the Web GUI stops responding jobs that are currently running continue to run after the restart. The work the jobs are doing has been completed but Rundeck still shows the job in a running state. Previously in 3.3.3 Rundeck would mark all jobs running when service restarted as "Incomplete".

The jobs running prevent future scheduled executions from running and have to be manually killed as they won't end on their own.

IRC chat support (MegaDrive) suggested this was a bug.

My Rundeck detail

Rundeck version: 3.3.11
install type: Deb
OS Name/version: Debian 4.9
DB Type/version: mssql

To Reproduce
Steps to reproduce the behavior:

  1. Have "running" jobs in Rundeck.
  2. Restart rundeckd service via bash.
  3. When WebUI comes back up there will be jobs "running" that are doing no work.

Expected behavior
Previously jobs used to get marked incomplete.

Screenshots
N/A

Desktop (please complete the following information):
N/A

Additional context
Attaching service.log at time of issue. Approx 3:05PM EST 06/24/2021

service.log

@MegaDrive68k
Copy link

Confirmed following the @jvermillion-identifi steps.

On 3.3.3:

Screenshot_9

On 3.3.11 (the execution bar stucks):

Screenshot_10

service.log output:

[2021-06-25T14:14:30,175] ERROR quartzjobs.ExecutionJob - Unable to start Job execution: Job "HelloWorld" {{Job f91370b2-1b50-4152-88aa-db38e7930b7c}} is currently being executed {{Execution 10}}

Happening on 3.4.0:

Screenshot_11

Thanks for the feedback @jvermillion-identifi!

@NapoSky
Copy link

NapoSky commented Jul 1, 2021

Got the same problem on certains jobs on 3.3.11 and 3.4.0.

in service.log ; I can see an "error parsing JSON" while trying to delete the execution even if the job seems to still running. I have no "dead" process on the instance.

(Edit)
I've found a silly workaround to make it work and to avoid multiple executions if your job doesn't need to.
If you're using a MySQL database, you can query your job state by doing a select on the "execution" table :
SELECT * FROM execution where id = 'yourExecutionID'\G;

By comparing an already succeeded and one still running, you can see there's no date_completed (set to NULL) and the status is "running".

*************************** 1. row ***************************
id: 309173
version: 1
scheduled_execution_id: 352
do_nodedispatch:
node_exclude_os_arch: NULL
node_keepgoing:
succeeded_node_list: NULL
retry_attempt: 0
node_include: NULL
retry_prev_id: NULL
success_on_empty_node_filter:
extra_metadata: NULL
node_exclude_os_version: NULL
timeout: 10m
node_exclude_precedence:
node_exclude_name: NULL
node_include_os_version: NULL
node_exclude_os_name: NULL
retry: NULL
filter: NULL
orchestrator_id: NULL
node_include_name: NULL
rduser: admin
retry_original_id: NULL
execution_type: scheduled
node_include_os_name: NULL
abortedby: NULL
filter_exclude: NULL
node_exclude: NULL
node_rank_order_ascending:
node_include_os_arch: NULL
loglevel: INFO
node_exclude_os_family: NULL
node_include_os_family: NULL
cancelled:
retry_delay: NULL
workflow_id: 309822
timed_out:
failed_node_list: NULL
arg_string: NULL
user_role_list: users
node_rank_attribute: NULL
date_completed: NULL
outputfilepath: /var/lib/rundeck/logs/rundeck/myProject/job/7fc98cd3-50af-47f6-9fb6-78c9b7947733/logs/309173.rdlog
server_nodeuuid: 1ec72185-acd2-41ad-a96a-0cb12a5ab7f1
will_retry:
retry_execution_id: NULL
node_exclude_tags: NULL
exclude_filter_uncheck:
node_threadcount: 1
node_include_tags: NULL
date_started: 2021-06-09 10:00:00.167000
status: running
node_filter_editable:
project: myProject

In this example, I've make it work by UPDATE those values with a later date and change the status to "succeeded"

UPDATE execution SET status = "succeeded" where id = "309173";
UPDATE execution SET date_completed = "021-06-09 10:05:00.167000" where id = "309173";

I hope it'll works for you.

@josemrs
Copy link

josemrs commented Jul 27, 2021

This has happened to me too. Rundeck 3.3.12. It is really annoying.

In my environment I have several jobs scheduled to run very often and the AWS instance where Rundeck is running (only servide, the DB is persisten in RDS) may be redeployed at will. Sometimes when the instance gets redeployed lost of "running" jobs get stuck and start to pile up.

@WestXu
Copy link

WestXu commented Oct 18, 2021

Rundeck 3.4.4 still experiences this. Don't know what to do with it.

@gpochiscan
Copy link

Same behaviour on 3.4.1

@kefa8
Copy link

kefa8 commented Mar 18, 2022

Seems like a fairly serious bug why hasn't it been fixed yet. I've had to create my own work around using the rundeck cli tool.

  1. Get details of current running jobs: rd executions list -p
  2. Get rundeck start time using: rd system info
  3. Find jobs with start time before rundeck start time and kill the job: rd executions kill -e <exec_id>

@stale
Copy link

stale bot commented Apr 2, 2023

In an effort to focus on bugs and issues that impact currently supported versions of Rundeck, we have elected to notify GitHub issue creators if their issue is classified as stale and close the issue. An issue is identified as stale when there have been no new comments, responses or other activity within the last 12 months. If a closed issue is still present please feel free to open a new Issue against the current version and we will review it. If you are an enterprise customer, please contact your Rundeck Support to assist in your request.
Thank you, The Rundeck Team

@stale stale bot added the wontfix:stale label Apr 2, 2023
@ahamilto-nodal
Copy link

We are still seeing this bug on Rundeck 4.12.1.

@stale stale bot removed the wontfix:stale label Jun 27, 2023
@siilike
Copy link

siilike commented Dec 31, 2023

Still happening in 5.0.0.20231216-1.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

9 participants