Skip to content

Problem: Tasks stuck in Waiting #3007

@lug-gh

Description

@lug-gh

Issue

After upgrading from v2.13.12 to v2.14.8 I get the issue that after a short period of time my tasks don't start anymore. after restarting semaphore it works again for a few minutes, before it get's stuck again.
It's not specific to ansible etc., it affects every task type.

Impact

Ansible (task execution)

Installation method

Docker

Database

Postgres

Browser

No response

Semaphore Version

v2.14.8

Ansible Version

Logs & errors

Here's an example.
Task 5847 was started and executed, it failed (not because of semaphore, just a failed playbook because of the host).

After that task 5848 was started, but the task log kept being blank, nothing happened. After 4 minutes Task 5848 was stopped manually with the stop button.

Then the tasks 5849 and 5850 were triggered (in this case with the webhook integration, but it happens on any way, as you can see on task 5848). These tasks are still in "waiting", even if there's no other task running.

Image

Image

time="2025-05-07T10:44:08+02:00" level=info msg="Task 5847 added to queue"
time="2025-05-07T10:44:11+02:00" level=info msg="Set resource locker with TaskRunner 5847"
time="2025-05-07T10:44:11+02:00" level=info msg="Task 5847 removed from queue"
time="2025-05-07T10:45:02+02:00" level=info msg="Stopped running TaskRunner 5847"
time="2025-05-07T10:45:02+02:00" level=info msg="Release resource locker with TaskRunner 5847"
time="2025-05-07T10:46:04+02:00" level=error msg="websocket: close 1006 (abnormal closure): unexpected EOF" fields.level=Error
time="2025-05-07T10:46:04+02:00" level=error msg="write tcp 100.64.1.3:3000->100.64.1.4:49364: use of closed network connection" error="Cannot send close message"

time="2025-05-07T10:46:27+02:00" level=info msg="Task 5848 added to queue"
time="2025-05-07T10:46:31+02:00" level=info msg="Set resource locker with TaskRunner 5848"
time="2025-05-07T10:46:31+02:00" level=info msg="Task 5848 removed from queue"



time="2025-05-07T10:48:57+02:00" level=error msg="websocket: close sent" error="Cannot send close message"
time="2025-05-07T10:48:59+02:00" level=info msg="Receiving Integration from: 192.168.120.12"
time="2025-05-07T10:48:59+02:00" level=info msg="1 integrations found for alias *censored*"
time="2025-05-07T10:48:59+02:00" level=info msg="Running integration 2"
time="2025-05-07T10:49:02+02:00" level=info msg="Receiving Integration from: 192.168.120.12"
time="2025-05-07T10:49:02+02:00" level=info msg="1 integrations found for alias *censored*"
time="2025-05-07T10:49:02+02:00" level=info msg="Running integration 2"

time="2025-05-07T10:49:13+02:00" level=error msg="websocket: close sent" error="Cannot send close message"
time="2025-05-07T10:49:15+02:00" level=error msg="strconv.Atoi: parsing \"null\": invalid syntax" error="Bad request. Cannot get task_id from request"
2025/05/07 10:49:15 http: superfluous response.WriteHeader call from github.com/semaphoreui/semaphore/api/projects.GetTaskMiddleware.func1 (tasks.go:111)

time="2025-05-07T10:49:28+02:00" level=error msg="websocket: close sent" error="Cannot send close message"
time="2025-05-07T10:49:33+02:00" level=error msg="no rows in result set" error="Bad request. Cannot get task from database"

time="2025-05-07T10:49:59+02:00" level=error msg="websocket: close 1006 (abnormal closure): unexpected EOF" fields.level=Error
time="2025-05-07T10:49:59+02:00" level=error msg="write tcp 100.64.1.3:3000->100.64.1.4:55434: use of closed network connection" error="Cannot send close message"

time="2025-05-07T10:50:14+02:00" level=error msg="websocket: close 1006 (abnormal closure): unexpected EOF" fields.level=Error
time="2025-05-07T10:50:14+02:00" level=error msg="write tcp 100.64.1.3:3000->100.64.1.4:45434: use of closed network connection" error="Cannot send close message"
time="2025-05-07T10:50:18+02:00" level=error msg="websocket: close sent" error="Cannot send close message"

time="2025-05-07T10:50:25+02:00" level=info msg="Stopped running TaskRunner 5848"
time="2025-05-07T10:50:25+02:00" level=info msg="Release resource locker with TaskRunner 5848"
time="2025-05-07T10:51:07+02:00" level=error msg="no rows in result set" error="Bad request. Cannot get task from database"
time="2025-05-07T10:51:08+02:00" level=error msg="no rows in result set" error="Bad request. Cannot get task from database"
time="2025-05-07T10:51:12+02:00" level=error msg="no rows in result set" error="Bad request. Cannot get task from database"

time="2025-05-07T10:51:25+02:00" level=error msg="websocket: close 1006 (abnormal closure): unexpected EOF" fields.level=Error

time="2025-05-07T10:51:25+02:00" level=error msg="websocket: close 1006 (abnormal closure): unexpected EOF" fields.level=Error
time="2025-05-07T10:51:25+02:00" level=error msg="write tcp 100.64.1.3:3000->100.64.1.4:55082: use of closed network connection" error="Cannot send close message"
time="2025-05-07T10:52:04+02:00" level=error msg="websocket: close sent" error="Cannot send close message"


time="2025-05-07T10:52:27+02:00" level=error msg="websocket: close 1006 (abnormal closure): unexpected EOF" fields.level=Error
time="2025-05-07T10:52:27+02:00" level=error msg="write tcp 100.64.1.3:3000->100.64.1.4:36484: use of closed network connection" error="Cannot send close message"

So I execute docker compose down && docker compose up -d , the tasks that were waiting are still waiting. If i stop them and then manually start it again, the task is executed immediately without any problem.

Manual installation - system information

No response

Configuration

No response

Additional information

No response

Metadata

Metadata

Assignees

Labels

Type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions