We recently noticed that a lot of our river job(around 5000) is stuck at a state of running. We have a global timeout of 1 hour and thought that the rescue is suppose to resolve this? These are across different job queues.
SELECT
COUNT(*) AS stuck_count,
COUNT(DISTINCT kind) AS distinct_kinds,
COUNT(DISTINCT queue) AS distinct_queues,
MIN(created_at) AS oldest_created_at,
MAX(created_at) AS newest_created_at,
NOW() - MIN(created_at) AS oldest_age,
NOW() - MAX(created_at) AS newest_age,
AVG(NOW() - created_at) AS avg_age,
PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY EXTRACT(EPOCH FROM NOW() - created_at)) * INTERVAL '1 second' AS median_age,
MIN(attempt) AS min_attempts,
MAX(attempt) AS max_attempts,
AVG(attempt)::numeric(10,2) AS avg_attempts,
COUNT(*) FILTER (WHERE attempted_at < NOW() - INTERVAL '1 hour') AS not_touched_in_1h,
COUNT(*) FILTER (WHERE attempted_at < NOW() - INTERVAL '6 hours') AS not_touched_in_6h
FROM river_jobs.river_job
WHERE state = 'running'
AND created_at < NOW() - INTERVAL '1 day';
[
{
"stuck_count": 5133,
"distinct_kinds": 42,
"distinct_queues": 7,
"oldest_created_at": "2026-02-24 22:08:09.618834+00",
"newest_created_at": "2026-05-25 19:10:35.474235+00",
"oldest_age": "91 days 00:29:38.338464",
"newest_age": "1 day 03:27:12.483063",
"avg_age": "40 days 29:02:53.066961",
"median_age": "1155:35:03.048939",
"min_attempts": 1,
"max_attempts": 4,
"avg_attempts": 1.08,
"not_touched_in_1h": 5133,
"not_touched_in_6h": 5133
}
]
github.com/riverqueue/river v0.31.0
github.com/riverqueue/river/rivertype v0.31.0
We recently noticed that a lot of our river job(around 5000) is stuck at a state of running. We have a global timeout of 1 hour and thought that the rescue is suppose to resolve this? These are across different job queues.
River job stats
River version
Let me know if any other information is helpful