Slow evalsha causing timeouts #668

glennpjones · 2021-12-05T16:34:56Z

Hi Mikael, first of all thank you so much for your work on this lib ! We are very satisfied with it. I've tried my best below to be as precise and provide as much context as possible, as I believe the issue is related to sidekiq-unique-jobs, but I am aware I could be mistaking.

Describe the bug
Our workers are reporting a lot of Redis::TimeoutError connection timed out errors. On the server I can see redis is using 100% cpu (one core). The slowlog is filled with entries that I can to some degree trace back to sidekiq-unique-jobs through the delete_job_by_digest argument: see 1) 4) 11). The entries look like this:

1) (integer) 226683
   2) (integer) 1638716967
   3) (integer) 935819
   4)  1) "evalsha"
       2) "14ab482616f7d7df8b71783a1c3db6eb2103a2a6"
       3) "3"
       4) "queue:low"
       5) "schedule"
       6) "retry"
       7) "uniquejobs:3ee9129f6972ac426c645a6b1b4b845e"
       8) "1638716966.1943657"
       9) "false"
      10) "1000"
      11) "delete_job_by_digest"
      12) "5.0.7"
   5) "172.17.0.14:55328"
   6) ""

When the queues are small, this does not seem to be an issue, but when we increased from 180.000 to 500.000 enqueued jobs, the amount of timeouts increase tremendously. We used to have 60 concurrent workers of this class without any issue. What stands out for me: we have not defined any unique-jobs specific config in the worker, and the worker itself does nothing complex.

Expected behavior
No timeouts

Current behavior
±150 very slow evalsha calls

Worker class

class MyDearWorker
  include Sidekiq::Worker
  include Sidekiq::Throttled::Worker

  sidekiq_options(
    queue: :low,
    retry: 8
  )

  # sidekiq_throttle config

  def perform
    # do something
  end
end

Additional context

SidekiqUniqueJobs was updated today from 7.0.8 to 7.1.12 to no avail
Sidekiq 6.3.1

redis-cli info returns

# Server
redis_version:5.0.7
redis_git_sha1:00000000
redis_git_dirty:0
redis_build_id:5f6bfe2b13cc4617
redis_mode:standalone
os:Linux 4.15.0-74-generic x86_64
arch_bits:64
multiplexing_api:epoll
atomicvar_api:atomic-builtin
gcc_version:8.3.0
process_id:1
run_id:1afc9ece9143d52e06cc7dcd1e2ea967779735e2
tcp_port:6379
uptime_in_seconds:2744770
uptime_in_days:31
hz:10
configured_hz:10
lru_clock:11328868
executable:/data/redis-server
config_file:/usr/local/etc/redis/redis.conf

# Clients
connected_clients:79
client_recent_max_input_buffer:2
client_recent_max_output_buffer:0
blocked_clients:64

# Memory
used_memory:349476912
used_memory_human:333.29M
used_memory_rss:360284160
used_memory_rss_human:343.59M
used_memory_peak:367916168
used_memory_peak_human:350.87M
used_memory_peak_perc:94.99%
used_memory_overhead:3616774
used_memory_startup:791504
used_memory_dataset:345860138
used_memory_dataset_perc:99.19%
allocator_allocated:349562064
allocator_active:350646272
allocator_resident:361467904
total_system_memory:67543748608
total_system_memory_human:62.91G
used_memory_lua:103424
used_memory_lua_human:101.00K
used_memory_scripts:35744
used_memory_scripts_human:34.91K
number_of_cached_scripts:12
maxmemory:0
maxmemory_human:0B
maxmemory_policy:noeviction
allocator_frag_ratio:1.00
allocator_frag_bytes:1084208
allocator_rss_ratio:1.03
allocator_rss_bytes:10821632
rss_overhead_ratio:1.00
rss_overhead_bytes:-1183744
mem_fragmentation_ratio:1.03
mem_fragmentation_bytes:10767344
mem_not_counted_for_evict:0
mem_replication_backlog:0
mem_clients_slaves:0
mem_clients_normal:2647718
mem_aof_buffer:0
mem_allocator:jemalloc-5.1.0
active_defrag_running:0
lazyfree_pending_objects:0

# Persistence
loading:0
rdb_changes_since_last_save:863
rdb_bgsave_in_progress:0
rdb_last_save_time:1638718611
rdb_last_bgsave_status:ok
rdb_last_bgsave_time_sec:0
rdb_current_bgsave_time_sec:-1
rdb_last_cow_size:2232320
aof_enabled:0
aof_rewrite_in_progress:0
aof_rewrite_scheduled:0
aof_last_rewrite_time_sec:-1
aof_current_rewrite_time_sec:-1
aof_last_bgrewrite_status:ok
aof_last_write_status:ok
aof_last_cow_size:0

# Stats
total_connections_received:221466
total_commands_processed:1636072405
instantaneous_ops_per_sec:72
total_net_input_bytes:93096542534
total_net_output_bytes:53079207633
instantaneous_input_kbps:19.85
instantaneous_output_kbps:7.00
rejected_connections:0
sync_full:0
sync_partial_ok:0
sync_partial_err:0
expired_keys:47100
expired_stale_perc:0.14
expired_time_cap_reached_count:0
evicted_keys:0
keyspace_hits:1508620767
keyspace_misses:2491703
pubsub_channels:1
pubsub_patterns:0
latest_fork_usec:10812
migrate_cached_sockets:0
slave_expires_tracked_keys:0
active_defrag_hits:0
active_defrag_misses:0
active_defrag_key_hits:0
active_defrag_key_misses:0

# Replication
role:master
connected_slaves:0
master_replid:18bafaa2c497072bb991b4bdade7721b38155c46
master_replid2:0000000000000000000000000000000000000000
master_repl_offset:0
second_repl_offset:-1
repl_backlog_active:0
repl_backlog_size:1048576
repl_backlog_first_byte_offset:0
repl_backlog_histlen:0

# CPU
used_cpu_sys:3475.941359
used_cpu_user:134248.522011
used_cpu_sys_children:67.156182
used_cpu_user_children:596.732377

# Cluster
cluster_enabled:0

# Keyspace
db0:keys=1507,expires=1349,avg_ttl=127866632386

Does this ring any bells? Thanks a lot in advance, if I can provide any more info please let me know

The text was updated successfully, but these errors were encountered:

mhenrixon · 2021-12-06T11:04:20Z

At first glance, there is nothing there that points to sidekiq unique jobs. I'd venture as far as to suggest you look into monitoring redis.

Keep in mind that with 60 workers, you'll be hitting redis pretty hard.

If you also have rails or something hitting redis from the other side you could be out of connection or something.

If you are on something like heroku, you likely need to increase your timeouts to redis.

There is also this whole thing with connection pools (sidekiq does a good job usually but you might still want to look into it).

As for timeouts, I believe I saw something like 8 seconds for heroku but I could be mistaken.

mhenrixon · 2021-12-06T11:05:13Z

blocked_clients:64 looks a little suspicious if you ask me.

glennpjones · 2021-12-13T17:45:09Z

Figured it out !

First of all thanks for your reply.

I've continued monitoring and digging, and I kept being pointed in the direction of that lua script being called through evalsha.

This particular queue receives 2 types of workers: a worker that fetched some data (without a config on uniqueness), and a cleanup worker, which had a uniqueness constraint. What happened is that every "fetching" worker would enqueue said cleanup worker x time in the future. But that cleanup worked had the following config:

    lock:        :until_executed,
    on_conflict: :replace

As the queue was being processed, between 50-100 jobs a second, Redis clogged up, which I think makes sense, considering https://github.com/mhenrixon/sidekiq-unique-jobs/blob/f09d41b818286ed5549ac90ccc2ea386be9b5ca4/lib/sidekiq_unique_jobs/lua/shared/_delete_from_queue.lua has to iterate over (a part of) 500.000+ jobs in steps of 50

Moving the cleanup job to a short queue completely fixed it.

Is this expected behavior?

ezekg · 2024-02-06T20:14:03Z

I was bit pretty hard by this. I had a couple workers that used the on_conflict: :replace strategy, and the schedule queue had 50k-1M of these jobs, depending on the day. I didn't realize this gem was scanning the entire schedule sorted set and doing a string find on each job payload. Suffice it to say, this was incredibly slow and ended up blocking Redis, resulting is severe downtime (not blaming, just sharing — downtime is entirely on me — and there is no better way that I can think of to do a search in the schedule sorted set). To resolve the issue, I moved away from scheduling unique jobs and our Redis issues were resolved after clearing Sidekiq's schedule queue of the unique jobs. 🙂

Do you think a separate set should be used for unique scheduled jobs, maybe keyed on unique ID? Something with constant time deletes would be super beneficial for the :replace strategy. But then we venture out of Sidekiq's schedule set…

mhenrixon · 2024-02-06T20:43:57Z

Uff! Sorry to hear that @ezekg. I just saw your status update on Twitter/X.

I wasn't aware you were using the gem as well. I got some heavy hitters using the gem.

At that scale, the scheduled queue would be a problem.

It is late right now, and I have been working on recursion and converting Mongodb to Postgresql, but I will look at this first thing tomorrow morning.

I could provide a warning as I reschedule the job in case the queue hits up a number.

That said, monitoring the sidekiq latency and queue size is likely a great idea.

If you want to jump on a call and discuss our options, I'm always down.

I have been working alone for so long, I wouldn't mind some company.

ezekg · 2024-02-06T21:07:37Z

Not your fault. We typically had ~20k scheduled jobs in there, but as a new feature was adopted, the queue size also started to grow. The straw that broke the camel's back was a new customer adopting the feature for a large user base, and that boomed the queue from ~20k to 250k in a ~~single day~~ few hours, which kept growing and growing as they added more users (and it didn't help that they scheduled their jobs out days and weeks, not hours like the rest of our users).

As it grew, our performance dropped and Redis read/write/connect timeouts started to happen, but I couldn't figure out why. We've handled that load before without a hitch, and our Redis instance is beefy, but this time, something was different.

The difference was the workload. Nobody else pushed that many jobs to the schedule queue, and so each time a job was pushed, Redis would block while it iterated the schedule queue trying to find any unique jobs to replace on conflict.

Don't feel bad. It's absolutely not your fault, and this gem has been monumental in scaling Keygen (which is why we sponsor it!). I just wanted to share, so that we could think of a solution that could scale for the next users. 🙂

I knew my heavy usage of scheduled jobs was going to bite me eventually, and oh did it.

mhenrixon · 2024-02-07T12:56:45Z

My idea was to move this part to ruby to avoid doing such a long-running task in lua.

Or perhaps check the latency or the size of the queues before calling lua and either helpfully crash or use a ruby script.

I initially thought lua ensures the job isn't picked up by sidekiq as I am trying to replace it.

Maybe that's an edge case, and it might be possible to solve with a reliable fetcher.

ezekg · 2024-02-08T15:53:39Z

Yeah, I guess moving to Ruby could potentially be better since it'd be non-blocking compared to a single Lua script. But it'd still be iterating over potentially millions of scheduled jobs, and I think that's where the inefficiency lies. Iterating a queue that big just hammers Redis if it's done too often, and in my case it was being done tens of times per-second. And the job that was being searched for in my case was likely towards the end of the queue, since they were being scheduled far out.

Not sure what else you could do though, since you can't do any type of search algorithm since the data is essentially random, and requires a string search to find the job with the unique job digest. If the schedule queue were a different data structure (it's a sorted set right now IIRC), e.g. a hash keyed on jid, it'd make O(1) look ups (and deletions/replacements) possible by keeping an internal mapping of unique job digest to jid, but right now, that's not how Sidekiq is built.

Only other solution I can think of would be to keep a separate queue for unique jobs, with a better data structure, and maybe tell Sidekiq to pull from that queue as well when looking for scheduled jobs.

ezekg · 2024-02-08T16:13:03Z

Thinking some more — if we kept an internal mapping of unique job digest to job payload, we could do a fast ZREM. But that seems like a big storage overhead, keeping the same job payload in 2 separate places. Could also be brittle if Sidekiq or somebody else modifies the job payload at all (which I'm sure it does e.g. when setting retry count), which would break the ZREM lookup if not kept in sync. So let's scratch that off as being too brittle.

But if we kept a mapping of digest to score (timestamp) of the job in Sidekiq's schedule sorted set, we could at least only look at a smaller subset of the queue, instead of the whole queue, using ZRANGE BYSCORE. Would need to handle reschedules to keep the score in sync, but that seems doable to me.

Here's some psuedo-code to show what I'm thinking:

local function delete_from_sorted_set(name, digest)
  local per   = 50
  local score = redis.call("hget", "digest-scores", digest)
  local items = redis.call("ZRANGE", name, score, score + per -1, "BYSCORE")
  local result

  for _, item in pairs(items) do
    if string.find(item, digest) then
      redis.call("ZREM", name, item)
      result = item
      break
    end
  end

  return result
end

May need to be adjusted to loop if more than 50 jobs are scheduled at the exact same time (score).

What do you think? Would reduce the search space if we used the score.

glennpjones closed this as completed Dec 13, 2021

mhenrixon reopened this Feb 6, 2024

ezekg mentioned this issue Feb 13, 2024

Add digest scores for faster deletes in sorted sets #835

Merged

mhenrixon closed this as completed in #835 Feb 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Slow evalsha causing timeouts #668

Slow evalsha causing timeouts #668

glennpjones commented Dec 5, 2021

mhenrixon commented Dec 6, 2021 •

edited

mhenrixon commented Dec 6, 2021

glennpjones commented Dec 13, 2021 •

edited

ezekg commented Feb 6, 2024

mhenrixon commented Feb 6, 2024

ezekg commented Feb 6, 2024 •

edited

mhenrixon commented Feb 7, 2024

ezekg commented Feb 8, 2024 •

edited

ezekg commented Feb 8, 2024 •

edited

Slow evalsha causing timeouts #668

Slow evalsha causing timeouts #668

Comments

glennpjones commented Dec 5, 2021

mhenrixon commented Dec 6, 2021 • edited

mhenrixon commented Dec 6, 2021

glennpjones commented Dec 13, 2021 • edited

ezekg commented Feb 6, 2024

mhenrixon commented Feb 6, 2024

ezekg commented Feb 6, 2024 • edited

mhenrixon commented Feb 7, 2024

ezekg commented Feb 8, 2024 • edited

ezekg commented Feb 8, 2024 • edited

mhenrixon commented Dec 6, 2021 •

edited

glennpjones commented Dec 13, 2021 •

edited

ezekg commented Feb 6, 2024 •

edited

ezekg commented Feb 8, 2024 •

edited

ezekg commented Feb 8, 2024 •

edited