Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

uniquejobs hash doesn't get cleaned up #195

Closed
xjlu opened this issue Sep 13, 2016 · 8 comments · Fixed by #200
Closed

uniquejobs hash doesn't get cleaned up #195

xjlu opened this issue Sep 13, 2016 · 8 comments · Fixed by #200

Comments

@xjlu
Copy link

xjlu commented Sep 13, 2016

The versions we are using in one service is

gem 'sidekiq', '4.1.4'
gem 'sidekiq-unique-jobs', '4.0.18'

We use both until_executed and until_timeout. The uniquejob* keys are generated and cleaned up properly, but it leaves a copy of mapping inside the uniquejobs hash, and it keeps growing. This problem started exactly at 4.1.2 (we tested various version around that time). And it seems still a problem with the latest version 4.1.8.

The problem here is that the hash size grows to such an extent that it eventually fills up the redis memory. Normal metrics such as the number of keys don't help debug this issue.

The internal structure of the hash is

["0", [["e820b3a5db2bfe94ea686c10", "uniquejobs:2782395ef5070394f7d5bbf7a6c8b934"], ["1d51c1ce36fa7efda2356a07", "uniquejobs:1b8d30e357006dbf484fbec15d5ffab9"], ["79dc6c691669d857d561875a", "uniquejobs:629965e9a1a162901f6b22918d5ebcbb"], ["90531aeb90f014cb00f17349", "uniquejobs:1b8d30e357006dbf484fbec15d5ffab9"], ["9e6710a97fdfb3f5438477e4", "uniquejobs:1b8d30e357006dbf484fbec15d5ffab9"],
...
@tyrauber
Copy link

tyrauber commented Nov 4, 2016

We are seeing the same problem:

gem 'sidekiq', '~> 4.2.3'
gem 'sidekiq-unique-jobs', '~> 4.0.13'

Periodically, we get a memory error with redis because of this:

Production Redis::CommandError: OOM command not allowed when used memory > 'maxmemory'.

And have to run a rake task to clear these jobs:

require 'sidekiq'
Sidekiq.redis do |conn|
  puts 'Removing uniquejobs:* keys'
  conn.keys('uniquejobs:*').each do |key|
    conn.del(key)
  end

  puts 'Removing uniquejobs hash'
  conn.del('uniquejobs')
end

carlosmartinez added a commit to carlosmartinez/sidekiq-unique-jobs that referenced this issue Nov 10, 2016
@riyad
Copy link

riyad commented Nov 14, 2016

We've come across the same problem.

I've tried to put together a more careful/verbose script (i.e. still not foolproof). It tries to do a few checks before actually deleting data. https://gist.github.com/riyad/9086d2b17ff1e8c091cdb1c7ac501b62

carlosmartinez added a commit to alphagov/sidekiq-unique-jobs that referenced this issue Nov 23, 2016
Relates to this issue in the original gem:
mhenrixon#195
@girak
Copy link

girak commented Dec 6, 2016

@mhenrixon idea how to fix this? we have been holding on upgrading to versions > 4.0.11

@agarcher
Copy link

agarcher commented Apr 6, 2017

Any idea when this fix will be in a versioned release?

@agarcher
Copy link

agarcher commented Apr 12, 2017

So great that the fix is released! 🎉

We have deployed it and confirmed our memory footprint has stopped growing. Now we are turning our minds to clean-up. Has anyone else invested effort into a script that will look through redis and clean out orphaned uniquejobs? With 5.0.0, the leak is fixed, but the already-leaked-keys are not cleaned up.

@mhenrixon
Copy link
Owner

I haven't looked into this yet. It would be totally awesome to have the possibility to clean up. Maybe I can find some time to do this over easter.

Should be a matter of matching the hashes jids against the jids that are in redis but... there could be an easter 🥚 hidden somewhere.

@agarcher
Copy link

There is #195 (comment) but won't that wipe out uniquejobs that are actually still relevant?

@agarcher
Copy link

agarcher commented May 9, 2017

@mhenrixon Any update on a clean up solution? Our Redis instance is sitting at a steady state of 600MB of bogus keys. I've been letting it sit there hoping I will "get something for free" to clean it out. Should there be a separate issue to track this?

h-lame added a commit to alphagov/publishing-api that referenced this issue Mar 13, 2018
This reverts #927.  Sidekiq-unique-jobs 5.x requires redis 3.x but our
infrastructure uses 2.8.  We also have to use the fork rather than a
released version of 4.x because the last release of 4.x doesn't include
a fix for mhenrixon/sidekiq-unique-jobs#195
which means the `uniquejobs` hash key in redis never gets smaller.

Although there is a fix for this in 5.x (see:
https://github.com/mhenrixon/sidekiq-unique-jobs/pulls/200 - this
commit is what is on our fork of 4.x) it may have been changed to rely
on expiry features of redis 3.x that are not available in redis 2.x.

On staging this key is currently 6.5M entries, and consumes ~500Mb.
On production it's only 2.5M entries and consumes ~200Mb.
We're running a (much simplified version of) this script:
https://gist.github.com/riyad/9086d2b17ff1e8c091cdb1c7ac501b62
in a screen session to remove any expired keys from this hash.
h-lame added a commit to alphagov/publishing-api that referenced this issue Mar 13, 2018
This reverts #927.  Sidekiq-unique-jobs 5.x requires redis 3.x but our
infrastructure uses 2.8.  We also have to use the fork rather than a
released version of 4.x because the last release of 4.x doesn't include
a fix for mhenrixon/sidekiq-unique-jobs#195
which means the `uniquejobs` hash key in redis never gets smaller.

Although there is a fix for this in 5.x (see:
https://github.com/mhenrixon/sidekiq-unique-jobs/pulls/200 - this
commit is what is on our fork of 4.x) it may have been changed to rely
on expiry features of redis 3.x that are not available in redis 2.x.

On staging this key is currently 6.5M entries, and consumes ~500Mb.
On production it's only 2.5M entries and consumes ~200Mb.
We tried running a (much simplified version of) this script:
https://gist.github.com/riyad/9086d2b17ff1e8c091cdb1c7ac501b62
in a screen session to remove any expired keys from this hash, but
unfortunately the rate of adding keys to the uniquejobs hash was
greater than the rate of removal.  Instead we waited until the queue
was drained and deleted the key.
h-lame added a commit to alphagov/link-checker-api that referenced this issue Mar 13, 2018
Sidekiq-unique-jobs 5.x requires redis 3.x but our infrastructure uses
2.8.  We also have to use the fork rather than a released version of 4.x
because the last release of 4.x doesn't include a fix for
mhenrixon/sidekiq-unique-jobs#195 which means the `uniquejobs` hash key
in redis never gets smaller.

Although there is a fix for this in 5.x (see:
https://github.com/mhenrixon/sidekiq-unique-jobs/pulls/200 - this
commit is what is on our fork of 4.x) it may have been changed to rely
on expiry features of redis 3.x that are not available in redis 2.x.

On staging this key is currently 2.5M entries, and consumes ~200Mb.
On production it's only 1.2M entries and consumes ~100Mb.
We tried running a (much simplified version of) this script:
https://gist.github.com/riyad/9086d2b17ff1e8c091cdb1c7ac501b62
in a screen session to remove any expired keys from this hash, but
unfortunately the rate of adding keys to the uniquejobs hash was
greater than the rate of removal.  Instead we waited until the queue
was drained and deleted the key.
h-lame added a commit to alphagov/link-checker-api that referenced this issue Mar 14, 2018
Sidekiq-unique-jobs 5.x requires redis 3.x but our infrastructure uses
2.8.  We also have to use the fork rather than a released version of 4.x
because the last release of 4.x doesn't include a fix for
mhenrixon/sidekiq-unique-jobs#195 which means the `uniquejobs` hash key
in redis never gets smaller.

Although there is a fix for this in 5.x (see:
https://github.com/mhenrixon/sidekiq-unique-jobs/pulls/200 - this
commit is what is on our fork of 4.x) it may have been changed to rely
on expiry features of redis 3.x that are not available in redis 2.x.

On staging this key is currently 2.5M entries, and consumes ~200Mb.
On production it's only 1.2M entries and consumes ~100Mb.
We tried running a (much simplified version of) this script:
https://gist.github.com/riyad/9086d2b17ff1e8c091cdb1c7ac501b62
in a screen session to remove any expired keys from this hash, but
unfortunately the rate of adding keys to the uniquejobs hash was
greater than the rate of removal.  Instead we waited until the queue
was drained and deleted the key.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants