Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle "LOADING Redis is loading the dataset in memory" #358

Open
shaharmor opened this issue Aug 15, 2016 · 20 comments
Open

Handle "LOADING Redis is loading the dataset in memory" #358

shaharmor opened this issue Aug 15, 2016 · 20 comments

Comments

@shaharmor
Copy link
Collaborator

Hi,

When a slave is first being connected to a master it needs to load the entire DB, which takes time.
Any command that is send to that slave during this time will receive a LOADING Redis is loading the dataset in memory response.

I think we should handle this and retry the command (Maybe even to a different node within the same slot).

@luin thoughts?

@shaharmor
Copy link
Collaborator Author

Its possible that during a failover to a slave, the old master will sync from the new master and cause this error to be returned, which makes the whole failover mechanism not so failsafe.

@luin
Copy link
Collaborator

luin commented Aug 15, 2016

ioredis already supports detecting loading in standalong version: https://github.com/luin/ioredis/blob/master/lib/redis.js#L420-L428. Seems we just need to wait for the "ready" event of the new redis node here: status.https://github.com/luin/ioredis/blob/master/lib/cluster/connection_pool.js#L58-L63

@shaharmor shaharmor added bug and removed question labels Aug 15, 2016
@shaharmor
Copy link
Collaborator Author

@luin something like this?

redis = new Redis(_.defaults({
      retryStrategy: null,
      readOnly: readOnly
    }, node, this.redisOptions, { lazyConnect: true }));

    var _this = this;
    redis._readyCheck(function (err) {
      // TODO: handle error
      _this.nodes.all[node.key] = redis;
      _this.nodes[readOnly ? 'slave' : 'master'][node.key] = redis;

      redis.once('end', function () {
        delete _this.nodes.all[node.key];
        delete _this.nodes.master[node.key];
        delete _this.nodes.slave[node.key];
        _this.emit('-node', redis);
        if (!Object.keys(_this.nodes.all).length) {
          _this.emit('drain');
        }
      });

      _this.emit('+node', redis);

      redis.on('error', function (error) {
        _this.emit('nodeError', error);
      });
    });

Also, how should we handle an error in the _readyCheck function?

@luin
Copy link
Collaborator

luin commented Sep 9, 2016

Hmm...I just checked the code, and it seems that when a node has not finished loading data from the disk, the commands sent to it will be added to its offline queue instead of sending to the redis immediately.

@shaharmor
Copy link
Collaborator Author

So that means that this should already be fixed? I've seen this happen in production, so its definitely an issue.

Could it be that it happens only to slaves or something? or when using scaleReads?

@shaharmor
Copy link
Collaborator Author

Its also possible that it happens if the slave was once connected, but then got restarted for some reason

@luin
Copy link
Collaborator

luin commented Sep 9, 2016

That's strange. Either the node is a slave or a master doesn't affect the support of offline queue. Are you able to reproduce the issue? Or enable the debug log maybe?

@kishorpawar
Copy link

I found this issue when I did following.

  1. Accidentally I ran FLUSHALL on redis-cli, I tried to do ctrl-d.
  2. Without stopping redis-server I copied backed up rdb to dump.rdb and restarted redis-server. I found that the copy did not happen actually.
  3. I stopped redis-server and then copied backed up rdb to dump.rdb and started redis-server. Copy worked.
  4. Started redis-cli
  5. Ran command KEYS * and got error (error) LOADING Redis is loading the dataset in memory

@kaidiren
Copy link

@shaharmor So how did you deal with it finally ?

@stale
Copy link

stale bot commented Oct 23, 2017

This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 7 days if no further activity occurs, but feel free to re-open a closed issue if needed.

@stale stale bot added the wontfix label Oct 23, 2017
@stale stale bot closed this as completed Oct 30, 2017
@shaharmor
Copy link
Collaborator Author

Hey @luin , I just encountered this issue again, and I think we should see how we can fix it.

@shaharmor shaharmor reopened this Feb 13, 2018
@stale stale bot removed the wontfix label Feb 13, 2018
@stale
Copy link

stale bot commented Mar 15, 2018

This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 7 days if no further activity occurs, but feel free to re-open a closed issue if needed.

@stale stale bot added the wontfix label Mar 15, 2018
@stale stale bot closed this as completed Mar 22, 2018
@Eywek
Copy link

Eywek commented Apr 9, 2019

Hello,

Any news on this? I got the same error on ioredis v4.0.10

@alavers
Copy link
Contributor

alavers commented Apr 5, 2020

@Eywek @shaharmor Do you have any more details on how you reproduce this issue?

Is it possible you're connected to a slave that has begun a resync? E.g if the master it was pointing to performed a failover? A redis slave would return -LOADING errors during a resync which might explain how you encounter them without a connection reset.

What happens if you implement a reconnectOnError that returns 2 when a LOADING error is encountered?

@xiandong79
Copy link

any update

@alavers
Copy link
Contributor

alavers commented Apr 15, 2020

^I have a hypothesis that an error handler like this:

    reconnectOnError: function(err) {
      if (err.message.includes("LOADING")) {
        return 2;
      }
    }

might solve this problem and if so should perhaps be made a default ioredis behavior. But I haven't built a repeatable way to reproduce this issue.

@bartpeeters
Copy link

We were able to reproduce this issue by setting up an AWS ElastiCache cluster with the following config:

  • 3 shards, 1 replica per shard
  • Engine: Clustered Redis
  • Engine Version Compatibility: 3.2.10
  • Auto-failover: enabled

We filled this cluster with about 700 Mb of data.
Then we setup an ioredis application which continuously sent redis.get's all with keys that belonged to hash slots of one of our shards.

We deleted the replica node in the chosen shard, no gets failed.

But when we added back a node in this shard, we got multiple

got error during get key theKey93923, error: ReplyError: LOADING Redis is loading the dataset in memory

We used the following config for ioredis:

const Redis = require('ioredis');

const redis = new Redis.Cluster(
  [
    {
      host: 'bart-test.rmoljo.clustercfg.euw1.cache.amazonaws.com',
      port: 6379,
    },
  ],
  {
    enableReadyCheck: true,
    scaleReads: 'slave',
  }
);

Using @alavers 's snippet did indeed solve the issue:

const redis = new Redis.Cluster(
  [
    {
      host: 'bart-test.rmoljo.clustercfg.euw1.cache.amazonaws.com',
      port: 6379,
    },
  ],
  {
    enableReadyCheck: true,
    scaleReads: 'slave',
    redisOptions: {
      reconnectOnError: function(err) {
        if (err.message.includes("LOADING")) {
          console.log('got one of dem loading ones');
          return 2;
        }
      }

  }
);

We see the log message

got one of dem loading ones

and not a single error.

Note that we were only able to reproduce it if we used option scaleReads: 'slave'.

We also tried this exact same scenario with a Redis Cluster on our Dev pc and we were unable to reproduce it that way.
Ioredis kept sending requests to the master while this new replica node was LOADING the redis dataset in memory.
No idea why the behaviour is different between ElastiCache and a non ElastiCache Redis Cluster.

@bartpeeters
Copy link

Should we make @alavers error handler:

    reconnectOnError: function(err) {
      if (err.message.includes("LOADING")) {
        return 2;
      }
    }

ioredis default behaviour, as we were able to reproduce this (see comment above).

If yes, we could make a PR for this.

@michel-el-hajj
Copy link

Sometime this means that you have too much data in Redis and on redis restart, it's going to load all this data in the cache. This will lead to a huge queue that will block any query. If the data isn't important, you need to delete the saved data on the server and restart Redis once again. FLUSHALL won't work since the queue is huge, you need to delete the data directly.

@hktalent
Copy link

@shaharmor Mac OS:

rm -rf /usr/local/var/db/redis/*
brew services restart redis
redis-cli -n 4 FLUSHDB

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

10 participants