Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Subscriber receives all cached messages upon initial connection on Redis-backed channel regardless of buffer lengt #445

Closed
danjbh opened this issue Feb 26, 2018 · 16 comments

Comments

@danjbh
Copy link

danjbh commented Feb 26, 2018

I'm currently testing nchan w/ a Redis back-end and noticed that my websocket clients will receive all messages cached by the app instead of the number I have set in nchan_message_buffer_length (with nchan_subscriber_first_message set to oldest).

Basically, I need clients who request a specific location to get just the 5 messages in a channel buffer. I'm not sure if it makes a difference, but I am running this inside of Docker (nchan version 1.1.14). If I use the backup storage mode, this problem goes away but then I can't scale this app horizontally.

Also, I noticed that if I run a flushall on Redis after hitting the buffer length, it seems to un-stick the messages somehow (perhaps that's a clue).

Let me know if there's anything I can do to help track this one down. This is a great piece of software and would be willing to send over a healthy donation if we could get this fixed and into production!

Here is my config and steps to reproduce...

    nchan_use_redis on;
    nchan_redis_url "redis://redis:6379";
    nchan_message_buffer_length 5;
    nchan_message_timeout 0;

    listen       80;
    server_name  localhost;

    location / {
        root   /usr/share/nginx/html;
        index  index.html index.htm;
    }

    error_page   500 502 503 504  /50x.html;
    location = /50x.html {
        root   /usr/share/nginx/html;
    }

    location = /oldest {
      nchan_subscriber websocket;
      nchan_channel_id "foo";
      nchan_subscriber_first_message oldest;
    }

    location = /pub {
      nchan_publisher;
      nchan_channel_id "foo";
    }
}

Steps to reproduce

  1. Use provided config above
  2. Write 5 messages to the foo channel using the pub endpoint (via curl or preferred method)
  3. Pull the channel via Websocket connection so that it gets cached
  4. Write 5 additional messages to the foo channel
  5. Pull the channel via Websocket and you get all 10 cached messages, from steps 2 and 4.
@danjbh
Copy link
Author

danjbh commented Feb 26, 2018

Also, I do realize that I could just set the first message setting to -5 and call it a day, however, I'm actually looking to set the buffer length to 100, which is beyond that maximum allowed value of -32 for that setting.

I've essentially reduced the example to use a buffer length of 5 to help simplify and speed up the steps to reproduce.

@concreted
Copy link

I'm also running into this issue. I was able to alleviate the issue for a while with nchan_subscriber_first_message -<num_messages>, but am still seeing it occasionally. From what I've seen It happens more often on resource constrained deployments - using AWS, I see it almost never when using 4 t2.small nodes vs. fairly consistently with 1 t2.small.

@concreted
Copy link

Restarting the Nchan servers also clears out the old messages. So it seems like they are cached to memory and getting 'stuck'.

@danjbh
Copy link
Author

danjbh commented Mar 23, 2018

Any thoughts on this or anything we can do to help move it along?

@slact
Copy link
Owner

slact commented Mar 24, 2018

I already have a pretty good idea what's happening here, the fix is just a bit tricky. If you're interested, I wouldn't mind if you did some regression testing to see if an earlier version worked as expected.

@danjbh
Copy link
Author

danjbh commented Mar 24, 2018

Definitely.. just let me know which version and I'll give it a shot. Thanks!

@concreted
Copy link

I'm also willing to do some testing, do you have an idea which range of older versions may have worked as expected?

@danjbh
Copy link
Author

danjbh commented Mar 29, 2018

*nudge* :)

@slact
Copy link
Owner

slact commented Jun 27, 2018

@danjbh , @concreted : This issue should be fixed with d3a6557. Please rebuild Nchan from lastest master and let me know if the problem persists, as I have trouble replicating it with any consistency.

@slact slact added the testing label Jun 27, 2018
@slact
Copy link
Owner

slact commented Jun 27, 2018

Err, nope, I just reproduced it. Not fixed yet...

@slact
Copy link
Owner

slact commented Jun 27, 2018

aand fixed in a778114 . Please rebuild from master and give it a try.

@slact slact changed the title Websocket client receives all cached messages upon initial connection regardless of buffer length Subscriber receives all cached messages upon initial connection on Redis-backed channel regardless of buffer lengt Jun 28, 2018
@danjbh
Copy link
Author

danjbh commented Jun 28, 2018

I'm heading out of town for the next week but I'll give it a shot and see if I can reproduce once I return. Thanks again for looking into this @slact !

@danjbh
Copy link
Author

danjbh commented Jul 9, 2018

Alright, I'm back in action and was able to perform some limited testing this morning. So far I have been unable to reproduce and will do some extensive ("production style") testing later this afternoon/evening.

Anyhow, everything looks promising so far -- thanks again and stay tuned!

@danjbh
Copy link
Author

danjbh commented Jul 10, 2018

Just finished rolling these changes into production and everything is looking great so far. I'll monitor closely over the next few days and will report any issues, but typically I would have seen a problem by now. I'm able to successfully scale the app horizontally into multiple containers w/ a shared redis back-end. Good stuff!

Thanks a bunch for getting this working -- this software has been extremely valuable and even more so now that we can scale it properly!

@slact
Copy link
Owner

slact commented Jul 10, 2018

Great, this code will be part of the upcoming release. Redis connection management got a big rewrite too, and you can now load-balance message delivery onto slaves, optimize for Redis CPU or Bandwidth, there's auto-failover for clusters and master-slave setups, and a bunch of other stuff. I'll be putting out an official release with all this stuff in a week or two.

@slact slact closed this as completed Jul 10, 2018
@slact slact removed the testing label Jul 10, 2018
@ivanovv
Copy link

ivanovv commented Jun 19, 2020

@danjbh It would be aweosme if you share the horizontal scaling schema that you use.

From this thread and nginx.conf 2016 slides by Leo I have an impression that your config looks something like this:

  • several t2.small instances in AWS EC2 use the same nginx/nchan config
  • ELB accepts traffic and passes connections to a random instance

Am I right?

Also, I have a couple of questions:

  1. why you decided to go with several smaller instances instead of one big one? Failover? if so, do you have any failover procedures / scripts?
  2. What is your overall impression of running nchan in such setup? Issues, problems?

Thanks in advance!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants