Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue with possible loop on run #12

Closed
jwoertink opened this issue Jul 17, 2022 · 11 comments
Closed

Issue with possible loop on run #12

jwoertink opened this issue Jul 17, 2022 · 11 comments

Comments

@jwoertink
Copy link
Contributor

I'm trying to integrate this shard in to Cable using the examples by @mjeffrey18 #7 (comment) but when I run the specs for Cable they seem to just hang after the first spec.

My guess is since we call Cable.restart between each spec, that runs the shutdown which calls this:

def shutdown
  redis_subscribe.run({"unsubscribe"})

  redis_subscribe.close
  redis_publish.close
  connections.each do |_, v|
    v.close
  end
end

If I add some puts after that redis_subscribe.run, it never prints. Now, if I swap that out for this:

redis_subscribe.unsubscribe("")

That will run the specs, though, I get a bunch of IO::Error

@jgaskins
Copy link
Owner

Looks like UNSUBSCRIBE doesn't return anything, but run expects to read a response from the server. So Connection#encode plus a flush should do the trick. I just had a look and it turns out that's actually the exact implementation of Connection#unsubscribe. Looks like Cable also took that into account originally by only writing to the server (calling a very similar method) but not reading back from it.

As for the IO::Error, that seems weird. It doesn't look like you're reusing the same Connection instance because you're using a fresh Cable::Server, which should give you a fresh Redis::Connection, which should give you a fresh TCPSocket.

@jgaskins
Copy link
Owner

I just noticed why you were trying to use run. It didn't occur to me that splat args in Crystal requires at least one arg (a lesson I learn every couple months and promptly forget), so there's no way to send a plain unsubscribe with the current implementation.

@jwoertink
Copy link
Contributor Author

Yeah, I want to unsubscribe from all channels, so I have to at least pass 1 arg. It sounds like this is the right path to just use unsubscribe() instead of run() here. Thanks for the insight.

I'm still not sure why the IO::Error on the specs, so I'll keep digging around on that. I must be missing something.

@jgaskins
Copy link
Owner

Cool, I'll add some method overloads to accommodate that.

I've been playing with this, too, and I can't seem to reproduce it without Cable. I'll post my findings on cable-cr/cable#48.

@jgaskins
Copy link
Owner

Oh wait, I think I know what this is. It's not happening when the new connection spins up, which is what I thought was happening. It's that the Redis::Parser#read from your previous spec is still running in the other fiber when the connection is closed. You're correct, the problem is that this shard is not accounting for the connection closing.

I was able to reproduce that behavior very simply:

require "../src/redis"

subscriber = Redis::Connection.new

spawn { subscriber.subscribe "foo" {} }

subscriber.close

I'll try to have a fix out (along with the empty unsubscribe overload) this evening.

@jgaskins
Copy link
Owner

This also explains why the exception wasn't failing your specs.

@jwoertink
Copy link
Contributor Author

Nice! Glad you were able to find it. Thanks for the help on it.

@jgaskins
Copy link
Owner

Released v0.5.0 with the fix for this. I confirmed that the specs on cable-cr/cable#48 pass locally with this change.

@jwoertink
Copy link
Contributor Author

Sweet! I ran it locally and most of those IO error issues did go away. There does still seem to be 1 left, but maybe not related?

cable on  issues/46 is 📦 v0.2.1 via 🔮 v1.4.1 
❯ crystal spec 
.....Unhandled exception in spawn(name: Cable::Server - subscribe): Closed stream (IO::Error)
  from /usr/share/crystal/src/io.cr:118:5 in 'check_open'
  from /usr/share/crystal/src/io/buffered.cr:165:5 in 'write_byte'
  from /usr/share/crystal/src/char.cr:897:9 in 'to_s'
  from /usr/share/crystal/src/io.cr:174:5 in '<<'
  from lib/redis/src/writer.cr:12:7 in 'encode'
  from lib/redis/src/connection.cr:125:5 in '->'
  from /usr/share/crystal/src/fiber.cr:146:11 in 'run'
  from /usr/share/crystal/src/fiber.cr:98:34 in '->'
  from ???
.....*...........................

Digging in to this one, it looks like the handler spec calls Cable.restart a second time..
https://github.com/cable-cr/cable/blob/a6672c4f19dc25a61b35c403db4b883ae86f834e/spec/cable/handler_spec.cr#L107

Removing this line fixes (which is nice), but it seems calling that Cable.restart twice in a row borks it... I'm gonna just update the spec and move on though, this is awesome!

@jgaskins
Copy link
Owner

I did see that once when I ran it, but it didn't show up again. I think it's due to asynchrony in spawn in Cable::Server#subscribe_to_internal. By the time that fiber runs for the first restart, the second one may or may not have closed its socket. Increased latency to the server might make it more reliably reproducible.

If I understand the code in Stefan's client (and there is no guarantee for that), it may have been silently reconnecting to send the SUBSCRIBE command instead of raising an exception:

This implementation doesn't do that because, while Connection#run does reconnect and retry if a connection is lost, it can be dangerous to do that for encode, since it's used by things like pipelines. For example, if we disconnect in the middle of a pipeline block, we may try to read the results of 5 commands but the server only sends 3 results. Coincidentally, I added a warning to the docs about this right as you posted this comment.

@jwoertink
Copy link
Contributor Author

Ah, that makes sense. Good call. Yeah, I think calling that twice in a row should be avoided anyway, so definitely not an issue in this shard. 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants