Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

reuse connection in the same fiber for connection_pool #72

Closed
wants to merge 4 commits into from

Conversation

qqshfox
Copy link
Contributor

@qqshfox qqshfox commented Oct 24, 2011

No description provided.

@igrigorik
Copy link
Owner

This patch assumes that the connection can be reused, which unfortunately is not true most of the time.. Most protocols require that you wait for the response before sending another command. I can see how we may want this behavior in certain cases, but I don't think it should be the default.

@qqshfox
Copy link
Contributor Author

qqshfox commented Oct 24, 2011

I see...

But as the current implementation, we will lost the former connection if we call the 'acquire' method twice:

def acquire(fiber)
  if conn = @available.pop
    @reserved[fiber.object_id] = conn
    conn
  else
    Fiber.yield @pending.push fiber
    acquire(fiber)
  end
end

and still 'release' twice:

def execute(async)
  f = Fiber.current

  begin
    conn = acquire(f)
    yield conn
  ensure
    release(f) if not async
  end
end

def release(fiber)
  @available.push(@reserved.delete(fiber.object_id))

  if pending = @pending.shift
    pending.resume
  end
end

then push a 'nil' to the list 'available'.

This will lead to a bug, right?

@qqshfox
Copy link
Contributor Author

qqshfox commented Oct 24, 2011

How to reproduce:

require 'em-synchrony'

class Client
  def request
    yield if block_given?
  end
end

POOL = EM::Synchrony::ConnectionPool.new :size => 2 do
  Client.new
end

def log(key)
  puts "+" * 20
  puts key
  puts "available: #{POOL.instance_variable_get(:@available).inspect}"
  puts "available size: #{POOL.instance_variable_get(:@available).size}"
  puts "reserved: #{POOL.instance_variable_get(:@reserved).inspect}"
  puts "reserved size: #{POOL.instance_variable_get(:@reserved).size}"
  puts "-" * 20
end

log "0"
POOL.request do # acquire
  log "1"
  POOL.request  # acquire
                # release
  log "2"
end             # release
log "3"

@igrigorik
Copy link
Owner

Hmm, yeah that's a good catch. So, I guess what we're saying here is:

  • a fiber is allowed to check out a connection from the pool
  • if a "sync" call is made, then fiber is paused until operation is done and connection is automatically returned to the pool
  • if an "async" call is made, then we return immediately after firing the command, and the connection is returned to the pool sometime later when the operation completes. In the meantime, if the same fiber asks to acquire another connection from the same pool, then the same connection object will be returned.

The gotcha that this introduces is as I mentioned before: returning the same connection object means that as a developer you have to be careful to make sure that the connection supports pipelining. As an example, I think mysql will break with this behavior.

Q: Does this spec work with your patch: https://github.com/igrigorik/em-synchrony/blob/master/spec/mysql2_spec.rb#L49

I have a feeling it may break..

ig

@qqshfox
Copy link
Contributor Author

qqshfox commented Oct 25, 2011

Yes, you are right, the spec broken.

Should we create a new issue talkting about that?

@qqshfox
Copy link
Contributor Author

qqshfox commented Oct 25, 2011

the spec fixed

@qqshfox
Copy link
Contributor Author

qqshfox commented Oct 25, 2011

I will add a spec to reproduce my problem soon.

@qqshfox
Copy link
Contributor Author

qqshfox commented Oct 25, 2011

Timeout block should never return. Because the conn popped out from 'available' is 'nil', it will be pushed into the 'pending' queue. The fiber yielded, no one resume it, no more codes will be executed.

@igrigorik
Copy link
Owner

Let's merge #73 and this back together, there is no point in breaking them apart because I can't merge this and introduce a bug in the current behavior.

Also, on further reflection, I think while there is definitely an issue here, it is an edge case. If we're simulating a blocking API, then the block syntax is actually a bit misleading since that's usually the pattern we use to signal that the block may be executed later. If you skip the block syntax, your example will actually work just fine. Having said that, it is still something we probably should address.

As I mentioned earlier, it's also not the case of arguing which behavior is "correct" - we need both, to cover the pipelined and serial cases. To minimize impact to current clients, we should stick with default behavior, which is effectively "serial", but also need to address the nested calls scenario. (that's issue 1)

An entirely separate issue (2) is checking out / returning same connection. This in itself introduces another bug, which can't be addressed without putting some burden on the user to declare which behavior they want. So with that, I'm currently thinking something like the following

ConnectionPool.new size: 2 # each call returns new connection + fixes (1)
ConnectionPool.new size: 2, pipeline: true # call by same fiber returns same connection: fixes (2), basically current patch

@qqshfox
Copy link
Contributor Author

qqshfox commented Oct 26, 2011

If my words make you uncomfortable... I apologize for that... Cause i'm not a native english speaker... I just want to explain my problem clearly...

anyway, we have the same idea now.

Please add a spec to test my codes when pipleine == false, I have no experience about that. Then I can modify my patch. Thanks.

@igrigorik
Copy link
Owner

Hanfei, not at all! Apologies if it came across that way.. Just trying to document what's scrolling through my head.

I'll take a look at the code and see what I can come up with.

@igrigorik
Copy link
Owner

Hanfei, sorry about the delay.. been thinking about this over the weekend, and I think there is a simple solution here: the block syntax shouldn't even be required - all of the connection checkout logic is and should be hidden as is.

Calling some_pool.cmd guarantees clean check-in and check-out, and since we are trying to reproduce a procedural API, the nested block syntax is not what we should be going after in any case. There is one case where the block syntax makes sense: you want to to send pipelined requests on a single connection, in which case your Client example should be made to yield the connection and you should be executing async methods on it.

@igrigorik igrigorik closed this Nov 26, 2011
@qqshfox
Copy link
Contributor Author

qqshfox commented Nov 27, 2011

It seems that we have not crossed the same point... Maybe my example is not so clean...


In my example, the Client#request is not a piplelined method. It is a synced method, send data, recv data, and yield the block. Actually the client is Dalli (set async => true)...

https://github.com/mperham/dalli/blob/master/lib/dalli/client.rb#L105

The both GET and ADD method are synced, so I think the FETCH method is not pipelined.

I know there is no guarantee that the client is pipelined or not, but we still have not solve the problem, right?


Finally I understand what you mean. Thank you so much.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants