Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Already on GitHub? Sign in to your account

Feature/overhaul #12

Merged
merged 68 commits into from Mar 27, 2012

Conversation

Projects
None yet
2 participants
Collaborator

nate commented Mar 21, 2012

The overhaul is feature complete and well tested. All that remains is code review and documentation.

If you check out the benchmark results in the benchmark/sieve.rb file, you'll see that I managed to squeeze pretty good performance out of it, so I don't know that there's much more work to be done there on a larger scale, but we can play around with micro-optimizing.

Let me know what you think.

Collaborator

nate commented on lib/agent/channel.rb in 492632f Mar 9, 2012

I'm emulating go's channel behavior right here, but I'm not really sure if that's a good idea.

With go, if a channel gets closed while waiting to write to it, you get a panic. If some code is trying to 'receive' on a channel, and it gets closed, then you get a multi-return value like this.

I like it.

Owner

igrigorik replied Mar 9, 2012

Since we're trying to emulate Go's behavior... we should probably defer to its mechanics as much as we can. In other words, lgtm.

Collaborator

nate commented on lib/agent/queue.rb in 492632f Mar 13, 2012

I realized last night that this method and this entire class only works on buffered queues. Setting the size to zero breaks absolutely everything. I need to re-think things. I'm wondering if I shouldn't split up the buffered and unbuffered functionality completely, since they behave so differently. I might just check out how go does it at this point...except they can cheat by controlling the scheduler....and I really don't want to use Thread.critical...ugh....

Owner

igrigorik replied Mar 13, 2012

A separate implementation for for unbuffered is actually pretty reasonable.. I think.

I've long wanted to do - in fact, had some code half way there, but never finished - an unbuffered channel over 0MQ. Benefit being you could piece together multiple workers beautifully.. but it only works on unbuffered channels. :)

(not that unbuffered channel needs to run over 0mq..)

Collaborator

nate replied Mar 13, 2012

Ah, right, I guessed that was the idea behind the making transports rather agnostic. Again, I was going for the lowest time to implement a correct representation of go's channels. Also, I'd like to wait and see what go replaces netchan with.

Owner

igrigorik replied Mar 13, 2012

From what I can tell.. netchan was a fun experiment that's gone pretty much nowhere - unfortunately, exactly because its unbuffered. I'm surprised they're still keeping it in stdlib.

Collaborator

nate replied Mar 13, 2012

Well, at least they're putting it under the "old" package.

Owner

igrigorik replied Mar 13, 2012

s/old/dead/g :)

Hmm, any particular reason why you're going with the ! convention for channels and 'goroutines'?

Owner

igrigorik replied Mar 13, 2012

To answer my own question (reading through the code), guessing to avoid the Kernel::select, etc, conflicts.

Maybe it's the novelty factor, but not yet sold on it.. although don't have a much better recommendation (yet).

Collaborator

nate replied Mar 13, 2012

Yep, that's the issue I'm trying to solve with the new syntax, and since I was already doing it with select, I thought I'd make it consistent across all the Kernel extensions that Agent adds.

Pop and Push are different, but share a lot of common code.. perhaps something that can be refactored?

Collaborator

nate replied Mar 13, 2012

Yep. I was keeping them separate so I could move quickly until I figured out what exactly they needed to do. I figured I should shoot for correctness first, and whatever got me there sooner...

Owner

igrigorik replied Mar 13, 2012

sounds good

guessing you want to increment that up, not down? :)

Collaborator

nate replied Mar 13, 2012

ha, yep! I have that fixed in my local branch, but haven't pushed it up yet. I thought that might be a bit hasty given that I completely missed unbuffered channels...

Hmm, really?

Collaborator

nate replied Mar 13, 2012

Channels in go don't require a name. Do you think they should? It seems to me that channel names are just an implementation detail, for the most part, unless you want to initialize two channels completely independently and have them use the same queue.

Owner

igrigorik replied Mar 13, 2012

Hmm, now I'm trying to remember why I forced the name requirement to begin with..

Collaborator

nate replied Mar 13, 2012

Maybe it was a requirement initially before you started doing the UUID stuff with selector?

Collaborator

nate replied Mar 13, 2012

Sometimes first implementations become permanent implementations. :)

Owner

igrigorik replied Mar 13, 2012

Still scratching my head. If it hits me, I'll let you know. :-)

In the meantime, let's go with no names.

this works for you across all vms?

Collaborator

nate replied Mar 13, 2012

Yes, but on accident, I believe, due to the default of buffered queues.

Collaborator

nate replied Mar 13, 2012

Looks like it's broken on jruby. Heh...

Owner

igrigorik replied Mar 13, 2012

Right, that's why I had the :vm => :ruby in there ;-)

Collaborator

nate replied Mar 13, 2012

Fixed. :)

Seems like for :send the block is potentially optional?

Collaborator

nate replied Mar 13, 2012

Probably, yes.

Collaborator

nate replied Mar 13, 2012

Fixed.

Yikes, do we really need to pass in and do an explicit wait for every operation?

Collaborator

nate replied Mar 13, 2012

We need to make sure that the Push is enqueued in the buffered channel before releasing control back to the calling thread. We're doing it this way because we need to allow deferred channel operations for the selector to work, which means on non-deferred operations we need to block until the operation completes. It might seem slow, but it's actually not too bad. We achieved pretty good throughput on the sieve benchmark with this implementation with ruby 1.9.2, somewhere around 7500 messages per second (used a puts in Channel#receive and then used grep and wc -l the output).

Owner

igrigorik replied Mar 13, 2012

I'm actually more worried about the explicit 3 line push/pop setup from an API standpoint, although the extra objects don't help either. :-)

Any reason why we can't "push" that code down into the library itself, instead of placing the burden on the user?

Collaborator

nate replied Mar 13, 2012

Users don't directly interface with the queue, though, so this API is hidden in the channel and the selector. I could shove this off into the queue, I suppose?

Collaborator

nate replied Mar 13, 2012

And users typically won't use deferred operations, so they'll never even see the push and pop objects. These operations are really only for internal use. I considered hiding them away in a subdirectory for the queue to imply that they're not for public consumption, but I felt it would be better to document what is supposed to be used (eventually).

Owner

igrigorik replied Mar 13, 2012

facepalm .. it's been a long day. Nevermind, you're right.. the users wouldn't see this anyway - all good.

Collaborator

nate replied Mar 13, 2012

No worries, I absolutely know the feeling. :)

Collaborator

nate commented on lib/agent/queue.rb in d5040c3 Mar 18, 2012

Ugh, I just found out that this breaks when we do a select with a send and a receive on the same unbuffered channel. The select ends up executing both the send and the receive and both succeed, which is not at all what we want. This means that the thing that makes select possible, deferred push and pop, are dependent on buffered channels.

I'm going to have to think about how to fix this. I'm afraid I might have to rewrite it again, and this time more similar to how go does it internally. Translating that to ruby won't be easy, though, since go's versions makes a lot of assumptions about how goroutines will be scheduled and we don't really have good control over ruby's thread scheduler. On top of that, ruby is preemptive and go is cooperative. The closest things we have that can control scheduling are Thread.critical and wrapping everything we want to execute exclusively in the same Monitor/Mutex.

I really need to think long and hard about this again.

Collaborator

nate replied Mar 18, 2012

Oh, and any input is most definitely welcome.

Owner

igrigorik replied Mar 18, 2012

So, how long before you're reimplementing all of your code in Go instead? ;-)

Collaborator

nate replied Mar 18, 2012

Heh, I've just about thrown in the towel on this a few times already, and rewriting some things in Go is something I've been considering, but I'd still like this in ruby. I suspect that I'll need to get my hands very, very dirty to do this correctly.

Collaborator

nate replied Mar 19, 2012

Looking more into Thread.critical, it's not really an option: http://ruby.11.n6.nabble.com/ruby-core-20999-Supporting-Thread-critical-with-native-threads-td3552902.html

I think I might have a solution to this, though. I just need to work out some more issues and I'll commit it/push it up.

Collaborator

nate commented on lib/agent/queue.rb in 0fd2693 Mar 19, 2012

I did it! This line and the similar one below were the checks needed to make it work properly with unbuffered channels. Basically, don't trigger a push or a pop if it's from the same blocking group.

Next up for me is to refactor, document, and optimize like hell. I'm super excited that this is working correctly.

Owner

igrigorik replied Mar 22, 2012

Sweet!

Collaborator

nate commented on lib/agent/channel.rb in 492632f Mar 20, 2012

What do you think about this change in syntax?

# You only need the type in the case of unbuffered channels
string_unbuffered_channel = channel!(String)
# But if you want to give it a name, that could work like this:
string_unbuffered_channel_with_opts = channel!(String, :name => "foo")

# Allow easier passing of the size of a channel (more go-like in its declaration)
string_buffered_channel = channel!(String, 1)
# But still allow the other options (name)
string_buffered_channel_with_opts = channel!(String, 1, :name => "foo"))

It could be implemented like this:

def initialize(*args)
  opts = args.pop if args.last.is_a?(Hash)
  type = args.shift
  size = args.shift || 0

  raise Untyped unless type

  # Module includes both classes and modules
  raise InvalidType unless type.is_a?(Module)

  @state      = :open
  @name       = opts[:name] || Agent::UUID.generate
  @max        = size
  @type       = type
  @direction  = opts[:direction] || :bidirectional

  @close_mutex = Mutex.new

  @queue = Queues.register(@name, @max)
end
Owner

igrigorik replied Mar 22, 2012

works for me

Collaborator

nate commented on a5ee664 Mar 21, 2012

It's crazy how much slower Monitors are than Mutexes, especially on ruby 1.8. The sieve benchmark takes 79.7s in ruby 1.8 w/ Monitors. The same benchmark takes 32.5s with Mutexes. In ruby 1.9.2 it goes from 10.5s to 9.0s. A big improvement, imho.

Owner

igrigorik replied Mar 22, 2012

Huh, well that's good to know. Now I'm curious to see where else Monitors are used.. and could benefit from this observation.

Owner

igrigorik commented Mar 22, 2012

This is one epic merge :-)

Collaborator

nate commented Mar 22, 2012

I'm pretty much being the worst contributor ever, I know. ;)

Owner

igrigorik commented Mar 24, 2012

To the contrary! :)

I'd merge this now.. feel free to go live with it once you feel its ready.

Collaborator

nate commented Mar 25, 2012

Will do. I have one more significant change to make and then I'll just go merge-crazy.

@nate nate merged commit d9e489b into master Mar 27, 2012

Collaborator

nate commented Mar 27, 2012

Alright, it's merged in. Can we release the gem then?

Owner

igrigorik commented Mar 28, 2012

Yup. Specs seem to pass on this end - looks great.

Are you using this in prod? Wondering if we should jump to 0.9 or something like that, give it a bit of time to settle in, and then push out a 1.0? Or, happy to call it 0.2 and keep iterating.

Collaborator

nate commented Mar 28, 2012

Jumping to 0.9 would be fine. We're not using it in prod just yet, but will be in a few weeks, probably. We could wait until then before considering pushing 1.0.

Owner

igrigorik commented Mar 28, 2012

Sounds good. 0.9 should be up on rubygems now. Kudos for the awesome work!

Collaborator

nate commented Mar 28, 2012

Thanks so much!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment