-
-
Notifications
You must be signed in to change notification settings - Fork 465
io/wait #303
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
io/wait #303
Conversation
Can you also submit an example of demonstrating net-ssh with celluloid-io, eventmachine? I guess the remaining is the core of the event loop: I'm not sure what's your plan on replacing it. |
lib/net/ssh/proxy/command.rb
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does wait_readable ever returns false?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
According to http://ruby-doc.org/stdlib-2.0.0/libdoc/io/wait/rdoc/IO.html#method-i-wait_readable it returns IO, true, false or nil. All I've managed to test is, if the timeout triggers, it returns nil. If the wait_readable call is successful, it returns the IO object itself. I assumed that it return false if "the command fails", that is, if the descriptor would be placed in the errors fd array. But to be honest, that's just an assumption. Maybe one should read the bit in ruby source code regarding this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if result = self.wait_readable(60)
if result == false
# cannot be reached
end
end
looks fishy to me.
I'm not sure on which cases can IO.select return an exceptional event for pipes but those are not handled by wait_readable for sure.
Currently I only have a patch for celluloid-io. EventMachine is theoretically possible, as em-synchrony provides a TCPSocket quack; but didn't have the time to fiddle with it yet. My currently production patch is quite ugly, an "include and extend" sausage fest I don't want to discuss further (I think I provided a gist link for it in a previous issue regarding celluloid-io), but with these changes, it has the potential of becoming completely irrelevant. Basically, I was search and replacing every instance of io_select calls with the celluloid-io #wait_readable and #wait_writable respective calls. I have a not-yet-tested patch rewrite which looks like this: module Net::SSH
module SSHCompatExtensions
def io_select_with_celluloid(*params)
if ::Celluloid::IO.evented?
IOExtensions.select(*params)
else
io_select_without_celluloid(*params)
end
end
end
module IOExtensions
NEARZERO = 0.00001
def self.wait(t = nil)
if t
time = t.zero? ? NEARZERO : t
timeout(time) { yield }
else
yield
end
rescue ::Timeout::Error,
::Celluloid::TaskTimeout
nil
end
def self.select(read_array, write_array = nil, error_array = nil, tout = nil)
wait(tout) do
readers = read_array
writers = write_array
writers = write_array.map { |w| w.wait_writable ; w } if write_array
readers = (writers ? read_array-writers : read_array).map { |r| r.wait_readable ; r }
writers ? [ readers, writers ] : [readers]
end
end
end
end which seems to works, but still needs further testing. You can already see how I worked around the core session hook. |
I would like to discuss the best way to replace the https://github.com/net-ssh/net-ssh/blob/master/lib/net/ssh/connection/session.rb#L210 line with the new API. The thing I can't get out of my head is that inside a session there is always one descriptor open, but it can be waiting on reads or writes or both, as the channels feature multiplexes on the same socket. So, multiple events on the same IO. So, why not just call #wait on it? If you look at my patch, I'm first waiting on writes and then on reads, in case that is usable. |
Just as a sidenote, I have created celluloid/celluloid-io#169 , as the current Celluloid IO socket API doesn't support the timeout optional parameter. |
@mfazekas , here are some micro-benchmarks: # my script is looks more or less like this;
# main thing: open a connection, perform a blocking requests, exit
Net::SSH.start("testhost", user, password: pass) do |session|
session.exec!("somecommand")
end
The overall gist is: it doesn't make it that much faster, but it reduces a few Array allocations. Still, it's amazing how many strings are being used just for a simple script. In the readers/writers thing, there are a few things that bother me: the max size of each will never be more than 1, as sessions only retain one socket (correct me plz if I'm wrong, but looking at the code it seems that way). And either it's going to be (readers/writers) 0/0, 1/0, 0/1, or 1/1. Currently I don't think that there is a way in ruby (besides using IO.select) to wait for read AND write. If you look at io/wait, there's #wait_writable, #wait_readable and #wait (alias of #wait_readable). NIO4R, which celluloid-io uses, allows to register for reads and writes, but it's sockets don't, as they quack like ruby sockets. This is crap for certain timeout scenarios, because I assume this: readers, writers = IO.select([sock], [sock], nil, 3)
# above will be more efficient waiting for events, below will add less pressure in GC.
sock.wait_readable(3) ; sock.wait_writable(3) I added some more info in the commit messages. Tell me what you think. |
Regarding the missing API: https://bugs.ruby-lang.org/issues/12013 |
@TiagoCardoso1983 i don't think that a few extra array allocations per select has any significance. And yes in select we can have more than one socket. The primary use is sockets opened for port forwarding.
I don't understand this can you ellaborate? NIO4R is an abstraction for select and it's alternatives. Or you say that NIO4R is used internally by celluliod-io and celluliod-io sockets will not work wih NIO4R. Moreover it would be good to define your usecase. Why do you need to use celluloid-io with net-ssh what is the issue with the normal socket implementation. |
Batch Processing. The same way such gems as eventmachine and celluloid-io are used for http concurrent pages fetching. In the ISP world, a lot of routers/switches have to be acessed, and most of the networks are >9000 boxes, without concurrency, software updates can take days, instead of hours. Think of it as capistrano not sequential and not session-per-task (I think it's still this way). Also to clarify, I'm already using celluloid-io with net-ssh, but the I have all the patches stored in an ugly file I'd prefer to get rid of if these changes are accepted, with the possible benefits being others taking advantage of it.
Not in the most common case of capistrano start-stop scripts/deploying to 20 servers, but when we're talking about more than 5000 boxes and long running processes, all allocations count (this goes for the strings as well, but can be mitigated soon with the string freeze from ruby 2.3). Another use-case I'm exploring net-ssh with: Web apps acting as proxies to run command line scripts and piping output back to a websocket/SSE/long-polling client.
it's a bit off-topic and (for now) not a concern of net-ssh, it's just that ruby sockets miss an API to wait for both readable and writable, not a big hinderance. I thought about what I said yesterday, and due to the results I'm experiencing in my probes with this patch, I do see a benefit when I scale a script to 400 remote ssh servers. And thanks for the tip on port forwarding. It's not a feature I'm actively using, but who knows, maybe in the future it might become a topic. |
My final idea after we discuss this patch is to get a client for celluloid-io directly usable from the gem. Just to give you an example of what I mean: https://github.com/brianmario/mysql2/blob/master/lib/mysql2/em.rb this is the EM mysql raw client that is at disposal when someone wants to integrate it in the EM reactor. The idea would be to create one for celluloid-io and eventually one for EM. As you might know, there is this gem, but it's a very hacky and incomplete implementation and is semi-abandoned. |
This still don't explain the advantages of celluloid-io. I get that you use a lot of connections, does celluloid-io allows it to handle 200 net-ssh connection in less thread than the native one? Can you show simple sample code on using net-ssh with celluloid-io so it shares threads?
If all allocation counts you should not use ruby, use C or other lower level language. Select is an expensive call anyway and we usually wait for something there, so i don' think we optimize much by eliminating 3-4 arrays per select. I'm not saying optimizations are unwelcome but you should be able to demonstrate benefits. I don't buy the argument that every allocation counts, esp not if that is not in an obvious hotspot. |
lib/net/ssh/connection/session.rb
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not acceptable replacement for select. We can abstract away, so you can use NIO4R but you have use some select/poll/... primitives at the end.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#wait_readable and #wait_writable call select already, please see the source implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I meant a single muliplex call (in this case select). Waiting for a collection of sockets to become eiher readable or writeable is not same as waiting on them one by one, and first for write then for read.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the first for write then for read is a concern indeed, see my comment with the ruby issue linked.
Here's a script, it'll assume that you have a list of boxes you can pass to stdin and that the authentication method and credentials is the same. For the sake of simplicity, assume pkey authentication method:
To test this properly, you'd have to run with this branch and this one |
More info about celluloid-io here: https://github.com/celluloid/celluloid-io To sum it up, a Celluloid::IO actor has its own thread, mailbox, and optimized event-loop, which uses libev/NIO(java), which use epoll/pool/kqueue/select/whatever's available, provided one uses the io/wait API.
Yes. celluloid-io (and eventmachine, if there was a client already) allow me to access 200 clients concurrently on 1 thread. celluloid-io allows me to distribute the low in n(number of cores) threads, each one running its own event loop.
Most of the people writing scripts for ISPs don't have a CS background and never saw C. They need a basic interface to access the boxes and write commands to automate tasks, and I already have to support a library that abstracts the specific vendors on top of net-ssh and celluloid-io. Please, don't ask me to teach them C :)
Debatable, but still an off-topic. Main purpose of this pull request is replace the IO.select calls with the io/wait variant, which is more maintainable in all cases - 1 (the one we are still discussing). Less allocations is a minor benefit in this case, I agree. |
celluloid-io should have pattern to replace IO.select, which it can work with. Multiple wait_readable and wait_writeable instead of a single select doesn't sounds like an option. It's for will not work for raw TCP sockets not sure about celluloid sockets. |
It can't. eventmachine can't. No scalable event loop does. select is past and usually the last fallback if nothing else is available. There is a cap on the number of sockets you can listen to, and this is the reason why the epool and kqueue API exist. This is C10K problem, nginx, node.js. celluloid-io won't do it. eventmachine won't do it. Don't believe me, there are a lot of papers about the performance and scalability issues with the select system call. I understand that the main goal of net-ssh is not web-scale-scalability, I'm just proposing a replacement using the new socket APIs.
I kind of agreed with you, before re-evaluating it. The only part where that would have influence would be on loop calls with a timeout parameter. The worst case scenario would be
It will. I understand your concern, net-ssh has been quite stable and working for a lot of use cases, and this might seem like a huge change. But it's not. I've been working with net-ssh and following its changelog for the last 3 years, and patching its internals for my purposes. IO.select is a powerful tool and the proper solution for timing out for nonblocking usage. But since 2.0, it's not. IO.select usage is an artifact from old ruby APIs and only libraries supporting 1.9.3 and below still have it implemented. More importantly, do you have any scripts you could test these changes with? I'm testing these changes since last week against scripts using celluloid-io, other scripts using raw net-ssh the channel multiplex feature in a sinatra app, and also with capistrano. FYI, capistrano is the one giving me an harder time, as the uploads are not working. |
So IO.select has it's limitations, and it can be replaced with NIO4R or any similiar stuff which ends up calling some kind of multiplex primitive in system(select,epoll,kqueue...) |
The integration test is good, but insufficient. There are a lot of usages in the wild that could potentially be broken. Just as example, the problem I'm having with capistrano is probably due to some interaction with net-scp. Yup, the change in #process will be controversial without better socket APIs or architecture changes, like each listener socket handling its own events internally. I could try to come up with some proposals. I don't want to criticize the current implementation in any way, I use it as well and I'm quite confortable with it (it just works), just wanted to get a more compatible socket API upstream (without public API changes) for a common use-case: integration with concurrent cli/app clients. I like to monkey to monkey-patch as much as the next guy, I'd just like a better alternative than em-ssh, which clearly didn't work out. |
After some examination of ruby source code, I confirmed that:
So my info previous statement was incomplete. http://daniel.haxx.se/docs/poll-vs-select.html for comparison. |
I tried to hack around it a little bit today, and I'm no longer confident that this is possible with the current design. The default way to interact with the session is to loop in the end without a timeout. As that means that one loops indefinitely on a group of listeners, there is no other way to wait for any event on a group of sockets than to use IO.select. Also, the library uses .send and .recv , which is the blocking Socket API ( #256 maybe have to do with it?) . IMO to make it truly nonblocking that would involve a different library, which means, one can only hack around a solution to make it fully compatible with eventmachine or celluloid-io. Should I revert the changes concerning the .process method and wait for a merge, or are these changes not going to be accepted at all? |
I think wait_readable is more readable than IO.select for sure. I'll just need some time to check some corner cases - like the error case.
|
ok, will do. FYI the ruby issue linked has been handled, there'll probably be an API to poll for readable/writable, which might be useful in the future. |
lib/net/ssh/buffered_io.rb
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don' think the next unless result
is neccesary
When running integration tests, i'm also seeing this error: test_with_rate_limit_and_spurious_wakeup(TestProxy):
IOError: not opened for writing
/net-ssh/lib/net/ssh/proxy/command.rb:86:in `wait_writable'
/net-ssh/lib/net/ssh/proxy/command.rb:86:in `rescue in send'
/net-ssh/lib/net/ssh/proxy/command.rb:82:in `send'
/net-ssh/lib/net/ssh/buffered_io.rb:100:in `send_pending'
/net-ssh/lib/net/ssh/connection/session.rb:245:in `block in postprocess'
/net-ssh/lib/net/ssh/connection/session.rb:244:in `each'
/net-ssh/lib/net/ssh/connection/session.rb:244:in `postprocess'
/net-ssh/lib/net/ssh/connection/session.rb:212:in `process'
/net-ssh/lib/net/ssh/connection/session.rb:170:in `block in loop'
/net-ssh/lib/net/ssh/connection/session.rb:170:in `loop'
/net-ssh/lib/net/ssh/connection/session.rb:170:in `loop'
/net-ssh/lib/net/ssh/connection/channel.rb:269:in `wait'
/net-ssh/lib/net/ssh/connection/session.rb:365:in `exec!'
./test/integration/test_proxy.rb:76:in `block (3 levels) in test_with_rate_limit_and_spurious_wakeup'
./test/integration/test_proxy.rb:60:in `with_spurious_write_wakeup_emulate'
./test/integration/test_proxy.rb:75:in `block (2 levels) in test_with_rate_limit_and_spurious_wakeup'
/net-ssh/lib/net/ssh.rb:235:in `start'
./test/integration/test_proxy.rb:74:in `block in test_with_rate_limit_and_spurious_wakeup'
./test/integration/test_proxy.rb:29:in `block in setup_ssh_env'
/net-ssh/test/integration/common.rb:19:in `block in tmpdir'
/usr/local/rvm/rubies/ruby-2.0.0-p598/lib/ruby/2.0.0/tmpdir.rb:88:in `mktmpdir'
/net-ssh/test/integration/common.rb:18:in `tmpdir'
./test/integration/test_proxy.rb:24:in `setup_ssh_env'
./test/integration/test_proxy.rb:69:in `test_with_rate_limit_and_spurious_wakeup'
/usr/local/rvm/rubies/ruby-2.0.0-p598/lib/ruby/2.0.0/minitest/unit.rb:1301:in `run'
/usr/local/rvm/rubies/ruby-2.0.0-p598/lib/ruby/2.0.0/test/unit/testcase.rb:17:in `run'
/usr/local/rvm/rubies/ruby-2.0.0-p598/lib/ruby/2.0.0/minitest/unit.rb:919:in `block in _run_suite'
/usr/local/rvm/rubies/ruby-2.0.0-p598/lib/ruby/2.0.0/minitest/unit.rb:912:in `map'
/usr/local/rvm/rubies/ruby-2.0.0-p598/lib/ruby/2.0.0/minitest/unit.rb:912:in `_run_suite'
/usr/local/rvm/rubies/ruby-2.0.0-p598/lib/ruby/2.0.0/test/unit.rb:657:in `block in _run_suites'
/usr/local/rvm/rubies/ruby-2.0.0-p598/lib/ruby/2.0.0/test/unit.rb:655:in `each'
/usr/local/rvm/rubies/ruby-2.0.0-p598/lib/ruby/2.0.0/test/unit.rb:655:in `_run_suites'
/usr/local/rvm/rubies/ruby-2.0.0-p598/lib/ruby/2.0.0/minitest/unit.rb:867:in `_run_anything'
/usr/local/rvm/rubies/ruby-2.0.0-p598/lib/ruby/2.0.0/minitest/unit.rb:1060:in `run_tests'
/usr/local/rvm/rubies/ruby-2.0.0-p598/lib/ruby/2.0.0/minitest/unit.rb:1047:in `block in _run'
/usr/local/rvm/rubies/ruby-2.0.0-p598/lib/ruby/2.0.0/minitest/unit.rb:1046:in `each'
/usr/local/rvm/rubies/ruby-2.0.0-p598/lib/ruby/2.0.0/minitest/unit.rb:1046:in `_run'
/usr/local/rvm/rubies/ruby-2.0.0-p598/lib/ruby/2.0.0/minitest/unit.rb:1035:in `run'
/usr/local/rvm/rubies/ruby-2.0.0-p598/lib/ruby/2.0.0/test/unit.rb:21:in `run'
/usr/local/rvm/rubies/ruby-2.0.0-p598/lib/ruby/2.0.0/test/unit.rb:774:in `run'
/usr/local/rvm/rubies/ruby-2.0.0-p598/lib/ruby/2.0.0/test/unit.rb:366:in `block (2 levels) in autorun'
/usr/local/rvm/rubies/ruby-2.0.0-p598/lib/ruby/2.0.0/test/unit.rb:27:in `run_once'
/usr/local/rvm/rubies/ruby-2.0.0-p598/lib/ruby/2.0.0/test/unit.rb:365:in `block in autorun' So what happens is that we write a command and during the write of the command other end closes the connection because we've sent too much data already. With wait_writable i get a i'm a bit worried about such corner case differences between IO.select and wait_readable and wait_writable f = IO.popen("/bin/sh","r+")
f.close_write
f.wait_writable
IOError: not opened for writing
IO.select(nil,[f],nil,10)
# no error |
The way I see it, the io was closed by the server, so the client should fail. Then again, this is a unix pipe, right? Why ignore the error and retry to write? Is the socket going to reconnect under any circumstances? This might require further investigation, doesn't seem trivial. |
Another question: is this enabling an infinite loop in case of constant EINTR? |
It makes sense for wait_writable to return error if the other end closed. But our code not written to handle those exceptions so we'll need to handle those differences, which is not trivial.
i'm not sure i understand this question |
for what it's worth, I can't reproduce the issue with network sockets, it seems either like something specific to unix pipes or to the OS (EINTR is interrupted syscall, IO.select triggers a syscall, so if it was all 1-1, IO.select shouldn't work either). |
i've just(#334) updated scripts so it should work with ansible2 too. |
@mfazekas how familiar are you with ansible? I'm behind a proxy, and ansible, for some reason, doesn't inherit the environment, and I have to reset the proxy variables for it again. Currently:
is failing. You know how/were can I update the environment? I've seen in the documentation that I can set the "environment" in a role yml file, but I tried to place it in many places, and it's always wrong. |
According to this post you'll need to ignore certs but ansible 2.0 should consider proxy environmnts: |
I'm using ansible 2.0.1.0. I tried the |
Maybe not the right assumption, but it does fail there. here's the log:
for all it's worth, that URL redirects to another, but I assume that the underlying library takes care of that. |
So, finally managed to come back to ansible. I've reverted that particular usage of wait_writable for the popen call, at least until it gets fixed in upstream. |
something's wrong with the travis builds |
@mfazekas can you have a look ? I've also added an option for socket class factory, which will hopefully solve another issue with net-ssh. |
Sorry i have limited time these days:
|
Not only did I open the PR mentioned above, I've also added an issue to extend the |
…ed since v3, according to documentation)
…able methods provided by the io/wait library; the biggest advantage of this strategy is less array allocations, as one was allocating arrays everytime for calling select, thereby increasing pressure on the GC; also, this way is much more readable, and has the nice advantage of playing nicely with libraries with its own event loop like celluloid-io
… is, one iterates once on the listeners; wait on write event and process, wait on read event and process; if none happened in the loop, allow the session to rekey; this approach reduces the number of loops, O(n) instead of O(2n), reduces allocations (select creates array, select call creates more arrays, Array() conversion takes care of creating an array for when select returns nil
…lect timed out, that is, if any of the waits was unsuccessful
…nt if select timed out, that is, if any of the waits was unsuccessful" This reverts commit 0a5faa4.
…so, plan is, one iterates once on the listeners; wait on write event and process, wait on read event and process; if none happened in the loop, allow the session to rekey; this approach reduces the number of loops, O(n) instead of O(2n), reduces allocations (select creates array, select call creates more arrays, Array() conversion takes care of creating an array for when select returns nil" This reverts commit 9883d44.
…e is a bug in ruby which prevents this from working
Using the
IO#wait_readable
andIO#wait_writable
methods for non-blocking exception handling.Advantages:
All but one usages of IO.select have been replaced. Would very much like to discuss a workaround for the remaining one.