Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

found a hang bug in poor network condition #793

Closed
boywhp opened this issue Aug 2, 2016 · 2 comments
Closed

found a hang bug in poor network condition #793

boywhp opened this issue Aug 2, 2016 · 2 comments

Comments

@boywhp
Copy link

boywhp commented Aug 2, 2016

I found a hang bug when i used paramiko lib (release paramiko-2.0.2) in some poor network . here is the hang stack in eclipse:

    thread0 - pid2994_seq5  
        wait_for_response [auth_handler.py:193] 
        auth_password [transport.py:1269]   
        _auth [client.py:590]   
        connect [client.py:380] 
        handle_task [ssh.py:84] 
        __worker_thread [engine.py:507] 
        run [threading.py:763]  
        __bootstrap_inner [threading.py:810]    
        __bootstrap [threading.py:783]  
    Thread-218 - pid2994_seq155 
        read_all [packet.py:254]    
        read_message [packet.py:391]    
        run [transport.py:1760] 
        __bootstrap_inner [threading.py:810]    
        __bootstrap [threading.py:783]  

I check paramiko codes and found paramiko used socket.settimeout to prevent read hang,
But actually socket.settimeout will only do effect in send, in some poor network sock.recv will hang for ever!!!
So I did a temp patch on paramiko-2.0.2:

diff -uN paramiko-2.0.2/client.py paramiko/client.py
--- paramiko-2.0.2/client.py    2016-07-26 12:11:12.000000000 +0800
+++ paramiko/client.py  2016-08-02 11:26:04.082162387 +0800
@@ -335,7 +335,7 @@
             t.set_log_channel(self._log_channel)
         if banner_timeout is not None:
             t.banner_timeout = banner_timeout
-        t.start_client()
+        t.start_client(timeout=timeout)
         ResourceManager.register(self, t)

         server_key = t.get_remote_server_key()
diff -uN paramiko-2.0.2/packet.py paramiko/packet.py
--- paramiko-2.0.2/packet.py    2016-07-26 12:11:12.000000000 +0800
+++ paramiko/packet.py  2016-08-02 16:55:07.020972741 +0800
@@ -22,7 +22,7 @@

 import errno
 import os
-import socket
+import socket,select
 import struct
 import threading
 import time
@@ -102,7 +102,7 @@
         self.__timer = None
         self.__handshake_complete = False
         self.__timer_expired = False
-
+        self.recv_timeout = 5
     @property
     def closed(self):
         return self.__closed
@@ -230,6 +230,16 @@
             self.__timer_expired = False
             self.__handshake_complete = True

+    def recv_without_hang(self, n):
+        '''
+        prevent hang on socket recv! patched by boywhp@gmail.com
+        '''
+        ready = select.select([self.__socket], [], [], self.recv_timeout)
+        if ready[0]:
+            return self.__socket.recv(n)
+        
+        return ''
+    
     def read_all(self, n, check_rekey=False):
         """
         Read as close to N bytes as possible, blocking as long as necessary.
@@ -251,7 +261,7 @@
             if self.handshake_timed_out():
                 raise EOFError()
             try:
-                x = self.__socket.recv(n)
+                x = self.recv_without_hang(n) #self.__socket.recv(n)
                 if len(x) == 0:
                     raise EOFError()
                 out += x
@@ -482,7 +492,7 @@
         start = time.time()
         while True:
             try:
-                x = self.__socket.recv(128)
+                x = self.recv_without_hang(128) #self.__socket.recv(128)
                 if len(x) == 0:
                     raise EOFError()
                 break
diff -uN paramiko-2.0.2/transport.py paramiko/transport.py
--- paramiko-2.0.2/transport.py 2016-07-26 12:11:12.00000   Thread-218 - pid2994_seq155 
        read_all [packet.py:254]    
        read_message [packet.py:391]    
        run [transport.py:1760] 
        __bootstrap_inner [threading.py:810]    
        __bootstrap [threading.py:783]  ```
0000 +0800
+++ paramiko/transport.py   2016-08-02 16:48:28.160962185 +0800
@@ -444,7 +444,7 @@
         # We need the FQDN to get this working with SSPI
         self.gss_host = socket.getfqdn(gss_host)

-    def start_client(self, event=None):
+    def start_client(self, event=None, timeout=None):
         """
         Negotiate a new SSH2 session as a client.  This is the first step after
         creating a new `.Transport`.  A separate thread is created for protocol
@@ -472,6 +472,9 @@
         :param .threading.Event event:
             an event to trigger when negotiation is complete (optional)

+        :param float timeout:
+            a timeout, in seconds, for SSH2 session negotiation (optional)
+
         :raises SSHException: if negotiation fails (and no ``event`` was passed
             in)
         """
@@ -481,10 +484,12 @@
             self.completion_event = event
             self.start()
             return
-
+        
+        if timeout is not None: self.packetizer.recv_timeout = timeout
         # synchronous, wait for a result
         self.completion_event = event = threading.Event()
         self.start()
+        max_time = time.time() + timeout if timeout is not None else None
         while True:
             event.wait(0.1)
             if not self.active:
@@ -492,7 +497,7 @@
                 if e is not None:
                     raise e
                 raise SSHException('Negotiation failed.')
-            if event.is_set():
+            if event.is_set() or (timeout is not None and time.time() >= max_time):
                 break

     def start_server(self, event=None, server=None):
@cool-RR
Copy link

cool-RR commented Dec 7, 2016

I'm also experiencing a hang with version 2.0.2 and I suspect it's the same issue. Using the debugger, it seems like my code is in an infinite loop in Packetizer.read_all, it gets a timeout from the socket, sets got_timeout = True but then the while loop keeps repeating forever.

@bitprophet
Copy link
Member

This sounds like it's #520 going by symptom & location of apparent hang. There's a partial fix or two coming out in 2.0.3 which should be released soon. Please follow #520 for more, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants