Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HAPServer doesn't close sockets and eventually hits OS limit #145

Closed
undersun81 opened this issue Aug 20, 2018 · 11 comments
Closed

HAPServer doesn't close sockets and eventually hits OS limit #145

undersun81 opened this issue Aug 20, 2018 · 11 comments

Comments

@undersun81
Copy link

undersun81 commented Aug 20, 2018

After running hap for several days my system hit the limit of listen sockets (128 default on raspbian). Most of them in CLOSE_WAIT, so hap-client sent FIN but hap-server code didn't handle it correctly (it suppose to close socket).

>netstat -an | egrep 51826.+CLOSE_WAIT | wc -l
122

Briefly looked into hap_server.py, can't see how is this situation handled

@ikalchev
Copy link
Owner

Hi and thanks for the report. I will take a look a this in the following days

@ikalchev
Copy link
Owner

ikalchev commented Nov 6, 2018

I confirm that I see the issue as well. However, I don't know how this happens yet - I suspect the handler doesn't finish its requests properly, but we will see.

@ikalchev
Copy link
Owner

ikalchev commented Nov 6, 2018

More on that: I have 36 threads running. There is one for the server, one for the event thread, 3 for accessories and whatever asyncio has (say 10-15 that actually do something). Each handler spawns another thread and stays alive until the socket is closed.

So probably the handlers are stuck waiting on a closed socket.

@ikalchev
Copy link
Owner

ikalchev commented Nov 7, 2018

Narrowing down: It appears that a HAP socket thinks its connected and is blocked on recv (see Frame 12 at the bottom of the snippet)

(gdb) info threads
  Id   Target Id         Frame 
  36   Thread 0x75b4d460 (LWP 934) "/usr/local/bin/" 0x76de0964 in select () at ../sysdeps/unix/syscall-template.S:81
  35   Thread 0x751ff460 (LWP 935) "/usr/local/bin/" 0x76f05c50 in do_futex_timed_wait (isem=isem@entry=0x748006b0, rt=rt@entry=0x751fe7a0)
    at ../nptl/sysdeps/unix/sysv/linux/sem_timedwait.c:41
  34   Thread 0x747a9460 (LWP 938) "/usr/local/bin/" 0x76f070c0 in recv () at ../sysdeps/unix/syscall-template.S:81
  33   Thread 0x73dff460 (LWP 939) "/usr/local/bin/" 0x76f05a40 in do_futex_wait (isem=isem@entry=0x73401110) at ../nptl/sysdeps/unix/sysv/linux/sem_wait.c:48
  32   Thread 0x733ff460 (LWP 940) "/usr/local/bin/" 0x76f05a40 in do_futex_wait (isem=isem@entry=0x735006b0) at ../nptl/sysdeps/unix/sysv/linux/sem_wait.c:48
  31   Thread 0x72bff460 (LWP 941) "/usr/local/bin/" 0x76dddb80 in poll () at ../sysdeps/unix/syscall-template.S:81
  30   Thread 0x721ff460 (LWP 949) "/usr/local/bin/" 0x76f05a40 in do_futex_wait (isem=isem@entry=0x723006b0) at ../nptl/sysdeps/unix/sysv/linux/sem_wait.c:48
  29   Thread 0x711ff460 (LWP 980) "/usr/local/bin/" 0x76f05a40 in do_futex_wait (isem=isem@entry=0x72300b70) at ../nptl/sysdeps/unix/sysv/linux/sem_wait.c:48
  28   Thread 0x709ff460 (LWP 981) "/usr/local/bin/" 0x76f070c0 in recv () at ../sysdeps/unix/syscall-template.S:81
  27   Thread 0x701ff460 (LWP 993) "/usr/local/bin/" 0x76f05a40 in do_futex_wait (isem=isem@entry=0x73500910) at ../nptl/sysdeps/unix/sysv/linux/sem_wait.c:48
  26   Thread 0x6f9ff460 (LWP 1019) "/usr/local/bin/" 0x76f05a40 in do_futex_wait (isem=isem@entry=0x73401370) at ../nptl/sysdeps/unix/sysv/linux/sem_wait.c:48
  25   Thread 0x6f1ff460 (LWP 1025) "/usr/local/bin/" 0x76f05a40 in do_futex_wait (isem=isem@entry=0x73e00b48) at ../nptl/sysdeps/unix/sysv/linux/sem_wait.c:48
  24   Thread 0x6e9ff460 (LWP 1031) "/usr/local/bin/" 0x76f05a40 in do_futex_wait (isem=isem@entry=0x74800910) at ../nptl/sysdeps/unix/sysv/linux/sem_wait.c:48
  23   Thread 0x6e1ff460 (LWP 1039) "/usr/local/bin/" 0x76f05a40 in do_futex_wait (isem=isem@entry=0x752006c8) at ../nptl/sysdeps/unix/sysv/linux/sem_wait.c:48
  22   Thread 0x6d9ff460 (LWP 1045) "/usr/local/bin/" 0x76f05a40 in do_futex_wait (isem=isem@entry=0x1a39720) at ../nptl/sysdeps/unix/sysv/linux/sem_wait.c:48
  21   Thread 0x6d1ff460 (LWP 1052) "/usr/local/bin/" 0x76f05a40 in do_futex_wait (isem=isem@entry=0x72300b58) at ../nptl/sysdeps/unix/sysv/linux/sem_wait.c:48
  20   Thread 0x6c9ff460 (LWP 1058) "/usr/local/bin/" 0x76f05a40 in do_futex_wait (isem=isem@entry=0x722055f8) at ../nptl/sysdeps/unix/sysv/linux/sem_wait.c:48
  19   Thread 0x6c1ff460 (LWP 1064) "/usr/local/bin/" 0x76f05a40 in do_futex_wait (isem=isem@entry=0x73500b70) at ../nptl/sysdeps/unix/sysv/linux/sem_wait.c:48
  18   Thread 0x6b9ff460 (LWP 1071) "/usr/local/bin/" 0x76f05a40 in do_futex_wait (isem=isem@entry=0x734015d0) at ../nptl/sysdeps/unix/sysv/linux/sem_wait.c:48
  17   Thread 0x6b1ff460 (LWP 1079) "/usr/local/bin/" 0x76f05a40 in do_futex_wait (isem=isem@entry=0x73e00ff8) at ../nptl/sysdeps/unix/sysv/linux/sem_wait.c:48
  16   Thread 0x6a9ff460 (LWP 1105) "/usr/local/bin/" 0x76f05a40 in do_futex_wait (isem=isem@entry=0x74800b70) at ../nptl/sysdeps/unix/sysv/linux/sem_wait.c:48
  15   Thread 0x6a1ff460 (LWP 1113) "/usr/local/bin/" 0x76f05a40 in do_futex_wait (isem=isem@entry=0x75200db0) at ../nptl/sysdeps/unix/sysv/linux/sem_wait.c:48
  14   Thread 0x699ff460 (LWP 1120) "/usr/local/bin/" 0x76f05a40 in do_futex_wait (isem=isem@entry=0x1a3be20) at ../nptl/sysdeps/unix/sysv/linux/sem_wait.c:48
  13   Thread 0x691ff460 (LWP 1130) "/usr/local/bin/" 0x76f05a40 in do_futex_wait (isem=isem@entry=0x72300dd0) at ../nptl/sysdeps/unix/sysv/linux/sem_wait.c:48
  12   Thread 0x689ff460 (LWP 1136) "/usr/local/bin/" 0x76f05a40 in do_futex_wait (isem=isem@entry=0x72205858) at ../nptl/sysdeps/unix/sysv/linux/sem_wait.c:48
  11   Thread 0x681ff460 (LWP 1142) "/usr/local/bin/" 0x76f05a40 in do_futex_wait (isem=isem@entry=0x73500dd0) at ../nptl/sysdeps/unix/sysv/linux/sem_wait.c:48
  10   Thread 0x719ff460 (LWP 5720) "/usr/local/bin/" 0x76f070c0 in recv () at ../sysdeps/unix/syscall-template.S:81
  9    Thread 0x679ff460 (LWP 7515) "/usr/local/bin/" 0x76f070c0 in recv () at ../sysdeps/unix/syscall-template.S:81
  8    Thread 0x671ff460 (LWP 8288) "/usr/local/bin/" 0x76f070c0 in recv () at ../sysdeps/unix/syscall-template.S:81
  7    Thread 0x661ff460 (LWP 11440) "/usr/local/bin/" 0x76f070c0 in recv () at ../sysdeps/unix/syscall-template.S:81
  6    Thread 0x659ff460 (LWP 17585) "/usr/local/bin/" 0x76f070c0 in recv () at ../sysdeps/unix/syscall-template.S:81
  5    Thread 0x669ff460 (LWP 23268) "/usr/local/bin/" 0x76f070c0 in recv () at ../sysdeps/unix/syscall-template.S:81
  4    Thread 0x649ff460 (LWP 30521) "/usr/local/bin/" 0x76f070c0 in recv () at ../sysdeps/unix/syscall-template.S:81
  3    Thread 0x651ff460 (LWP 2295) "/usr/local/bin/" 0x76f070c0 in recv () at ../sysdeps/unix/syscall-template.S:81
  2    Thread 0x641ff460 (LWP 10204) "/usr/local/bin/" 0x76f070c0 in recv () at ../sysdeps/unix/syscall-template.S:81
* 1    Thread 0x76f73000 (LWP 739) "/usr/local/bin/" 0x76de7c14 in epoll_wait () at ../sysdeps/unix/syscall-template.S:81
(gdb) thread 2
[Switching to thread 2 (Thread 0x641ff460 (LWP 10204))]
#0  0x76f070c0 in recv () at ../sysdeps/unix/syscall-template.S:81
81	../sysdeps/unix/syscall-template.S: No such file or directory.
(gdb) py-bt
Traceback (most recent call first):
  <built-in method recv of HAPSocket object at remote 0x73f89e88>
  File "/usr/local/lib/python3.6/site-packages/pyhap/hap_server.py", line 658, in recv
    block_length_bytes = socket.socket.recv(self, self.LENGTH_LENGTH)
  File "/usr/local/lib/python3.6/site-packages/pyhap/hap_server.py", line 635, in recv_into
    data = self.recv(nbytes, flags)
  File "/usr/local/lib/python3.6/socket.py", line 586, in readinto
    return self._sock.recv_into(b)
  <built-in method readline of _io.BufferedReader object at remote 0x73fa6938>
  File "/usr/local/lib/python3.6/http/server.py", line 386, in handle_one_request
    self.raw_requestline = self.rfile.readline(65537)
  File "/usr/local/lib/python3.6/http/server.py", line 420, in handle
    self.handle_one_request()
  File "/usr/local/lib/python3.6/socketserver.py", line 696, in __init__
    self.handle()
  File "/usr/local/lib/python3.6/site-packages/pyhap/hap_server.py", line 141, in __init__
    super(HAPServerHandler, self).__init__(sock, client_addr, server)
  File "/usr/local/lib/python3.6/site-packages/pyhap/hap_server.py", line 800, in finish_request
    self.RequestHandlerClass(sock, client_addr, self, self.accessory_handler)
  File "/usr/local/lib/python3.6/socketserver.py", line 639, in process_request_thread
    self.finish_request(request, client_address)
  File "/usr/local/lib/python3.6/threading.py", line 864, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/local/lib/python3.6/threading.py", line 916, in _bootstrap_inner
    self.run()
  File "/usr/local/lib/python3.6/threading.py", line 884, in _bootstrap
    self._bootstrap_inner()
(gdb) py-up
#12 Frame 0x73fa7558, for file /usr/local/lib/python3.6/site-packages/pyhap/hap_server.py, line 658, in recv (self=<HAPSocket(_connected=True, shared_key=b'^\xe2\x84`n\\5V\x8fZ\x86\xe4\xf2E\xa2\n~\xd8>\x8b\xa4\x0b\xb7\xc6\xb4\xf5)\xd6\x8f\xa4\x12\x16', out_count=7, in_count=1, out_cipher=<CHACHA20_POLY1305(isBlockCipher=False, isAEAD=True, nonceLength=12, tagLength=16, implementation='python', name='chacha20-poly1305', key=b'\xf2FZ\xbfy\x7fT"\xe0\xedf\x10\xe3[&&dc\x1f~\xb0rq\xd6<\xc6\xfb\xc1h\xb2\x0bU') at remote 0x73f2c5f0>, in_cipher=<CHACHA20_POLY1305(isBlockCipher=False, isAEAD=True, nonceLength=12, tagLength=16, implementation='python', name='chacha20-poly1305', key=b"\x9b\xeb\\G#F|\xe9\xa5'\xa5F/\xb1\xc5\x8f\x0f\xc9\x0ew\x06\xe93V\x10\x88Q\x1e\x07\xaa\xd8\t") at remote 0x73f2cef0>, out_lock=<_thread.RLock at remote 0x73f95590>, curr_in_total=74, num_in_recv=74, curr_in_block=None) at remote 0x73f89bb8>, buflen=1042, flags=0, result=b'')
    block_length_bytes = socket.socket.recv(self, self.LENGTH_LENGTH)

@ikalchev
Copy link
Owner

ikalchev commented Nov 9, 2018

I think I got it - The HAPSocket inherits and is initialised from a socket to implement the custom HAP crypto. During initialisation something is not copied properly as the socket is both in CLOSE_WAIT and blocked in recv. I don't know the exact issue though.

So instead of subclassing I went another way (a Proxy pattern) and the issue is gone now. I will submit the fix tomorrow.

@ikalchev
Copy link
Owner

1549717

@ikalchev
Copy link
Owner

Not fully fixed - makefile method is not patched correctly and messes up the states again (the issue is different now, but related).

I hope to fix it soon.

@ikalchev ikalchev reopened this Nov 11, 2018
@ikalchev
Copy link
Owner

Apparently the object created by makefile (SocketIO) uses a member of the socket - _io_refs, directly. Since it assigns to it, the call does not go through __getattr__ and ends up in HAPSocket instead of in the socket.

I currently fixed this by defining the same property in HAPSocket and proxying all calls to it, but I will ry to unit test this so we really know we are done here.

ikalchev added a commit that referenced this issue Nov 11, 2018
Apparently the object created by makefile (SocketIO) uses a member of the
socket - _io_refs, directly. Since it assigns to it, the call does not go
through __getattr__ and ends up in HAPSocket instead.

This change also adds tests which cover the above case.
@ikalchev
Copy link
Owner

Added a simple test to check for the issue. Also had the fix running for 1 hour with 2 iOS clients.

@alessandrocolos
Copy link

still having the issue of Homekit intermittent problem with 'No response' on IOS devices :-(
Running one raspberry pi 3, hassio 0.88.1 (latest)

@undersun81
Copy link
Author

undersun81 commented Feb 23, 2019 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants