Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Python 3.7 Support & Task Queue Refactoring (again) #86

Merged
merged 47 commits into from
Dec 18, 2018
Merged

Conversation

lgrahl
Copy link
Member

@lgrahl lgrahl commented Aug 29, 2018

Changes:

  • Adds support for Python 3.7
  • Dependencies have been upgraded (most importantly websockets to version 6.0).
  • The task queue has been refactored (once again) because it was still a pain. It should actually work as intended now. And as a nice side effect, it's also less awkward to work with.
  • Adds support for the timeout close code (3008), see Add timeout close code saltyrtc-meta#150

To do:

  • Evaluate whether to use asyncio.gather instead of asyncio.wait in ServerProtocol.handle_client. Decision: No since asyncio.wait allows to provide more details in the log.
  • Sanity-check all occurrences of asyncio.gather, asyncio.wait and asyncio.wait_for (regarding cancellation behaviour). Update: Moved into Revise cancellation of futures #87.
  • Run client tests
    • JS client (including slow tests)
      • Two errors, two warnings, one exception (when closing the server) left
      • Add tests for newly found errors
    • Java client
    • Rust client Update: Omitted for now since it doesn't build on ArchLinux with OpenSSL 1.1.1a
  • Improve code coverage of this PR
    • Add a test for a misbehaving task (blocking the join_task_queue)
  • Evaluate whether this should be a major release or a minor one (probably major due to the change to the close code in case of a ping timeout). Decision: Major.
  • Wait for Add timeout close code saltyrtc-meta#150 to be reviewed
  • Wait for release of websockets 7.0

After merge, prepare a release.

Resolves #70. Resolves #79. Resolves #80. Resolves #81. Resolves #82. Resolves #83. Resolves #91.

Fix indentation
Fix typo
Fix relay receiver connection lost test (used a future in two places which had side effects when cancelled)
Fix sleep calls in tests (the event loop was missing)
…ain)

Expose keyword arguments of `websockets.serve`, resolves #83
Remove subprotocol selection function workaround
Use InternalError instead of SignalingError where appropriate

Add internal _TaskQueueState enum
Mark a task as 'done' when it has been handled
A task queue cannot be closed any longer, only cancelled
When a task queue is cancelled, all enqueued tasks are also cancelled
Add task queue join method which blocks until all enqueued tasks (including the currently active task) have been handled
Try to join the task queue when closing a client's connection
Fix close the connection as early as possible after the client's handler returned
@codecov-io
Copy link

codecov-io commented Aug 30, 2018

Codecov Report

❗ No coverage uploaded for pull request base (master@43c1215). Click here to learn what that means.
The diff coverage is 81.55%.

Impacted file tree graph

@@            Coverage Diff            @@
##             master      #86   +/-   ##
=========================================
  Coverage          ?   77.48%           
=========================================
  Files             ?        9           
  Lines             ?     1528           
  Branches          ?      190           
=========================================
  Hits              ?     1184           
  Misses            ?      281           
  Partials          ?       63
Impacted Files Coverage Δ
saltyrtc/server/bin.py 33.33% <ø> (ø)
saltyrtc/server/common.py 72.46% <100%> (ø)
saltyrtc/server/message.py 75.15% <33.33%> (ø)
saltyrtc/server/server.py 84.93% <81.32%> (ø)
saltyrtc/server/protocol.py 87.73% <81.81%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 43c1215...2225967. Read the comment docs.

@lgrahl lgrahl force-pushed the py37-support branch 2 times, most recently from fc06571 to c8ea6a8 Compare August 30, 2018 12:49
The CLI tests also fail when running Python 3.4 locally. But the error
that's being spit out makes absolutely zero sense. Since the CLI works
perfectly fine in 3.4 when running it by hand and it only happens
in 3.4, let's ignore this for now.
Copy link
Member

@dbrgn dbrgn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't really review the refactoring since I'm not well-versed with asyncio, but I looked through the code and it looks clean!

@lgrahl
Copy link
Member Author

lgrahl commented Sep 3, 2018

Sadly, there are still a bunch of errors when running the client tests. I want to finally track those down or we'll end up with yet another refactoring.

Dead clients always have an associated protocol instance. That instance
should eventually take care of cleaning up the connection. However, with
the approach prior to this change, a connection may become closed before
the protocol instance realises it. Thus, the protocol instance would
never have the chance to clean up after itself. This might have been the
cause of memory leaks (#52).
Move event tests into the TestServer class
Add missing docstring to tests
Fix only cancel if the coroutine turns out to be a future
Close (actual) coroutines in the task queue
Multiple tasks can be resolved with an exception at once. The previous
code did handle only one exception, resulting in errors bubbling up
Add possibility to access log records that were recorded during a test
Raise an exception in case an error has been logged in any test
Add a filter for specific log messages that will not raise an exception
Add a test that checks that the server handles a disconnect correctly
when sending a ping.
Add another test that does the same when waiting for a pong.
Add a test that ensures a disconnect during receive is handled correctly
Add a test that ensures a disconnect during send is handled correctly
Add a test that ensures a disconnect while handling a task is handled correctly
@lgrahl
Copy link
Member Author

lgrahl commented Sep 26, 2018

New error to investigate when running the saltyrtc-client-js tests:

[2018-09-26 09:43:57.194947] ERROR: websockets.server: Error in connection handler
Traceback (most recent call last):
  File "/saltyrtc-server-python37/lib/python3.7/site-packages/websockets/server.py", line 164, in handler
    yield from self.ws_handler(self, path)
  File "/saltyrtc/saltyrtc-server-python/saltyrtc/server/server.py", line 832, in handler
    yield from protocol.handler_task
  File "/saltyrtc/saltyrtc-server-python/saltyrtc/server/server.py", line 285, in handler
    yield from asyncio.gather(*coroutines, loop=self._loop)
  File "/saltyrtc/saltyrtc-server-python/saltyrtc/server/protocol.py", line 513, in enqueue_task
    self._task_queue_state.name))
saltyrtc.server.exception.InternalError: Task queue is already cancelled

And when stopping the server:

Traceback (most recent call last):
  File "/saltyrtc-server-python37/bin/saltyrtc-server", line 11, in <module>
    load_entry_point('saltyrtc.server', 'console_scripts', 'saltyrtc-server')()
  File "/saltyrtc/saltyrtc-server-python/saltyrtc/server/bin.py", line 267, in main
    cli(obj=obj)
  File "/saltyrtc-server-python37/lib/python3.7/site-packages/click/core.py", line 722, in __call__
    return self.main(*args, **kwargs)
  File "/saltyrtc-server-python37/lib/python3.7/site-packages/click/core.py", line 697, in main
    rv = self.invoke(ctx)
  File "/saltyrtc-server-python37/lib/python3.7/site-packages/click/core.py", line 1066, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/saltyrtc-server-python37/lib/python3.7/site-packages/click/core.py", line 895, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/saltyrtc-server-python37/lib/python3.7/site-packages/click/core.py", line 535, in invoke
    return callback(*args, **kwargs)
  File "/saltyrtc-server-python37/lib/python3.7/site-packages/click/decorators.py", line 17, in new_func
    return f(get_current_context(), *args, **kwargs)
  File "/saltyrtc/saltyrtc-server-python/saltyrtc/server/bin.py", line 252, in serve
    loop.run_until_complete(server_.wait_closed())
  File "/usr/lib64/python3.7/asyncio/base_events.py", line 568, in run_until_complete
    return future.result()
  File "/saltyrtc/saltyrtc-server-python/saltyrtc/server/server.py", line 866, in wait_closed
    yield from self._wait_connections_closed()
  File "/saltyrtc/saltyrtc-server-python/saltyrtc/server/server.py", line 876, in _wait_connections_closed
    yield from asyncio.gather(*tasks, loop=self._loop)
  File "/saltyrtc-server-python37/lib/python3.7/site-packages/websockets/server.py", line 164, in handler
    yield from self.ws_handler(self, path)
  File "/saltyrtc/saltyrtc-server-python/saltyrtc/server/server.py", line 832, in handler
    yield from protocol.handler_task
  File "/saltyrtc/saltyrtc-server-python/saltyrtc/server/server.py", line 285, in handler
    yield from asyncio.gather(*coroutines, loop=self._loop)
  File "/saltyrtc/saltyrtc-server-python/saltyrtc/server/protocol.py", line 513, in enqueue_task
    self._task_queue_state.name))
saltyrtc.server.exception.InternalError: Task queue is already cancelled

Fixed by ededc4f and ce17f2b

Add error when enqueuing a task in an already cancelled task queue
Few logging improvements
This prevents a race condition that happens when client A has started
to disconnect but client B was still able to start the enqueue
procedure. This can happen since the enqueue procedure itself is async
and not sync.
This prevents yet another race condition that can happen when the
exception from ServerProtocol.handle_client is being moved back into
the event loop to be caught by ServerProtocol.handler. Another client
would be able to pick up the PathClient instance and enqueue
further tasks.
@lgrahl lgrahl mentioned this pull request Oct 25, 2018
This ensures that no further task enqueues can be made when the
connection is about to be closed (triggered by another client or by the
server itself)
We do our own keep-alive pinging already and the currently exposed API
of websockets is too inflexible for this purpose.
Formerly, all uncancelled tasks have been treated as if they are done.
This resulted in exceptions when trying to fetch the result (or
exception) from the task for logging purposes.

Now, all tasks are cancelled if they are not already cancelled or not
done.

Tests have been added for cancelling active and queued tasks as well as
coroutines since this wasn't adequately tested previously.
Add a test for relayed messages enqueued towards the responder before
the initiator dropped the responder.

Add a test for relayed messages enqueued towards the first initiator
before the second initiator dropped the first initiator.
Formerly, it was possible that a client enqueues a message towards
another client that had already been queued to be dropped.

Add a test for a relayed message enqueued towards the responder after
the initiator dropped the responder.
Formerly, it was possible that a client, who is currently in the
process of being dropped, could still forward messages to other clients.

In order to fix this, we have to store the running task loops on the
client, so the receive (and forward) loop is being cancelled once the
client is being dropped.
Add a test for dropping a client which should not trigger a 'disconnected' message
Replace `authenticated` flag with `ClientState` on `PathClient`
Slightly more robust interaction between `Path` and `PathClient`
More robust interaction between `Server` and `ServerProtocol`
Update tests
@lgrahl lgrahl merged commit f9ed942 into master Dec 18, 2018
@lgrahl lgrahl deleted the py37-support branch December 18, 2018 15:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants