Skip to content

Conversation

@martindurant
Copy link
Member

Could be used, for example, with python logging-to-socket

@martindurant
Copy link
Member Author

I would also like to make a HTTP server source, i.e., one event for each incoming GET, say, containing the URL parameters and data. How, though, do you tell a tornado Application or Server to start on a given event loop, or otherwise get it to run in the right place?

@CJ-Wright
Copy link
Member

I'm not certain, I'd like this as well, since I've been looking at making a ZMQ source.

@martindurant
Copy link
Member Author

Of course, this fails on kafka, as all the PRs seem to now. If #226 is to be merged, should we simply drop the unbatched kafka sources, because they cause us so much trouble? or, @skmatti , might your reuse of confluent objects help those too?

@martindurant
Copy link
Member Author

@CJ-Wright OK, solved the tornado server too.

@martindurant
Copy link
Member Author

@mrocklin , can you review, and we should come to an agreement on how to move ahead regarding the stalling kafka tests.

@skmatti
Copy link
Contributor

skmatti commented Mar 13, 2019

I think unbatched kafka source failures could be fixed by this commit from #216.

@martindurant martindurant mentioned this pull request Mar 13, 2019
Copy link
Collaborator

@mrocklin mrocklin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some comments/questions

time.sleep(0.02)
assert l == []
sock.send(b'\n')
time.sleep(0.02)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've found that fixed time sleeps like this can be brittle, especially on travis-ci, where a stray GC call may take more than 20ms (or a full second if you're unlucky).

Typically rather than sleeps I do something like

start = time.time()
while not l:
    time.sleep(0.01 )
    assert time.time() < start + 2

Or, even better, you could find some way to trigger when things show up.

sock2.close()
sock.close()

assert l == [b'data\n', b'data\n', b'data2\n']
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be nice also to have an async test, if you're comfortable with the async API

self.sock.bind(("", self.port))
self.sock.listen(128)
self.loop.add_handler(self.sock.fileno(), self.connection_ready,
self.loop.READ)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm somewhat surprised to see the use of raw sockets here. I'm curious, was there a reason to go this route rather than create a Tornado IOStream here directly? I see that you use them later on.

I don't know a ton about managing nonblocking sockets manually, I'm probably just missing something.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code is adapted from the tornado IOLoop documentation: as far as I understand it, you do not have any IOStream until you accept an incoming connection on the open socket.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, maybe the right abstraction here then is a Tornado TCPServer?

I didn't remove the socket server verion yet, so that thhe two could be
compared
@martindurant
Copy link
Member Author

@mrocklin , I added a tornado.TCPServer version of the same. I could just remove the direct socket version, but I am not convinced that either is better than the other.

@codecov-io
Copy link

codecov-io commented Mar 20, 2019

Codecov Report

Merging #227 into master will increase coverage by 0.14%.
The diff coverage is 96.92%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #227      +/-   ##
==========================================
+ Coverage   93.38%   93.53%   +0.14%     
==========================================
  Files          13       13              
  Lines        1481     1546      +65     
==========================================
+ Hits         1383     1446      +63     
- Misses         98      100       +2
Impacted Files Coverage Δ
streamz/sources.py 95.31% <96.92%> (+0.54%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 090b9ba...f453a96. Read the comment docs.

@mrocklin
Copy link
Collaborator

I could just remove the direct socket version, but I am not convinced that either is better than the other

Yeah, I don't know either. I don't know non-blocking sockets that well. I tend to just let Tornado/asyncio handle it.

@martindurant
Copy link
Member Author

I tend to just let Tornado/asyncio handle it.

I strongly suspect that the two forms are actually identical, so it might just be a matter of choice - or leave both.

@mrocklin
Copy link
Collaborator

I strongly suspect that the two forms are actually identical, so it might just be a matter of choice - or leave both.

I recommend that we stick with TCPServer if that's ok.

Any comments on the async test and timing questions above?

@martindurant
Copy link
Member Author

Any comments on the async test and timing questions above?

Sorry, missed that. I added an async test, and changed sleeps to (a)wait_for.

Copy link
Collaborator

@mrocklin mrocklin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this. A few minor comments/requests. Hopefully nothing major though.

"""Shutdown HTTP server"""
if not self.stopped:
self.server.stop()
self.server = None
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You remove the server here, but not in the TCP solution. Is there a reason?

application = Application([
(self.path, Handler),
])
self.server = HTTPServer(application)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can imagine wanting to pass in keyword arguments to the constructor and have them percolate to here (same with TCPServer) . I'm not sure though.

finally:
s.stop()

wait_for(lambda: out == [b'data\n', b'data\n', b'data2\n'], 2, period=0.01)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to clean up the sockets here? Does this leak file descriptors?

s = Source.from_http_server(port)
out = s.sink_to_list()
s.start()
time.sleep(0.02) # allow loop to run
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this necessary? If so, why this time?

In general, I'll probably challenge any sleep like this. If it's necessary to make things run then it's likely too short in all cases (GC on travis can take arbitrarily long times) and so results in intermittent failures. I spend a non-trivial time tracking these things down in the dask/distributed codebase, so I've grown fairly allergic to them.

- Removed sleeps
- Closed sockets at test end
- Allowed for kwargs to pass to tornado servers (not tested - don't know
  what parameters may be reasonable here)
- remove tcp server instance on stop; not necessary, but reasonable
def __init__(self, port, path='/.*', start=False, server_kwargs=None):
self.port = port
self.path = path
self.server_kwargs = server_kwargs or {}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In one case you use tcp_kwargs and in the other server_kwargs. Maybe use server_kwargs in both places for consistency of the API?

@mrocklin mrocklin merged commit 0b93c94 into python-streamz:master Mar 21, 2019
@mrocklin
Copy link
Collaborator

Thanks for this @martindurant . Should be fun to play with !

@martindurant martindurant deleted the sockets branch March 21, 2019 23:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants