-
Notifications
You must be signed in to change notification settings - Fork 151
Kafka fixes #230
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Kafka fixes #230
Changes from all commits
80efa35
b635167
893300f
3b546f3
2802078
ae7a6f2
3a808ae
a1f0b09
a1104b6
c6aea02
fd7ed40
7e0efc1
bf68b79
d206a69
04dbf77
1ae904d
62d523b
02ed90e
6b3cbcb
fc26b5d
38f1255
c7d133c
5213c46
b7d7c5b
cd94560
a60238f
7fe9651
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -1231,6 +1231,88 @@ def cb(self): | |
| yield self._emit(x) | ||
|
|
||
|
|
||
| @Stream.register_api() | ||
| class to_kafka(Stream): | ||
| """ Writes data in the stream to Kafka | ||
|
|
||
| This stream accepts a string or bytes object. Call ``flush`` to ensure all | ||
| messages are pushed. Responses from Kafka are pushed downstream. | ||
|
|
||
| Parameters | ||
| ---------- | ||
| topic : string | ||
| The topic which to write | ||
| producer_config : dict | ||
| Settings to set up the stream, see | ||
| https://docs.confluent.io/current/clients/confluent-kafka-python/#configuration | ||
| https://github.com/edenhill/librdkafka/blob/master/CONFIGURATION.md | ||
| Examples: | ||
| bootstrap.servers: Connection string (host:port) to Kafka | ||
|
|
||
| Examples | ||
| -------- | ||
| >>> from streamz import Stream | ||
| >>> ARGS = {'bootstrap.servers': 'localhost:9092'} | ||
| >>> source = Stream() | ||
| >>> kafka = source.map(lambda x: str(x)).to_kafka('test', ARGS) | ||
| <to_kafka> | ||
| >>> for i in range(10): | ||
| ... source.emit(i) | ||
| >>> kafka.flush() | ||
| """ | ||
| def __init__(self, upstream, topic, producer_config, **kwargs): | ||
| import confluent_kafka as ck | ||
|
|
||
| self.topic = topic | ||
| self.producer = ck.Producer(producer_config) | ||
|
|
||
| Stream.__init__(self, upstream, ensure_io_loop=True, **kwargs) | ||
| self.stopped = False | ||
| self.polltime = 0.2 | ||
| self.loop.add_callback(self.poll) | ||
| self.futures = [] | ||
|
|
||
| @gen.coroutine | ||
| def poll(self): | ||
| while not self.stopped: | ||
| # executes callbacks for any delivered data, in this thread | ||
| # if no messages were sent, nothing happens | ||
| self.producer.poll(0) | ||
| yield gen.sleep(self.polltime) | ||
|
|
||
| def update(self, x, who=None): | ||
| future = gen.Future() | ||
| self.futures.append(future) | ||
|
|
||
| @gen.coroutine | ||
| def _(): | ||
| while True: | ||
| try: | ||
| # this runs asynchronously, in C-K's thread | ||
| self.producer.produce(self.topic, x, callback=self.cb) | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can you verify that this doesn't block if the producer is busy?
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Also, does the
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If it gets run in C-K's thread then you may have to use
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I should have been more verbose here. The doc says that the method call is always async (returns immediately), and I have found no case where it isn't. The actual marshalling of the message to kafka happens in C-K's own thread, a C-level place. But the python callbacks for delivery are triggered upon |
||
| return | ||
| except BufferError: | ||
| yield gen.sleep(self.polltime) | ||
| except Exception as e: | ||
| future.set_exception(e) | ||
| return | ||
|
|
||
| self.loop.add_callback(_) | ||
| return future | ||
|
|
||
| @gen.coroutine | ||
| def cb(self, err, msg): | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Does confluent_kafka run the callback in a separate thread? If so then we can't make this a coroutine. That might be OK. to_kafka is likely to be used as a sink, and so we don't expect to get a future from downstream that we're supposed to wait on. It might be enough here to just set the result and be done. If cb is called in the thread where def cb(self, err, msg):
if good:
future.set_result(...)
else:
future.set_exception(...)However, if it's going to be called within a separate thread, then we'll need to use def cb(...):
def _():
future.set_result(...)
...
self.loop.add_callback(_)
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
No, it runs in the thread which calls
There is no |
||
| future = self.futures.pop(0) | ||
| if msg is not None and msg.value() is not None: | ||
| future.set_result(None) | ||
| yield self._emit(msg.value()) | ||
| else: | ||
| future.set_exception(err or msg.error()) | ||
|
|
||
| def flush(self, timeout=-1): | ||
| self.producer.flush(timeout) | ||
|
|
||
|
|
||
| def sync(loop, func, *args, **kwargs): | ||
| """ | ||
| Run coroutine in loop running in separate thread. | ||
|
|
||
Uh oh!
There was an error while loading. Please reload this page.