-
Notifications
You must be signed in to change notification settings - Fork 164
Stream #193
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Hi, I switched one of my project on that branch, and the new |
|
@ook thanks for exercising the code, this behavior is documented here: https://github.com/kafkaex/kafka_ex/blob/stream/lib/kafka_ex.ex#L260 We have a choice of allowing the stream to be finite or infinite, I chose the former, maybe we should allow the stream function take a flag to determine if it's a finite stream or an infinite one and act accordingly, then you can either have a stream that never returns like you want or one that returns once it's done consuming the available offsets like was mentioned in some other issue. |
|
Both aspect are interesting. May I suggest a big warning in the |
dantswain
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is definitely a step in the right direction. I like this about 1000% better than the previous implementation.
I think there should definitely be an option for the stream to be indefinite, and quite possibly that it should be the default. The kafka partition is, in concept, an indefinite thing.
Unfortunately there's a lot of room for this to be misunderstood, and I think it's our responsibility to ensure that the documentation does a good job clarifying. That doesn't necessarily have to be part of this PR - I can take a stab at it, I'm usually decent at writing documentation.
|
|
||
| defimpl Enumerable do | ||
| def reduce(data, acc, fun) do | ||
| next_fun = fn offset -> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we make this an actual function and/or refactor it into a composition of smaller functions?
lib/kafka_ex/stream.ex
Outdated
| end | ||
| response = data.worker_name | ||
| |> GenServer.call({:fetch, %{data.fetch_request| offset: offset}}) | ||
| |> hd |> Map.get(:partitions) |> hd |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd prefer not to use hd so much - it's a bit opaque. We can either pattern match [element] = thing or add helper functions to the appropriate modules, like FetchResponse.first_response. Also I think we should try to stick to one pipe per line (I think credo would tell us this as well).
|
(FYI, changed into a recursive call to stream: works like a charm) |
|
@bjhaid Any plans to revisit this? |
|
Will do sometime this week, I have not had enough time that I can dedicate to finishing this, but will try to find time to do this week |
|
OK no worries, I just thought I'd ping you. Let me know if there's anything I can do to help. |
|
@bjhaid Monthly ping on this ;) |
|
Hello, @bjhaid! This is your first Pull Request that will be reviewed by Ebert, an automatic Code Review service. It will leave comments on this diff with potential issues and style violations found in the code as you push new commits. You can also see all the issues found on this Pull Request on its review page. Please check our documentation for more information. |
|
@dantswain I will try to finish this today, and get it behind me |
| stream = KafkaEx.stream(random_string, 0, worker_name: :stream, offset: 0, auto_commit: false) | ||
| log = TestHelper.wait_for_accum( | ||
| fn() -> GenEvent.call(stream.manager, KafkaEx.Handler, :messages) end, | ||
| fn() -> stream |> Enum.take(2) end, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use a function call when a pipeline is only one function long
| # make sure we consume at least one message before we assert that there is no offset committed | ||
| _log = TestHelper.wait_for_any( | ||
| fn() -> GenEvent.call(stream.manager, KafkaEx.Handler, :messages) |> Enum.take(2) end | ||
| fn() -> stream |> Enum.take(2) end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use a function call when a pipeline is only one function long
| stream = KafkaEx.stream(random_string, 0, worker_name: worker_name) | ||
| log = TestHelper.wait_for_any( | ||
| fn() -> GenEvent.call(stream.manager, KafkaEx.Handler, :messages) |> Enum.take(2) end | ||
| fn() -> stream |> Enum.take(2) end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use a function call when a pipeline is only one function long
| stream = KafkaEx.stream(random_string, 0, worker_name: :stream_auto_commit, offset: 0) | ||
| log = TestHelper.wait_for_any( | ||
| fn() -> GenEvent.call(stream.manager, KafkaEx.Handler, :messages) |> Enum.take(2) end | ||
| fn() -> stream |> Enum.take(2) end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use a function call when a pipeline is only one function long
lazy stream implementation, this closes #145.
behavior and also update the implementation to halt if there are no more messages to stream.
Fetch.Response.partition_messages function.
stream to the user.
lib/kafka_ex.ex
Outdated
| Returns a stream that consumes fetched messages, the stream will halt once the max_bytes number of messages is reached, if you want to halt the stream early supply a small max_bytes, for the inverse supply a large max_bytes. | ||
| Optional arguments(KeywordList) | ||
| - stream_mode: the mode the stream will be in, `:infinite` for an infinite stream and `:finite` for a finite stream, a finite stream will return once there's no data left to consume from the partition while an infinite stream will block till there's data to consume from the stream. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be good if this mirrors the command line tools and uses something like :no_wait_at_logend https://cwiki.apache.org/confluence/display/KAFKA/System+Tools#SystemTools-SimpleConsumerShell
|
I can publish 100 messages to a topic and create a stream for that topic: Note it sets the offset to 0 because the consumer group is new. Now I take 20 messages, which should be well over 100 bytes. It actually ends up with duplicate messages because it's over the max_bytes: If I take 1 again, it actually starts back at 0, because the This makes sense if you look at the implementation of This is even more fun: If I create a NEW stream with the same consumer group, I get offset 2. Furthermore, if I I'm pretty sure I can sort this all out. The question is, from a design standpoint, should the stream continue along the topic from the last offset that was committed, or should it start at the same Regardless, I definitely agree that documenting this is really important. |
|
Can you verify what the behavior of |
|
nvm I just verified, and this is consistent with the behavior of |
|
|
lib/kafka_ex/stream.ex
Outdated
| no_wait_at_logend: false | ||
|
|
||
| defimpl Enumerable do | ||
| def reduce(data = %KafkaEx.Stream{}, acc, fun) do |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
File has the variable name before the pattern while most of the files have the variable name after the pattern when naming parameter pattern matches
| assert m3.value == "message 3" | ||
| assert m4.value == "message 4" | ||
| end | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There should be no trailing white-space at the end of a line.
Fixes #214
|
I fixed the double-consume issue. I think we may be committing in the wrong spot, so I want to add some tests for that. Still need to update the documentation as well. There's a potential weirdness with |
lib/kafka_ex/stream.ex
Outdated
| # this function returns a Stream.resource stream, so we need to define | ||
| # start_fun, next_fun, and after_fun callbacks | ||
|
|
||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There should be no more than 1 consecutive blank lines.
|
@bjhaid @joshuawscott I've handled everything I could think of, and updated the documentation. Could you guys give this another review, please? One thing I considered doing was delegating stream creation to a function in the |
| consumer_group: consumer_group | ||
| ) | ||
|
|
||
| [m1, m2, m3, m4] = Enum.take(stream, 10) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Worth testing that this actually only returns 4 elements?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would match fail if I didn't get 4 elements, or am I misunderstanding you?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
goes off to drink more coffee before reviewing more
| ) | ||
|
|
||
| assert ["Msg 1", "Msg 2"] == stream1 | ||
| |> Enum.take(2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nitpicky, but I kind of prefer assigning to a variable first so that the assert is more self-contained
messages = stream1
|> Enum.take(2)
|> Enum.map(&(&1.value))
assert ["Msg 1", "Msg 2"] == messages
Mainly because I was super confused that a list == a stream before I noticed the pipeline on the next line.
joshuawscott
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM ![]()
bjhaid
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM minus slight var name vs string literal comment
| worker_name: :stream, | ||
| offset: 0, | ||
| auto_commit: true, | ||
| consumer_group: "stream_test" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should use the consumer_group var here instead of "stream_test"
|
Ebert has finished reviewing this Pull Request and has found:
You can see more details about this review at https://ebertapp.io/github/kafkaex/kafka_ex/pulls/193. |
This replaces the existing GenEvent based stream implementation with a purely lazy stream implementation, fixing issues described in:
#191
#185
#145
#141