Clarification Request for brod:fetch #251

randysecrist · 2017-12-12T22:26:02Z

See attached screen shot ...

brod:fetch using the earliest offset only seems to be returning a subset of messages.

In this case, There are 5 messages in partition 0. Earliest offset is 0, latest is 5. A call to fetch using offset 0 returns message 0,1 and 2. Calling fetch with offset 3 returns the last two messages.

Is this expected? Anyway to get back all 5 with a single fetch call?

zmstone · 2017-12-12T23:04:43Z

Which brod version and which kafka version were you testing?
on latest version 3.3.4
:brod.fetch/3 by default uses 100K Bytes as max_bytes in fetch request.
:brod.fetch/8 allows you to specify max_bytes.

However, the messages are not that big, i.e. 3 messages are obviously less than 100K.
This seem quite odd to me. Not sure why kafka would not return all 5 messages in the batch.

zmstone · 2017-12-12T23:12:50Z

there are 40 hours between timestamp of message at offset 2 and offset 3,
maybe the are from two segments in kafka

could you check in kafka log-data directory, see if there are two segments for that partition ?

randysecrist · 2017-12-13T17:20:58Z

Yes, I had to find another partition and offset that was doing this; but it looks like there are multiple segments.

We use kafka 2.11. Brod; 3.3.4

zmstone · 2017-12-13T17:29:17Z

that should be it.

randysecrist · 2017-12-13T17:43:50Z

Ok, so it is expected. The code I have doing the fetch looks like this; does this seem correct?

  defp fetch(acc, hosts, topic, partition, start, total_count) do
    case :brod.fetch(default_hosts(), topic, partition, start) do
      {:ok, []} -> acc
      {:error, reason} -> {:error, reason}
      {_, messages} ->
        result = acc ++ messages
        case (length(result) < total_count) do
          true -> fetch(result, hosts, topic, partition, start + length(result), total_count)
          false -> result
        end
    end
  end

zmstone · 2017-12-13T18:17:21Z

Seem correct. But it’s probably better to connect leader then use brod_util:fetch/4 to avoid re-connect for each attempt.
See brod_cli fetch for example

randysecrist · 2017-12-13T20:07:46Z

Thank you for the review.

zmstone · 2017-12-13T22:29:52Z

Should be start+length(messages) not length(result)
Also next begin offset is not always start+length because offsets may not sequential in compacted partitions, should be last message’s offset+1

randysecrist closed this as completed Dec 13, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Clarification Request for brod:fetch #251

Clarification Request for brod:fetch #251

randysecrist commented Dec 12, 2017

zmstone commented Dec 12, 2017

zmstone commented Dec 12, 2017

randysecrist commented Dec 13, 2017

zmstone commented Dec 13, 2017

randysecrist commented Dec 13, 2017

zmstone commented Dec 13, 2017

randysecrist commented Dec 13, 2017

zmstone commented Dec 13, 2017

Clarification Request for brod:fetch #251

Clarification Request for brod:fetch #251

Comments

randysecrist commented Dec 12, 2017

zmstone commented Dec 12, 2017

zmstone commented Dec 12, 2017

randysecrist commented Dec 13, 2017

zmstone commented Dec 13, 2017

randysecrist commented Dec 13, 2017

zmstone commented Dec 13, 2017

randysecrist commented Dec 13, 2017

zmstone commented Dec 13, 2017