#318 avoid losing an element in leveldb seeker #320

mrahs · 2021-05-07T15:56:56Z

This fix covers iterators for partition tables.
Views use the merge iterator which was not affected by the bug due to calling Next upon initialization and caching the pair before calling Next again (which was perfect for Seek).

That means the merge iterator would have paniced with other implementations such as in-memory which is used the tester package. I have added a test case to demonstrate the panic and the fix.

frairon

The code looks good, it should fix the issue. Also thanks for the regression test!

But I'm actually wondering whether we should just remove that comment? Checking the history I found that the comment was added by a bigger refactoring last week, but the iterator implementation actually did not change as far as I remember.
So maybe just the comment is wrong and after removing it we should be fine?
What do you think?

mrahs · 2021-05-11T09:16:36Z

It's not just a comment. There are two implementations with different semantics. In this patch, I changed the leveldb one to match the comment on the common interface. I'm fine if patching the memory iterator is preferred. The important thing is that all implementations have the same semantics.

mrahs · 2021-05-12T16:09:33Z

Just noticed there is an old related issue: #181

frairon · 2021-05-17T14:27:48Z

Yep, so it seems that something is quite odd here. I took the comments from the old branch mentioned in #181 because I thought it was improving storage, but actually that branch was adding some layer to track offsets along with keys for some retention-cleanup that we will probably not add.
So I'd say, the comment is wrong, the storage doesn't need a call to Next after Seek.
The memory-storage is a different thing though. It's mostly used for testing, the Seek-Iterate is semantically different from the other storages, because it's not ordered. When seeking it's just selecting the keys that contain the seeked key somehow, which is most likely not what users would expect from the storage. However, as said it's just used for testing (unless you know what you're doing), so we should probably just make it consistent with leveldb's iterator and maybe use strings.HasPrefix as proposed in #181.
And fix the comment of course. Would you agree on that?

mrahs · 2021-05-17T14:41:27Z

Agreed. When I surveyed the implementations (memory, redis, merged, leveldb), it looked like all of them were compatible with calling Next after a Seek except for leveldb's.

I agree with your reasoning. Let's fix the comment and the memory iterator.
I can rework my PR sometime within the next couple of weeks.

frairon · 2021-05-17T14:54:24Z

Wait what? But if all are compatible with calling next after seek, that means that we need to allow calling Next in leveldb too, so should rather use the current PR?

mrahs · 2021-05-17T15:05:50Z

It comes down to memory vs leveldb. Redis' seek is a noop. The merged iterator is primarily used with leveldb. And I'm getting the impression that LevelDB is probably the only one used in production.

Given that, I'd rather avoid a hack in production code. Thoughts?

frairon · 2021-05-17T15:10:03Z

Yep, agreeing to that. So let's not use a hack, fix the comment and memory iterator. Just got a bit confused by your earlier comment, because it sounded that "next" after "seek" works, but it doesn't without losing one element.

frairon · 2021-05-19T08:44:19Z

@mrahs, while you're on it, could you maybe also replace the type header.Headers in the memory-storage (and in its iterator) here with a raw data type like map[string][]byte?
I think the headers type added their by mistake by find/replace when introducing the header-type, but it logically does not belong there.

mrahs · 2021-05-19T14:06:07Z

My bad 🤦
Will definitely do.

mrahs · 2021-07-04T15:10:30Z

Updated the PR to resolve multiple issues since they are all related: #318, #331, and #181.

options.go

partition_table.go

frairon · 2021-07-26T08:35:40Z

@mrahs, would you by any chance have any progress on the PR?
We're kind of waiting for that so we can merge other features and release the breaking changes together. But no pressure :).
No seriously, I don't want to stress you in any way. As an alternative, we could also merge this PR into a feature-branch in goka and I'll implement the remaining issues we aggreed upon, then we merge everything into master when it's done. I just don't want to steal that from you. What do you think?

mrahs · 2021-07-27T12:43:22Z

@mrahs, would you by any chance have any progress on the PR?
We're kind of waiting for that so we can merge other features and release the breaking changes together. But no pressure :).
No seriously, I don't want to stress you in any way. As an alternative, we could also merge this PR into a feature-branch in goka and I'll implement the remaining issues we aggreed upon, then we merge everything into master when it's done. I just don't want to steal that from you. What do you think?

Sorry for the silence. I've been meaning to wrap this up asap, but life got busy. I'll update the PR later today.

Using goka.Headers enables implementations of the view callback to avoid importing Sarama as a dependency. However, it is a breaking change that requires all users to update their implementations to explicitly accept the headers argument.

- Corrected the comment on the iterator interface to match LevelDB semantics. All other implementations were compatible with LevelDB except for the memory iterator which is also fixed in this change set. - Updated the memory iterator to be semantically compatible with LevelDB. Iterators traverse the storage using ordered keys and Seek no longer replaces the iterators view.

This should make it easier to implement and extend the behaviour of the callback. Also introduced a default implementation that should be sufficient for most use cases.

frairon

Looks good, many thanks for the extra changes. Your contribution is really appreciated here!!
I think we can leave it like this, so we can go forward :).

One question though: the conversion between sarama headers and goka-headers is now moved to its own structure. But the only two places where it's being stored for lazy-conversion is the cbContext and the UpdateContext, whereas both could also just store the sarama.Headers and convert on request.
It just seems a bit of overkill to me to have this rather complex Headers structure supporting both header-types, with things like eval, Len, ToSaramaPtr, ToSarama and all that, which is mostly needed for testing, instead of the simpler way like before. So I'm just wondering if I'm missing the bigger picture here why this complexity is needed? :)

But again, thanks a lot for your work and your time on all that!

mrahs · 2021-07-30T07:04:19Z

Looks good, many thanks for the extra changes. Your contribution is really appreciated here!!
I think we can leave it like this, so we can go forward :).

One question though: the conversion between sarama headers and goka-headers is now moved to its own structure. But the only two places where it's being stored for lazy-conversion is the cbContext and the UpdateContext, whereas both could also just store the sarama.Headers and convert on request.
It just seems a bit of overkill to me to have this rather complex Headers structure supporting both header-types, with things like eval, Len, ToSaramaPtr, ToSarama and all that, which is mostly needed for testing, instead of the simpler way like before. So I'm just wondering if I'm missing the bigger picture here why this complexity is needed? :)

But again, thanks a lot for your work and your time on all that!

Thanks Franz!
It sounds like you're not fully content with the code. I'm open to further discussions.

The last iteration attempts to satisfy the following requirements and constraints:

Avoid forcing users to import Sarama as a dependency. Using a custom header type rather than exposing Sarama's achieves this. It was first introduced in this commit with a type that matches what Apache uses for its implementation.
Sarama's ProducerMessage type takes a slice of values while its ConsumerMessage type takes a slice of pointers. Conversion functions provide the convenience of switching between these formats.
Optimize for processors that don't make use of headers, but make headers readily available for processors that need them. Lazy evaluation achieves this.

Now all of this can still be achieved with a simpler header type of type Headers map[string][]byte if it weren't for the storage proxy which has a public function with headers in the argument list. Keeping it as the sarama type forces implementers to import sarama. Switching to the goka type requires more complexity for lazy evaluation.

Now that I've typed this, I believe it should be possible to change the proxy interface to use the UpdateContext type which would achieve both goals (no external dependency and lazy evaluation). I'm thinking:

func (s *storageProxy) Update(ctx UpdateContext, k string, v []byte) error {
  return s.update(ctx, s, k, v)
}

I've just tried this out and it looks better. The headers are type Headers map[string][]byte with only Merged, ToSarama and ToSaramaPtr functions. I can push this commit if that looks good to you.

frairon · 2021-07-30T07:42:55Z

Awesome, that looks even better!
Funnily after writing my comment I also thought your solution is fine and maybe the encapsulation in the extra structure isn't such a bad idea. Also since you're the main contributor around headers you probably know better what seems more elegant there. But you were faster responding.
Anyway, if you ask me this new solution is even better and we can push that. Thanks also for removing the unused fields and functions in DefaultUpdateContext :).

frairon reviewed May 10, 2021

View reviewed changes

mrahs force-pushed the master branch 2 times, most recently from 8658538 to 2fb25b0 Compare July 4, 2021 15:07

frairon reviewed Jul 5, 2021

View reviewed changes

options.go Outdated Show resolved Hide resolved

frairon reviewed Jul 8, 2021

View reviewed changes

options.go Outdated Show resolved Hide resolved

partition_table.go Outdated Show resolved Hide resolved

mrahs added 3 commits July 28, 2021 09:27

Introduce UpdateContext type

1f3373b

This should make it easier to implement and extend the behaviour of the callback. Also introduced a default implementation that should be sufficient for most use cases.

mrahs force-pushed the master branch from 93f0b26 to 3402a80 Compare July 28, 2021 16:38

Refactor headers for performance and legibility

d32ef29

mrahs force-pushed the master branch from 3402a80 to d32ef29 Compare July 28, 2021 16:49

frairon previously approved these changes Jul 30, 2021

View reviewed changes

Simplify the headers type

e06eeb9

mrahs dismissed frairon’s stale review via e06eeb9 July 30, 2021 13:40

frairon approved these changes Aug 2, 2021

View reviewed changes

frairon merged commit eaaa070 into lovoo:master Aug 2, 2021

frairon mentioned this pull request Sep 17, 2022

RedisIterator skips first key when it.Next() is called #394

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

#318 avoid losing an element in leveldb seeker #320

#318 avoid losing an element in leveldb seeker #320

mrahs commented May 7, 2021

frairon left a comment

mrahs commented May 11, 2021

mrahs commented May 12, 2021

frairon commented May 17, 2021

mrahs commented May 17, 2021

frairon commented May 17, 2021

mrahs commented May 17, 2021

frairon commented May 17, 2021

frairon commented May 19, 2021

mrahs commented May 19, 2021

mrahs commented Jul 4, 2021

frairon commented Jul 26, 2021 •

edited

mrahs commented Jul 27, 2021

frairon left a comment

mrahs commented Jul 30, 2021 •

edited by frairon

frairon commented Jul 30, 2021

#318 avoid losing an element in leveldb seeker #320

#318 avoid losing an element in leveldb seeker #320

Conversation

mrahs commented May 7, 2021

frairon left a comment

Choose a reason for hiding this comment

mrahs commented May 11, 2021

mrahs commented May 12, 2021

frairon commented May 17, 2021

mrahs commented May 17, 2021

frairon commented May 17, 2021

mrahs commented May 17, 2021

frairon commented May 17, 2021

frairon commented May 19, 2021

mrahs commented May 19, 2021

mrahs commented Jul 4, 2021

frairon commented Jul 26, 2021 • edited

mrahs commented Jul 27, 2021

frairon left a comment

Choose a reason for hiding this comment

mrahs commented Jul 30, 2021 • edited by frairon

frairon commented Jul 30, 2021

frairon commented Jul 26, 2021 •

edited

mrahs commented Jul 30, 2021 •

edited by frairon