Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

#318 avoid losing an element in leveldb seeker #320

Merged
merged 5 commits into from
Aug 2, 2021
Merged

Conversation

mrahs
Copy link
Contributor

@mrahs mrahs commented May 7, 2021

This fix covers iterators for partition tables.
Views use the merge iterator which was not affected by the bug due to calling Next upon initialization and caching the pair before calling Next again (which was perfect for Seek).

That means the merge iterator would have paniced with other implementations such as in-memory which is used the tester package. I have added a test case to demonstrate the panic and the fix.

Copy link
Contributor

@frairon frairon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code looks good, it should fix the issue. Also thanks for the regression test!

But I'm actually wondering whether we should just remove that comment? Checking the history I found that the comment was added by a bigger refactoring last week, but the iterator implementation actually did not change as far as I remember.
So maybe just the comment is wrong and after removing it we should be fine?
What do you think?

@mrahs
Copy link
Contributor Author

mrahs commented May 11, 2021

It's not just a comment. There are two implementations with different semantics. In this patch, I changed the leveldb one to match the comment on the common interface. I'm fine if patching the memory iterator is preferred. The important thing is that all implementations have the same semantics.

@mrahs
Copy link
Contributor Author

mrahs commented May 12, 2021

Just noticed there is an old related issue: #181

@frairon
Copy link
Contributor

frairon commented May 17, 2021

Yep, so it seems that something is quite odd here. I took the comments from the old branch mentioned in #181 because I thought it was improving storage, but actually that branch was adding some layer to track offsets along with keys for some retention-cleanup that we will probably not add.
So I'd say, the comment is wrong, the storage doesn't need a call to Next after Seek.
The memory-storage is a different thing though. It's mostly used for testing, the Seek-Iterate is semantically different from the other storages, because it's not ordered. When seeking it's just selecting the keys that contain the seeked key somehow, which is most likely not what users would expect from the storage. However, as said it's just used for testing (unless you know what you're doing), so we should probably just make it consistent with leveldb's iterator and maybe use strings.HasPrefix as proposed in #181.
And fix the comment of course. Would you agree on that?

@mrahs
Copy link
Contributor Author

mrahs commented May 17, 2021

Agreed. When I surveyed the implementations (memory, redis, merged, leveldb), it looked like all of them were compatible with calling Next after a Seek except for leveldb's.

I agree with your reasoning. Let's fix the comment and the memory iterator.
I can rework my PR sometime within the next couple of weeks.

@frairon
Copy link
Contributor

frairon commented May 17, 2021

Wait what? But if all are compatible with calling next after seek, that means that we need to allow calling Next in leveldb too, so should rather use the current PR?

@mrahs
Copy link
Contributor Author

mrahs commented May 17, 2021

It comes down to memory vs leveldb. Redis' seek is a noop. The merged iterator is primarily used with leveldb. And I'm getting the impression that LevelDB is probably the only one used in production.

Given that, I'd rather avoid a hack in production code. Thoughts?

@frairon
Copy link
Contributor

frairon commented May 17, 2021

Yep, agreeing to that. So let's not use a hack, fix the comment and memory iterator. Just got a bit confused by your earlier comment, because it sounded that "next" after "seek" works, but it doesn't without losing one element.

@frairon
Copy link
Contributor

frairon commented May 19, 2021

@mrahs, while you're on it, could you maybe also replace the type header.Headers in the memory-storage (and in its iterator) here with a raw data type like map[string][]byte?
I think the headers type added their by mistake by find/replace when introducing the header-type, but it logically does not belong there.

@mrahs
Copy link
Contributor Author

mrahs commented May 19, 2021

My bad 🤦
Will definitely do.

@mrahs mrahs force-pushed the master branch 2 times, most recently from 8658538 to 2fb25b0 Compare July 4, 2021 15:07
@mrahs
Copy link
Contributor Author

mrahs commented Jul 4, 2021

Updated the PR to resolve multiple issues since they are all related: #318, #331, and #181.

options.go Outdated Show resolved Hide resolved
options.go Outdated Show resolved Hide resolved
partition_table.go Outdated Show resolved Hide resolved
@frairon
Copy link
Contributor

frairon commented Jul 26, 2021

@mrahs, would you by any chance have any progress on the PR?
We're kind of waiting for that so we can merge other features and release the breaking changes together. But no pressure :).
No seriously, I don't want to stress you in any way. As an alternative, we could also merge this PR into a feature-branch in goka and I'll implement the remaining issues we aggreed upon, then we merge everything into master when it's done. I just don't want to steal that from you. What do you think?

@mrahs
Copy link
Contributor Author

mrahs commented Jul 27, 2021

@mrahs, would you by any chance have any progress on the PR?
We're kind of waiting for that so we can merge other features and release the breaking changes together. But no pressure :).
No seriously, I don't want to stress you in any way. As an alternative, we could also merge this PR into a feature-branch in goka and I'll implement the remaining issues we aggreed upon, then we merge everything into master when it's done. I just don't want to steal that from you. What do you think?

Sorry for the silence. I've been meaning to wrap this up asap, but life got busy. I'll update the PR later today.

Using goka.Headers enables implementations of the view callback to avoid importing Sarama
as a dependency. However, it is a breaking change that requires all users to update
their implementations to explicitly accept the headers argument.
- Corrected the comment on the iterator interface to match LevelDB semantics.
All other implementations were compatible with LevelDB except for the memory
iterator which is also fixed in this change set.
- Updated the memory iterator to be semantically compatible with LevelDB.
Iterators traverse the storage using ordered keys and Seek no longer
replaces the iterators view.
This should make it easier to implement and extend the behaviour of the callback.
Also introduced a default implementation that should be sufficient for most use cases.
frairon
frairon previously approved these changes Jul 30, 2021
Copy link
Contributor

@frairon frairon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, many thanks for the extra changes. Your contribution is really appreciated here!!
I think we can leave it like this, so we can go forward :).

One question though: the conversion between sarama headers and goka-headers is now moved to its own structure. But the only two places where it's being stored for lazy-conversion is the cbContext and the UpdateContext, whereas both could also just store the sarama.Headers and convert on request.
It just seems a bit of overkill to me to have this rather complex Headers structure supporting both header-types, with things like eval, Len, ToSaramaPtr, ToSarama and all that, which is mostly needed for testing, instead of the simpler way like before. So I'm just wondering if I'm missing the bigger picture here why this complexity is needed? :)

But again, thanks a lot for your work and your time on all that!

@mrahs
Copy link
Contributor Author

mrahs commented Jul 30, 2021

Looks good, many thanks for the extra changes. Your contribution is really appreciated here!!
I think we can leave it like this, so we can go forward :).

One question though: the conversion between sarama headers and goka-headers is now moved to its own structure. But the only two places where it's being stored for lazy-conversion is the cbContext and the UpdateContext, whereas both could also just store the sarama.Headers and convert on request.
It just seems a bit of overkill to me to have this rather complex Headers structure supporting both header-types, with things like eval, Len, ToSaramaPtr, ToSarama and all that, which is mostly needed for testing, instead of the simpler way like before. So I'm just wondering if I'm missing the bigger picture here why this complexity is needed? :)

But again, thanks a lot for your work and your time on all that!

Thanks Franz!
It sounds like you're not fully content with the code. I'm open to further discussions.

The last iteration attempts to satisfy the following requirements and constraints:

  • Avoid forcing users to import Sarama as a dependency. Using a custom header type rather than exposing Sarama's achieves this. It was first introduced in this commit with a type that matches what Apache uses for its implementation.
  • Sarama's ProducerMessage type takes a slice of values while its ConsumerMessage type takes a slice of pointers. Conversion functions provide the convenience of switching between these formats.
  • Optimize for processors that don't make use of headers, but make headers readily available for processors that need them. Lazy evaluation achieves this.

Now all of this can still be achieved with a simpler header type of type Headers map[string][]byte if it weren't for the storage proxy which has a public function with headers in the argument list. Keeping it as the sarama type forces implementers to import sarama. Switching to the goka type requires more complexity for lazy evaluation.

Now that I've typed this, I believe it should be possible to change the proxy interface to use the UpdateContext type which would achieve both goals (no external dependency and lazy evaluation). I'm thinking:

func (s *storageProxy) Update(ctx UpdateContext, k string, v []byte) error {
  return s.update(ctx, s, k, v)
}

I've just tried this out and it looks better. The headers are type Headers map[string][]byte with only Merged, ToSarama and ToSaramaPtr functions. I can push this commit if that looks good to you.

@frairon
Copy link
Contributor

frairon commented Jul 30, 2021

Awesome, that looks even better!
Funnily after writing my comment I also thought your solution is fine and maybe the encapsulation in the extra structure isn't such a bad idea. Also since you're the main contributor around headers you probably know better what seems more elegant there. But you were faster responding.
Anyway, if you ask me this new solution is even better and we can push that. Thanks also for removing the unused fields and functions in DefaultUpdateContext :).

@frairon frairon merged commit eaaa070 into lovoo:master Aug 2, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants