Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: Add joins with watermarks #94

Closed
burdiyan opened this issue Feb 27, 2018 · 3 comments
Closed

Proposal: Add joins with watermarks #94

burdiyan opened this issue Feb 27, 2018 · 3 comments

Comments

@burdiyan
Copy link
Contributor

burdiyan commented Feb 27, 2018

I found that often we want to be able to ensure that certain part of some topic was processed before starting doing joins with that topic.

For example we join stream A with table B. We know a point in time where topic B would be mostly read. It may not be completely recovered, because messages are constantly coming in. But we know that we need at least the portion before that point in time to be available for joins.

So the idea would be to wait until certain processing time of topic B before starting processing topic A.

I could assume that using processor’s stats something like this could be achieved, but I’m not sure.

I’ve tried sleeping in A’s callback until particular record of topic B is available, but found out that topic B is stalling when topic A sleeps.

@db7
Copy link
Collaborator

db7 commented Feb 27, 2018

Currently what the processor does is the following: When it starts it queries the current HWM of all joined tables and save them. The processor then recovers its state and the joined tables up to their saved HWMs. Once that is done, the processor starts consuming the streams. In other words, the processor starts consuming the streams once the joined tables are recovered at least up to the HWM they had at the time the processor started.

If the topic being joined is not yet in the right state when the processor starts (and queries the HWM) then you may have problems. Goka does not support joins with time window or such. And hacking that with stats sounds like error prone.

@burdiyan
Copy link
Contributor Author

Oh, if that's how it works, it is exactly what I need. But I was worrying if that's always the case and thought I need to do something else to ensure that. Great! Thanks!

@db7
Copy link
Collaborator

db7 commented Feb 27, 2018

Please report an issue if it does not behave as I described! :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants