New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[LINQ] Add timestamp join #215
[LINQ] Add timestamp join #215
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- I'm in favor of renaming it to
TimestampJoinOperator
to remain consistent with the rest of the operators. Maybe at some point we could have a genericJoinOperator
whose joining behavior (i.e., window, tumbling, timestamp etc.) could be chosen through anenum
. - We have a
JoinSumOperator
in thefull_pipeline
example that could be replaced with aTimestampJoinOperator
followed by aMapOperator
. (I believe Flink does this using anapply
method, which we could also provide in the LINQ API and just use aMapOperator
underneath)
I revamped the LINQ example with the timestamp join. Do we also want to include it in the full pipeline example? If so, can you let me know the purpose of the full pipeline example? Is the intent to mirror the LINQ example using the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, the intent of the full_pipeline
example was to show the operator construction and connection API, which is simplified by the LINQ API since they are all data transformation operators. Maybe we can leave it untouched for now, since we're planning to replace that with a Perception example anyways.
Also, what is the behavior of a TimestampJoinOperator
when you get no inputs from one of the streams for a timestamp?
Messages are dropped when there are no inputs from one of the streams for a given timestamp. The behavior is documented in the table above the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Can you add a comment on the connect
methods about why we require them to take a state initialization just in case we forget again?
Introduces the
TimestampJoin
, which joins messages with matching timestamps from two different streams by performing a cartesian product. I've tested the operator locally, and it seems to work.A few things I wasn't sure about that can hopefully be addressed in review:
TimestampJoin
seems more ergonomic, but this should be namedTimestampJoinOperator
to stay consistent with the other operators.