Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for self (left / right / inner / cross) join #290

Open
Spike2050 opened this issue Mar 3, 2017 · 5 comments
Open

Add support for self (left / right / inner / cross) join #290

Spike2050 opened this issue Mar 3, 2017 · 5 comments

Comments

@Spike2050
Copy link
Contributor

Spike2050 commented Mar 3, 2017

Why isn't here a shortcutmethod of being able to Join a Sequence with itself, like this (from Seq.java):

(As addition to Seq.java)

/**
 * Cross join stream with itself
 * <p>
 * <code><pre>
 * // (tuple(1, 1), tuple(1, 2), tuple(2, 1), tuple(2, 2))
 * Seq.of(1, 2).crossJoin()
 * </pre></code>
 */
default <U> Seq<Tuple2<U, U>> crossJoin() {
    List<U> list = toList();
    return Seq.crossJoin(Seq(list), Seq(list));
}

Or why wouldn't that make sense? Ok, Streams are one-shot, therefore I have to convert it to a list, which isnt lazy, but it would safe me a lot of hastle writing the additional code, e.g. writing the conversion myself beforehand.

Same of course with Inner-, LeftOuter-, and RightOuterJoin.

@lukaseder
Copy link
Member

Hmm, we could support cartesian powers up to a certain number of degrees (although, they'd be very quickly impractical...)

What would be the point of doing this with outer joins?

@lukaseder lukaseder changed the title Joins with itself Add support for cartesian powers Mar 3, 2017
@Spike2050
Copy link
Contributor Author

Spike2050 commented Mar 3, 2017

Hey lukaseder
I have this example for an OuterJoin (in Pseudocode):

If you have a Person(Id, ParentId) and this Dataset:

Fred(1,0)
Kelly(2,1)
Cindy(3,1)

you could make a LeftOuterJoin with d1.ParentId == d2.Id to get the Persons and their Parent in a tuple and the Persons that dont have a parent (maybe to handle these seperatly):

Tuple<Fred,Null>
Tuple<Kelly,Fred>
Tuple<Cindy,Fred>

About your number of degrees question, you could start with 2, I think that would cover most use cases, If you wanted more, ýou could go the established route by giving the dataset again and again via parameter. If you'd make more degress, I think you would also need a TriPredicate, QuadPredicate, etc?

Your thoughts?

@lukaseder lukaseder changed the title Add support for cartesian powers Add support for self (left / right / inner) join Mar 4, 2017
@lukaseder lukaseder changed the title Add support for self (left / right / inner) join Add support for self (left / right / inner / cross) join Mar 4, 2017
@lukaseder
Copy link
Member

OK, I see. That might make sense indeed, at least for degree 2. I don't think it would be a reasonable extension for higher degree joins as the ordering semantics might become quite complex.

@Spike2050
Copy link
Contributor Author

Spike2050 commented Mar 4, 2017

Yes, so I suggest, that a join (l/r/i/c) without another stream as parameter would be automatically a Self-Join, like this:

default <T> Seq<Tuple2<T, T>> crossJoin()
default <T> Seq<Tuple2<T, T>> innerJoin(BiPredicate<? super T, ? super T> predicate)
default <T> Seq<Tuple2<T, T>> leftOuterJoin(BiPredicate<? super T, ? super T> predicate)
default <T> Seq<Tuple2<T, T>> rightOuterJoin(BiPredicate<? super T, ? super T> predicate)

There's also a case for calling them, speraratly, not overloading, selfCrossJoin, selfInnerJoin, etc., that depends on your philosophy. If you're interested, I'll shoot you a pull-request

@lukaseder
Copy link
Member

The keyword "self" should definitely need to be part of the naming, otherwise, the overload with the existing joins might be:

  • Confusing
  • Ambiguous (for the compiler)

So:

<T> Seq<Tuple2<T, T>> crossSelfJoin()
<T> Seq<Tuple2<T, T>> innerSelfJoin(BiPredicate<? super T, ? super T> predicate)
<T> Seq<Tuple2<T, T>> leftOuterSelfJoin(BiPredicate<? super T, ? super T> predicate)
<T> Seq<Tuple2<T, T>> rightOuterSelfJoin(BiPredicate<? super T, ? super T> predicate)

Sure, a pull request would be greatly appreciated!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants