Join GitHub today
GitHub is home to over 40 million developers working together to host and review code, manage projects, and build software together.Sign up
Sparse left and right outer joins #1386
A version of sparse left and right outer joins, taken from the original sparse outer join implementation
In some of our pipeline we have a bunch of right outer joins where the RHS of the join is around 5% of the LHS, and but the RHS still doesn't fit in memory. As of now we extract the keys as a Side Input out of RHS which barely fits in memory and then we filter LHS using the keys, before doing the right outer join. Using a BF and sparse joins would help us scale well, which we would need to very soon, but we would need a right outer join variant. So I thought of adding the right and left versions.