Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Operator for join #57

Merged
merged 16 commits into from
Apr 9, 2024
Merged

Operator for join #57

merged 16 commits into from
Apr 9, 2024

Conversation

tantaman
Copy link

@tantaman tantaman commented Apr 2, 2024

Join is a bit verbose to test. 900 lines of tests 😅.

Next PR will expose JOIN to ZQL proper.

Left and Right join can be done without much extra work (add a row to the result set, rather than skipping, if a match is not present in the left/right table).

Commits shouldn't be reviewed in isolation.


Kept for history:

Decided on a nest operator that can appear in the position of a selected column as well as supporting normal joins.

Fiddling with the best way to alias tables until then. E.g.,

Issue.as('t1')
  .select('t1.title')
  .join(Issue, 'parent_id', 'id').as('t2')
  .where('t2.status', '=', 'blocked')

vs

Issue.select('title').join(Issue, 't2', 'parent_id', 'id').where('t2.status', '=', 'blocked')

Or do all joins as nested joins 🤔 and the where / select parts becomes a path

Issue.select('title', 'parent.title')
  .nest(Issue, 'parent', 'parent_id', 'id')
  .where('issue.parent.status', '=', 'blocked');

Alternate nest join ideas:

Issue.select(
  'title',
  nest('parent', Issue.select(...)).on('parent_id', 'id')
).where('issue.parent.status', '=', 'blocked');

Issue.select(
  'title',
   nest('owner', User.select('name'))
     .on('owner_id', 'id')
)

I think the nest idea is cleanest.

  1. It can replace correlated sub-queries for most use cases
  2. Where clauses become simpler since the user can operate on the entire aggregate

Drawbacks:

  1. Copying the outer object to nest something inside
  2. The nested collection is copied if something inside of it changes
  3. Junction tables show up in the results

src/zql/ivm/graph/operators/reduce-operator.test.ts Outdated Show resolved Hide resolved
@@ -38,6 +39,10 @@ export abstract class OperatorBase<O extends object> implements Operator {
}

commit(v: Version) {
if (v <= this.#lastCommit) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does this happen again?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for operators with more than one input. We send the commit notification down the graph so it'll get a commit notification twice.

An alternative would be to not send commit down the graph and instead have every effect and view register with the global materialite instance if they're run in a given transaction. On commit, they'd get notified directly.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Working on a separate PR that does not send commit through the graph. Only effects and views need to know of commits so they can call their listeners so it makes sense not to send commit through the graph from a saving work perspective.

src/zql/ivm/types.ts Outdated Show resolved Hide resolved
}
}

// export
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove

src/zql/ivm/graph/operators/join-operator.ts Outdated Show resolved Hide resolved
src/zql/ivm/graph/operators/join-operator.ts Outdated Show resolved Hide resolved
src/zql/ivm/graph/operators/join-operator.test.ts Outdated Show resolved Hide resolved
],
]);
items.length = 0;
});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please add some tests where newDifference is called more than once per stream and version.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch. I need to take version into account in the operator to handle this case.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was actually working fine. I wrote the test case incorrectly 😅. New test added for this case that actually does the correct thing.

Copy link
Contributor

@arv arv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@tantaman
Copy link
Author

tantaman commented Apr 9, 2024

updated to always iterate over the smallest collection in the outer loop.

This caused our synthetic ids for join rows to be non-deterministic (outer loop row was concatenated first then inner loop row) so add a sort step to ensure joins always produce the same id for the same two rows.

@tantaman tantaman merged commit 12607f1 into main Apr 9, 2024
4 checks passed
@tantaman tantaman deleted the mlaw/join branch April 9, 2024 00:38
}
}
}
return ret;
}

#concatIds(idA: string | number, idB: string | number) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is generally better to use module level functions than private methods. Only use private methods when you need access to other (private) state.

Copy link
Contributor

@arv arv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants