Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

implement a reduce operator & ZQL group-by on top of it #45

Merged
merged 12 commits into from
Mar 29, 2024
Merged

Conversation

tantaman
Copy link

All aggregate functions can be modeled as reduce

This adds a reduce operator so we can do:

  1. group by
  2. avg, sum, count, min, max, etc.
  3. group_array

This targets SQL behavior for group-by.

@tantaman
Copy link
Author

note to self: pullHistory messages can stop at a reduce operator if the operator has already been seeded with history.

if (entries === undefined) {
continue;
}
this.#index.delete(key);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This changes the iteration order of the keys. Is that intentional? I think it would be simpler to only delete if length === 0.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not intentional but also shouldn't matter?

Updating to only delete on length === 0

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, slightly more efficient because less tombstones on the map storage ;-)

@tantaman tantaman changed the title implement a reduce operator to support group-by implement a reduce & group-by Mar 25, 2024
@tantaman
Copy link
Author

tantaman commented Mar 25, 2024

Not sure that I like the aggregation API:

https://github.com/rocicorp/rails/pull/45/files#diff-c88168576b354654e2ec061b5bbab48370b2e04ff6aeec9f2a52d741558d68daR449-R454

q
    .select('status')
    .groupBy('status')
    .array('assignee')
    .min('created');

maybe it should be:

q
    .select('status', agg.array('assignee'), agg.min('created'))
    .groupBy('status')

@tantaman tantaman changed the title implement a reduce & group-by implement a reduce operator & ZQL group-by on top of it Mar 25, 2024
@arv
Copy link
Contributor

arv commented Mar 25, 2024

Another strawman:

q
  .select('status').
  .agg('array', 'assignee')
  .agg('min', 'created')

But I think I prefer the first suggested API.

constructor(context: Context, tableName: string, ast?: AST) {
this.#ast = ast ?? {
table: tableName,
alias: aliasCount++,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the alias used for?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it'll be used whenever we compile the query to SQL.

I think I mentioned this before but since the same table could be present many times (through joins and sub-queries) it'll be simplest to give each occurrence a unique alias.

Maybe it is irrelevant to do at this step and the AST -> SQL compiler can do this itself.

Base automatically changed from mlaw/zql-hoist to main March 27, 2024 11:34
new API:

```ts
q.select('foo', agg.min('bar'), agg.array('baz')).groupBy('foo')
```
@tantaman tantaman marked this pull request as draft March 27, 2024 14:32
@tantaman tantaman marked this pull request as ready for review March 27, 2024 14:51
@tantaman
Copy link
Author

updated the API to allow putting aggregate calls in select:

q
    .select(
      'status',
      agg.array('assignee'),
      agg.min('created', 'minCreated'),
      agg.max('created', 'maxCreated'),
    )

we could eventually support this.
src/zql/ast-to-ivm/pipeline-builder.test.ts Show resolved Hide resolved
src/zql/ast-to-ivm/pipeline-builder.ts Outdated Show resolved Hide resolved
@@ -5,6 +5,18 @@
// input to the query builder.
export type Ordering = readonly [readonly string[], 'asc' | 'desc'];
export type Primitive = string | number | boolean | null;

// I think letting users provide their own lambda functions
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That introduces the possibilities of exceptions in the pipeline

It also makes converting to SQL impossible.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, the lambdas would only be available client side. What is run server side would have to return a superset of the data.

src/zql/integration.test.ts Outdated Show resolved Hide resolved
src/zql/integration.test.ts Outdated Show resolved Hide resolved
});
}

groupBy<K extends keyof S['fields']>(...x: K[]) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
groupBy<K extends keyof S['fields']>(...x: K[]) {
groupBy<K extends Selectable<S>>(...x: K[]) {

src/zql/query/entity-query.ts Show resolved Hide resolved
src/zql/query/agg.ts Outdated Show resolved Hide resolved
src/zql/query/entity-query.test.ts Outdated Show resolved Hide resolved
src/zql/ivm/graph/operators/operator-index.ts Outdated Show resolved Hide resolved
- type alias on `Aggregate<
  AsString<keyof S['fields']>,
  string
>;`
- aggrable - aggregable
- remove operator-index
- test flatMapIter
- tree-shakable `agg` exports
- remove unused await
* Flat maps the items returned from the iterable.
*
* `iter` is a lambda that returns an iterable
* so this function can return an `IterableIterator`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it should take an IterableIterator then?

Also, These should be compatible with the standard methods to simplify transition to them later:

https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Iterator/flatMap

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

going to iterate on this separately. IterableIterator is also getting exhausted on me :/

E.g.,

x = flatMapIter(iter, fn);
console.log([...x])
console.log([...x])

second spread is empty :/

Or maybe a better question to answer for myself is why would we need to pull the iterable twice. Seems like it should only ever happen once.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To get the second log to be non empty x needs to be something that has [Symbol.iterator](): Iterator which returns a new iterator every time.

This API was something that I felt we didn't get quite right for ES6. It is always confusing.

@tantaman tantaman merged commit 6f1ba16 into main Mar 29, 2024
4 checks passed
@tantaman tantaman deleted the mlaw/group-by branch March 29, 2024 19:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants