Skip to content
This repository has been archived by the owner on Sep 12, 2018. It is now read-only.

Aggregates #39

Closed
rnewman opened this issue Aug 23, 2016 · 5 comments
Closed

Aggregates #39

rnewman opened this issue Aug 23, 2016 · 5 comments

Comments

@rnewman
Copy link
Collaborator

rnewman commented Aug 23, 2016

Implement support for aggregates in projection, starting with max.

Aggregation implies grouping, so we will also need to support :with and be careful to deliberately hit the number-of-heads bug.

@rnewman
Copy link
Collaborator Author

rnewman commented Aug 23, 2016

Note also that SQLite supports distinct aggregates:

SELECT sum(distinct x) FROM foo

which will be useful for operations like count-distinct.

@rnewman
Copy link
Collaborator Author

rnewman commented Aug 24, 2016

“Query variables not in aggregate expressions will group the results and appear intact in the result.”

Generate SQL with a GROUP BY clause containing all non-aggregate columns in the projection, plus all columns in a :with clause. (SQLite allows for GROUP BY to refer to non-projected columns, behaving as a subquery, which is exactly what we want in this case.)

The resulting rows (including all variables used in aggregate expressions) should themselves be distinct. We might need work to do this.

The aggregate expressions should translate fairly neatly to SQL, modulo any lurking edge-case semantic differences in comparison:

  • (min ?xs) -> min(xs)
  • (max ?xs) -> max(xs)
  • (count ?xs) -> count(xs)
  • (count-distinct ?xs) -> count(distinct xs)
  • (sum ?xs) -> total(xs) (sum has its oddities)
  • (avg ?xs) -> avg(xs)
  • (median ?xs) — not yet.
  • (variance ?xs) — not yet.
  • (stddev ?xs) — not yet.

@rnewman
Copy link
Collaborator Author

rnewman commented Aug 24, 2016

I believe that queries featuring aggregates must, in the general case, be produced with a subquery. That's because the aggregates are considered to apply to a basis set, the elements of which are unique across the projected and :with-preserved bindings.

So

[:find ?url ?title max(?visitTime)
 :in $ :with ?page :where
 [?page :page/url ?url]
 [?page :page/title ?title]
 [?page :visit/time ?visitTime]]

should result in

SELECT DISTINCT inner.url AS url, inner.title AS title, max(inner.visitTime) AS _max_visitTime
FROM (
  SELECT DISTINCT datoms123.v AS url, datoms…, … AS page
) AS inner
GROUP BY inner.url, inner.title, inner.page

The alternative is that the aggregate function is called for non-distinct rows — non-distinct with or without the influence of :with. That will yield results counter to the expected behavior of q. SELECT DISTINCT applies to values after grouping and aggregation.

It's a little perverse — after all, :with exists precisely to avoid unwanted pruning of results, and it might even be impossible to add enough columns to match SQL's usual behavior, which is a weakness — but so it goes.

@rnewman
Copy link
Collaborator Author

rnewman commented Aug 24, 2016

N.B., type conversion plays in here. For instance, if I query :find (max ?date) against an attribute of type :db.type/instant, I expect the result to be converted accordingly.

I don't think we can simply extract type tags alongside the maxed value: SQLite doesn't guarantee which row from the group will contribute. Instead, this will require schema-aware projection. Fortunately, that's already in place for ordinary queries.

rnewman added a commit that referenced this issue Aug 25, 2016
@rnewman
Copy link
Collaborator Author

rnewman commented Aug 31, 2016

Fixed in b2a1af3.

@rnewman rnewman closed this as completed Aug 31, 2016
rnewman added a commit that referenced this issue Sep 22, 2016
rnewman added a commit that referenced this issue Sep 22, 2016
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant