Group By and Aggregated Values #1312

sorenbs · 2017-11-21T11:26:32Z

#70 Was a wide ranging discussion of how to support GroupBy and Aggregations in a type safe GraphQL API. This issue takes the learnings from previous discussions and provides a final API Proposal.

Throughout this proposal the examples will be based on this data schema:

type User {
  id: ID! @unique
  name: String!
  age: Int!
  salaryBracket: String!
  city: String!
}

Note: According to #353 we will introduce a new API version that combines the capabilities of the Simple and Relay API. The API is not final yet, but there will be a relay-style connection field for all relations, providing us a convenient place to introduce aggregation fields.

Retrieving all users who live in Aarhus:

{
  allUsersConnection(where: {city: "Aarhus"}) {
    edges {
      node { id, name }
    }
  }
}

See example return value

Data:

[
  {id: "1", name: "Søren", age: 23, salaryBracket: "0-5", city: "Aarhus"},
  {id: "2", name: "Tim", age: 43, salaryBracket: "50-80", city: "Aarhus"},
  {id: "3", name: "Nilan", age: 99, salaryBracket: "0-5", city: "Magdeburg"}
]

Return value:

{
  allUsersConnection {
    edges: [
      { node: { id: "1", name: "Søren" } },
      { node: { id: "2", name: "Karl" } }
    ]
  }
}

Aggregations

Aggregate functions

avg
median
max
min
count
sum

API

Getting the average age of people living in Aarhus is accomplished like this in SQL:

SELECT AVG(age) FROM User WHERE city = 'Aarhus'

With Prisma it would look like this:

{
  allUsersConnection(where: {city: "Aarhus"}) {
    aggregate {
      avg {
        age
      }
    }
  }
}

See example return value

Data:

[
  {id: "1", name: "Søren", age: 23, salaryBracket: "0-5", city: "Aarhus"},
  {id: "2", name: "Tim", age: 43, salaryBracket: "50-80", city: "Aarhus"},
  {id: "3", name: "Nilan", age: 99, salaryBracket: "0-5", city: "Magdeburg"}
]

Return value:

{
  allUsersConnection: {
    aggregate: {
      avg: {
        age: 33
      }
    }
  }
}

Limiting the scope of aggregations

The normal where, skip, first and orderBy arguments can be used to limit the scope of data included in the aggregations:

{
  allUsersConnection(where: {city: "Aarhus"}, first: 5, orderBy AGE_DESC) {
    aggregate {
      avg {
        age
      }
    }
  }
}

This will return the average age of the 5 oldest people in Aarhus

See example return value

Data:

[
  {id: "1", name: "Søren", age: 99, salaryBracket: "0-5", city: "Aarhus"},
  {id: "2", name: "Tim", age: 99, salaryBracket: "50-80", city: "Aarhus"},
  {id: "3", name: "Nilan", age: 99, salaryBracket: "0-5", city: "Aarhus"},
  {id: "4", name: "Johannes", age: 99, salaryBracket: "0-5", city: "Aarhus"},
  {id: "5", name: "Mathias", age: 99, salaryBracket: "50-80", city: "Aarhus"},
  {id: "6", name: "Marcus", age: 5, salaryBracket: "0-5", city: "Aarhus"}
]

Return value:

{
  allUsersConnection: {
    aggregate: {
      avg: {
        age: 99
      }
    }
  }
}

Larger example

combining aggregations and data retrieval:

{
  allUsersConnection(where: {city: "Aarhus"}) {
    aggregate {
      avg {
        age
      }
      max {
        age
      }
    }
    edges {
      node { name, age }
    }
  }
}

See example return value

Data:

[
  {id: "1", name: "Søren", age: 23, salaryBracket: "0-5", city: "Aarhus"},
  {id: "2", name: "Tim", age: 43, salaryBracket: "50-80", city: "Aarhus"},
  {id: "3", name: "Nilan", age: 99, salaryBracket: "0-5", city: "Magdeburg"}
]

Return value:

{
  allUsersConnection {
    aggregate: {
      avg: {
        age: 33
      }
      max: {
        age: 43
      }
    }
    edges: [
      { node: { name: "Søren", age: 23 } },
      { node: { name: "Tim", age: 43 } }
    ] 
  }
}

Group

In relational databases, GROUP BY is most often used together with aggregation functions like this SELECT city, AVG(age) FROM User GROUP BY city

Because GraphQL returns tree structured data, it is quite compelling to use groupBy without aggregation functions:

{
  allUsersConnection {
    groupBy {
      city {
        key
        connection {
          edges {
            node { id, name }
          }
        }
      }
    }    
  }
}

See example return value

Data:

[
  {id: "1", name: "Søren", age: 23, salaryBracket: "0-5", city: "Aarhus"},
  {id: "2", name: "Tim", age: 43, salaryBracket: "50-80", city: "Aarhus"},
  {id: "3", name: "Nilan", age: 99, salaryBracket: "0-5", city: "Magdeburg"}
]

Return value:

{
  allUsersConnection: {
    groupBy: {
      city: [
        {
          key: "Aarhus"
          connection: {
            edges: [
              { node: { id: "1", name: "Søren" } },
              { node: { id: "2", name: "Tim" } }
            ]
          }
        },
        {
          key: "Magdeburg"
          connection: {
            edges: [
              { node: { id: "3", name: "Nilan" } }
            ]
          }
        }
      ]
    }    
  }
}

Or even in multiple levels:

{
  allUsersConnection {
    groupBy {
      city {
        key
        connection {
          groupBy {
            salaryBracket {
              key
              connection {
                edges {
                  node { id, name }
                }
              }
            }
          }
        }
      }
    }    
  }
}

See example return value

Data:

[
  {id: "1", name: "Søren", age: 23, salaryBracket: "0-5", city: "Aarhus"},
  {id: "2", name: "Tim", age: 43, salaryBracket: "50-80", city: "Aarhus"},
  {id: "3", name: "Nilan", age: 99, salaryBracket: "0-5", city: "Magdeburg"},
  {id: "4", name: "Dom", age: 99, salaryBracket: "50-80", city: "Aarhus"}
]

Return value:

{
  allUsersConnection: {
    groupBy: {
      city: [
        {
          key: "Aarhus"
          connection: {
            groupBy: {
              salaryBracket: [
                {
                  key: "0-5"
                  connection: {
                    edges: [
                      { node: { id: "1", name: "Søren" } }
                    ]
                  }
                },
                {
                  key: "50-80"
                  connection: {
                    edges: [
                      { node: { id: "2", name: "Tim" } },
                      { node: { id: "4", name: "Dom" } }
                    ]
                  }
                ]
              }
            }
          }
        },
        {
          key: "Magdeburg"
          connection: {
            groupBy: {
              salaryBracket: [
                {
                  key: "0-5"
                  connection: {
                    edges: [
                      { node: { id: "3", name: "Nilan" } }
                    ]
                  }
                }
              ]
            }
          }
        }
      ]
    }
  }
}

Combining groupBy and aggregations

The following query will group by city, return first 5 Users, average age of first 5 users and average age of everyone in city

{
  allUsersConnection {
    groupBy {
      city {
        key
        firstTwo: connection(first: 2, orderBy: AGE_DESC) {
          edges {
            node { name }
          }
          aggregate {
            avg {
              age
            }
          }
        }
        allInCity: connection {
          aggregate {
            avg {
              age
            }
          }
        }
      }
    }    
  }
}

See example return value

Data:

[
  {id: "1", name: "Emanuel", age: 11, salaryBracket: "0-5", city: "Aarhus"},
  {id: "2", name: "Søren", age: 23, salaryBracket: "0-5", city: "Aarhus"},
  {id: "3", name: "Tim", age: 43, salaryBracket: "50-80", city: "Aarhus"},
  {id: "4", name: "Nilan", age: 99, salaryBracket: "0-5", city: "Magdeburg"}
]

Return value:

{
  allUsersConnection: {
    groupBy {
      city: [
        {
          key: "Aarhus"
          firstTwo: {
            edges: [
              { node: { name: "Tim" } },
              { node: { name: "Søren" } }
            ]
            aggregate: {
              avg: {
                age: 33
              }
            }
          }
          allInCity: connection {
            aggregate: {
              avg: {
                age: 25.666
              }
            }
          }
        },
        {
          key: "Magdeburg"
          firstTwo: {
            edges: [
              { node: { name: "Nilan" } },
              { node: { name: "Søren" } }
            ]
            aggregate: {
              avg: {
                age: 99
              }
            }
          }
          allInCity: connection {
            aggregate: {
              avg: {
                age: 99
              }
            }
          }
        }
      ]
    }    
  }
}

Limitations

Both groupBy and aggregations are on single fields only. You can filter the data that goes into the aggregation, but there is no way to use expressions as keys in a group by query.

The text was updated successfully, but these errors were encountered:

ejoebstl · 2017-11-21T17:14:30Z

Hello Soren,
currently contemplating over your proposal. Could you please add the underlying schema as well? It's probably trivial, but I would like to rule out mistakes on my end.

ejoebstl · 2017-11-21T18:22:15Z

For the multiple level group, can you please add example data (ungrouped as well as grouped)? I can't quite grasp the concept of multi-level groups.

sorenbs · 2017-11-21T22:07:03Z

@ejoebstl I have added example responses to all queries. This should make the proposed dynamics very clear :-) Looking forward to your feedback.

The multi level groups are really very simple. By exploiting the fact that we have a wonderful tree structure to place data into. The more interesting question is wether this is useful or not.

ejoebstl · 2017-11-22T18:49:09Z

It's an excellent idea to allow grouping without aggregation by exploiting the three structure. That's a main limitation of SQL.

The feature itself is very useful. Until now, when you wanted to group data, you needed to come up with either a relation or do it in your application. Grouping and aggregation is not only incredibly useful for building powerful frontends (think of a search feature for thousands of nodes, where you can filter by fields), but also decreases overhead in the backend by a lot. Even if I just want to gather some statistics about my data using the playground, this makes everything easier.

Some considerations:

Right now it's not possible to use a combination of multiple fields in a groupBy, correct?
Is it possible to use an aggregation inside a filter? Use case for your example: select all users with more than medium age.
I'd suggest to also add a count_distinct aggregation to count all distinct values of a field.
Will this proposal also work for the Simple API, or is the Simple API a thing of the past anyway?

I'm quite sure the proposal is a good way though. The few points above can most likely be added afterwards without any complication.

sorenbs · 2017-11-22T19:02:06Z

Right now it's not possible to use a combination of multiple fields in a groupBy, correct?

Correct. It's also not possible to use an arbitrary expression. I think this ability might be worth giving up in trade for a simple type-safe API

Is it possible to use an aggregation inside a filter? Use case for your example: select all users with more than medium age.

See proposal #1279

I'd suggest to also add a count_distinct aggregation to count all distinct values of a field.

Great idea!

Will this proposal also work for the Simple API, or is the Simple API a thing of the past anyway?

In the future there will be only a single API flavour as described in #353

nikolasburk · 2017-12-11T17:15:00Z

There is no example for a count aggregation, I'm guessing it looks like this:

{
  postsConnection {
    aggregate {
      count
    }
  }
}

Please confirm or correct!

kieusonlam · 2017-12-16T03:08:12Z

Is it possible to order by aggregated value? I try to do a something like:
Course
-- Episodes
---- Views
Views model

{
  date: DateTime! @unique 
  views: Int!
}

I want to query top Course order by daily / weekly / ... views. It will sum all episiodes views between 2 date and order by that sum.

jvbianchi · 2018-01-26T11:38:20Z

Why was this issue moved to the graphcool-framework repo?

I thought that Group By and Aggregated Values would be implemented in Prisma.

The Prisma documentation links to this issue

kieusonlam · 2018-01-26T12:09:32Z

@jvbianchi

As I know Graphcool Framework is a GraphQL backend solution. Still a lot of people using it like me.

Prisma is not a replacement. It is an open-source GraphQL query engine can connect to a lot of different database not just Graphcool Framework. It's a standalone version of Graphcool 1.0 and they will go a different way from now.

You can read it here: https://www.graph.cool/forum/t/graphcool-framework-and-prisma/2237

I'm still waiting for them to this features, because I think I'll stick with Graphcool Framework. :)

Everyone can correct me if I'm wrong.

jvbianchi · 2018-01-26T12:12:59Z

@kieusonlam Ok, but that doesn't explain why this feature will not be implemented in Prisma as well.

the count aggregate function has already been implemented, why not the others too?

kieusonlam · 2018-01-26T12:17:05Z

@jvbianchi It's already have this feature. You can check the example here: https://github.com/graphcool/graphql-server-example
topHomes query have numRatings which is defined in
https://github.com/graphcool/graphql-server-example/blob/master/src/resolvers/Home.ts

jvbianchi · 2018-01-26T12:23:52Z

@kieusonlam That is what I just said. count has been implemented.

But avg, median, max, min, sum and group by have not.

Do you have a example with any of this other aggregated functions?

kieusonlam · 2018-01-26T12:27:56Z

@jvbianchi Hmm, yup, that's my bad. It's still missing avg, median, max, min, sum. We may wait for graphcool team to have the right answer.

arnabkd · 2018-06-22T12:01:27Z

shameless bump: begging for this feature ;)

oae · 2018-07-24T18:46:34Z

Any update for this feature?

gentle-noah · 2018-07-26T01:46:16Z

Going to bump as well. Not having this feature == lots more work and poor client performance. :)

kirgene · 2018-08-25T23:06:52Z

Will it be possible to use aggregates in filter query?
For example, to get active users by number of commits they made:

query activeUsers {
  users(where: {
      commits: {
        date_gte: "THIS_MONTH_DATE",
        aggregate: {
          count_gte: 5
        }
      }
    }) {
     email
   }
}

FluorescentHallucinogen · 2018-09-01T08:44:07Z

@sorenbs @schickling This feature is planned for Q3 in 2018. Only 1 month till the end of Q3. Any progress? Will aggregate functions be implemented at once or one by one? I really need avg for my project.

FluorescentHallucinogen · 2018-10-16T13:47:05Z

Q3 2018 is over. Any news?

dortamiguel · 2018-10-23T13:47:30Z

I need max, there is any way I can get this functionality?

kevinmarrec · 2018-10-30T14:25:54Z

@sorenbs Any news on this ? Can the Roadmap label be updated if it's planned for later ?

MJones180 · 2018-11-03T22:40:48Z

Q3 has been over for a while and still no response as to the current status of this. An update would be nice 👍

stephen-bunn · 2018-11-06T14:53:36Z

Also looking for an update on the status of this.

sorenbs · 2018-11-14T13:50:58Z

This continues to be an important feature for us. I'll update this issue when we have a concrete timeframe. See also this explanation for why we were unable to ship this feature in Q3 as planned.

@FluorescentHallucinogen - we will likely implement a large chunk of this feature in one go as each individual aggregation is comparatively little work.

joshhopkins · 2019-01-25T11:36:15Z

Any ETA on this? Very much needed 🙏🏼

impowski · 2019-02-05T21:01:10Z

Waiting for this one to drop, there will be a big use in our project in production!

cihadturhan · 2019-03-08T08:04:05Z

At least implement sum :)

terion-name · 2019-03-12T12:49:11Z

any eta?(

par6n · 2019-05-20T08:06:35Z

Bump. :)

EddiG · 2019-07-10T14:46:31Z

@sorenbs The Prisma 2 was released, my congrats 🎉
But what about this issue, do you have any rough estimations?

anton6 · 2019-07-14T11:26:52Z

Will it be possible to use aggregates in filter query?
For example, to get active users by number of commits they made:

query activeUsers {
  users(where: {
      commits: {
        date_gte: "THIS_MONTH_DATE",
        aggregate: {
          count_gte: 5
        }
      }
    }) {
     email
   }
}

I want to echo @kirgene question. I want to be able to do something similar but looks like there is no way to do this.

pbassut · 2021-05-07T21:35:33Z

Are we forgotten?
It's sad to be left with a library with some obvious limitations. I would migrate over to prisma 2 but the process is not so easy.

sorenbs added the rfc/1-draft label Nov 21, 2017

sorenbs self-assigned this Nov 21, 2017

sorenbs mentioned this issue Nov 21, 2017

Grouping and Aggregation #70

Closed

nikolasburk mentioned this issue Nov 21, 2017

[Prisma 1.0] Specifications #353

Closed

32 tasks

sorenbs mentioned this issue Nov 21, 2017

Support filtering on aggregated values #1279

Closed

1 task

schickling mentioned this issue Nov 30, 2017

Prisma: Generated type names #1341

Closed

3 tasks

sorenbs added rfc/2-accepted and removed rfc/1-draft labels Dec 1, 2017

sorenbs added rfc/1-draft and removed rfc/2-accepted labels Dec 9, 2017

mavilein mentioned this issue Dec 14, 2017

[WIP] Prisma #1318

Merged

42 tasks

marktani added this to the 1.0-beta3 milestone Dec 15, 2017

sorenbs modified the milestones: 1.0-beta3, 1.1 Jan 1, 2018

marktani removed this from the 1.1 milestone Jan 22, 2018

marktani mentioned this issue Jan 23, 2018

Group By and Aggregated Values Graphcool/graphcool-framework#416

Closed

marktani closed this as completed Jan 23, 2018

schickling added the roadmap/2018-Q3 label Jul 26, 2018

jesstelford mentioned this issue Aug 3, 2018

Include meta fields along side relationships with a many modifier keystonejs/keystone#181

Closed

Jannis mentioned this issue Aug 15, 2018

Design API for data aggregations graphprotocol/research#65

Open

ejoebstl mentioned this issue Oct 4, 2018

WIP - Update Spec opencrud/opencrud#24

Open

sorenbs removed the roadmap/2018-Q3 label Nov 14, 2018

pantharshit00 added the rfc/1-draft label Jan 10, 2019

techniq mentioned this issue Jan 23, 2019

Support Group By / Aggregated values SimonCropp/GraphQL.EntityFramework#61

Closed

jesstelford mentioned this issue Apr 9, 2019

Add groupBy & aggregate to queries keystonejs/keystone#1016

Closed

Vadorequest mentioned this issue May 28, 2019

Driver: GraphQL metabase/metabase#6297

Open

PascalSenn mentioned this issue Jul 23, 2019

Filters: Aggregations ChilliCream/graphql-platform#924

Closed

janpio closed this as completed Sep 1, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Group By and Aggregated Values #1312

Group By and Aggregated Values #1312

sorenbs commented Nov 21, 2017 •

edited by schickling

ejoebstl commented Nov 21, 2017 •

edited

ejoebstl commented Nov 21, 2017

sorenbs commented Nov 21, 2017

ejoebstl commented Nov 22, 2017 •

edited

sorenbs commented Nov 22, 2017 •

edited

nikolasburk commented Dec 11, 2017

kieusonlam commented Dec 16, 2017 •

edited

jvbianchi commented Jan 26, 2018 •

edited

kieusonlam commented Jan 26, 2018 •

edited

jvbianchi commented Jan 26, 2018 •

edited

kieusonlam commented Jan 26, 2018 •

edited

jvbianchi commented Jan 26, 2018

kieusonlam commented Jan 26, 2018

arnabkd commented Jun 22, 2018

oae commented Jul 24, 2018

gentle-noah commented Jul 26, 2018

kirgene commented Aug 25, 2018

FluorescentHallucinogen commented Sep 1, 2018

FluorescentHallucinogen commented Oct 16, 2018

dortamiguel commented Oct 23, 2018

kevinmarrec commented Oct 30, 2018

MJones180 commented Nov 3, 2018

stephen-bunn commented Nov 6, 2018

sorenbs commented Nov 14, 2018

joshhopkins commented Jan 25, 2019

impowski commented Feb 5, 2019

cihadturhan commented Mar 8, 2019

terion-name commented Mar 12, 2019

par6n commented May 20, 2019

EddiG commented Jul 10, 2019

anton6 commented Jul 14, 2019

pbassut commented May 7, 2021

Group By and Aggregated Values #1312

Group By and Aggregated Values #1312

Comments

sorenbs commented Nov 21, 2017 • edited by schickling

Aggregations

Aggregate functions

API

Limiting the scope of aggregations

Larger example

Group

Combining groupBy and aggregations

Limitations

ejoebstl commented Nov 21, 2017 • edited

ejoebstl commented Nov 21, 2017

sorenbs commented Nov 21, 2017

ejoebstl commented Nov 22, 2017 • edited

sorenbs commented Nov 22, 2017 • edited

nikolasburk commented Dec 11, 2017

kieusonlam commented Dec 16, 2017 • edited

jvbianchi commented Jan 26, 2018 • edited

kieusonlam commented Jan 26, 2018 • edited

jvbianchi commented Jan 26, 2018 • edited

kieusonlam commented Jan 26, 2018 • edited

jvbianchi commented Jan 26, 2018

kieusonlam commented Jan 26, 2018

arnabkd commented Jun 22, 2018

oae commented Jul 24, 2018

gentle-noah commented Jul 26, 2018

kirgene commented Aug 25, 2018

FluorescentHallucinogen commented Sep 1, 2018

FluorescentHallucinogen commented Oct 16, 2018

dortamiguel commented Oct 23, 2018

kevinmarrec commented Oct 30, 2018

MJones180 commented Nov 3, 2018

stephen-bunn commented Nov 6, 2018

sorenbs commented Nov 14, 2018

joshhopkins commented Jan 25, 2019

impowski commented Feb 5, 2019

cihadturhan commented Mar 8, 2019

terion-name commented Mar 12, 2019

par6n commented May 20, 2019

EddiG commented Jul 10, 2019

anton6 commented Jul 14, 2019

pbassut commented May 7, 2021

sorenbs commented Nov 21, 2017 •

edited by schickling

ejoebstl commented Nov 21, 2017 •

edited

ejoebstl commented Nov 22, 2017 •

edited

sorenbs commented Nov 22, 2017 •

edited

kieusonlam commented Dec 16, 2017 •

edited

jvbianchi commented Jan 26, 2018 •

edited

kieusonlam commented Jan 26, 2018 •

edited

jvbianchi commented Jan 26, 2018 •

edited

kieusonlam commented Jan 26, 2018 •

edited