Group By and Aggregated Values #416

marktani · 2018-01-23T14:24:54Z

Issue by sorenbs
Tuesday Nov 21, 2017 at 11:26 GMT
Originally opened as https://github.com/graphcool/prisma/issues/1312

#70 Was a wide ranging discussion of how to support GroupBy and Aggregations in a type safe GraphQL API. This issue takes the learnings from previous discussions and provides a final API Proposal.

Throughout this proposal the examples will be based on this data schema:

type User {
  id: ID! @unique
  name: String!
  age: Int!
  salaryBracket: String!
  city: String!
}

Note: According to #353 we will introduce a new API version that combines the capabilities of the Simple and Relay API. The API is not final yet, but there will be a relay-style connection field for all relations, providing us a convenient place to introduce aggregation fields.

Retrieving all users who live in Aarhus:

{
  allUsersConnection(filter: {city: "Aarhus"}) {
    edges {
      node { id, name }
    }
  }
}

See example return value

Data:

[
  {id: "1", name: "Søren", age: 23, salaryBracket: "0-5", city: "Aarhus"},
  {id: "2", name: "Tim", age: 43, salaryBracket: "50-80", city: "Aarhus"},
  {id: "3", name: "Nilan", age: 99, salaryBracket: "0-5", city: "Magdeburg"}
]

Return value:

{
  allUsersConnection {
    edges: [
      { node: { id: "1", name: "Søren" } },
      { node: { id: "2", name: "Karl" } }
    ]
  }
}

Aggregations

Aggregate functions

avg
median
max
min
count
sum

API

Getting the average age of people living in Aarhus is accomplished like this in SQL:

SELECT AVG(age) FROM User WHERE city = 'Aarhus'

With Graphcool it would look like this:

{
  allUsersConnection(filter: {city: "Aarhus"}) {
    aggregate {
      avg {
        age
      }
    }
  }
}

See example return value

Data:

[
  {id: "1", name: "Søren", age: 23, salaryBracket: "0-5", city: "Aarhus"},
  {id: "2", name: "Tim", age: 43, salaryBracket: "50-80", city: "Aarhus"},
  {id: "3", name: "Nilan", age: 99, salaryBracket: "0-5", city: "Magdeburg"}
]

Return value:

{
  allUsersConnection: {
    aggregate: {
      avg: {
        age: 33
      }
    }
  }
}

Limiting the scope of aggregations

The normal filter, skip, first and orderBy arguments can be used to limit the scope of data included in the aggregations:

{
  allUsersConnection(filter: {city: "Aarhus"}, first: 5, orderBy AGE_DESC) {
    aggregate {
      avg {
        age
      }
    }
  }
}

This will return the average age of the 5 oldest people in Aarhus

See example return value

Data:

[
  {id: "1", name: "Søren", age: 99, salaryBracket: "0-5", city: "Aarhus"},
  {id: "2", name: "Tim", age: 99, salaryBracket: "50-80", city: "Aarhus"},
  {id: "3", name: "Nilan", age: 99, salaryBracket: "0-5", city: "Aarhus"},
  {id: "4", name: "Johannes", age: 99, salaryBracket: "0-5", city: "Aarhus"},
  {id: "5", name: "Mathias", age: 99, salaryBracket: "50-80", city: "Aarhus"},
  {id: "6", name: "Marcus", age: 5, salaryBracket: "0-5", city: "Aarhus"}
]

Return value:

{
  allUsersConnection: {
    aggregate: {
      avg: {
        age: 99
      }
    }
  }
}

Larger example

combining aggregations and data retrieval:

{
  allUsersConnection(filter: {city: "Aarhus"}) {
    aggregate {
      avg {
        age
      }
      max {
        age
      }
    }
    edges {
      node { name, age }
    }
  }
}

See example return value

Data:

[
  {id: "1", name: "Søren", age: 23, salaryBracket: "0-5", city: "Aarhus"},
  {id: "2", name: "Tim", age: 43, salaryBracket: "50-80", city: "Aarhus"},
  {id: "3", name: "Nilan", age: 99, salaryBracket: "0-5", city: "Magdeburg"}
]

Return value:

{
  allUsersConnection {
    aggregate: {
      avg: {
        age: 33
      }
      max: {
        age: 43
      }
    }
    edges: [
      { node: { name: "Søren", age: 23 } },
      { node: { name: "Tim", age: 43 } }
    ] 
  }
}

Group

In relational databases, GROUP BY is most often used together with aggregation functions like this SELECT city, AVG(age) FROM User GROUP BY city

Because GraphQL returns tree structured data, it is quite compelling to use groupBy without aggregation functions:

{
  allUsersConnection {
    groupBy {
      city {
        key
        connection {
          edges {
            node { id, name }
          }
        }
      }
    }    
  }
}

See example return value

Data:

[
  {id: "1", name: "Søren", age: 23, salaryBracket: "0-5", city: "Aarhus"},
  {id: "2", name: "Tim", age: 43, salaryBracket: "50-80", city: "Aarhus"},
  {id: "3", name: "Nilan", age: 99, salaryBracket: "0-5", city: "Magdeburg"}
]

Return value:

{
  allUsersConnection: {
    groupBy: {
      city: [
        {
          key: "Aarhus"
          connection: {
            edges: [
              { node: { id: "1", name: "Søren" } },
              { node: { id: "2", name: "Tim" } }
            ]
          }
        },
        {
          key: "Magdeburg"
          connection: {
            edges: [
              { node: { id: "3", name: "Nilan" } }
            ]
          }
        }
      ]
    }    
  }
}

Or even in multiple levels:

{
  allUsersConnection {
    groupBy {
      city {
        key
        connection {
          groupBy {
            salaryBracket {
              key
              connection {
                edges {
                  node { id, name }
                }
              }
            }
          }
        }
      }
    }    
  }
}

See example return value

Data:

[
  {id: "1", name: "Søren", age: 23, salaryBracket: "0-5", city: "Aarhus"},
  {id: "2", name: "Tim", age: 43, salaryBracket: "50-80", city: "Aarhus"},
  {id: "3", name: "Nilan", age: 99, salaryBracket: "0-5", city: "Magdeburg"},
  {id: "4", name: "Dom", age: 99, salaryBracket: "50-80", city: "Aarhus"}
]

Return value:

{
  allUsersConnection: {
    groupBy: {
      city: [
        {
          key: "Aarhus"
          connection: {
            groupBy: {
              salaryBracket: [
                {
                  key: "0-5"
                  connection: {
                    edges: [
                      { node: { id: "1", name: "Søren" } }
                    ]
                  }
                },
                {
                  key: "50-80"
                  connection: {
                    edges: [
                      { node: { id: "2", name: "Tim" } },
                      { node: { id: "4", name: "Dom" } }
                    ]
                  }
                ]
              }
            }
          }
        },
        {
          key: "Magdeburg"
          connection: {
            groupBy: {
              salaryBracket: [
                {
                  key: "0-5"
                  connection: {
                    edges: [
                      { node: { id: "3", name: "Nilan" } }
                    ]
                  }
                }
              ]
            }
          }
        }
      ]
    }
  }
}

Combining groupBy and aggregations

The following query will group by city, return first 5 Users, average age of first 5 users and average age of everyone in city

{
  allUsersConnection {
    groupBy {
      city {
        key
        firstTwo: connection(first: 2, orderBy: AGE_DESC) {
          edges {
            node { name }
          }
          aggregate {
            avg {
              age
            }
          }
        }
        allInCity: connection {
          aggregate {
            avg {
              age
            }
          }
        }
      }
    }    
  }
}

See example return value

Data:

[
  {id: "1", name: "Emanuel", age: 11, salaryBracket: "0-5", city: "Aarhus"},
  {id: "2", name: "Søren", age: 23, salaryBracket: "0-5", city: "Aarhus"},
  {id: "3", name: "Tim", age: 43, salaryBracket: "50-80", city: "Aarhus"},
  {id: "4", name: "Nilan", age: 99, salaryBracket: "0-5", city: "Magdeburg"}
]

Return value:

{
  allUsersConnection: {
    groupBy {
      city: [
        {
          key: "Aarhus"
          firstTwo: {
            edges: [
              { node: { name: "Tim" } },
              { node: { name: "Søren" } }
            ]
            aggregate: {
              avg: {
                age: 33
              }
            }
          }
          allInCity: connection {
            aggregate: {
              avg: {
                age: 25.666
              }
            }
          }
        },
        {
          key: "Magdeburg"
          firstTwo: {
            edges: [
              { node: { name: "Nilan" } },
              { node: { name: "Søren" } }
            ]
            aggregate: {
              avg: {
                age: 99
              }
            }
          }
          allInCity: connection {
            aggregate: {
              avg: {
                age: 99
              }
            }
          }
        }
      ]
    }    
  }
}

Limitations

Both groupBy and aggregations are on single fields only. You can filter the data that goes into the aggregation, but there is no way to use expressions as keys in a group by query.

The text was updated successfully, but these errors were encountered:

marktani · 2018-01-23T14:24:57Z

Comment by ejoebstl
Tuesday Nov 21, 2017 at 17:14 GMT

Hello Soren,
currently contemplating over your proposal. Could you please add the underlying schema as well? It's probably trivial, but I would like to rule out mistakes on my end.

marktani · 2018-01-23T14:24:58Z

Comment by ejoebstl
Tuesday Nov 21, 2017 at 18:22 GMT

For the multiple level group, can you please add example data (ungrouped as well as grouped)? I can't quite grasp the concept of multi-level groups.

marktani · 2018-01-23T14:25:00Z

Comment by sorenbs
Tuesday Nov 21, 2017 at 22:07 GMT

@ejoebstl I have added example responses to all queries. This should make the proposed dynamics very clear :-) Looking forward to your feedback.

The multi level groups are really very simple. By exploiting the fact that we have a wonderful tree structure to place data into. The more interesting question is wether this is useful or not.

marktani · 2018-01-23T14:25:02Z

Comment by ejoebstl
Wednesday Nov 22, 2017 at 18:49 GMT

It's an excellent idea to allow grouping without aggregation by exploiting the three structure. That's a main limitation of SQL.

The feature itself is very useful. Until now, when you wanted to group data, you needed to come up with either a relation or do it in your application. Grouping and aggregation is not only incredibly useful for building powerful frontends (think of a search feature for thousands of nodes, where you can filter by fields), but also decreases overhead in the backend by a lot. Even if I just want to gather some statistics about my data using the playground, this makes everything easier.

Some considerations:

Right now it's not possible to use a combination of multiple fields in a groupBy, correct?
Is it possible to use an aggregation inside a filter? Use case for your example: select all users with more than medium age.
I'd suggest to also add a count_distinct aggregation to count all distinct values of a field.
Will this proposal also work for the Simple API, or is the Simple API a thing of the past anyway?

I'm quite sure the proposal is a good way though. The few points above can most likely be added afterwards without any complication.

marktani · 2018-01-23T14:25:04Z

Comment by sorenbs
Wednesday Nov 22, 2017 at 19:02 GMT

Right now it's not possible to use a combination of multiple fields in a groupBy, correct?

Correct. It's also not possible to use an arbitrary expression. I think this ability might be worth giving up in trade for a simple type-safe API

Is it possible to use an aggregation inside a filter? Use case for your example: select all users with more than medium age.

See proposal #1279

I'd suggest to also add a count_distinct aggregation to count all distinct values of a field.

Great idea!

Will this proposal also work for the Simple API, or is the Simple API a thing of the past anyway?

In the future there will be only a single API flavour as described in #353

marktani · 2018-01-23T14:25:06Z

Comment by nikolasburk
Monday Dec 11, 2017 at 17:15 GMT

There is no example for a count aggregation, I'm guessing it looks like this:

{
  postsConnection {
    aggregate {
      count
    }
  }
}

Please confirm or correct!

marktani · 2018-01-23T14:25:08Z

Comment by kieusonlam
Saturday Dec 16, 2017 at 03:08 GMT

Is it possible to order by aggregated value? I try to do a something like:
Course
-- Episodes
---- Views
Views model

{
  date: DateTime! @unique 
  views: Int!
}

I want to query top Course order by daily / weekly / ... views. It will sum all episiodes views between 2 date and order by that sum.

marktani · 2018-01-26T12:29:52Z

Sorry, this issue shouldn't have been created :) Please see the discussion in https://github.com/graphcool/prisma/issues/1312 as well.

alan345 · 2019-01-22T23:56:50Z

link is not working anymore.. :/

ejoebstl · 2019-01-23T15:10:44Z

link is not working anymore.. :/

prisma/prisma#1312

venikx · 2020-04-29T09:55:04Z

@marktani I'm wondering how'd deal with a large number of groups, so large that you kinda want to implement some sort of pagination in the groupBy itself.

Let's say you goupBy date, in total you have 10 000 entitities and each data contians 2 entities. So if you groupBy date, you'd get a list of 5000 keys. How would the query look like (notice the edges under city instead of being able to get the key directly)?
Or would you include some pagination without using connection/edge fields? How'd would that look?

{
  allUsersConnection {
    groupBy {
      city {
        edges {
          node {
            key
            connection {
              edges {
                node {
                  id
                  name
                }
              }
            }
          }
        }
      }
    }
  }
}

marktani added the proposal/needs-review label Jan 23, 2018

marktani closed this as completed Jan 26, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Group By and Aggregated Values #416

Group By and Aggregated Values #416

marktani commented Jan 23, 2018

marktani commented Jan 23, 2018

marktani commented Jan 23, 2018

marktani commented Jan 23, 2018

marktani commented Jan 23, 2018

marktani commented Jan 23, 2018

marktani commented Jan 23, 2018

marktani commented Jan 23, 2018

marktani commented Jan 26, 2018

alan345 commented Jan 22, 2019

ejoebstl commented Jan 23, 2019

venikx commented Apr 29, 2020 •

edited

Group By and Aggregated Values #416

Group By and Aggregated Values #416

Comments

marktani commented Jan 23, 2018

Aggregations

Aggregate functions

API

Limiting the scope of aggregations

Larger example

Group

Combining groupBy and aggregations

Limitations

marktani commented Jan 23, 2018

marktani commented Jan 23, 2018

marktani commented Jan 23, 2018

marktani commented Jan 23, 2018

marktani commented Jan 23, 2018

marktani commented Jan 23, 2018

marktani commented Jan 23, 2018

marktani commented Jan 26, 2018

alan345 commented Jan 22, 2019

ejoebstl commented Jan 23, 2019

venikx commented Apr 29, 2020 • edited

venikx commented Apr 29, 2020 •

edited