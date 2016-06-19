Join GitHub today
GitHub is home to over 50 million developers working together to host and review code, manage projects, and build software together.Sign up
Pagination implementation for real-life database #94
Comments
OlegIlyenko
commented
Jun 19, 2016
|
Recently @chandu0101 created a nice and relatively big example of a relay application which uses MongoDB and sangria (scala GraphQL implementation).
If you don't mind looking at some scala code, then I would suggest to check out this function:
https://github.com/chandu0101/sri-sangria-example/blob/master/server/src/main/scala/sri/sangria/mongoserver/services/BaseService.scala#L52-L99
It creates a relay
Even though this example uses sangria-relay, conceptually it is very similar to reference implementation.
GuillaumeLeclerc
commented
Jul 7, 2016
|
From all the solutions I found on the internet (including this one). It seems it does not work if data is inserted in the database during the lifetime of a cursor.
Am I wrong ?
Is there a generic solution ?
GuillaumeLeclerc
commented
Jul 8, 2016
|
I thought about a solution to solve the data insertion:
I have a compliant implementation for Mongoose database I can share if anyone interested.
My only concern Is how long should I keep the queries in my redis server.
sibelius
commented
Jul 12, 2016
|
@mattecapu this package relay-mongodb-connection creates relay connection from mongodb cursors, it also uses
mattecapu
commented
Jul 12, 2016
|
Thank you @sibelius and @OlegIlyenko, your links were a good starting place for understanding what was going on in the creation of a ConnectionType response.
My solution was to not use
Essentially I encode information about the ordering field (i.e. the ID) in the cursor and then use that to query results with ID >= or <= than cursor. The
My actual use case was a little more complicated because I'm fetching from multiple tables. Thus after retrieving the minimum required records I'm doing an additional round of sorting and cutting.
I'd be happy to provide some code examples if someone needs them.
sibelius
commented
Jul 12, 2016
|
@mattecapu I think u should provide some code examples to help people understand better how to implement a cursor
mattecapu
commented
Jul 13, 2016
|
Well the point is, if you implement a ConnectionType for an endpoint you just need to
Cursors can be any string you want. Normally Relay defaults to
I used
Then a request to my endpoint look like this
endpoint(after: "sdflkjsdlfkjslkdf", first: 10) {
# stuff
}
Basically, any request described by the specification is supported.
Thus when I get the request I process it in the following way:
Using this "algorithm", only the needed data is fetched. While
Notably, I return an object with the shape described above (and in the linked spec) and I don't use the
wincent
commented
Aug 30, 2016
|
That's a really great write-up @mattecapu. I'm going to close this out now but that is exactly the kind of thing that would work well in the documentation (could be something in a code comment, in the
GuillaumeLeclerc
commented
Aug 30, 2016
|
@mattecapu It seems the algorithm does not work if there was an deletion in the database between two paginated queries and/or if we want to sort by anything else than Id's right ?
oexza
commented
Sep 23, 2016
|
@mattecapu what happens when the ids are not ordered? i.e they are UUIDs,
mattecapu
commented
Sep 24, 2016
•
|
Sorry @GuillaumeLeclerc I lost your question in a notification misunderstanding with GitHub.
As laid out in the comment above, my algorithm as both the limitations you guys noticed, but we can easily generalize it to overcome them.
To support dynamic data getting deleted or added inbetween queries, paginating with
Eventually, we can provide a further generalization by not fixing an order-provider field but let it be dynamic, effectively allowing a lot of different orders on the data, which can come in handy. This is pretty simple to implement too, but gets complicated once you have to support dynamic data.
@wincent I'll see what I can do! Thanks for the appreciation.
joonhocho
commented
Sep 30, 2016
|
I came across this issue and I wanted to share a Node.js library that I created a while ago.
Naoto-Ida
commented
Nov 2, 2016
|
@mattecapu Great post! Just wanted to say it helped out a lot. We had some tweaking to do since we accept arguments other than the Relay args spec.
Naoto-Ida
commented
Nov 8, 2016
•
|
Everything was working great with @mattecapu's solution,
We have one query for events that returns data in a specific order.
We have an events query like this:
{
viewer {
events(first: 1, inCountry: Japan) {
edges {
node {
id
name
}
}
}
}
}
But when you introduce
mattecapu
commented
Nov 8, 2016
|
@Naoto-Ida if I understood correctly and your order is deterministic,
Naoto-Ida
commented
Nov 9, 2016
•
|
Our ORDER statement in our SQL would include something like:
We compute it by base64decoding it, and splitting it into
So in the end, due to there not being that many records and time constraints, we redis cached the all the event record. We fetch it when a query with the same arguments come in, then based off of the supplied cursor, would slice the total records and serve the ones before/after it.
mattecapu
commented
Nov 9, 2016
|
Now I see the problem.
Your data will be shattered into several of this atomic operations, which aggregated togheter will give you the most recent version of it. So for example if I want to retrieve attribute
SELECT new_value FROM mytable WHERE object_id='34' ORDER BY timestamp DESC LIMIT 1
But now If I want to know which state my db was as a given time, I'll simply exclude all updates done after a specific timestamp_
Voilà, I can now run queries against any version of my DB.
Speaking for your specific case, @Naoto-Ida, I think you could get away with a far less disruptive change: create a table
luckydrq
commented
Oct 31, 2017
|
@mattecapu why
mattecapu
commented
Nov 5, 2017
|
@luckydrq yeah it basically peeks at the next page to see if there is one.
Re-reading my last post, it comes to me there's a less invasive way to support updates, just use an
pcattori
commented
Jan 3, 2018
•
|
I implemented some helper functions (namely paginate) so that SQL support would be easy (ordering, filtering supported). I followed @mattecapu 's suggested approach.
usage:
import {
GraphQLObjectType,
} from 'graphql'
import {
connectionArgs,
} from 'graphql-relay'
import * as helpers from 'the-gist-linked-above' // not an actual npm lib yet :P
// helper
const connectionType = nodeType => connectionDefinitions({nodeType}).connectionType
const Query = new GraphQLObjectType({
name: 'Query',
fields: () => ({
// ... other queries here
things: {
type: connectionType(Thing),
args: connectionArgs,
resolve: (_, paginationArgs) => {
// you could get `orderBy` from args, but just hard-coded here for simplicity
return helpers.paginate(models.Thing, paginationArgs, {orderBy: [['name', 'ASC']]})
}
},
})
})
It's currently coupled with
https://gist.github.com/pcattori/2bb645d587e45c9fdbcabf5cef7a7106
sibelius
commented
Jan 3, 2018
|
We use this in production
https://github.com/entria/graphql-mongoose-loader
it solves pagination and dataloader for mongo, using mongoose.
we have the same concept for other datasources, as REST api, SQL (oracle and postgres), very easy to extend to any datasource.
crisu83
commented
Apr 26, 2018
•
|
@mattecapu Thanks for posting your solution here it was very helpful.
I implemented the same logic in our project on top of our GraphQL and Relay implementations in PHP.
enumag
commented
Oct 2, 2018
•
|
@mattecapu I read your guide how to handle GraphQL connections with SQL. It's pretty much what I came up with when I was analyzing it myself (+ some details like how to handle hasNextPage).
The problem is that the SQL query gets really complicated really fast when I add an optional sorting argument to the GraphQL connection - especially if the sorting can be a combination of fields.
Do you have any tips how to handle that and how to do it efficiently?
mattecapu
commented
Oct 2, 2018
|
Hi @enumag, what do you exactly mean by 'SQl complexity'? The length in chars? Execution time? Other metrics?
enumag
commented
Oct 3, 2018
|
Primarily execution time and efficient usage of indexes on that table.
Secondary is the SQL query length and number of conditions in WHERE clause but I'm already using an SQL builder so I can deal with that myself.
binajmen
commented
Apr 13, 2020
|
Thank you @mattecapu. I had the same "algorithm" in my head, but I forgot to reverse the result when using last / before.. which makes senses from the UI point of view.
If someone is interested, here is my current implementation with Node, Apollo, and Knex on top of MySQL. Constructive feedbacks are more than welcome!
I took some shortcuts that I could improve in the future:
const userSchema = gql`
type User implements Node {
id: ID!
email: String!
...
}
extend type Query {
...
usersPaginated(input: UserPaginatedInput!): UserPaginatedConnection
}
input UserPaginatedInput {
first: Int
after: ID
last: Int
before: ID
}
type UserPaginatedConnection {
pageInfo: PageInfo
edges: [UserPaginatedEdge]
}
type PageInfo {
hasNextPage: Boolean
endCursor: ID
hasPreviousPage: Boolean
startCursor: ID
}
type UserPaginatedEdge {
cursor: ID
node: User
}
...
`
const userResolvers = {
...
usersPaginated: async (parent, args, { db }, info) => {
try {
const { first, after, last, before } = args.input
let [hasNextPage, endCursor, hasPreviousPage, startCursor] = [false, null, false, null]
let res = []
if (!!first && !!last)
throw new ApolloError('Ambiguous query: first and last should not be used together')
if (!first && !last)
throw new ApolloError('Missing first or last argument')
const query = db.from('gql_user')
if (!!first) {
if (!!after)
query.where('id', '>', after)
query.limit(first + 1)
.orderBy('id', 'asc')
res = await query
if (res.length > first) {
hasNextPage = true
res = res.slice(0, first)
}
}
else if (!!last) {
if (!!before)
query.where('id', '<', before)
query.limit(last + 1)
.orderBy('id', 'desc')
res = await query
if (res.length > last) {
hasPreviousPage = true
res = res.slice(0, last)
}
res.reverse()
}
startCursor = res[0].id
endCursor = res[res.length - 1].id
const pageInfo = {
hasNextPage,
endCursor,
hasPreviousPage,
startCursor
}
let edges = []
res.map(row => {
edges.push({
cursor: row.id,
node: row
})
})
return {
pageInfo,
edges
}
} catch (err) {
throw new ApolloError(err.sqlMessage, err.code, err)
}
}
...
}
ChristianIvicevic
commented
May 14, 2020
•
|
Sorry for spamming, but I'd like to thank @mattecapu for his explanation of the algorithm in one of his prior posts. I was mislead by these cursors based on static arrays and noticed that there must be a different way for pagination of dynamic data such as data from a database where data gets deleted in between pages / accessing different pages during pagination causing anomalies. The detailed description really helped to come up with a correct solution in my project.
mattecapu commented
Jun 19, 2016
•
edited
In all the examples I can find, paginated queries are made against a mockup database which is just a JS array, and thus it is simply passed through
connectionFromArrayto return the correct paginated result (like the Star Wars example mentioned in the README).
For a real-life database, query all records and then pass them to
connectionFromPromisedArraydoesn't seem to be a good solution, because it will easily brek your perfomance/crash your server as soon as you're doing anything at (even modest) scale
So what solutions should you use to avoid insane database fetching?
(I'm using a SQL database but I think a good solution to this problem applies to pretty much every not-a-js-array dbms)