Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How does pagination work with dataloader? #71

Closed
tonyghita opened this Issue Jan 31, 2017 · 7 comments

Comments

Projects
None yet
4 participants
@tonyghita
Copy link

tonyghita commented Jan 31, 2017

I've been having trouble wrapping my head around how pagination works with a batched data loader.

scalar Cursor # an opaque string which identifies a resource

query GetUsers($first: Int, $after: Cursor) {
  users(first: $first, after: $after) {
   name
  }
}

Do the same pagination parameters $first and $after get passed into every possible batch? Is there some page accounting that has to happen a layer above the dataloader layer? Or is dataloader batching incompatible with pagination?

Would love to get your thoughts @leebyron (cc @nicksrandall)

@leebyron

This comment has been minimized.

Copy link
Contributor

leebyron commented Jan 31, 2017

DataLoader is typically not responsible for pagination. The batching behavior it provides applies to loading many keys in one dispatch to a key-value store. It shouldn't matter if those keys are related as part of a "page" or not. Since fetching ranges of lists doesn't really fit the "load by key" model, it typically happens orthogonally to DataLoader.

I've seen pagination implemented in lots of different ways, each with different tradeoffs which may be a better fit in different scenarios (normalized vs kv store, 10 servers vs 100,000 servers, on-box backend vs distributed backend, etc)

Here is one way that is simple to implement: a two-phase loading of paged information.

You'll need to provide two data APIs (let's use SQL as an example)

  1. Given some paging criteria produces a list of keys. (learn more)
    SELECT id2 FROM friends WHERE id1 = :from AND id2 > :after ORDER BY id2 LIMIT :first

  2. Given keys, provide values.
    SELECT * FROM users WHERE id IN (:ids)

When paginating, you'll likely run something like this 1st query without batching or caching help from DataLoader. The result will be a list of ids to load data for. Then, you can provide those IDs to the 2nd kind of query - which is exactly the kind of query DataLoader is good at providing batching and caching behavior for. DataLoader's .loadMany() is intended for this purpose.

Advanced techniques

Of course with SQL you could write this as a join query, though it could result in over-fetching should you have many queries in a single request which have a high probability of overlapping (that
is, high odds of a DataLoader cache-hit). Also if you want to populate the DataLoader cache after a join query, you'll need to use .prime().

Other sorts of data storage backends have different best practices for pagination which I recommend you investigate before thinking about how you would use DataLoader alongside it.

@tonyghita

This comment has been minimized.

Copy link
Author

tonyghita commented Jan 31, 2017

Great explanation, that helps a ton. Thanks @leebyron!

@tvvignesh

This comment has been minimized.

Copy link

tvvignesh commented Mar 27, 2018

@leebyron Thanks for the great explanation. One doubt though. Since dataloader is most typically used to cache on a per request basis, is this needed? While moving to the next page, it is a brand new request, so why should we even bother doing that. What if we just store the keys in the dataloader as per the pagination results and the next time the user changes the page, all existing keys in the dataloader are anyway erased and we store the next set of keys in the dataloader.

Am i right in my assumption?

@leebyron

This comment has been minimized.

Copy link
Contributor

leebyron commented Mar 27, 2018

You're correct that a DataLoader is best used to apply only for a single request.

In my previous comment I explained how DataLoader can help as part of pagination in modeling a "join query" to request a set of elements as part of a single request. Both loading the edges in a page and loading the data at the end of each edge occur within a single request.

@jychen7

This comment has been minimized.

Copy link

jychen7 commented Jun 13, 2018

@leebyron sorry, I am still have question about how data loader work with pagination.

for example,

{
  me {
    name
    followers(first: 3) {
      name
      followers(first: 2) {
        name
      }
    }
  }
}

suppose it is MySQL

followers

id user_id follower_id
1 m a
2 m b
3 m c
4 m d
5 a e
6 a f
7 a g
8 b h
9 b i
10 b j
11 c k
12 c l
13 c m

without pagination, the batch resolver can be

select follower_id from followers where user_id = m; # result is [a, b, c, d]
select user_id, follower_id from followers where user_id IN (a,b,c,d);

but if first query is

select follower_id from followers where user_id = m limit 3; # result is [a, b, c]

how can we batch query the 2 friends for each user in [a, b, c] ?

Thanks.

@leebyron

This comment has been minimized.

Copy link
Contributor

leebyron commented Jun 13, 2018

I’m definitely not a SQL expect and don’t have an answer to your question. There may be a way to model what you’re trying to do in a single query but I’m not aware of it. Perhaps https://www.postgresql.org/docs/9.1/static/queries-with.html could be helpful?

@jychen7

This comment has been minimized.

Copy link

jychen7 commented Jun 14, 2018

@leebyron we are using MySQL here, but never mind, thanks for your help as well

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.