Skip to content

Pagination with offset should scale like after #5807

@EnricoMi

Description

@EnricoMi

Experience Report

What you wanted to do

I want to retrieve my result set via pagination as 1) I expect large results and 2) want to have fixed memory requirements while processing the stream of results.

What you actually did

I have a predicate with 40m triples. I want to retrieve all uids having this predicate, and its value:

{
  result (func: has(<http://www.w3.org/2000/01/rdf-schema#label>), first: 1000, offset: 0) {
    uid
    <http://www.w3.org/2000/01/rdf-schema#label>
  }
}

Why that wasn't great, with examples

The time each query takes scales with the offset:

offset total_ns
1,000 0.001s
10,000 0.028s
100,000 0.313s
1,000,000 3.057s
10,000,000 35.067s

So pages at the end of the result set take much longer than at the beginning of it.

Looking at how after scales, we see constant query time:

offset of uid in after total_ns
1,000 0.016s
10,000 0.019s
100,000 0.008s
1,000,000 0.007s
10,000,000 0.019s

The offset pagination should be as scalable as after. If this is not possible, the two different classes of scalability should be clearly documented at https://dgraph.io/docs/query-language/#pagination.

Metadata

Metadata

Assignees

No one assigned

    Labels

    area/performancePerformance related issues.area/querylang/paginationRelated to pagination: first, offset, etcexp/intermediateFixing this requires some experience with the project.kind/enhancementSomething could be better.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions