Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[8.x] Mixed orders in cursor paginate #37762

Merged
merged 3 commits into from
Jul 8, 2021

Conversation

halaei
Copy link
Contributor

@halaei halaei commented Jun 21, 2021

It is possible to remove the following limitation from cursorPaginate() function:

It requires that the "order by" directions (descending / ascending) are the same if there are multiple "order by" clauses.

Example:

$cursor = ['test' => 'foo', 'another' => 'bar', 'third' => 'baz'];
$query->...->orderBy('test')->orderByDesc('another')->orderBy('third');

A desired SQL statement:

SELECT * FROM table
WHERE
    (test > 'foo' or
       (test = 'foo' and (another < 'bar' or
           (another = 'bar' and (third > 'baz')))))
ORDER BY test asc, another desc, third asc
LIMIT 100

Todo

  • Remove the limitation from docs.
  • Remove the deprecated CursorPaginationException in 9.x.
  • Check if DBMS can handle queries generated by this PR as efficiently as queries generated by the current Laravel release.
  • Remove the redundant parentheses generated by this PR.
  • Add more tests to check it handles cursors with more that 2 variables and mixed directions.

Tests

I tried to fix tests by checking the resulting query and bindings every time the mocked get() function is called.

@halaei halaei force-pushed the mix-orders-for-cursor-paginate-8 branch from 0638ded to cce9fe3 Compare June 21, 2021 13:17
@driesvints
Copy link
Member

Ping @paras-malhotra

@paras-malhotra
Copy link
Contributor

paras-malhotra commented Jun 24, 2021

@halaei, the reason I didn't take this path is because of the changing behaviour of index usage based on the DB engine. I guess if we construct queries like this, we'll need to check if all 4 databases are able to use: (a) a composite index and (b) an index on the first column if a composite index is missing, on the queries constructed. The benefit of the tuple comparison operator is that the DB engine takes care of this automatically.

I haven't looked into index support for database engines but you may find this article useful: https://use-the-index-luke.com/sql/partial-results/fetch-next-page

@halaei
Copy link
Contributor Author

halaei commented Jun 24, 2021

I am not sure about database support either. Ideally DB engines should be able to find and use the best index in the most efficient way without any hint, but I don't know if they are all as perfect as we need them to be. Regardless of this PR, I would really like to know how DB engines work in this regard.
If there is a performance issue for queries in this PR, beside reporting the issue to DB maintainers :) maybe we can use the simpler tuple comparison clause if possible and fallback to queries generated by this PR if it isn't possible.

One important consideration is that according to sql-workbenck, tuple comparison with < or > is not supported in SQL-Server and Oracle. So this PR removes another limitation for SQL-Server, which is officially supported by Laravel.
Moreover, since this PR doesn't use raw queries, it should probably work for jenssegers/laravel-mongodb too, which is personally important to me.

@paras-malhotra
Copy link
Contributor

paras-malhotra commented Jun 24, 2021

I'd recommend we stick to the tuple comparison to avoid extra complexity on constructing queries for index usage. If we go with this PR, we should test it or check out the RDBMS docs to ensure the index is actually used. The whole idea of cursor pagination is for better query performance, and if it messes up the execution plan, in the end it may not be worth it from a performance perspective when compared to offset pagination.

For the issue with the incompatibility with SQL Server, I think we can just mention that in the docs. This way, at least we're sure that cursor pagination does in fact increase query performance for the databases that are supported.

@halaei
Copy link
Contributor Author

halaei commented Jun 25, 2021

According to MySQL optimization docs, there are scenarios where using row constructor, e.g. tuples, actually degrade the performance. So maybe not using tuple comparison is actually a performance optimization! I try to check it with some benchmark.

@halaei
Copy link
Contributor Author

halaei commented Jun 25, 2021

My benchmark result for MySQL 8.0.25:

Table

CREATE TABLE `foos` (
  `id` bigint unsigned NOT NULL AUTO_INCREMENT,
  `foo` int NOT NULL,
  `bar` int NOT NULL,
  `baz` int NOT NULL,
  `created_at` timestamp NULL DEFAULT NULL,
  `updated_at` timestamp NULL DEFAULT NULL,
  PRIMARY KEY (`id`),
  KEY `foos_foo_bar_baz_index` (`foo`,`bar`,`baz`)
) ENGINE=InnoDB AUTO_INCREMENT=2000001 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;

Data

The table is seeded with 2 million records, with 0<=foo<10, 0<=bar<10, 0<=baz<20000. This distribution noticibly slows down queries that can't use the entire 12-byte length of the multi-column index.

for ($foo = 0; $foo < 10; $foo++) {
    for ($bar = 0; $bar < 10; $bar++) {
        for ($i = 0; $i < 20000; $i += 100) {
            $values = [];
            $date = now();
            for ($baz = $i; $baz < $i + 100; $baz++) {
                $values[] = [
                    'foo' => $foo,
                    'bar' => $bar,
                    'baz' => $baz,
                    'created_at' => $date,
                    'updated_at' => $date,
                ];
            }
            \App\Models\Foo::query()->insert($values);
        }
    }
}

Queries

In the following queries we try to get 50 elements from the near end of the index, to make sure they will be slow unless they perfectly use the multi-column index.

1. Paginating all items:

1.1. Using single column comparison:

select * from `foos` where (`foo` > 9 or (`foo` = 9 and (`bar` > 9 or (`bar` = 9 and (`baz` > 19000))))) order by `foo` asc, `bar` asc, `baz` asc limit 51;

Runtime: less than 2 milliseconds.
key_len reported by explain select statement: 12.

1.2. Using tuple comparison:

select * from `foos` where (`foo`, `bar`, `baz`) > (9, 9, 19000) order by `foo` asc, `bar` asc, `baz` asc limit 51

Runtime: 16 seconds.
key_len reported by explain select statement: 12.
This result must be a bug from MySQL. It is multiple times slower than scanning the whole table, e.g. by select count(*).

2. Paginating items with foo = 9:

2.1. Using single column comparison:

select * from `foos` where `foo` = 9 and (`bar` > 9 or (`bar` = 9 and (`baz` > 19000))) order by `bar` asc, `baz` asc limit 51

Runtime: less than 2 milliseconds.
key_len reported by explain select statement: 12.

2.2. Using tuple comparison:

select * from `foos` where `foo` = 9 and (`bar`, `baz`) > (9, 19000) order by `bar` asc, `baz` asc limit 51

Runtime: 0.5 seconds.
key_len reported by explain select statement: 4.
This result agrees with MySQL documentation: https://dev.mysql.com/doc/refman/8.0/en/row-constructor-optimization.html

Conclusion

With MySQL, it is better to use single column comparison for cursor pagination of rows sorted by a 3-column index. Using tuple comparison is slower than simple pagination for some cases. This is most probably caused by a bug in MySQL and InnoDB engine.

@taylorotwell
Copy link
Member

@paras-malhotra any other thoughts on this?

@paras-malhotra
Copy link
Contributor

Based on the analysis by @halaei and #37216 (comment), it does seem that this PR will improve performance for mySQL but I'm not sure how it compares for PostgreSQL.

I guess, since there are more mySQL users than PostgreSQL and also the fact that SQL Server isn't supported by the current cursor pagination, I think it makes sense to accept this PR. 👍

@halaei
Copy link
Contributor Author

halaei commented Jun 28, 2021

For the record, MySQL has confirmed the performance bug I reported as "a duplicate of an internally-filed bug report, which is not yet scheduled for fixing":
https://bugs.mysql.com/bug.php?id=104128

@taylorotwell
Copy link
Member

Do we have any integration tests on this feature that actually hit the database?

@halaei
Copy link
Contributor Author

halaei commented Jun 29, 2021

@taylorotwell there are some tests here:

public function testCursorPaginatedModelCollectionRetrieval()
{
EloquentTestUser::create(['id' => 1, 'email' => 'taylorotwell@gmail.com']);
EloquentTestUser::create($secondParams = ['id' => 2, 'email' => 'abigailotwell@gmail.com']);
EloquentTestUser::create(['id' => 3, 'email' => 'foo@gmail.com']);
CursorPaginator::currentCursorResolver(function () {
return null;
});
$models = EloquentTestUser::oldest('id')->cursorPaginate(2);
$this->assertCount(2, $models);
$this->assertInstanceOf(CursorPaginator::class, $models);
$this->assertInstanceOf(EloquentTestUser::class, $models[0]);
$this->assertInstanceOf(EloquentTestUser::class, $models[1]);
$this->assertSame('taylorotwell@gmail.com', $models[0]->email);
$this->assertSame('abigailotwell@gmail.com', $models[1]->email);
$this->assertTrue($models->hasMorePages());
$this->assertTrue($models->hasPages());
CursorPaginator::currentCursorResolver(function () use ($secondParams) {
return new Cursor($secondParams);
});
$models = EloquentTestUser::oldest('id')->cursorPaginate(2);
$this->assertCount(1, $models);
$this->assertInstanceOf(CursorPaginator::class, $models);
$this->assertInstanceOf(EloquentTestUser::class, $models[0]);
$this->assertSame('foo@gmail.com', $models[0]->email);
$this->assertFalse($models->hasMorePages());
$this->assertTrue($models->hasPages());
}
public function testPreviousCursorPaginatedModelCollectionRetrieval()
{
EloquentTestUser::create(['id' => 1, 'email' => 'taylorotwell@gmail.com']);
EloquentTestUser::create(['id' => 2, 'email' => 'abigailotwell@gmail.com']);
EloquentTestUser::create($thirdParams = ['id' => 3, 'email' => 'foo@gmail.com']);
CursorPaginator::currentCursorResolver(function () use ($thirdParams) {
return new Cursor($thirdParams, false);
});
$models = EloquentTestUser::oldest('id')->cursorPaginate(2);
$this->assertCount(2, $models);
$this->assertInstanceOf(CursorPaginator::class, $models);
$this->assertInstanceOf(EloquentTestUser::class, $models[0]);
$this->assertInstanceOf(EloquentTestUser::class, $models[1]);
$this->assertSame('taylorotwell@gmail.com', $models[0]->email);
$this->assertSame('abigailotwell@gmail.com', $models[1]->email);
$this->assertTrue($models->hasMorePages());
$this->assertTrue($models->hasPages());
}
public function testCursorPaginatedModelCollectionRetrievalWhenNoElements()
{
CursorPaginator::currentCursorResolver(function () {
return null;
});
$models = EloquentTestUser::oldest('id')->cursorPaginate(2);
$this->assertCount(0, $models);
$this->assertInstanceOf(CursorPaginator::class, $models);
Paginator::currentPageResolver(function () {
return new Cursor(['id' => 1]);
});
$models = EloquentTestUser::oldest('id')->cursorPaginate(2);
$this->assertCount(0, $models);
}
public function testCursorPaginatedModelCollectionRetrievalWhenNoElementsAndDefaultPerPage()
{
$models = EloquentTestUser::oldest('id')->cursorPaginate();
$this->assertCount(0, $models);
$this->assertInstanceOf(CursorPaginator::class, $models);
}

@driesvints
Copy link
Member

@halaei can you rebase with 8.x please?

@halaei halaei force-pushed the mix-orders-for-cursor-paginate-8 branch from cce9fe3 to 8d656a8 Compare June 30, 2021 07:07
@taylorotwell
Copy link
Member

Can you mark this as draft until all your checkbox tasks are complete?

@driesvints driesvints marked this pull request as draft July 6, 2021 13:56
@halaei halaei marked this pull request as ready for review July 7, 2021 20:26
@halaei
Copy link
Contributor Author

halaei commented Jul 7, 2021

The only remaining checkbox can only be done after this PR is merged. And I don't think it is really required to "remove the redundant parentheses generated by this PR", but let me know if I you need me to work on it.

@taylorotwell taylorotwell merged commit 80f0112 into laravel:8.x Jul 8, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants