Stricter ORDER BY, LIMIT, OFFSET #10649
Labels
P1: important
For issues and PRs.
status: in discussion
For issues and PRs. Community and maintainers are discussing the applicability of the issue.
topic: limit & offset
For issues and PRs. Things that involve queries with limit and/or offset.
type: refactor
For issues and PRs. Things that improve the code readability, maintainability, testability, etc.
Add restrictions on
ORDER BY
,LIMIT
andOFFSET
. Which programming patterns should be allowed, which should be warned against and which should throw errors?Related to #10331
Underlying restrictions
These are the built-in restrictions of some dialects:
ORDER BY
is required byOFFSET
which is required byLIMIT
. Graphically,ORDER BY -> OFFSET -> LIMIT
LIMIT
is required byOFFSET
. Graphically,ORDER BY, LIMIT -> OFFSET
It's possible to workaround all restrictions to make clauses seemingly independent in all dialects.
Arbitrary behavior
Let's first address
ORDER BY
. Assuming that the order clause is unambiguous,ORDER BY
alone results in deterministic behavior. Moreover, in most casesORDER BY
is needed to avoid arbitrary behavior ifOFFSET
orLIMIT
are used.OFFSET Y
without ordering will exclude Y randomly selected rows, which results in arbitrary behavior. The result ofLIMIT X
without ordering is in general arbitrary, but it depends on the number of total rows N. If N is known andN > X
, the resulting set will return X randomly selected rows, which is arbitrary. However if N is unknown orN <= X
, the query will return all the rows, which is deterministic.Now assume that there is ordering, does
LIMIT
withoutOFFSET
or viceversa result in undefined behavior? An orderedLIMIT X
without offsetting is perfectly fine and is equivalent toLIMIT X OFFSET 0
. It retrieves the first X rows according to the order clause (deterministic).An ordered
OFFSET Y
without limit just hides the first Y rows according to the order clause, which is deterministic.Defensive programming
Let's go back to the case of
LIMIT X
without ordering. Let's assume that the number of rows per query will never exceed N. Some developers use a defensive programming mindset. They setLIMIT X
without ordering, whereX >= N
, to avoid overloading the DB in case there is a bug in the query that brings more results. Adding an extraORDER BY
clause to a smallLIMIT
is inexpensive.E.g. there is a maximum of 2 phones per customer. You expect to be queried in this way
options.where.user = 1234
which will return at much 2 rows. However an evil user overloaded query parameters and managed to setoptions.where.user = [1234, 5678, 9012, ...]
. Luckily your statement containedLIMIT 3
without ordering. In this case you detect that 3 rows were returned and abort the operation. Also you avoid overloading the database. Ordering was never important or required.In my opinion, this practice is not mainstream. In terms of performance, forcing developers to add
ORDER BY
will have a negligible impact. Ordering is lightning fast for small numbers of rows.ORDER BY with OFFSET but without LIMIT
In the context of regular pagination, using
OFFSET
withoutLIMIT
is usually the symptom of an error. Pagination works like thisORDER BY ... LIMIT X
, thenORDER BY ... LIMIT X OFFSET 1*X
,ORDER BY ... LIMIT X OFFSET 2*X
, etc. IfLIMIT
was omitted, it was likely due to some exception or some coding mistake. In this case it makes sense to throw an error. Otherwise, a limitless query could overload the DB server.Are there legitimate uses of
OFFSET
withoutLIMIT
? Yes. Let's say your app contains playlists, folders with files, photo albums, files in a pull request, etc. In a page you display the first 20 results (ORDER BY ... LIMIT 20), and when the user clicks in ashow all
button, all remaining items are loaded (ORDER BY ... OFFSET 20).If sequelize requires developers to set a
LIMIT
in order to useOFFSET
, developers can workaround this limitation using a absurdly high limit,options.limit = 9999999
.Summary
To avoid undefined behavior, it is enough to require
ORDER BY
in caseLIMIT
orOFFSET
are used. GraphicallyORDER BY -> (LIMIT, OFFSET)
. In some cases, this will add a tiny performance penalty to some already deterministic defensive programming practices.Runtime pagination errors are prevented if
LIMIT
is required byOFFSET
. Graphically,ORDER BY -> LIMIT -> OFFSET
. Legitimate uses ofOFFSET
without limit can be easily worked around withoptions.limit = 9999999
.Proposal
ORDER BY -> LIMIT -> OFFSET
is not satisfied.The text was updated successfully, but these errors were encountered: