Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stricter ORDER BY, LIMIT, OFFSET #10649

Open
javiertury opened this issue Mar 29, 2019 · 2 comments
Open

Stricter ORDER BY, LIMIT, OFFSET #10649

javiertury opened this issue Mar 29, 2019 · 2 comments
Labels
P1: important For issues and PRs. status: in discussion For issues and PRs. Community and maintainers are discussing the applicability of the issue. topic: limit & offset For issues and PRs. Things that involve queries with limit and/or offset. type: refactor For issues and PRs. Things that improve the code readability, maintainability, testability, etc.

Comments

@javiertury
Copy link
Contributor

javiertury commented Mar 29, 2019

Add restrictions on ORDER BY, LIMIT and OFFSET. Which programming patterns should be allowed, which should be warned against and which should throw errors?

Related to #10331

Underlying restrictions

These are the built-in restrictions of some dialects:

  • mssql: ORDER BY is required by OFFSET which is required by LIMIT. Graphically, ORDER BY -> OFFSET -> LIMIT
  • sqlite, mysql and mariadb: ORDER BY is independent. LIMIT is required by OFFSET. Graphically, ORDER BY, LIMIT -> OFFSET
  • postgres: they are all independent.

It's possible to workaround all restrictions to make clauses seemingly independent in all dialects.

Arbitrary behavior

Let's first address ORDER BY. Assuming that the order clause is unambiguous, ORDER BY alone results in deterministic behavior. Moreover, in most cases ORDER BY is needed to avoid arbitrary behavior if OFFSET or LIMIT are used.

OFFSET Y without ordering will exclude Y randomly selected rows, which results in arbitrary behavior. The result of LIMIT X without ordering is in general arbitrary, but it depends on the number of total rows N. If N is known and N > X, the resulting set will return X randomly selected rows, which is arbitrary. However if N is unknown or N <= X, the query will return all the rows, which is deterministic.

Now assume that there is ordering, does LIMIT without OFFSET or viceversa result in undefined behavior? An ordered LIMIT X without offsetting is perfectly fine and is equivalent to LIMIT X OFFSET 0. It retrieves the first X rows according to the order clause (deterministic).

An ordered OFFSET Y without limit just hides the first Y rows according to the order clause, which is deterministic.

Defensive programming

Let's go back to the case of LIMIT X without ordering. Let's assume that the number of rows per query will never exceed N. Some developers use a defensive programming mindset. They set LIMIT X without ordering, where X >= N, to avoid overloading the DB in case there is a bug in the query that brings more results. Adding an extra ORDER BY clause to a small LIMIT is inexpensive.

E.g. there is a maximum of 2 phones per customer. You expect to be queried in this way options.where.user = 1234 which will return at much 2 rows. However an evil user overloaded query parameters and managed to set options.where.user = [1234, 5678, 9012, ...]. Luckily your statement contained LIMIT 3 without ordering. In this case you detect that 3 rows were returned and abort the operation. Also you avoid overloading the database. Ordering was never important or required.

In my opinion, this practice is not mainstream. In terms of performance, forcing developers to add ORDER BY will have a negligible impact. Ordering is lightning fast for small numbers of rows.

ORDER BY with OFFSET but without LIMIT

In the context of regular pagination, using OFFSET without LIMIT is usually the symptom of an error. Pagination works like this ORDER BY ... LIMIT X, then ORDER BY ... LIMIT X OFFSET 1*X, ORDER BY ... LIMIT X OFFSET 2*X, etc. If LIMIT was omitted, it was likely due to some exception or some coding mistake. In this case it makes sense to throw an error. Otherwise, a limitless query could overload the DB server.

Are there legitimate uses of OFFSET without LIMIT? Yes. Let's say your app contains playlists, folders with files, photo albums, files in a pull request, etc. In a page you display the first 20 results (ORDER BY ... LIMIT 20), and when the user clicks in a show all button, all remaining items are loaded (ORDER BY ... OFFSET 20).

If sequelize requires developers to set a LIMIT in order to use OFFSET, developers can workaround this limitation using a absurdly high limit, options.limit = 9999999.

Summary

To avoid undefined behavior, it is enough to require ORDER BY in case LIMIT or OFFSET are used. Graphically ORDER BY -> (LIMIT, OFFSET). In some cases, this will add a tiny performance penalty to some already deterministic defensive programming practices.

Runtime pagination errors are prevented if LIMIT is required by OFFSET. Graphically, ORDER BY -> LIMIT -> OFFSET. Legitimate uses of OFFSET without limit can be easily worked around with options.limit = 9999999.

Proposal

  1. Throw errors if the requirement ORDER BY -> LIMIT -> OFFSET is not satisfied.
  2. Modify tests
@sushantdhiman
Copy link
Contributor

Good idea @javiertury , this is especially an issue with MSSQL which often generate incorrect queries for misuse of limit, offset and order.

As you suggested, we should enforce ORDER BY when LIMIT and OFFSET are present. This is a breaking change, we can perhaps supply default order by to PRIMARY KEY with warning to slide this in current v5 release.

@sushantdhiman sushantdhiman added status: in discussion For issues and PRs. Community and maintainers are discussing the applicability of the issue. type: refactor For issues and PRs. Things that improve the code readability, maintainability, testability, etc. labels Mar 29, 2019
@papb papb added topic: limit & offset For issues and PRs. Things that involve queries with limit and/or offset. P1: important For issues and PRs. labels Oct 30, 2019
@github-actions
Copy link
Contributor

This issue has been automatically marked as stale because it has been open for 14 days without activity. It will be closed if no further activity occurs within the next 14 days. If this is still an issue, just leave a comment or remove the "stale" label. 🙂

@github-actions github-actions bot added the stale label Nov 15, 2021
@WikiRik WikiRik removed the stale label Nov 15, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
P1: important For issues and PRs. status: in discussion For issues and PRs. Community and maintainers are discussing the applicability of the issue. topic: limit & offset For issues and PRs. Things that involve queries with limit and/or offset. type: refactor For issues and PRs. Things that improve the code readability, maintainability, testability, etc.
Projects
None yet
Development

No branches or pull requests

4 participants