Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Forced order by ID in sql.js causing 40+ second queries for single row queries. #169

Open
SippieCup opened this issue Feb 21, 2020 · 5 comments

Comments

@SippieCup
Copy link

@SippieCup SippieCup commented Feb 21, 2020

On tables with 100 million+ rows, when querying for a single row using loopback it'll insert an "order by id" forcing the entire table read.

Steps to reproduce

I am using a postgres server, but it is for pretty much all sql queries.

Example:
We have a model "foo" attached to a SQL table with the columns id and hash, where id is unique, and hash is not unique. This table has 100,000,000 Rows. and some hashes have up to 100,000 rows. (note this is simplified, in my database there are far more columns and every row is unique)

We use this curl command to query the loopback API (I know that I can use findOne instead, but it'll still go toprototype.all so it really doesn't make a difference besides returning a single object rather than an array):

curl --location --request GET 'localhost:3000/api/foos' --header 'Content-Type: application/x-www-form-urlencoded' --form 'filter={"where": { "hash": "d41d8cd98f00b204e9800998ecf8427e" }, "limit": 1 }'

Current Behavior

Loopback generates the following SQL statement

SELECT * FROM "public"."foo" WHERE "hash"='d41d8cd98f00b204e9800998ecf8427e' ORDER BY "id" LIMIT 1

Because hash is non-unique, and can have thousands of rows, (in this case, 115,000) this takes 40.632s on my server to complete.

Expected Behavior

Loopback should instead generate the following SQL statement:

SELECT hash FROM "public"."foo" WHERE "hash"='d41d8cd98f00b204e9800998ecf8427e' LIMIT 1

resulting in a query time of on my server of 12ms.

Link to reproduction sandbox

If you really need me to setup a reproduction sandbox I can, but the problem should be pretty apparent.

Additional information

link to offending code: https://github.com/strongloop/loopback-connector/blob/master/lib/sql.js#L1388

You could say that this is "working as intended" and that its the database structure is bad/wrong/whatever, but there is no reason why the connector should be dictating an order on all queries.

This functionality should be defined in the default scope of the datasource and model, not built into the base loopback connector.

Acceptance Criteria

  • add a flag to disable default order by the identifier on queries. Reference: strongloop/loopback-connector-postgresql#417
  • add test cases for main SQL connectors (MySQL for example) + mongodb
  • update documentations. Might need to update several README files of connectors
@SippieCup SippieCup added the bug label Feb 21, 2020
@SippieCup SippieCup changed the title Forced Order by ID in sql.js causing 40+ second queries for single row queries. Forced order by ID in sql.js causing 40+ second queries for single row queries. Feb 21, 2020
@agnes512

This comment has been minimized.

Copy link
Contributor

@agnes512 agnes512 commented Feb 24, 2020

@SippieCup thanks for bringing up the issue. I agree that order is not needed for many use cases.

We currently have a community PR ongoing: strongloop/loopback-connector-postgresql#417. I think it's trying to fix the issue you mentioned. It's proposing to have flag defaultIdSort to disable sorting on id. Feel free to join the discussion!

@SippieCup

This comment has been minimized.

Copy link
Author

@SippieCup SippieCup commented Feb 26, 2020

Thanks, didn't notice that PR before.

While its good for the postgresql connector, and acceptable for my use case, I still think it might be worth backporting it to the parent connector for other sql-based datasources.

@agnes512

This comment has been minimized.

Copy link
Contributor

@agnes512 agnes512 commented Feb 26, 2020

@SippieCup Agreed. I will edit the issue description/acceptance criteria a bit and let the team to estimate it. Thanks!

@jannyHou

This comment has been minimized.

Copy link
Contributor

@jannyHou jannyHou commented Feb 27, 2020

Problem: If order field is not specified, pk will be the default order field in the generated query
Solution:

  • try some quick workaround, e.g. order by 1 or order by '', see if it fixes the problem
  • then we can think of introducing a flag to avoid automatically use the pk when order field is not specified
@SippieCup

This comment has been minimized.

Copy link
Author

@SippieCup SippieCup commented Feb 28, 2020

order by 1 (or whichever column you would like), would only give an improvement if you are selecting the column used in the where filter, at which point, it is just redundant versus removing the order by clause entirely. Furthermore, If the user is not selecting fields to return, they would have in know the order of the columns as they appear in the database and select the right column (while also handling hidden properties / columns potentially not defined in the loopback model) to figure out the correct column number to select.

It would would make code unreadable, breaking on model/database changes, and would be adding another magnitude of complexity. But I guess it does "work." If thats the solution strongloop chooses, I'll just continue using my forked connector where I just removed the offending code entirely, but I'd feel sorry for other users.

order by '' I'm fairly certain will give you an error as its would be a non-integer constant.

I see two functional solutions: Adding a flag as you have stated and in the PR, or removing the order by id default, and have users explicitly declare if they want order by id within the model scope.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
3 participants
You can’t perform that action at this time.