-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(server) Random Endpoint use querybuilder and return exifInfo #9301
Conversation
exifInfo: true, | ||
}); | ||
|
||
return builder.orderBy('RANDOM()').limit(count).getMany(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This does not work, right? RANDOM()
always returns a number between 0 and 1, whereas before we intentionally multiplied it by the total number of assets to make sure we actually get a random value between 0 and x (x >> 1 usually).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TBH, I did not do too much research. I just found that ORDER BY RANDOM()
should work fine and read through this articles:
https://www.commandprompt.com/education/postgresql-order-by-random/
https://www.educba.com/postgresql-order-by-random/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh ok, nice! TIL
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This works fine for me:
SELECT *
FROM "assets"
ORDER BY RANDOM()
LIMIT 5
Run 1:
id
69c79d1e-8e32-4560-8e87-fae91b897746
400a7631-7d72-48c4-92e4-c07c278b35f5
e4281bfb-80f6-41e9-8313-387cb0667621
fc7929a3-1c93-4a08-a7d1-220c4bb3c012
804ae1fc-3298-426b-88c9-70110bab041d
Run 2:
id
3fbb6493-dcec-41e9-9fbd-a218f32db997
03e16ccb-c73c-487a-8ba5-0eda18b4efa0
d9420371-8d05-4b57-93e0-f95bc051704d
670f2b7d-f2cc-41be-a350-2d7a30b6f7f2
a03c965d-ef86-4164-96e1-32fae95170e1
Run 3:
id
bda1ea0d-43f1-423d-a8f6-5cc46e7fd460
4be225ef-43be-4a36-951e-77067b25767b
23e53ef9-7fa7-4b46-ac09-f6ea1de7e19a
66232d1c-06f6-4220-bcfb-670ed7a7639c
7fd2e9e8-8aa8-4f0a-8c27-68e6734d69c3
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, ordering by random is one of the main ways. It unfortunately sorts the entire table before applying the limit so it doesn't scale that well, but the current query isn't particularly efficient either so it's probably fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ordering by random is actually more random than what we have now since each row is random, vs. the current implementation returns a slice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mertalev that it orders the whole table first does indeed sound not so ideal. Do you have anything in mind how that could be optimized?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using TABLESAMPLE SYSTEM_ROWS
is very fast, e.g.
SELECT * FROM assets TABLESAMPLE SYSTEM_ROWS(5) INNER JOIN exif ON assets.id = exif."assetId";
With 2 million assets, this query takes about 2ms vs. 6s for ORDER BY random()
.
This does require the tsm_system_rows
extension, but it's easy to create since it's trusted and hence doesn't need superuser permission. The main disadvantage is that it'd have to be a raw query, which is easy when there are no relations but get more ugly when relations are involved.
Something like this should work:
this.repository.query(`
SELECT assets.*, lat."exifInfo"
FROM assets TABLESAMPLE SYSTEM_ROWS($1)
LEFT JOIN LATERAL (
SELECT to_json(exif) AS "exifInfo"
FROM exif
WHERE assets.id = exif."assetId"
LIMIT 1) lat ON true
WHERE "ownerId" = ANY($2::uuid[])`, [count, [ownerId]]);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah dang, it looks like it applies the WHERE
clause after sampling. I guess the ORDER BY random()
is okay for now and we can look at optimizing this in the future.
@danieldietzler I assume this is what you meant with |
The
/asset/random
-Endpoint does not return anyexifInfo
. This should fix it.