Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(server) Random Endpoint use querybuilder and return exifInfo #9301

Merged
merged 3 commits into from
May 8, 2024

Conversation

JW-CH
Copy link
Contributor

@JW-CH JW-CH commented May 7, 2024

The /asset/random-Endpoint does not return any exifInfo. This should fix it.

exifInfo: true,
});

return builder.orderBy('RANDOM()').limit(count).getMany();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does not work, right? RANDOM() always returns a number between 0 and 1, whereas before we intentionally multiplied it by the total number of assets to make sure we actually get a random value between 0 and x (x >> 1 usually).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TBH, I did not do too much research. I just found that ORDER BY RANDOM() should work fine and read through this articles:
https://www.commandprompt.com/education/postgresql-order-by-random/
https://www.educba.com/postgresql-order-by-random/

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh ok, nice! TIL

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This works fine for me:

SELECT *
FROM "assets"
ORDER BY RANDOM()
LIMIT 5

Run 1:

id
69c79d1e-8e32-4560-8e87-fae91b897746
400a7631-7d72-48c4-92e4-c07c278b35f5
e4281bfb-80f6-41e9-8313-387cb0667621
fc7929a3-1c93-4a08-a7d1-220c4bb3c012
804ae1fc-3298-426b-88c9-70110bab041d

Run 2:

id
3fbb6493-dcec-41e9-9fbd-a218f32db997
03e16ccb-c73c-487a-8ba5-0eda18b4efa0
d9420371-8d05-4b57-93e0-f95bc051704d
670f2b7d-f2cc-41be-a350-2d7a30b6f7f2
a03c965d-ef86-4164-96e1-32fae95170e1

Run 3:

id
bda1ea0d-43f1-423d-a8f6-5cc46e7fd460
4be225ef-43be-4a36-951e-77067b25767b
23e53ef9-7fa7-4b46-ac09-f6ea1de7e19a
66232d1c-06f6-4220-bcfb-670ed7a7639c
7fd2e9e8-8aa8-4f0a-8c27-68e6734d69c3

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, ordering by random is one of the main ways. It unfortunately sorts the entire table before applying the limit so it doesn't scale that well, but the current query isn't particularly efficient either so it's probably fine.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ordering by random is actually more random than what we have now since each row is random, vs. the current implementation returns a slice.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mertalev that it orders the whole table first does indeed sound not so ideal. Do you have anything in mind how that could be optimized?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using TABLESAMPLE SYSTEM_ROWS is very fast, e.g.

SELECT * FROM assets TABLESAMPLE SYSTEM_ROWS(5) INNER JOIN exif ON assets.id = exif."assetId";

With 2 million assets, this query takes about 2ms vs. 6s for ORDER BY random().

This does require the tsm_system_rows extension, but it's easy to create since it's trusted and hence doesn't need superuser permission. The main disadvantage is that it'd have to be a raw query, which is easy when there are no relations but get more ugly when relations are involved.

Something like this should work:

this.repository.query(`
    SELECT assets.*, lat."exifInfo"
    FROM assets TABLESAMPLE SYSTEM_ROWS($1)
    LEFT JOIN LATERAL (
        SELECT to_json(exif) AS "exifInfo"
        FROM exif
        WHERE assets.id = exif."assetId"
        LIMIT 1) lat ON true
    WHERE "ownerId" = ANY($2::uuid[])`, [count, [ownerId]]);

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah dang, it looks like it applies the WHERE clause after sampling. I guess the ORDER BY random() is okay for now and we can look at optimizing this in the future.

@JW-CH
Copy link
Contributor Author

JW-CH commented May 7, 2024

@danieldietzler I assume this is what you meant with @GenerateSql. Correct me if I am wrong.

@jrasm91 jrasm91 merged commit 535c7a8 into immich-app:main May 8, 2024
22 checks passed
@JW-CH JW-CH deleted the dev.jw.randomendpoint branch May 8, 2024 06:19
@JW-CH JW-CH changed the title Random Endpoint use querybuilder and return exifInfo Enhancement(server) Random Endpoint use querybuilder and return exifInfo May 8, 2024
@JW-CH JW-CH changed the title Enhancement(server) Random Endpoint use querybuilder and return exifInfo feat(server) Random Endpoint use querybuilder and return exifInfo May 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants