feat(server): separate face clustering job #5598

mertalev · 2023-12-10T03:14:33Z

Description

This PR separates the current facial recognition job into two parts: generating faces and clustering faces, or Face Detection and Facial Recognition. This forms the basis for future improvements to clustering that can leverage the fact that each job has access to the full set of face embeddings.

The queue job for facial recognition now waits for face detection to finish, while face detection jobs queue this queue job. The queue facial recognition job is assigned a job ID so BullMQ only queues the first instance of it.

The clustering job no longer requires embeddings to be re-generated, so it's very quick to change options and re-cluster. For both face detection and facial recognition, running on all assets still requires re-clustering.

The clustering algorithm is also overhauled. Due to the increased number of faces during search, looking at just the nearest neighbor led to many duplicate people.

As part of this, minFaces is now incorporated into the clustering algorithm. When minFaces is set to 1, this algorithm performs like a better version of the one on main (no distinction between core and non-core points, but all faces have access to the full set of embeddings and cluster sequentially). When set higher, it increases the precision of the clustering at the cost of increasing the chance that a face is not assigned to a person. The default is increased to 3 for higher precision; this produced the best results during testing without excluding relevant images.

A perk of this change is that default thumbnails for people will now be better on average, with a lower chance of blurred or off-angle faces. This is because only core faces (described below) can generate thumbnails.

Some other minor changes:

The getAll and getAllFaces queries are now paginated
Added a waitForQueueCompletion method for job hierarchy (e.g. running all detection jobs before recognition jobs can start)
The unlink method now warns if a file doesn't exist instead of throwing
~~The embedding column of asset_faces is now (unfortunately) selected by default~~
- ~~Trying to explicitly select this column doesn't work because the column "doesn't exist" according to TypeORM, but it works if it's selected by default~~
Jobs with concurrency disabled now appear in settings as disabled

Algorithm

The clustering algorithm has been updated to a variant of DBSCAN, including a concept of "core" points (points with a minimum number of faces around them). During search, a person is only created if there are no points around with an assigned person and the current point is a core point. Core points are additionally allowed to assign a person to all un-assigned points around them, while non-core points can only assign to themselves.

There ~~are~~ were two simplifications here:
~~1. Core faces are allowed to extend from the people of non-core faces during search (normally non-core points cannot extend a cluster)~~
- ~~This would require looking up the density of each neighbor, which is difficult to do efficiently with the current job system~~
~~2. Core faces can only reassign 100 - 1000 faces at a time depending on library size (normally all points in range would be extended, not just the top K)~~
- HNSW indices are not optimized for range queries, so a limit is needed for performance. This can theoretically cause duplicates if the number of faces for a person is very high, so might need to be tweaked in the future.

I also experimented with clustering with the ML service using HDBSCAN, a more sophisticated algorithm than DBSCAN, but ran into some issues:

HDBSCAN doesn't have a concept of cluster evolution, which is an important part of face clustering. Using it for partial clustering would mean using Jaccard similarity or another metric to associate the clusters it generates with existing clusters, a sub-optimal solution.
The Python implementation indexes the points again with a KDTree or BallTree, which doesn't work as well as the HNSW index we use for high dimensional data.
This approach doesn't scale well to larger libraries due to the amount of data being transferred to the ML service and the redundant overhead of indexing.
Implementing it in the Node.js-based streaming-oriented job environment is considerably more complex than DBSCAN.

How Has This Been Tested?

Results on a toy dataset are perfect. Results on a library with a few thousand images are nearly perfect and better than the current clustering algorithm.

The only thing I'm unsure about is how the algorithm performs for other libraries. It's possible there are edge cases I haven't encountered.

Fixes #6441
Fixes #4087

cloudflare-pages · 2023-12-29T02:19:14Z

Deploying with Cloudflare Pages

Latest commit:	`157dcf4`
Status:	✅ Deploy successful!
Preview URL:	https://7e3ac1de.immich.pages.dev
Branch Preview URL:	https://feat-face-clustering-job.immich.pages.dev

View logs

jrasm91

This PR is quite large lol. Can you give a summary of the changes in the description? Like renamed the queue, added a second queue, etc.?

server/src/domain/person/person.service.ts

server/src/domain/repositories/machine-learning.repository.ts

mertalev · 2024-01-06T23:00:13Z

Haha yeah, I added some more details to the description

mertalev · 2024-01-09T04:38:15Z

I managed to address the clustering simplifications. (1) is solved by moving non-core faces to the back of the queue, effectively making it a priority queue. This gives a guarantee (at least for All jobs) that any person a core face finds is from another core face. (2) is solved by making a separate search for a person. This will work regardless of library size as the query will only return a face with a person. The results on a library with a few thousand images are almost perfect and better than the current algorithm.

Now to make it not fail all of our checks...

danieldietzler

I obviously can't speak for the python code, yet alone the whole facial recognition logic. However, the code I could asses looks really good imo!
Great job :)

server/src/domain/person/person.service.spec.ts

server/src/infra/repositories/job.repository.ts

server/src/infra/repositories/person.repository.ts

jrasm91

This is a huge change, but I think it looks really good. Being able to run the clustering algorithm independently of the embedding computation will be very nice to have. LGTM!

server/src/domain/person/person.service.ts

server/src/infra/repositories/filesystem.provider.ts

server/src/infra/repositories/job.repository.ts

formatting linting

remove unused imports

formatting

remove unused import

formatting

fix tests

wording formatting

…ustering-job

bo0tzz added the 🧠machine-learning label Dec 13, 2023

mertalev mentioned this pull request Dec 28, 2023

Age prediction #6037

Closed

mertalev force-pushed the feat/face-clustering-job branch from 2d6b965 to d6dc331 Compare December 29, 2023 02:19

jrasm91 reviewed Jan 5, 2024

View reviewed changes

server/src/domain/person/person.service.ts Outdated Show resolved Hide resolved

server/src/domain/repositories/machine-learning.repository.ts Outdated Show resolved Hide resolved

mertalev force-pushed the feat/face-clustering-job branch from 4bd4ec1 to b354212 Compare January 6, 2024 22:25

mertalev force-pushed the feat/face-clustering-job branch 9 times, most recently from deeb0d2 to c75e9ac Compare January 10, 2024 06:22

mertalev marked this pull request as ready for review January 10, 2024 06:37

mertalev requested review from jrasm91, danieldietzler and martabal January 10, 2024 06:40

danieldietzler approved these changes Jan 10, 2024

View reviewed changes

jrasm91 reviewed Jan 12, 2024

View reviewed changes

jrasm91 approved these changes Jan 12, 2024

View reviewed changes

mertalev force-pushed the feat/face-clustering-job branch 2 times, most recently from 997cc65 to 858e68f Compare January 13, 2024 05:52

mertalev added 4 commits January 17, 2024 22:05

separate facial clustering job

435eee0

update api

b4d6a1a

fixed some tests

ad3beb2

invert clustering

dea9c9b

mertalev and others added 17 commits January 17, 2024 22:10

update sql

7db4c8e

formatting linting

simplify logic

9ed46c3

remove unused imports

more specific delete signature

d2d0c7d

more accurate typing for face stubs

c316aac

add migration

3a3eabd

formatting

chore: better typing

b263fe4

don't select embedding by default

4e5573c

remove unused import

updated sql

5d441d9

use normal try/catch

7c1af9e

stricter concurrency typing and enforcement

82f07c8

update api

43bd812

update job concurrency panel to show disabled queues

ba2be4a

formatting

check jobId in queueAll

de949ca

fix tests

remove outdated comment

269b4e9

better facial recognition icon

c9b3e8c

wording

4e7d97b

wording formatting

fixed tests

2457ab4

mertalev force-pushed the feat/face-clustering-job branch from 22f5ae7 to 2457ab4 Compare January 18, 2024 03:23

alextran1502 and others added 8 commits January 17, 2024 21:27

Merge branch 'main' of github.com:immich-app/immich into feat/face-cl…

7aecbd7

…ustering-job

fix

0f5fe4f

formatting & sql

4060168

try to fix sql check

d5a84d9

more detailed description

4733b25

update sql

b6bc4c5

formatting

3de6298

wording

79ef030

alextran1502 approved these changes Jan 18, 2024

View reviewed changes

update minFaces description

157dcf4

mertalev merged commit 68f5281 into main Jan 18, 2024
21 checks passed

mertalev deleted the feat/face-clustering-job branch January 18, 2024 05:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(server): separate face clustering job #5598

feat(server): separate face clustering job #5598

mertalev commented Dec 10, 2023 •

edited

Loading

cloudflare-pages bot commented Dec 29, 2023 •

edited

Loading

jrasm91 left a comment

mertalev commented Jan 6, 2024

mertalev commented Jan 9, 2024 •

edited

Loading

danieldietzler left a comment

jrasm91 left a comment

feat(server): separate face clustering job #5598

feat(server): separate face clustering job #5598

Conversation

mertalev commented Dec 10, 2023 • edited Loading

Description

Algorithm

How Has This Been Tested?

cloudflare-pages bot commented Dec 29, 2023 • edited Loading

Deploying with Cloudflare Pages

jrasm91 left a comment

Choose a reason for hiding this comment

mertalev commented Jan 6, 2024

mertalev commented Jan 9, 2024 • edited Loading

danieldietzler left a comment

Choose a reason for hiding this comment

jrasm91 left a comment

Choose a reason for hiding this comment

mertalev commented Dec 10, 2023 •

edited

Loading

cloudflare-pages bot commented Dec 29, 2023 •

edited

Loading

mertalev commented Jan 9, 2024 •

edited

Loading