Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(server): separate face clustering job #5598

Merged
merged 45 commits into from
Jan 18, 2024
Merged

Conversation

mertalev
Copy link
Contributor

@mertalev mertalev commented Dec 10, 2023

Description

This PR separates the current facial recognition job into two parts: generating faces and clustering faces, or Face Detection and Facial Recognition. This forms the basis for future improvements to clustering that can leverage the fact that each job has access to the full set of face embeddings.

The queue job for facial recognition now waits for face detection to finish, while face detection jobs queue this queue job. The queue facial recognition job is assigned a job ID so BullMQ only queues the first instance of it.

The clustering job no longer requires embeddings to be re-generated, so it's very quick to change options and re-cluster. For both face detection and facial recognition, running on all assets still requires re-clustering.

The clustering algorithm is also overhauled. Due to the increased number of faces during search, looking at just the nearest neighbor led to many duplicate people.

As part of this, minFaces is now incorporated into the clustering algorithm. When minFaces is set to 1, this algorithm performs like a better version of the one on main (no distinction between core and non-core points, but all faces have access to the full set of embeddings and cluster sequentially). When set higher, it increases the precision of the clustering at the cost of increasing the chance that a face is not assigned to a person. The default is increased to 3 for higher precision; this produced the best results during testing without excluding relevant images.

A perk of this change is that default thumbnails for people will now be better on average, with a lower chance of blurred or off-angle faces. This is because only core faces (described below) can generate thumbnails.

Some other minor changes:

  • The getAll and getAllFaces queries are now paginated
  • Added a waitForQueueCompletion method for job hierarchy (e.g. running all detection jobs before recognition jobs can start)
  • The unlink method now warns if a file doesn't exist instead of throwing
  • The embedding column of asset_faces is now (unfortunately) selected by default
    • Trying to explicitly select this column doesn't work because the column "doesn't exist" according to TypeORM, but it works if it's selected by default
  • Jobs with concurrency disabled now appear in settings as disabled

Algorithm

The clustering algorithm has been updated to a variant of DBSCAN, including a concept of "core" points (points with a minimum number of faces around them). During search, a person is only created if there are no points around with an assigned person and the current point is a core point. Core points are additionally allowed to assign a person to all un-assigned points around them, while non-core points can only assign to themselves.

There are were two simplifications here:
1. Core faces are allowed to extend from the people of non-core faces during search (normally non-core points cannot extend a cluster)
- This would require looking up the density of each neighbor, which is difficult to do efficiently with the current job system
2. Core faces can only reassign 100 - 1000 faces at a time depending on library size (normally all points in range would be extended, not just the top K)
- HNSW indices are not optimized for range queries, so a limit is needed for performance. This can theoretically cause duplicates if the number of faces for a person is very high, so might need to be tweaked in the future.

I also experimented with clustering with the ML service using HDBSCAN, a more sophisticated algorithm than DBSCAN, but ran into some issues:

  • HDBSCAN doesn't have a concept of cluster evolution, which is an important part of face clustering. Using it for partial clustering would mean using Jaccard similarity or another metric to associate the clusters it generates with existing clusters, a sub-optimal solution.
  • The Python implementation indexes the points again with a KDTree or BallTree, which doesn't work as well as the HNSW index we use for high dimensional data.
  • This approach doesn't scale well to larger libraries due to the amount of data being transferred to the ML service and the redundant overhead of indexing.
  • Implementing it in the Node.js-based streaming-oriented job environment is considerably more complex than DBSCAN.

How Has This Been Tested?

Results on a toy dataset are perfect. Results on a library with a few thousand images are nearly perfect and better than the current clustering algorithm.

The only thing I'm unsure about is how the algorithm performs for other libraries. It's possible there are edge cases I haven't encountered.

Fixes #6441
Fixes #4087

Copy link

cloudflare-pages bot commented Dec 29, 2023

Deploying with  Cloudflare Pages  Cloudflare Pages

Latest commit: 157dcf4
Status: ✅  Deploy successful!
Preview URL: https://7e3ac1de.immich.pages.dev
Branch Preview URL: https://feat-face-clustering-job.immich.pages.dev

View logs

Copy link
Contributor

@jrasm91 jrasm91 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR is quite large lol. Can you give a summary of the changes in the description? Like renamed the queue, added a second queue, etc.?

server/src/domain/person/person.service.ts Outdated Show resolved Hide resolved
@mertalev
Copy link
Contributor Author

mertalev commented Jan 6, 2024

Haha yeah, I added some more details to the description

@mertalev
Copy link
Contributor Author

mertalev commented Jan 9, 2024

I managed to address the clustering simplifications. (1) is solved by moving non-core faces to the back of the queue, effectively making it a priority queue. This gives a guarantee (at least for All jobs) that any person a core face finds is from another core face. (2) is solved by making a separate search for a person. This will work regardless of library size as the query will only return a face with a person. The results on a library with a few thousand images are almost perfect and better than the current algorithm.

Now to make it not fail all of our checks...

@mertalev mertalev force-pushed the feat/face-clustering-job branch 9 times, most recently from deeb0d2 to c75e9ac Compare January 10, 2024 06:22
@mertalev mertalev marked this pull request as ready for review January 10, 2024 06:37
Copy link
Member

@danieldietzler danieldietzler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I obviously can't speak for the python code, yet alone the whole facial recognition logic. However, the code I could asses looks really good imo!
Great job :)

server/src/domain/person/person.service.spec.ts Outdated Show resolved Hide resolved
server/src/domain/person/person.service.spec.ts Outdated Show resolved Hide resolved
server/src/infra/repositories/person.repository.ts Outdated Show resolved Hide resolved
Copy link
Contributor

@jrasm91 jrasm91 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a huge change, but I think it looks really good. Being able to run the clustering algorithm independently of the embedding computation will be very nice to have. LGTM!

server/src/domain/person/person.service.ts Show resolved Hide resolved
server/src/domain/person/person.service.ts Outdated Show resolved Hide resolved
server/src/infra/repositories/filesystem.provider.ts Outdated Show resolved Hide resolved
server/src/infra/repositories/job.repository.ts Outdated Show resolved Hide resolved
@mertalev mertalev force-pushed the feat/face-clustering-job branch 2 times, most recently from 997cc65 to 858e68f Compare January 13, 2024 05:52
@mertalev mertalev merged commit 68f5281 into main Jan 18, 2024
21 checks passed
@mertalev mertalev deleted the feat/face-clustering-job branch January 18, 2024 05:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants