Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Thousands of Clusters that Include Many Different People as Same Person #475

Closed
bsaggy opened this issue Nov 10, 2022 · 43 comments
Closed
Labels

Comments

@bsaggy
Copy link

bsaggy commented Nov 10, 2022

Describe the bug
I'm running Recognize against ~35k images. It's creating way too many clusters, currently above 7k and growing.

MariaDB [nextcloud]> select count(*) from oc_recognize_face_clusters;
+----------+
| count(*) |
+----------+
|     7392 |
+----------+
1 row in set (0.005 sec)

The craziest part is that I'll click on a cluster in the Memories App with Mark Person in Preview enabled, and see multiple different people with the green bound box around them across all the pictures in that same cluster.

For example, one cluster has myself, my wife, my grandmother, my mother in law, my sister in law, my brother in law, a friend of a different skin color - all as the person of interest in this cluster. Another cluster has my 2 month old son, myself, my wife, my sister in law, my grandfather, etc. all as the person of interest in the cluster.

While I understand there is a margin for error in facial recognition, I have to believe something is wrong here. With over 7,000 clusters and every cluster containing all kinds of people of interest as indicated by the green bounding box, this is pretty much useless to me at this point. ~92k queued files still yet to go.

To Reproduce
Steps to reproduce the behavior:
I can't say that this is necessarily "reproducible", but this is the evolution of how I have used Recognize thus far.

  1. Installed Recognize v2.x on Nextcloud v24.x and let it do its thing.
  2. Upgraded to NextCloud 25.0.0 and then 25.0.1 while also upgrading to Recognize v3.1.2.
  3. At some point, I did a Reset Tags for Classified Files and a Reset Faces for Classified Files since I didn't seem to have made that much progress yet coming from v2.x and wanted to start fresh on v3.x. I also issued a recrawl. Otherwise I've been letting it just run for about a week now.

Expected behavior
Facial Recognition to work more accurately, and not identity my whole family including friends as the same person. Create less clusters but with more accuracy.

Recognize (please complete the following information):

  • JS-only mode: WASM Mode
  • Enabled modes: face recognition only

Server (please complete the following information):
System Configuration

Ubuntu 20.04 VM
32 GB RAM (I increased memory & vCPU in hopes of speeding things up to no avail)
8 vCPU
NC v25.0.1
Recognize v3.1.2

Recognize Configuration:

Face Recognition is enabled
6 Cores allowed for use
WASM mode is enabled

Additional context
If there's anything else I can check or do, please let me know.

@marcelklehr
Copy link
Member

Hi @bdevy

Thanks for your feedback, I'll try to help troubleshoot. There could be multiple things wrong here. One thing is the number of clusters which is likely due to the recently increased sensitivity in clustering which will create multiple clusters for the same person if the person looks a bit different on some days (think, new haircut, new glasses, etc.). I'm thinking about exposing the sensitivity here as an admin setting to let people play with it themselves. The second thing that could go wrong is the attribution of who this cluster is about. I take it you're using the "memories" app as the photos app doesn't display the person the cluster is about. When looking at the photos of one cluster or another, can you make out a person that is in all photos of a cluster? This is likely the actual person this cluster is about? If there is no person that is in all photos of a cluster, then the actual clustering is at fault, which is quite weird. Perhaps it has to do with WASM mode not being as accurate as normal tensorflow mode. I'll have to investigate that.

@bsaggy
Copy link
Author

bsaggy commented Nov 11, 2022

Hi @marcelklehr,

I think the sensitivity is definitely a factor - there are some pictures where the same person looks a bit different being sent to different unique clusters, but then there are are also pictures of the same person where the pictures were taken just moments apart - the picture looks almost identical but those pictures are being sent to different unique clusters as well. Very confusing. A knob to tune sensitivity might be helpful in this scenario.

I am using the "Memories" app and it works very well - the Photos app had some severe performance issues as the number of clusters grew (One issue is that the merge person dialog would stop appearing when the number of clusters got to maybe around 100 - the second issue is that the photos page would hang while loading when clicking on People, almost as if it was reclassifying all of the recognized faces in real-time - there would even be node.js process running on the server during this time but it got to the point where the page was just unusable. I digress).

When looking at the photos of one cluster or another, can you make out a person that is in all photos of a cluster?

Anyway, no, I cannot make out a person that is in all photos of a given cluster. I do have the "Mark person in preview" enabled so I see the green box around the person - but this box is around different people in the same cluster. Some pictures still have the person I think the cluster is about, but many pictures don't. So my answer to your question has to be "no", and as you say the clustering may be at fault. I'm checking to see if can enable AVX to utilize normal tensorflow mode instead of WASM mode if you think that might help. Over 10,200 clusters found now (~3k increase from 24 hours ago) and 88k files still in queue.

Thank you for your time in helping to understand these issues!

Brian

@marcelklehr
Copy link
Member

the Photos app had some severe performance issues as the number of clusters grew

Yeah, we're aware of that now and I've provided a fix that will be released soon. :)

Anyway, no, I cannot make out a person that is in all photos of a given cluster.

Mmh, that's not good :/

@adrianog91
Copy link

I have the same problem with the same configuration in WASM mode

@khlschrnk
Copy link

Anyway, no, I cannot make out a person that is in all photos of a given cluster.

Mmh, that's not good :/

Same here, the people in several clusters seem to be completely random. I also see mixed faces between babies and elderly and also different colored hair.

@marcelklehr
Copy link
Member

marcelklehr commented Nov 15, 2022

I tried but I cannot reproduce this. I enabled WASM mode on my dev instance and let it sift through my personal photo collection. The result is a well sorted collection of photos categorized by face. No false positives at all apart from one rubbish cluster with all the outliers.

@rhatguy
Copy link

rhatguy commented Nov 16, 2022

Same issue here. I'm running in native tensorflow (not WASM). Mine has been running for 1-2 weeks now and is 53k to go. I see on my dual 14 core server (28 real cores) that recognize keeps only one CPU thread spiked and isn't leveraging all available cores, so the initial run through my 41k picture collection is taking forever. I have cores set to 16 in the recognize settings, but it doesn't seem to do anything. Now Nextcloud is complaining that my cron isn't running for over 4 hours because each recognize run is taking so long.

Below you can see an example of one of the clusters that is showing inside of Memories. If clustering is going to be this bad its probably less than useful.

recognize

@derekakelly
Copy link

derekakelly commented Nov 17, 2022

I am also experiencing this problem.

MariaDB [nextcloud]> select count(*) from oc_recognize_face_clusters;
+----------+
| count(*) |
+----------+
|    20020 |
+----------+
1 row in set (0.007 sec)

@bsaggy
Copy link
Author

bsaggy commented Nov 17, 2022

I tried but I cannot reproduce this. I enabled WASM mode on my dev instance and let it sift through my personal photo collection. The result is a well sorted collection of photos categorized by face. No false positives at all apart from one rubbish cluster with all the outliers.

Hi @marcelklehr,

I enabled CPU pass through on my VM and WASM mode was no longer necessary so I disabled it. However, the issue persisted. I finally disabled Facial Recognition at 20k clusters. I don't see any point in letting it continue in its current state.

I had installed and just started using Recognition v2 on NC 24 shortly before upgrading to NC 25 and Recognition v3. I don't know if that factors into the behavior at all, but thought it was worth mentioning.

@rhatguy's example is pretty spot on to what I've observed as well - clusters with many different people of interest. Is there any more info I can provide you to aid in the troubleshooting of this issue? Is there a way to "reset" Recognize back to its defaults as if it were a fresh install, and if so is it worth attempting that to see if it then behaves as expected?

Thanks,
Brian

@illnesse
Copy link

fwiw, it happens without WASM too

@marcelklehr marcelklehr added bug Something isn't working v3.x wasm and removed wasm labels Nov 21, 2022
@marcelklehr
Copy link
Member

marcelklehr commented Nov 21, 2022

@illnesse Do you experience it, too? Didn't it work for you before?

@RedKage
Copy link

RedKage commented Nov 23, 2022

Hello, same here, without WASM.

I have lots of clusters containing very different people. Sometimes the person in the preview bubble in the memory app is only on one picture in the cluster. The rest of the faces can be from completely different persons, from babys to granies, males and females. It's all mixed up.

I have the "mark preview" activated so I see what the model sees, and what it thinks is the same face... and it's absolutely wrong

On 200 clusters only a handful are actually with a single person in it.

EDIT: Recognize v3.2.2, first time I am using this.
Only recognizing faces, 100 per cronjob. WSAM disabled.

BTW I have an increasing number of faces queued which never get dequeued.
Not sure if related or not.
I can wait days, it doesn't remove the queued faces. It increases until there is about 12k queued. Then stops.
Cronjob is instant, it seems it doesn't do neither crawling nor classifying.
When manipulating the clusters, merging faces, removing people, the queue increases a bit more, but doesn't dequeue.
Very weird

EDIT2:
I'm just thinking about it, but I have a lot of "asian" faces. These gets mixed up much more than caucasian faces where I have very few false positives... hmmmm

@MB-Finski
Copy link
Contributor

MB-Finski commented Dec 6, 2022

FWIW, I'm also experiencing this issue. Tens of different people get assigned to 1-2 mega clusters while quite often photos of the same person (taken only seconds apart) get assigned to separate identities/clusters. I tried manually reassigning all images to correct clusters but still processing more photos will result in new "mega clusters" (and occasionally new miniclusters of existing identities).

For me, the formation of the large clusters with tens of different people is the bigger issue. Basically, I have to manually assign the identitiy of almost every detected face.

NC 25.0.1
Recognize 3.2.3 (Native TF-mode)

@marcelklehr
Copy link
Member

It can happen that 1-2 Mega clusters appear with faces that couldn't be assigned to a different cluster. That's one thing, but if every cluster is a random assortment of people, then something fishy is going on.

@MB-Finski
Copy link
Contributor

MB-Finski commented Dec 10, 2022

Not all clusters are random. It's just that once these mega clusters form, most (something like 80%-90%) of the subsequently detected faces will be assigned to them.

I've previously tried to reset/remove the face tags from the GUI (i.e. from the admin settings), but I started wondering if that also resets all clusters that have been previously detected?

@bugsyb
Copy link

bugsyb commented Dec 11, 2022

@rhatguy, wondering if the CPU being at 100% (#475 (comment)) could not be related to my observation described here (in separate report to avoid steeling the thread subject: #546.

@bsaggy
Copy link
Author

bsaggy commented Dec 11, 2022

It can happen that 1-2 Mega clusters appear with faces that couldn't be assigned to a different cluster. That's one thing, but if every cluster is a random assortment of people, then something fishy is going on.

Hi @marcelklehr, if there's any testing or data gathering that I can do for you, please let me know.

@marcelklehr
Copy link
Member

It's just that once these mega clusters form, most (something like 80%-90%) of the subsequently detected faces will be assigned to them.

That's a good point. Apparently the cluster algorithm is a bit overzealous and over time we get black hole clusters. I was able to mitigate this in my test sample by adding a constraint on the inner cluster distance. If a cluster is larger than what could possibly be the same face, we simply disregard it. I'll run some more tests to verify that there's no negative consequences to this.

@marcelklehr
Copy link
Member

v3.3.3 is out now with the fix. After installing the update, make sure to remove the mega-clusters. Once you add a new face picture the clusters will be recalculated and hopefully the mega-clusters won't come back.

@bsaggy
Copy link
Author

bsaggy commented Dec 14, 2022

Hey @marcelklehr, that's great news!

Can you let me know the best way to delete thousands of mega-clusters? Would it be via database query?

Or better yet, I would be happy to just re-initialize the Recognize app and its database. I had only just begun using Recognize, so I don't mind wiping its progress and starting it from scratch. How could I do this?

Thanks,
Brian

@marcelklehr
Copy link
Member

You can run occ recognize:reset-faces which removes all face detections and face clusters from the database. Then you may run recognize:classify or trigger classification in the background by toggling the face recognition setting in the admin settings.

@RedKage
Copy link

RedKage commented Dec 14, 2022

@bdevy

Here's what I did to reset everything.

Admin:

Turn off all of the recognize toggles

OCC commands:

occ recognize:cleanup-tags
occ recognize:reset-tags
occ recognize:reset-faces

SQL:

delete from oc_jobs where class like '%Recognize%';
delete from oc_recognize_queue_faces;
delete from oc_recognize_queue_imagenet;
delete from oc_recognize_queue_landmarks;
delete from oc_recognize_queue_movinet;
delete from oc_recognize_queue_musicnn;
delete from oc_recognize_face_clusters;
delete from oc_recognize_face_detections;

Admin:

Turn on face recognition toggle

OCC command:

occ recognize:classify

@RedKage
Copy link

RedKage commented Dec 15, 2022

Aliright since version v3.3.3 I do not have these big clusters.
However...
Now almost none of my photos gets recognized.

With previous version I had around 500 faces recognized.
Now I have about 40.
And I still have 10k queued files according to the admin interface. I can run occ recognize:classify by hand, and it does output lots of things. Most notably "Face score too low".
The queue aslo keep increasing after each cron.
It seems I can run occ recognize:classify endlessly.

@marcelklehr
Copy link
Member

It seems I can run occ recognize:classify endlessly.

the classify command does not utilize the queue tables in the database, so the queue count in the interface doesn't apply to the command.

Now almost none of my photos gets recognized.

Mh, I also changed the threshold for face detection a bit, maybe that was too overzealous :/

@marcelklehr
Copy link
Member

marcelklehr commented Dec 15, 2022

With previous version I had around 500 faces recognized.

500 face clusters or 500 face detections?

@marcelklehr marcelklehr reopened this Dec 15, 2022
@RedKage
Copy link

RedKage commented Dec 15, 2022

With previous version I had around 500 faces recognized.

500 face clusters or 500 face detections?

40 face clusters. When before I had around 500.
It is also the number of total people count shown in the Memories > People.

I have currently 4K face detections (oc_recognize_face_detections).
Though before v3.3.3 I have never counted the face detections, so I can't compare with the current version.

@MB-Finski
Copy link
Contributor

MB-Finski commented Dec 15, 2022

Similar outcome testing 3.3.3. ~6000 detected faces. Maybe 30 clusters. No more than 13 faces per cluster.

I'm currently experimenting with various inner radius values. As I'm testing anyways, would increasing minimum cluster density make any sense? With a somewhat larger value, outliers would perhaps not creep into clusters as easily and, on the other hand, a little bit larger radius could be used?

@MB-Finski
Copy link
Contributor

MB-Finski commented Dec 15, 2022

Come to think of it, this outcome actually makes sense given the code changes. The mega-clusters created by DBSCAN didn't go anywhere now they're just not saved as known clusters. This will likely block any face within these clusters from ever being added to any identity.

EDIT: So the solution could perhaps be to have an even stricter min radius and/or larger min density?

A further possible solution just off the top of my head, would be to maybe restrict DBSCAN to run on only smaller batches of images (maybe group the batches by date)? This would reduce the possibility of pure "noise" from connecting unrelated clusters which will eventually lead to these mega clusters given large enough sample sizes.

EDIT2: Currently testing with cluster density 8 and radius 0.25 -- getting pretty good results but I'm sure these are still far from optimal settings (and optimal settings will depend on the source data, of course). I also set MAX_INNER_CLUSTER_DISTANCE = 999.0 and the mega clusters have not made a reappearance.

@marcelklehr
Copy link
Member

marcelklehr commented Dec 16, 2022

@MB-Finski feel free to drop by our matrix channel for discussing this, I'm also playing around with this atm :)

https://matrix.to/#/#marcelklehr_recognize:gitter.im

@farhills
Copy link

Just some feedback: 3.3.3 is significantly improved! Mega clusters no more, and very few errors within clusters.

I'm still re-indexing everything, but so far my impression is face detections are a bit low for my personal preference (I'd rather err on the side of grabbing everyone in the photo, at the expense of more small clusters). +1 for the feature request to expose the detection and clustering parameters in the settings UI so that we can tweak the system to our preference.

@marcelklehr
Copy link
Member

marcelklehr commented Dec 27, 2022

@farhills Thanks for the feedback!

I've just released https://github.com/nextcloud/recognize/releases/tag/v3.3.4 which should improve this even more and includes incremental clustering, which should significantly speed up clustering.

@farhills
Copy link

farhills commented Dec 27, 2022

Before I install I'll grab some screenshots to compare before/after. Is there a need to fully remove and reinstall, or can I just recrawl to update the classifications on existing photos?

PS thanks for working on this!

@marcelklehr
Copy link
Member

@farhills You can use the clear faces command and then toggle the face recognition setting in the admin settings once.

@farhills
Copy link

Results are in: substantially faster! I cheated on this go and only did faces (no object detection), but run time was ~24h whereas with 3.3.3 (face and object) it ran for 3-5 days. (No audio/video on either run)

Good news: the number of faces (sum of all clusters) is a lot higher. Only ~200 total faces in 3.3.3, now ~2000. 10x improvement!

Bad news: differentiation between individuals isn't tight enough. In 3.3.3 I had 50 'named' clusters, plus another 15-20 random people grabbed from photo backgrounds. In 3.3.4 it's only found 8 unique faces, lots of mixed clusters (dare I say mega clusters?). From what I've reviewed, there were zero false-positive 'is it a face' detections. The issue is the clusters include too many 'similar' but unique faces.

The 'is it a face' threshold can be lowered, but the 'is this the same face' threshold during the clustering needs to be raised.

The big cluster issue raises a second UI problem that is probably shared jointly between Recognize and photos/memories. In those 8 unique faces, there are many mis-identified individuals. But since those people don't have their own cluster, I don't have a GUI option to assign them anywhere. The only option is 'remove person' which removes them from the cluster, but hides them away forever. To fix:

  • Photos/Memories should have the option to 'add new person' when assigning photos to a different person during sorting of clusters. This would in effect be manually creating an empty cluster, then placing the selected photo(s) for reassignment into it. (I realize Photos/memories aren't your projects, but I don't know who would need to do what to make this possible)
  • Recognize maybe should re-process/cluster excluded faces? I realize this could get very annoying for anyone trying to make random strangers go away. It might require separate UI options for 'permanently ignore person' vs 'remove person from this cluster and re-process'.

Recognize v3.3.3 after processing and merging clusters:
Recognize v3 3 3 complete run redacted

Recognize v3.3.4 after processing, partial merging of clusters:
Recognize v3 3 4 complete run redacted

@marcelklehr
Copy link
Member

@farhills thank you for reviewing! After evaluating the situation with @MB-Finski on gitter we believe that shit-clusters (as I like to call them) largely result from improper encoding of partially visible faces of the Neural network we employ, so there's nothing in the clustering algorithm we can do. As you've noted, I've tried my best to filter out non-faces from the face detections (by excluding small faces which are often not visible enough to the encoder, and by increasing the face probability threshold, which excludes faces that the network is not too confident about).

I've tried using a different library for extracting face embeddings, but didn't have much luck as it's not fine tuned yet

@marcelklehr
Copy link
Member

Personally, with v3.3.4 I'm seeing the best results yet on my production machine. One shit cluster, but that's to be expected as per above.

@bsaggy
Copy link
Author

bsaggy commented Jan 2, 2023

Hi @marcelklehr, thank you for your work on this! v3.3.4 is much improved! "Mega-clusters" seem to be fixed. On my most recent run, there were several "shit clusters" with a dozen or two images in each. I reassigned those images to different clusters or just removed them as needed. While it was a bit of a pain, it was much more manageable than the thousands of clusters in earlier versions.

I have a couple clusters with 1-3k images in them. These were mostly all of the same person, but did exhibit some "shit cluster" attributes by at times including all different people in the cluster. It took some time, but I reassigned/removed those images as needed.

I am overall very impressed at the clustering of images in v3.3.4 - I have pictures of the same people from the past 15 years that were categorized into their respective clusters, whereas in previous versions this seemed to have created many different clusters!

I have a few comments/questions on issues or feature requests which I'll create separate issues on. One of those relates to #442 - I have 22k faces detected, 11k of which are categorized as NULL. I am confident that many of these should be added to existing clusters or warrant a new cluster. Is there any way to debug why a face is not getting categorized at all?

And can you help me understand your comment here?

we believe that shit-clusters (as I like to call them) largely result from improper encoding of partially visible faces of the Neural network we employ, so there's nothing in the clustering algorithm we can do.

I am sure that many of my NULL categorized images are not partially visible faces, but I definitely don't understand the inner workings of the Neural network.

Thanks!
Brian

@phil-lipp
Copy link

Personally, with v3.3.4 I'm seeing the best results yet on my production machine. One shit cluster, but that's to be expected as per above.

I reset my recognized faces yesterday and initiated a recrawl and the results for now are really great! I can already see a great improvement concerning clustering compared to earlier 3.x versions - thanks for your work @marcelklehr !

@marcelklehr
Copy link
Member

With v3.5.0 (out today) we've replaced the clustering algorithm with a better one, which should greatly improve the issues outlined here. After updating to v3.5.0 you can run occ recognize:reset-face-clusters and occ recognize:cluster-faces to re-run clustering. Let me know how it goes! 🚀

@adrianog91
Copy link

adrianog91 commented Feb 12, 2023

Very very good improvements!

@peperjohnny

This comment was marked as off-topic.

@phil-lipp

This comment was marked as off-topic.

@marcelklehr
Copy link
Member

the results for now are really great

Very very good improvements!

With this feedback I'm closing this thread for now. We'll continue to look into improving clustering, but I believe we've found a good spot in the solution space. (Thanks to @MB-Finski for working on this!)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests