-
-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Faces: Error "Failed removing merged clusters for subject" seems to cause tagging of faces to become slow #2806
Comments
Another user had a similar experience here, and the same error.... but the thread is focused on the other error of "has ambiguous subject jr1e4i93ltym0svq" versus the failed removing failed cluster. In both of my experiences of this happening, I have never had a "has ambiguous subject jr1e4i93ltym0svq" error. Only "failed removing merged clusters for subject" errors. The first time was ~10 different IDs. This time it is only one ID causing the same performance impact. |
Have you discovered the "photoprism faces audit" CLI command yet and the trace log mode? There's a big chance these can give you a clue as to what is wrong. It's absolutely true that failing to remove merged clusters would lead to this kind of behavior. We could be more aggressive here, but that could lead to data loss in rare cases, for example when the user provides inconsistent assignments that don't make sense. |
@lastzero was actually waiting for it to complete. Here's the audit log |
See also: #2022 (reply in thread) |
Could it be that these faces have been marked as "hidden" in the UI? A look at the code (note the arguments of the FacesByID function!) would explain why this warning is displayed further down, and also why they could not be matched or merging might fail: It comes down to better understanding what you did and what behavior is expected. Perhaps we should still work with these hidden faces when the alternative is poor performance or other problems. |
We look forward to an answer to the above question so we can find out what is causing this. |
Tuning in from #1598, I just ran
15 minutes later, I still see this in logs:
|
Alright, did you read my last comment? See question:
|
Also enable the debug or trace mode when you run the audit command, as it displays additional logs that can be helpful. Edit: In every event, there should never be "88 clusters" to merge every 15 minutes. One more thing that would be good to know for debugging is what database (version) you use and if you know how we might reproduce this (what did you do?). |
Don't recall specifically hiding or seeing a hide button, though anything is possible as I've been fumbling my way through the software just clicking things and seeing what happens. That said I can say for sure that I:
Might have also jumped around between edit photo view and gallery/thumbnail view while doing this |
How do I debug/trace with a docker-compose based deploy?
Agreed, and it is exactly 88 clusters, every 15 minutes and maxes disk I/O plus takes the exact same ~12 seconds - every time.
The docker-compose stack runs |
Using different names for the same face ("typos") could cause conflict resolution issues, although we improved that a lot earlier this year. You can enable trace mode and run commands like this: |
See also the built-in command help: https://docs.photoprism.app/getting-started/docker-compose/#command-line-interface |
You're losing bropoints for the management script ignoring output redirection to a file, but fine - here's trace:
Let me know if this helps anything, or if there's anything else I can run to help. For reference, this is running on Proxmox, in a LXC with Alpine Linux installed, that has Docker installed in it, and PhotoPrism runs in that Docker environment as a docker-compose stack. |
There's also the faces optimize and index subcommands you might want to try. Use the command help please, these do have descriptions including descriptions for different flags you can use. What are "bropoints" and why do I lose them? I'm on my phone and preparing dinner right now 😋 |
Hah, sure will have a look around. Bropoints as in buddy/friend points, not being able to redirect to a file and copypasta straight to you (vs piecing together one screen of terminal at a time in a text editor) makes me a sad panda |
Enjoy your meal tho! No peeking at github randoms 👀 |
That's intentional, because otherwise you'll have debug logs in your SQL dump, or CSV export. This would make all our other users sad, and they wouldn't notice until they need the backup. |
Database migrations all say OK. I'm getting 8 faces matched to the same person in stats.
Does this mean the management script outputs SQL dumps / CSV to stdout instead of taking an output file to write to? Sounds a bit crazy... |
Try |
If you want to know details about the backup command, please check the command help :) |
Also, if you're nice, I might eventually show you how to pipe the logs to a file. |
Hmm. Full on
|
OK, so we know as much as before. Thanks for trying though! 👍 |
@ER-EPR Based on your profile picture, I think the user experience would be much better for you if our model worked better with Asian faces. Except for the noise your fan might make. We want to solve these and many other problems, but it takes time and money to build a team of specialists. Be aware that I have only studied Physics and read research papers, I am not a full time face recognition specialist who does nothing else in his life. We certainly can't hire someone like that with our current resources. So our next goal is to improve funding instead of introducing more and more workarounds and tiny improvements, like experimenting with the worker schedule. That ties up resources we need for the bigger goals. |
Using multiple tabs instead of waiting would almost certainly cause the behavior you observed. Thanks for letting us know! We should include that as a note in our docs. |
@QrchackOfficial The merge fail can be cured by untagging small clusters of one subject as I did in the above posts using MySQL. You can search in the markers table for a subj_uid, and see the face_id listed in the warning, and clear the tags of the small groups and let them merge with the big group.(don't untag the big group) But remember to backup your database first. It's a dangerous act. |
You are welcome to privately provide us with a SQL dump of the subjects, faces, and markers tables for debugging. This would save us hours of reverse-engineering your (potential) database content based on the GitHub issue comments, which could eventually lead us to solutions that don't work for you because we got it wrong. Thank you very much! |
The documentation has been updated for you: Legacy HardwareIt is a known issue that the user interface and backend operations, especially face recognition, can be slow or even crash on older hardware due to a lack of resources. Like most applications, PhotoPrism has certain requirements and our development process does not include testing on unsupported or unusual hardware. Asian Faces and ChildrenIt is a known issue that children and Asian-looking faces cannot be recognized reliably. Detection without automatic recognition should not be affected by that. This is because the model we use was trained with North American images, which unfortunately do not include many Asians. The absence of children in the training data comes from the fact that parents do not usually share such images under a public license (and may not have the right to do so). We will continue to improve our models over time as our resources allow. Background WorkerFace recognition was developed and tested under the assumption that the background worker runs every 15 minutes, unless the backend is busy with other tasks like indexing. It has not been tested with much longer intervals and is not designed for that. PhotoPrism's background worker groups new faces by similarity, compares faces with clusters, and optimizes existing clusters as needed. Without these routine tasks, the number of faces to be processed becomes too large. The first and next time the worker runs, it can then cause a heavy server load until all the faces, face clusters, and related pictures have been updated. The longer you wait, the more CPU is required and the longer it takes. An important reason for the worker to run independently of actual changes in the main instance is that some users change the database content directly or run additional instances, for example for indexing. It is a problem that can be solved, but it takes time. If we were to ignore this and don't run the worker at all times, it could lead to many additional support requests, further reducing the amount of time we can spend on development. The handling of changes in multiple instances will be improved over time so that the worker can be run less frequently in future releases. Removing Merged Clusters FailsUnder certain conditions, inconsistent face assignments cannot be automatically resolved by the background worker, which can result in an unusually high CPU load when it is running:
Running the following command in a terminal can resolve problems with inconsistent data: docker compose exec photoprism photoprism faces audit --fix It can also be helpful to manually check for inconsistent assignments and fix them in the user interface. Alternatively, you can use the Advanced users affected by this are welcome to privately provide us with a SQL dump of their subjects, faces, and markers database tables for debugging. Thank you very much! |
I am closing this issue as we have not received any further feedback from the original reporter and there are now too many comments, making it difficult for anyone who has not participated in the discussion to understand. Everything important is now covered in the Known Issues. If someone affected provides us with a dump of the database tables, we can fix the problem - otherwise it requires too much time and luck: 👉 https://docs.photoprism.app/known-issues/#removing-merged-clusters-fails |
I have sent you two emails containing download links for the mysql dump for the three tables. The first link may have expired, please use the second one. |
@lastzero happy new year. I'm happy to provide you with the database files for debugging, but I'd probably need some guidance in getting them. I am running into the exact same issues. Single instance, on docker. I did not manually edit the database. Happy to help in any way here. |
Same over here. It takes sometimes about a minute or maybe 2 to add someone, when it works. It sometimes says it's "busy" or just "Have you lost network" (can't remember exactly but anyway, like something went wrong). PS: I've tried to run the commands and work arounds from that issue but none of them worked for me. |
I reset my faces, even though I already had a meaningful amount of faces tagged - no more clusters to tag, from 780 original clusters, as well as individual faces that needed to be fixed in clusters where they had been mis-tagged. I am now first taking care of organizing my full library, with albums and whatnot, and then I'll do a backup and try to go after the face tagging. The commands didn't really help. |
Unfortunately, even after 5 years, I'm still the only full-time developer. I'll took into it as soon as possible. |
Hi @lastzero . Never meant this to be taken in any way other than just updating here. My last post still stands: happy to share my "broken" database if it helps, though I think it's still around 1GB. Thanks for building Photoprism. It's a fantastic app. I know there are other priorities on the table. Just let us know how we can help. |
The dump might be really helpful. However, the subjects, markers and faces tables should be enough for debugging. That will reduce the size so you can send a link or attach it as a zip to an email. Thank you! ❤️ |
Just sent it to the hello@ email address, with a link to download the database. Hope it helps - let me know if you don't get it for some reason. Have a great weekend! |
Hello @lastzero 👋
Same as what @pjft said, we very much appreciate your time on the project and we know there are some priorities. We're just here to help so that if you feel looking back into this issue at one point, you've got the data you need. Sorry if my previous message came across the wrong way! I just sent you on the hello address a dump of the 3 tables you asked for. Hope it helps 🤞 !
Based on this comment and the fact that you shall now have the data you need to debug, would you consider reopening the issue maybe? Have a good weekend |
Given the long list of comments, it seems best to open a new issue. The new issue should contain a summary of everything that seems important to get to a solution and, for example, has been mentioned here, but also in GitHub Discussions. Feel free to move ahead with this, as I'm drowning in work right now and don't know when I'll get to it. Thank you very much, also for the kind words! |
Under certain conditions, inconsistent face assignments cannot be automatically resolved by the background worker, which can result in an unusually high CPU load when it is running:
Running the following command in a terminal can resolve problems with inconsistent data:
docker compose exec photoprism photoprism faces audit --fix
It can also be helpful to manually check for inconsistent assignments and fix them in the user interface. Alternatively, you can use the
photoprism faces reset
command for a clean start if you haven't invested much time in assigning faces yet.Advanced users affected by this are welcome to privately provide us with a SQL dump of their subjects, faces, and markers database tables for debugging. Thank you very much!
1. What is not working as documented?
If there is an error of "faces: failed removing merged clusters for subject (ID)",
tagging any new faces becomes incredibly slow normally 1-3 seconds, now 2-5 minutes.
This behavior happens wether tagging a random photo through photo browser or a "new face" collection under People.
2. How can we reproduce it?
I don't know how to reproduce this error, but this is the second time that it has occurred.
The first time, I tried a bunch of a few recommendations like increasing the innodb buffersize but they didn't fix the problem.
Resetting the Face ID allowed me to tag quickly again (although it meant starting over) and it worked great over the next 60,000 photos.
Now the same error is back and the same lag is back.
Steps to reproduce the behavior:
I don't know how to FORCE the error to be created.
3. What behavior do you expect?
Some way to undo or stop the merge attempt. If a single failed removing merged cluster for a single subject can cause this much of a performance issue, make an undo button or better yet just automatically undo it. I don't know I'm just making up suggestions because I don't really know what the error means.
4. What could be the cause of your problem?
People have said this might be caused by Tagging the same person with a similar but different name. If that's true, great! It would be nice if Photo Prism stopped me from creating the conflict. or reverted the action once it realizes it is going to cause a conflict.
5. Can you provide us with example files for testing, error logs, or screenshots?
Here is a video I gave @lastzero when it happened the first time. https://1drv.ms/v/s!AjcUU9ZDJWosuu9yj-2Sf0U2_tD2tA?e=6r1SXm
6. Which software versions do you use?
(a) PhotoPrism Build Number: 220901-f493607b0
(b) Database Type & Version: MariaDB
(c) Operating System Types & Versions: Linux (Unraid + Docker)
The text was updated successfully, but these errors were encountered: