-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
Faces: Improve performance and strategy for manual tagging #3124
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I have am having this same issue and the command My background is mostly in python, but I tried to do some digging through the code. The Warning message
Just prior to this block of code MatchMarkers() is called. It is trying to match 1 Face to 4 other Faces with the same subj_uid, but it looks like the distance of the embeddings of the 4 others Faces is too far from the first Face to be considered a match. So none complete. Going back to the block of code shown above that means:
So it falls into the else block with a warning message, although for this case the more accurate message would be to state there were no orphaned cluster to remove. |
Running into this issue. Tried running the audit fix command with no change. I have the merge failed error for one single subject. Library has 100k+ files. Spent a week tagging faces. Would prefer not to reset faces. :-) BTW, Thanks for all your work! Photoprism is awesome! and I am a sponsor as well. TRAC[2023-02-05T19:31:33Z] config: defaults file /etc/photoprism/defaults.yml does not exist |
Signed-off-by: Michael Mayer <michael@photoprism.app>
Signed-off-by: Michael Mayer <michael@photoprism.app>
I now have a clearer idea of the underlying problem and think we can improve the situation by not matching manually labeled faces if they do not meet the constraints set for automatic clustering (more specifically, cluster core and core distance). When testing with the pictures we had been using during development, a more aggressive approach for manually labeled faces seemed to work well, but this is only feasible for faces that the model can categorize well. For example, all these faces were matched after labeling a single face in the edit dialog: |
Thank you for the update - this looks promising! What consequences would this change bring about? Meaning, what will change by not matching manually labeled faces if they do not meet the constraints set for automatic clustering? Just curious to understand the "why the original decision, what problems did it solve" and "what consequences does it have". Not urgent, by any means - just intellectual curiosity. :) Thank you, and happy Friday! |
@pjft We can't tell what the final effect will be until it's implemented. It is expected that you will have to tag a few more faces that belong to the same person before the automatic matching starts, so basically how most other apps work. Since these changes require extensive testing, we will probably wait until after the next stable release to avoid delaying it even further. The sample faces we use for testing can be downloaded here: |
Of course - appreciate the time and effort in looking into this. Let us know how we can help. |
Signed-off-by: Michael Mayer <michael@photoprism.app>
This may reduce server load and prevent disks from spinning up. We welcome tests reports! Signed-off-by: Michael Mayer <michael@photoprism.app>
Signed-off-by: Michael Mayer <michael@photoprism.app>
…sm#3124 This may reduce server load and prevent disks from spinning up. We welcome tests reports! Signed-off-by: Michael Mayer <michael@photoprism.app>
Just wanted to say that this seems to be solved in the latest builds! Awesome work :) It's a breeze to label hundreds of clusters now. |
@pjft Thank you! That's good to know, since the feedback we've received on the optimizations hasn't been very enthusiastic so far. Most users didn't seem to notice (or care about) the changes when they tested our development preview builds. |
Just wanted to add to this, and share how inconsistent it can be when manually tagging (or de-tagging) faces. As such a key part of the product for me (and to help my aging parents remember people from their past) - anything that can be done to speed things up would be amazing.. The face tagging feature is the most important feature for a number of people - thank you so much for making it available |
This has been bugging me for a while, but it's now taking 200-300 seconds per face. That's enough time to start looking through the code. BackgroundThe call that starts the blocking operation, and subsequently fails resulting in 429 is The issue is that the remainder of the function can take a VERY long time to return. Upwards of 3 minutes. The application exhibits both CPU utilization (100% of 1 core) and DB traffic (distinct spikes of DB traffic and queries). The best I can tell from the logs, when
Based on log line timing, I suspect that the most time is being spent in (1),
|
To further investigate, I used DBeaver to export the faces table before and after one of these multi-minute, CPU intensive From this, I have to assume that the clusters aren't actually being merged in a meaningful way in the long-running
|
I think I found the bug causing the Optimize loop to not terminate early (and run 11 times, // Done?
if result.Merged <= c {
break
} checks to see if any faces were merged on the last run through the loop. Basically, if a call to This is a flawed assumption because:
Because the daemon runs this function every 15 minutes, it is highly likely that all face clusters which can be merged have been merged. In a static library in steady state that has at least 2 clusters which could be merged according to the embeddings matching, but the smaller cluster can't be removed due to the new cluster not matching the face, this loop will run 11 times; checking every cluster against all others with the same This could be fixed in 2 ways:
@lastzero if you let me know if you prefer fix (1) or fix (2) more, I am happy to submit a PR. There are a few other things I noticed while tracing this code: In the
To fix this, change the code to A more aggressive refactoring could also improve performance significantly: rather than querying all faces each time through the loop, we should get the list of faces at the start, then only query for face IDs with a Edit: One further (but schema-changing) optimization would be to add a column SELECT *
FROM faces f
WHERE
f.face_hidden = false
AND f.face_src = 'manual'
AND f.subj_uid <> ''
AND f.face_kind <= 1
AND EXISTS (
SELECT 1
FROM faces sub_f
WHERE
sub_f.subj_uid = f.subj_uid
AND (sub_f.optimized_at IS NULL OR sub_f.optimized_at = '')
)
ORDER BY f.subj_uid, f.samples DESC; Or thereabouts. In this way, we'd only be processing faces where at least one marker had changed; rather than doing the same work over and over.
|
Have a PR going. No confirmed fix yet. #4691 |
(from the PR for visibility by @lastzero ) This issue is deeper than initially suspected - the proposed fix (1) is necessary but not sufficient. After implementing the proposed fix (1), the loop is still running all 11 times. It seems that when there are several clusters of the same size (in my case 6) it is possible for each iteration to attempt to merge them, resulting in one new cluster being created and one old cluster being dropped. Trying to grok what is going on here... When the new cluster is created, it's ID comes from the embeddings... the centroid of all possible merge candidates. Somehow, this results in a new face hash (rounding issues? order of operations?) and the creation of a new automatic cluster which replaces the old one. Since I assumed that the intent of the md5
each time, the centroid is shifted slightly, resulting in a new Initial thought: One way around this would be to do the match in-memory, i.e. In the meantime, I will add a parameter to Edit: Noticed another problem (bug?). The sort in |
@theshadow27 Thanks for taking a look and sharing your insights! 🧐 Our current plan is to offload (most of) the calculations to the new MariaDB vector functions in an upcoming release: This should yield better results and be much faster than our own vector comparisons, which were intended as a temporary solution until we have a proper vector database/library in place. Unfortunately, as we are currently performing a major upgrade of our frontend, I do not have any code or a more specific timeline/roadmap for this yet: The edge cases that need to be handled/investigated (see your comments and what I've commented in other places) further encourage us to go this way. If you like, you are welcome to experiment with the vector functions and work on a proof of concept? ✨ Of course, if you find improvements to the current code that we can test, merge, and release within a reasonable timeframe, we'd be happy to do that as well - but my feeling is that the effort required for a fast and completely bug-free implementation is beyond what anyone could maintain/contribute... 🤔 |
@lastzero thanks for getting back, and happy new years! 🥂 First off, IMHO the current implementation/"temporary" solution is VERY good at meeting it's functional requirements - better than Apple Photos for sure. In the rare situation where it's confused a subject, it's straightforward to eject them and fix it. New markers are identified and matched quickly and accurately. Youth sports are a brutal use case, and I sincerely appreciate the effort that went into the current version! The only issue I have is with the nonfunctional UX requirement of not having excessive delays while naming many markers in quick succession. To be clear: the current state of the draft PR offers a significant improvement to the UX; by limiting the Regarding the Vector implementation - I'm already thinking about it, but the scope will be much wider than this PR, with impacts from DBSCAN up to everywhere that Match is used (at least 11 places) - likely requiring substantial rework. In addition to schema updates, a Vector implementation requires a minimum DB version that could break a lot of existing embedded/hosted installs without manual intervention during the upgrade. To keep backwards compatibility, some version of the current implementation strategy needs to be encapsulated behind an interface, further expanding the scope of refactoring, and potentially running with two different Gorm entities depending on a flag. I don't know enough about Gorm to know if that's ok or not. This is a Big project, lots of new code, lots of testing, bug hunting, and so on. As such, my suggestion is to try to get the minimum change for maximum UX improvement merged sooner than later - there are quite a few users who have reported this or derivative issues going back to October 2022. It doesn't fix the root cause, but it hides it well enough to make a big difference for those who are impacted. What do you think? P.S. If you're not comfortable rolling it out everywhere, it is not much more to hide it behind a flag (like P.P.S. I am working on another branch that has a more substantial diff; a re-write of the |
This PR implements a strategy to improve Optimize() performance by returning the correct number of modified rows in PurgeOrphanFaces and treating the result 0 as an error condition in MergeFaces. Logging around this has been improved in faces.go and faces_optimize.go. Related Comments: - #3124 (comment) - #4691 (comment)
@theshadow27 Thank you so much for your patience while we completed the frontend upgrade! 🙏 The last batch of UI/UX enhancements was released last week, so I'm happy to report that your PR is now merged and ready to be tested with our development preview build:
Any help with that is much appreciated :) |
While I am not OP, I have definitely noticed speedups with manual tagging when on the latest preview build, |
@Coloradohusky Thanks for testing! While the API could also be faster due to other performance improvements we've implemented in the meantime, it's definitely great to hear :) |
Let me start by saying this is a new issue that'll summarize a closed, but still happening issue. I'm creating this new issue based on @lastzero 's comment.
1. What is not working as documented?
For faces that were not added automatically, tagging people manually is slow to extremely slow. With some photos or persons (unsure), it takes a few seconds only (rare) but once that issue happens, most of the time it takes in my case between 1 and 2 minutes to tag 1 person. I've decided recently to move all my photos to photoprism so I've got quite a lot already, and quite a lot more to come. I have more than a couple hundreds pictures with unrecognized faces waiting to be tagged, but I'm not able to wait 1 to 2 mn for each, it'd be unmanageable.
It's documented as a known issue. I have tried the command
docker compose exec photoprism photoprism faces audit --fix
but it didn't help fixing the issue.2. How can we reproduce it?
I have no idea how exactly to reproduce this, but both @pjft and myself have shared parts of our database by email directly to help debug it. See pjft comment and mine. Best case scenario it's enough to debug, if not I'm more than happy to provide more info.
3. What behavior do you expect?
When a face is detected but unknown, it shouldn't take more than a few seconds to add it manually.
4. What could be the cause of your problem?
Really unsure, sorry.
5. Can you provide us with example files for testing, error logs, or screenshots?
As stated above, this has been done by email as it includes personal data.
6. Which software versions do you use?
(a) PhotoPrism Architecture & Build Number: AMD64, ARM64, ARMv7,...
Build 221118-e58fee0fb
(b) Database Type & Version: MariaDB, MySQL, SQLite,...
MariaDB
(c) Operating System Types & Versions: Linux, Windows, Android,...
Linux
(d) Browser Types & Versions: Firefox, Chrome, Safari on iPhone,...
Brave
andChrome
(e) Ad Blockers, Browser Plugins, and/or Firewall Software?
Probably irrelevant here. (I've tried to turn them off, doesn't change anything).
7. On what kind of device is PhotoPrism installed?
This is especially important if you are reporting a performance, import, or indexing issue. You can skip this if you're reporting a problem you found in our public demo, or if it's a completely unrelated issue, such as incorrect page layout.
(a) Device / Processor Type: Raspberry Pi 4, Intel Core i7-3770, AMD Ryzen 7 3800X,...
Intel(R) Core(TM) i3-3227U CPU @ 1.90GHz
(b) Physical Memory & Swap Space in GB
4gb and 4gb
(c) Storage Type: HDD, SSD, RAID, USB, Network Storage,...
SSD
(d) Anything else that might be helpful to know?
8. Do you use a Reverse Proxy, Firewall, VPN, or CDN?
Yes, I use SWAG which itself uses NGINX, but it's probably unrelated here as it used to work perfectly with that config.
I'll summarize what was found in the previous issue:
The text was updated successfully, but these errors were encountered: