New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Checking photos repeatedly after index completed #1618
Comments
That's expected PhotoPrism regularly runs an optimize command in the background |
I checked https://docs.photoprism.org/getting-started/config-options/, couldn't find any option related to this task. How could I adjust it? I have more than 263k photos, it'll need 3 days to run once each time. |
Are you sure it needs 3 days to go through "just" 263k database rows? It's not indexing, just updating based on existing index data. You can increase the interval of the background worker via https://docs.photoprism.org/getting-started/config-options/ For example, PhotoPrism generates titles based on metadata or recognized faces. Since not all photos can be instantly updated when you change the name of a person, this happens in the background. |
Yes, It truly use 3 days. |
SELECT * FROM `photos` WHERE (place_id <> '' AND place_id <> 'zz' AND place_src <> '' AND place_src <> 'estimate') ORDER BY ABS(DATEDIFF(taken_at, '2017-02-20 15:29:32')) ASC,`photos`.`id` ASC LIMIT 1; It usually used more than 1.5s to complete this query before I add the index. |
May I ask what hardware you have? Our test server here has 100k files, but uses an 8 years old Core i3 processor. It's not doing this for days. Note that the first run may be longer. Was this the first time? |
I use a Synology DS918+ sever, run PhotoPrism in docker, all HDD drivers, just a weak J3455 processor. |
Surprised May I ask how long the initial indexing took? Looking at the hardware specs of the DS918+, in particular the processor, I'm not surprised you have performance issues (like 70% slower than even a Core i3): |
With these hardware specs, feel free to go ahead and set |
I understand your point, this is not the first time. it used 24 days to generate thumbs and complete index for the first time.
|
So when it took 24 days to index, it's not surprising location estimates also take long. It's not a cheap query, but usually also no issue. I'd disable it in your case, which should also speed up the background check. Note the worker will also compare faces when you use facial recognition etc, so I wouldn't get rid of it completely unless you know what implications this has and are OK with it. |
Yes, this is because we use a lot of SQL. We know queries and tables can be optimized further, but don't consider it the most important issue right now. You're welcome to make specific suggestions and sponsor them to be included in a future release. |
Sure, I will check SQLs at weekends, and I will set PHOTOPRISM_WAKEUP_INTERVAL to 3 months or even higher. This will solve my problem, Thanks. |
Adding this here instead of opening yet another issue. I found that mysql gets hammered with this query thousands of times according to The query gets truncated there, but maybe it's enough to know where it comes from. The individual query seems cheap and finishes almost immediately, it just runs many many times. Maybe this can be aggregated? This makes mysql occupy almost two cores for 15 minutes in my case. Not a dealbreaker per se, but I cannot start a new indexing run while this is happening, even though the UI suggests PP is idle at this point, i.e. nothing happening in the log output, "Start" button enabled again. |
The query is supposed to trigger a photo metadata / title / keywords refresh after people visible on a picture have been added, changed or removed. If it was running for 15 full minutes then maybe it needs optimization. Didn't observe any issues so far, otherwise we wouldn't have released it like that. |
@srett If it was running thousands of times then probably by a background worker while faces on photos were matched to people. Does it happen every day or did it stop? |
The background worker is independent of the indexing worker so that it doesn't get blocked randomly. Of course this doesn't mean the load caused by the background doesn't slow down indexing or that database table locking will never be an issue. We fix everything we can understand and reproduce unless we need historic hardware for this. |
Ah, so it seems it was just coincidental, that the worker kicked off after the indexing was done. However, on a first glance, the log output now looks like it's doing the same thing over and over again with no apparent progress:
I guess whatever was up with the photos I added today confused the faces engine... Also, is this something to worry about? Btw. I should mention I'm still running on 210925-96168e4b-Linux-x86_64, in case there's been any improvements there already. Hopefully have time to upgrade this weekend. |
You may try updating as we fixed and improved a few things - before we invest more time in debugging. Shouldn't endlessly resolve collisions, but I can't prove it: https://en.wikipedia.org/wiki/Halting_problem |
Running this command in a terminal may help when there are conflicts, although the latest version should be able to resolve all of them automatically:
See https://docs.photoprism.org/getting-started/docker-compose/#command-line-interface for a general command overview. You can skip the prefix if not using Docker Compose. |
I left the old version running over night to see if anything improves; it just kep doing this over and over though. Then I just updated to the 2021-10-10 and after one more iteration, it was fixed. Although after updating I thought that I should have tested whether a simple restart of the old version would have fixed it. I'll check the logs next time I import new photos. :-) That's what I saw after updating, and after that, everything was back to normal:
|
Doesn't surprise me much, clustering and matching are a mathematical hell: A tiny mistake and you end up with infinite loops or bad results. Overall, it's really not too bad for a first version and the resources we have right now. |
version: 211010-83b4f783-Linux-x86_64 (latest)
MariaDB [photoprism]> select checked_at from photos where checked_at >= '2021-10-16 08:06:01' limit 10;
+---------------------+
| checked_at |
+---------------------+
| 2021-10-16 08:21:06 |
| 2021-10-16 08:21:07 |
| 2021-10-16 08:21:07 |
| 2021-10-16 08:21:08 |
| 2021-10-16 08:21:08 |
| 2021-10-16 08:21:09 |
| 2021-10-16 08:21:10 |
| 2021-10-16 08:21:10 |
| 2021-10-16 08:21:11 |
| 2021-10-16 08:21:11 |
+---------------------+
10 rows in set (0.001 sec) |
a little strange? |
vs
Those configs don't take effect. |
You may use the GitHub search to find the code that handles specific config values. In this case, you'll find that the max value currently is 86400 (24 hours): photoprism/internal/config/config.go Line 479 in 09f50fc
This is because not running the worker will break certain functionality, like location estimates and facial recognition. We can add the ability to disable the worker completely (which is what you really want) in combination with the new |
Would you be able to test the runtime of the worker one more time with Estimates disabled in Library > Settings and |
Setting |
config.go explains my confusion. I add some new photos and will check how long will it take after I disable Estimates and FACES. I hoped PP will finish its work ASAP before, so I config |
MariaDB [photoprism]> select count(*) from lenses WHERE lens_slug <> 'zz' AND id NOT IN (SELECT lens_id FROM photos);
+----------+
| count(*) |
+----------+
| 0 |
+----------+
1 row in set (0.291 sec)
MariaDB [photoprism]> DELETE FROM lenses WHERE lens_slug <> 'zz' AND id NOT IN (SELECT lens_id FROM photos);
Query OK, 0 rows affected (12 min 38.555 sec)
MariaDB [photoprism]> DELETE l FROM lenses l LEFT JOIN (SELECT DISTINCT lens_id FROM photos) p ON l.id = p.lens_id WHERE l.lens_slug <> 'zz' AND p.lens_id IS NULL;
Query OK, 0 rows affected (0.743 sec) Please consider rewrite this SQL, maybe my optimized version? |
Sure, did you test it on SQLite as well? I don't remember how we came up with every single query, but it looks like this is something SQLite doesn't support so we implemented it in an "unoptimized" way: https://stackoverflow.com/questions/24511153/how-delete-table-inner-join-with-other-table-in-sqlite It's now possible to use different queries depending on the SQL dialect, so we can optimize it for MySQL / MariaDB only. |
I test it on SQLite v3.32.3, SSD driver sqlite> .timer ON
sqlite> select count(*) from lenses WHERE lens_slug <> 'zz' AND id NOT IN (SELECT lens_id FROM photos);
0
Run Time: real 0.054 user 0.053050 sys 0.000524
sqlite> DELETE FROM lenses WHERE lens_slug <> 'zz' AND id NOT IN (SELECT lens_id FROM photos);
Run Time: real 0.054 user 0.052925 sys 0.000601
sqlite> DELETE l FROM lenses l LEFT JOIN (SELECT DISTINCT lens_id FROM photos) p ON l.id = p.lens_id WHERE l.lens_slug <> 'zz' AND p.lens_id IS NULL;
Run Time: real 0.000 user 0.000029 sys 0.000005
Error: near "l": syntax error Subquery version works fine for SQLite. It still doesn't support BTW, I use https://github.com/dumblob/mysql2sqlite to dump all data to SQLite from MariaDB |
Please create a new GitHub issue for "SQL Optimization" when you're done so that we can put it on our roadmap 👍 |
PhotoPrism Version: 211007-8f55d6f8-Linux-x86_64
I don't add or modifiy any photo, seem it's checking all photos again.
The text was updated successfully, but these errors were encountered: