Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Index: Extract original filenames from Exiftool JSON #1892

Closed
joachimtingvold opened this issue Jan 5, 2022 · 20 comments
Closed

Index: Extract original filenames from Exiftool JSON #1892

joachimtingvold opened this issue Jan 5, 2022 · 20 comments
Assignees
Labels
enhancement Refactoring, improvement or maintenance task released Available in the stable release ux Impacts User Experience

Comments

@joachimtingvold
Copy link

joachimtingvold commented Jan 5, 2022

After import, when going through photos to add them to albums, I noticed that a photo was not found (initially by looking at original_name).

After looking further, I found that the originals photo was present, but missing the original_name column;

c28ec8ddbf6f075cdff8466ee98dfcb3  /storage/photos/archive/6D/2013/10/12/IMG_5023.CR2
c28ec8ddbf6f075cdff8466ee98dfcb3  originals/2013/10/20131012_155750_4286DBCD.cr2

root@photoprism1:/storage/photoprism# exiftool originals/2013/10/20131012_155750_4286DBCD.cr2 | grep -i date
File Modification Date/Time     : 2022:01:03 01:15:42+01:00
File Access Date/Time           : 2022:01:05 21:31:00+01:00
File Inode Change Date/Time     : 2022:01:03 01:15:42+01:00
Modify Date                     : 2013:10:12 15:57:50
Date/Time Original              : 2013:10:12 15:57:50
Create Date                     : 2013:10:12 15:57:50
Create Date                     : 2013:10:12 15:57:50.00
Date/Time Original              : 2013:10:12 15:57:50.00
Modify Date                     : 2013:10:12 15:57:50.00

MariaDB [photoprism]> select photo_uid, photo_path, photo_name, original_name, taken_at, taken_at_local, place_src, created_at from photos where photo_name like '%4286DBCD%';
+------------------+------------+--------------------------+---------------+---------------------+---------------------+-----------+---------------------+
| photo_uid        | photo_path | photo_name               | original_name | taken_at            | taken_at_local      | place_src | created_at          |
+------------------+------------+--------------------------+---------------+---------------------+---------------------+-----------+---------------------+
| pr5404t2mayh2gc3 | 2013/10    | 20131012_155750_4286DBCD |               | 2013-10-12 13:57:50 | 2013-10-12 15:57:50 | estimate  | 2022-01-03 01:00:29 |
+------------------+------------+--------------------------+---------------+---------------------+---------------------+-----------+---------------------+
1 row in set (0.134 sec)

Photos imported just before, or just after, has original_name as expected;

MariaDB [photoprism]> select photo_uid, photo_path, photo_name, original_name, taken_at, taken_at_local, place_src, created_at from photos where created_at like '%2022-01-03 01:00%';
+------------------+------------+--------------------------+------------------------+---------------------+---------------------+-----------+---------------------+
| photo_uid        | photo_path | photo_name               | original_name          | taken_at            | taken_at_local      | place_src | created_at          |
+------------------+------------+--------------------------+------------------------+---------------------+---------------------+-----------+---------------------+
| pr54041l7d327eh2 | 2013/11    | 20131119_212433_95595609 | 6D/2013/11/19/IMG_5419 | 2013-11-19 20:24:33 | 2013-11-19 21:24:33 | estimate  | 2022-01-03 01:00:01 |
| pr5404810mct4ixc | 2013/11    | 20131119_212433_FE04DDE0 | 6D/2013/11/19/IMG_5420 | 2013-11-19 20:24:33 | 2013-11-19 21:24:33 | estimate  | 2022-01-03 01:00:08 |
| pr5404k34w44u3nh | 2013/11    | 20131119_212434_616A971C | 6D/2013/11/19/IMG_5421 | 2013-11-19 20:24:34 | 2013-11-19 21:24:34 | estimate  | 2022-01-03 01:00:20 |
| pr5404okxznncmsx | 2013/11    | 20131119_212434_61396CBE | 6D/2013/11/19/IMG_5422 | 2013-11-19 20:24:34 | 2013-11-19 21:24:34 | estimate  | 2022-01-03 01:00:24 |
| pr5404t2mayh2gc3 | 2013/10    | 20131012_155750_4286DBCD |                        | 2013-10-12 13:57:50 | 2013-10-12 15:57:50 | estimate  | 2022-01-03 01:00:29 |
| pr540512dmitjb8p | 2013/11    | 20131119_212435_9C28778F | 6D/2013/11/19/IMG_5423 | 2013-11-19 20:24:35 | 2013-11-19 21:24:35 | estimate  | 2022-01-03 01:00:37 |
| pr540553a9xs31ea | 2013/11    | 20131119_212437_531F5E22 | 6D/2013/11/19/IMG_5424 | 2013-11-19 20:24:37 | 2013-11-19 21:24:37 | estimate  | 2022-01-03 01:00:41 |
| pr5405i2mcc8byx0 | 2013/11    | 20131119_212439_6715786F | 6D/2013/11/19/IMG_5425 | 2013-11-19 20:24:39 | 2013-11-19 21:24:39 | estimate  | 2022-01-03 01:00:54 |
| pr5405m1pz738sdu | 2013/11    | 20131119_212441_17E975D7 | 6D/2013/11/19/IMG_5426 | 2013-11-19 20:24:41 | 2013-11-19 21:24:41 | estimate  | 2022-01-03 01:00:58 |
+------------------+------------+--------------------------+------------------------+---------------------+---------------------+-----------+---------------------+
9 rows in set (0.108 sec)

The original_name is also gone from the files table;

MariaDB [photoprism]> select photo_uid, file_name, original_name from files where photo_uid='pr5404t2mayh2gc3';
+------------------+------------------------------------------+---------------+
| photo_uid        | file_name                                | original_name |
+------------------+------------------------------------------+---------------+
| pr5404t2mayh2gc3 | 2013/10/20131012_155750_4286DBCD.cr2     |               |
| pr5404t2mayh2gc3 | 2013/10/20131012_155750_4286DBCD.cr2.jpg |               |
+------------------+------------------------------------------+---------------+
2 rows in set (0.000 sec)

Looking in the logs, I found the following;

image

The database entry was created at 2022-01-03 01:00:29, with error log entry at 2022-01-03 01:15:48. The image is visible just fine within PP; two stacked files (raw+jpeg) with proper info.

I'm not sure where the missing original_name stems from, since there was a time difference of ~15 minutes between the import and the error (which is probably because the error is caused by the indexing). I would assume that the import is what would generate the original_name, as the indexer has no way of knowing this (as far as I know, at least).

@joachimtingvold joachimtingvold added the bug Something isn't working label Jan 5, 2022
@joachimtingvold
Copy link
Author

joachimtingvold commented Jan 5, 2022

There seems to be more photos without original_name, and all of the photos are imported (i.e. none of the photos are indexed straight from the originals/ folder). I have not looked into the below results in more detail.

MariaDB [photoprism]> select count(*) from photos where original_name='';
+----------+
| count(*) |
+----------+
|      389 |
+----------+
1 row in set (0.067 sec)

@lastzero
Copy link
Member

lastzero commented Jan 5, 2022

Did you try the Development Preview? Seems a problem with color profile detection, which was refactored.

@joachimtingvold
Copy link
Author

No, this was imported using photoprism:latest. But the error (read color metadata) would be part of the index, and not the import, yes? If so, I can't understand how that would impact original_name, as that would be set during import? (as a pure index of originals would not reveal the original filename in any way).

@lastzero
Copy link
Member

lastzero commented Jan 5, 2022

Import just moves the file, the index operation is the same then. There's only one additional parameter for the original name which is empty when indexing existing originals.

@joachimtingvold
Copy link
Author

But the import must add the info to the original_name column? If not, index would have no way of knowing this info?

@lastzero
Copy link
Member

lastzero commented Jan 5, 2022

It knows because the name is passed to the index function. Before indexing is has been finished successfully, there is no index entry and thus no row / column.

@joachimtingvold
Copy link
Author

So if indexing fails (like it seems to have done here), the original_name info is basically lost forever?

@lastzero
Copy link
Member

lastzero commented Jan 5, 2022

Yep, unless we add yet another sidecar file just for this. Logs might of course also contain infos. That's why indexing should be robust and why the color profile detection was improved today. It's a new feature and it wasn't clear what could trigger an error. Would be great to get failing files for testing as we have none.

@lastzero
Copy link
Member

lastzero commented Jan 5, 2022

JSON files in the cache folder might also contain the original name.

@joachimtingvold
Copy link
Author

joachimtingvold commented Jan 5, 2022

root@photoprism1:/storage/photoprism# sha1sum originals/2013/10/20131012_155750_4286DBCD.cr2
54ef3374bd601168e9f107b90f264acf8085cceb  originals/2013/10/20131012_155750_4286DBCD.cr2
root@photoprism1:/storage/photoprism# grep SourceFile storage/cache/json/5/4/e/54ef3374bd601168e9f107b90f264acf8085cceb_exiftool.json
  "SourceFile": "/photoprism/import/6D/2013/10/12/IMG_5023.CR2",

Is that created by the import? If so, maybe the indexer could fetch it from there? It would only be relevant if the import-called index fails, and it suceeds at a later point during normal index.

@lastzero lastzero self-assigned this Jan 6, 2022
@lastzero lastzero added the please-test Ready for acceptance test label Jan 6, 2022
@lastzero lastzero changed the title Bug: Imported photo with missing original_name Index: Extract original file names from Exiftool JSON Jan 6, 2022
@lastzero lastzero added enhancement Refactoring, improvement or maintenance task and removed bug Something isn't working labels Jan 6, 2022
@lastzero lastzero changed the title Index: Extract original file names from Exiftool JSON Index: Extract original filenames from Exiftool JSON Jan 6, 2022
@lastzero lastzero added the ux Impacts User Experience label Jan 7, 2022
@graciousgrey graciousgrey added released Available in the stable release and removed please-test Ready for acceptance test labels Jan 7, 2022
@joachimtingvold
Copy link
Author

Should index -f restore original_name from JSON for previous indexed files?

@lastzero
Copy link
Member

lastzero commented Jan 8, 2022

Yes, if it's available in JSON and the file wasn't changed meanwhile so that the JSON was updated. Check the version history for when the JSON support was added to import. Earlier versions didn't create it before renaming. Could be a few months since it was implemented.

@joachimtingvold
Copy link
Author

joachimtingvold commented Jan 9, 2022

MariaDB [photoprism]> select photo_uid, file_name, file_hash, original_name from files where photo_uid='pr5404t2mayh2gc3';
+------------------+------------------------------------------+------------------------------------------+---------------+
| photo_uid        | file_name                                | file_hash                                | original_name |
+------------------+------------------------------------------+------------------------------------------+---------------+
| pr5404t2mayh2gc3 | 2013/10/20131012_155750_4286DBCD.cr2     | 54ef3374bd601168e9f107b90f264acf8085cceb | IMG_5023.CR2  |
| pr5404t2mayh2gc3 | 2013/10/20131012_155750_4286DBCD.cr2.jpg | e6287f37c9fd942957098cd4d0e99b1e62ba2617 |               |
+------------------+------------------------------------------+------------------------------------------+---------------+
2 rows in set (0.001 sec)

root@photoprism1:/storage/photoprism# grep SourceFile storage/cache/json/5/4/e/54ef3374bd601168e9f107b90f264acf8085cceb_exiftool.json
  "SourceFile": "/photoprism/import/6D/2013/10/12/IMG_5023.CR2",

It seems to partially work? I'd expect the value of original_name to be 6D/2013/10/12/IMG_5023.CR2, and not IMG_5023.CR2 like it is now.

If there theoretically could be different root paths for SourceFile, which would make it hard to "remove" the first parts of the full path (i.e. should it remove the two first or three first directories? Or just the first? Or Four?), I'd rather have the whole path, than just the filename. The most relevant part here would be the from the first subdir where it was imported (in this case 6D), as that could be used to filter/find specific photos (specially when migrating from other photo solutions).

@lastzero
Copy link
Member

lastzero commented Jan 9, 2022

Decided against using the relative path as the base path at time of indexing is not known and absolute paths should be avoided for many reasons. Note the relative path when importing depends on what base directory you have chosen in the dropdown. In the command line, you can even import folders outside the import path.

@joachimtingvold
Copy link
Author

joachimtingvold commented Jan 9, 2022

Is the original_path used for anything besides "extra info"? If so, full path should not be a problem? In my case, the whole point of keeping original_path, would be that you could use that information after import/index (i.e. to filter etc). That is, original_name=IMG_5023.CR2 is just as "useless" as original_name='', at least for my usecase.

@lastzero
Copy link
Member

lastzero commented Jan 9, 2022

Just for reference / debugging from our side.

@joachimtingvold
Copy link
Author

Since you can filter on it with original: search, I'd say "keep as much info as possible". The full path (rather than relative path), would only be for the few edgecases where the original index (called from import) would fail.

@lastzero
Copy link
Member

lastzero commented Jan 9, 2022

I get that, but you have a specific use case with specific assumptions. We can't make it like that for everyone for the reasons outlined above. At least not before thinking all possibilities through. I don't want to expose any server internals as path names may contain customer or user IDs. Security first.

@joachimtingvold
Copy link
Author

joachimtingvold commented Jan 9, 2022

I get that, but you have a specific use case with specific assumptions.

I'd say the whole point of original_name would be to track the origin/original filename at a later point? I don't think that would be a "specific assumption" on my part, to be honest.

If security is the primary reason the path is not included, then maybe the import (that creates the JSON), should put "import path" into a separate variable in the JSON, such that it could be taken into account in this scenario. For example ImportPath=/photoprism/import in my case above, and we could strip that from SourceFile if original_name needs to be derived from the JSON during indexing.

@lastzero
Copy link
Member

I forgot to mention that PhotoPrism can use original_name to extract a title or timestamp. But the path is often not required unless the date is encoded in a form like YEAR/MONTH/DAY_foobar.jpg.

Since we have other issues to work on, at least for a while, there is nothing I can do in the short term. In the time box that was available, it didn't seem completely safe to include the path and also it could lead to wrong metadata if the full filepath has nothing to do with the photo but the backend infrastructure. So these cases need to be tested properly before we can move forward.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Refactoring, improvement or maintenance task released Available in the stable release ux Impacts User Experience
Projects
Status: Release 🌈
Development

No branches or pull requests

3 participants