-
-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Metadata: Automatically sanitize bad Unicode strings #2897
Comments
The software column supports unicode and the validity is verified by MariaDB. PhotoPeism also does some sanitization, but not with a focus on Unicode specific constraints. So it can be an issue with invalid Unicode in the metadata. First time someone is reporting this though. Feel free to suggest improvements, e.g. via pull request. see https://github.com/photoprism/photoprism/blob/develop/internal/entity/details.go |
Signed-off-by: Michael Mayer <michael@photoprism.app>
Signed-off-by: Michael Mayer <michael@photoprism.app>
Alright, I fixed the error by making sure that all strings extracted from metadata are valid Unicode. You are welcome to test this with the upcoming preview build! |
An updated preview build will be available for testing soon: We hope you have a few minutes to let us know if it works so we can release the update tomorrow! |
photoprism
.details
.software
Happy testing! 🎁 |
Thanks for the quick fix! Tested and can confirm the issue is fixed.
|
Signed-off-by: Michael Mayer <michael@photoprism.app>
* merge-221118: (66 commits) Frontend: Update deps in package-lock.json Frontend: Update translations.json UI: Add Electra theme photoprism#2916 MariaDB: Make version check compatible with 10.10 photoprism#2913 Weblate: Update backend translations Weblate: Update frontend translations Backend: Upgrade golang.org/x/crypto in go.mod and go.sum Develop: Upgrade base image from 221116-jammy to 221117-jammy CI: Update "docker-develop-latest" target in Makefile CI: Update deploy-develop.sh script MariaDB: Upgrade pre-installed client from v10.6 to v10.9 Videos: Add "intel" init target to force driver installation photoprism#2700 Metadata: Improve data parsing and sanitization photoprism#2897 Frontend: Update translations.json and package-lock.json Weblate: Update frontend translations Develop: Upgrade base image from 221102-jammy to 221116-jammy Frontend: Update translations.json Frontend: update options.js Weblate: Update frontend translations Weblate: Update backend translations ...
1. What is not working as documented?
After a scratch indexing run on a fresh instance (sidecar/cache dirs/DB dir empty) I get these errors in the logs:
This is one occurrence, I have about 15 (indexing 60k pictures).
2. How can we reproduce it?
Index the attached image.
3. What behavior do you expect?
No error in the logs.
4. What could be the cause of your problem?
I took a look at the picture from the log above.
Looking at the error message, I suppose the problematic info from EXIF is
"Software": "ACD Systems ????????"
. I'm not sure whether the question marks are contained verbatim in EXIF, or just indicate some garbage data, that is perhaps then stored directly into MariaDB as is. This happens for pictures related to China, which may have passed through some software that stores such strings there. At least the bytes in the error message are not valid UTF8:Is my understanding correct that this column
details.software
should contain UTF8? If so, does PhotoPrism or MariaDB ensure the validity?5. Can you provide us with example files for testing, error logs, or screenshots?
Attached
6. Which software versions do you use?
(a)
AMD64, PhotoPrism® CE Build 221105-7a295cab4
(b) MariaDB, stock settings from docker-compose.yml, i.e.
(c) Linux
7. On what kind of device is PhotoPrism installed?
Shouldn't matter but:
(This is a pretty low-powered system, 10+ years old, indexing took a couple of days, 5k videos in addition to the pictures, but no other issues)
The text was updated successfully, but these errors were encountered: