Join GitHub today
GitHub is home to over 31 million developers working together to host and review code, manage projects, and build software together.Sign up
Umlaut in Title #358
An Umlaut in Exif-Title written by Lightroom gives error:
I do not know if the upload images below preserves the EXIF data, so a copy for direct download is on
It seems that your IPTC data is encoded as iso8859-1:
IPTC has a CodedCharacterSet tag that should give the encoding for the file (drewnoakes/metadata-extractor#12, https://stackoverflow.com/questions/15003031/how-to-properly-write-utf8-iptc-metadata-with-python-library-iptcinfo) but I cannot see this tag in your file.
I also discovered and tried iptcinfo3 which seems to handle this encoding tag, but is not able to decode the metadata in your file:
So one easy way to fix the issue would be to use a less strict decoding of tags. We could also think about switching to iptcinfo if it does a better job than pillow.
added a commit
Dec 28, 2018
I agree, and found the source of the problem: these fields were encoded with an old version of Lightroom; LT switched to consistent UTF-8 only in a later version. After rewriting these fields with LT 5 everything went smooth.
I have also tried with thumbsup, which handled both versions smoothly, as far I can see ExifTools is doing it the non-strict way.
Not sure if it is safe to always assume UTF-8, as the encoding seems to depend on the filesystem encoding when writing the file. For now I added a change to replace encoding errors and avoid a crash, we can revisit this later if there is a need for a better handling of IPTC encoding.