Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Metadata: Embed XMP metadata in JPEG files #243

Open
tmb80c opened this issue Feb 5, 2020 · 22 comments
Open

Metadata: Embed XMP metadata in JPEG files #243

tmb80c opened this issue Feb 5, 2020 · 22 comments
Assignees
Labels
enhancement Refactoring, improvement or maintenance task needs-analysis Requires further investigation priority Supported by early sponsors or popular demand

Comments

@tmb80c
Copy link

tmb80c commented Feb 5, 2020

Hi,

came across of this exiting software. Not sure if the following is by design, a missing feature or a bug.

I uploaded some pictures (tried with and without sidecar file). The pictures were exports from Lightroom and included already a picture name/tile and some keywords. I could see in photoprism the EXIF data after the import but not the title / keywords from the picture file. I think title and keywords are not EXIF data but other Metadata which I can see under details in windows file explorer. Looks like photoprism does only import EXIF not title and keywords into its database. Although the imported picture file still includes the title and the keywords.

For example one picture was a macro of a ladybug. The picture name/title (not filename) was "ladybug" and the picture description included two keywords "ladybug" and "Insect". Indexing indentified it correctly as a beetle with 96% confidence - cool, great work! After indexing the title was set to "Beetle". And under labels I could only see "Beetle". I would have expected that the import would use the picture name as photoprism-title and lists under labels the imported keywords as "manual" or "imported" and the "beetle" as a result of indexing.

Another user expectation would be that photoprism uses the keywords from the imported picture to eliminate false positives. Meaning if the file already includes a keyword which matches labels or categories in photoprism then this information should help the indexing in particular if there is a low confidence.

If the above can not be implemented then reindexing should not override manual edited labels and titles after reindexing.

It is my first contribution to github.

@lastzero
Copy link
Member

lastzero commented Feb 5, 2020

Thanks for your feedback! This absolutely makes sense. We import information from EXIF and also XMP to a certain extend. Would be good to know what fields are actually used (could be "description", but XMP also has a DC title field that Lightroom uses). We should also add a "keywords" field in our database (need to figure out what EXIF/XMP field this is as well).

@lastzero lastzero self-assigned this Feb 5, 2020
@lastzero lastzero added the enhancement Refactoring, improvement or maintenance task label Feb 5, 2020
@lastzero lastzero added this to the MVP milestone Feb 5, 2020
@tmb80c
Copy link
Author

tmb80c commented Feb 5, 2020

I loaded the picture of the ladybug into the tool "Get IPTC Photo Metadata" from the IPTC organisation. The relevant XMP fields are "title" and "keywords". If title could be loaded into photoprism's title field that would be of great help. At the moment a reindexing overwrites the title which is not what the user want's once the title has been manually updated.

With the keywords used as labels I assume you can improve the tagging a lot. The system could learn from imported pictures.

In another thread I saw a discussion regarding face regocgnition. Let's assume the keyword field already includes the name of the person.......

@lastzero
Copy link
Member

lastzero commented Feb 5, 2020

Not difficult to implement, let's do this. Also need the keywords field for words extracted from file names so that the user can see and edit them. Thanks for the test files!

@lastzero
Copy link
Member

lastzero commented Feb 7, 2020

We currently don't index Exif.Image.XPTitle and other fields starting with XP, probably because this is not included in the base standard @dsoprea?

Screenshot 2020-02-07 at 14 37 04

lastzero added a commit that referenced this issue Feb 7, 2020
Added: Subject, Keywords, Comment, CameraOwner and CameraSerial

Todo: Read values from Exif.Image.XPTitle, XPSubject, XPKeywords,...
Signed-off-by: Michael Mayer <michael@liquidbytes.net>
@lastzero
Copy link
Member

lastzero commented Feb 7, 2020

Added a test for this. Hope it was OK to use the Ladybug as example image!

@lastzero lastzero added the in-progress Somebody is working on this label Feb 7, 2020
lastzero added a commit that referenced this issue Feb 7, 2020
Signed-off-by: Michael Mayer <michael@liquidbytes.net>
lastzero added a commit that referenced this issue Feb 7, 2020
Signed-off-by: Michael Mayer <michael@liquidbytes.net>
@lastzero
Copy link
Member

lastzero commented Feb 7, 2020

Need to get ready for our journey now, hope this will do for now. Code is prepared to index additional Exif fields once we get them from our meta package. Hope @dsoprea can help with that.

@tmb80c
Copy link
Author

tmb80c commented Feb 7, 2020

Added a test for this. Hope it was OK to use the Ladybug as example image!

Sure, please!

@dsoprea
Copy link
Collaborator

dsoprea commented Feb 7, 2020

How does the EXIF parsing depend on the 'meta' package: "is prepared to index additional Exif fields once we get them from our meta package."? Is it some kind of dynamic binding defined in the DB?

@dsoprea
Copy link
Collaborator

dsoprea commented Feb 7, 2020

Non-standard stuff can be indexed. We preload the standard tags at the top of the process, but can readily add more.

@lastzero
Copy link
Member

lastzero commented Feb 8, 2020

How does the EXIF parsing depend on the 'meta' package: "is prepared to index additional Exif fields once we get them from our meta package."? Is it some kind of dynamic binding defined in the DB?

We parse it there using your Exif library so that our indexer can read from the Data struct, independent where the data came from.

@lastzero
Copy link
Member

lastzero commented Feb 8, 2020

Non-standard stuff can be indexed. We preload the standard tags at the top of the process, but can readily add more.

That's what I thought but failed to figure out how yesterday... Do you have an example or can send a PR for our meta package?

@dsoprea
Copy link
Collaborator

dsoprea commented Feb 8, 2020

#243 (comment)

I was in the car at the time. I had forgotten that this was the name of the package that hosts go-exif.

#243 (comment)

It'd be here:

You'd insert something like:

// If nothing is loaded, this will be implicitly loaded at first access. 
// However, since we're about to intervene and add one, we'll become responsible 
// for loading the whole set.
exif.LoadStandardTags(ti)

it := &exif.IndexedTag{
    // The IFD that it is found in.
    IfdPath: exif.IfdPathStandardExif,

    // It's ID.
    Id: 0x1234,

    // A human-friendly name.
    Name: "SomeName",

    // The type of the data.
    Type: exifcommon.TypeShort,
}

err = ti.Add(it)
if err != nil {
    log.Errorf("exif: %s", err.Error())
    return nil
}

Let me know if you want me to help.

@dsoprea
Copy link
Collaborator

dsoprea commented Feb 8, 2020

<- Note that you'll have to import github.com/dsoprea/go-exif/v2/common.

@lastzero
Copy link
Member

lastzero commented Feb 8, 2020

Thank you! I'm on vacation for a week, pull requests welcome. I'll see what I can do while on the train. Already added a test image.

@lastzero
Copy link
Member

lastzero commented Feb 8, 2020

Wow, looks like Adobe somehow managed to add XMP / Dublin Core data to the JPEG without using a sidecar file. That's why our Exif parser doesn't find it!

<rdf:Description rdf:about=''
  xmlns:dc='http://purl.org/dc/elements/1.1/'>
  <dc:creator>
   <rdf:Seq>
    <rdf:li>Photographer: TMB</rdf:li>
   </rdf:Seq>
  </dc:creator>
  <dc:format>image/jpeg</dc:format>
  <dc:subject>
   <rdf:Bag>
    <rdf:li>Ladybug</rdf:li>
   </rdf:Bag>
  </dc:subject>
  <dc:title>
   <rdf:Alt>
    <rdf:li xml:lang='x-default'>Ladybug</rdf:li>
   </rdf:Alt>
  </dc:title>
</rdf:Description>

So what we need here is the XMP support we started working on plus a way to extract this data from a JPEG. In Exif, ImageDescription is the right field to store the title of an image. There is no Title field.

@lastzero
Copy link
Member

lastzero commented Feb 8, 2020

For embedding XMP metadata in JPEG files, see https://wwwimages2.adobe.com/content/dam/acom/en/devnet/xmp/pdfs/XMP%20SDK%20Release%20cc-2016-08/XMPSpecificationPart3.pdf

Screenshot 2020-02-08 at 12 43 06

@dsoprea Any idea how we can implement this elegantly?

@lastzero
Copy link
Member

lastzero commented Feb 8, 2020

@tmb80c Try using an XMP sidecar file instead for now. Is this possible?

@lastzero
Copy link
Member

lastzero commented Feb 8, 2020

Apparently go-xmp has a method xmp.ScanPackets(io.Reader) which we could try, see trimmer-io/go-xmp#1

@dsoprea
Copy link
Collaborator

dsoprea commented Feb 8, 2020

@lastzero

There seems to be a lot of JPEG talk for being an XMP document. The XMP data is just in another segment? What is it that you're concerned won't be done well/elegantly? Seems like it would just be a simple enumeration of the JPEG segments (which can be done via go-jpeg-image-structure) and to just scan/grab/test the one specified in the text above, no?

@tmb80c
Copy link
Author

tmb80c commented Feb 8, 2020

@tmb80c Try using an XMP sidecar file instead for now. Is this possible?

I'm not using sidecar files in Lightroom.

@lastzero
Copy link
Member

lastzero commented Feb 8, 2020

@lastzero

There seems to be a lot of JPEG talk for being an XMP document. The XMP data is just in another segment? What is it that you're concerned won't be done well/elegantly? Seems like it would just be a simple enumeration of the JPEG segments (which can be done via go-jpeg-image-structure) and to just scan/grab/test the one specified in the text above, no?

Possible, didn't try. I'm on vacation and on a train, that's as far as I got... With go-jpeg you mean the built-in JPEG lib that comes with Go? You're probably doing something similar to get the Exif data.

@dsoprea
Copy link
Collaborator

dsoprea commented Feb 8, 2020

go-jpeg-image-structure is my project, which I used from Photoprism to parse JPEGs and extract EXIF (which there are convenience functions for). I'll try to do it in the next couple of days. I'm fairly highly utilized at the moment.

@lastzero lastzero changed the title Import does not read keywords nor title of an uploaded picture Support for embedding XMP metadata in JPEG files May 7, 2020
@lastzero lastzero added important and removed in-progress Somebody is working on this labels May 7, 2020
@lastzero lastzero removed this from the MVP milestone May 7, 2020
lastzero added a commit that referenced this issue May 13, 2020
Signed-off-by: Michael Mayer <michael@liquidbytes.net>
lastzero added a commit that referenced this issue May 13, 2020
Signed-off-by: Michael Mayer <michael@liquidbytes.net>
lastzero added a commit that referenced this issue May 13, 2020
Signed-off-by: Michael Mayer <michael@liquidbytes.net>
lastzero added a commit that referenced this issue May 13, 2020
Signed-off-by: Michael Mayer <michael@liquidbytes.net>
@graciousgrey graciousgrey changed the title Support for embedding XMP metadata in JPEG files Metadata / Embed XMP metadata in JPEG files Nov 26, 2020
@graciousgrey graciousgrey changed the title Metadata / Embed XMP metadata in JPEG files Metadata: Embed XMP metadata in JPEG files Jan 5, 2021
@graciousgrey graciousgrey added the needs-analysis Requires further investigation label Nov 3, 2021
@lastzero lastzero added priority Supported by early sponsors or popular demand and removed important labels Dec 14, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Refactoring, improvement or maintenance task needs-analysis Requires further investigation priority Supported by early sponsors or popular demand
Projects
Status: Ideas 💭
Development

No branches or pull requests

4 participants