Skip to content
This repository has been archived by the owner on Nov 1, 2020. It is now read-only.

Use EXIF taken date as sorting date for photos [$200] #40

Closed
1 task
oparoz opened this issue Sep 4, 2016 · 33 comments
Closed
1 task

Use EXIF taken date as sorting date for photos [$200] #40

oparoz opened this issue Sep 4, 2016 · 33 comments
Labels
bounty This issue have a bounty design Related to the design enhancement New feature or request

Comments

@oparoz
Copy link
Member

oparoz commented Sep 4, 2016

From @jancborchardt on July 20, 2015 17:2

Use-case: Doing a trip with friends and multiple people take photos. Afterwards when you move them all in the same folder from the different devices, the sorting is off. It’s not based off the EXIF date but instead the modified date.

We should sort photos where there is an EXIF date present by that date. @oparoz

Ref owncloud/gallery#101

Pre-requirements

  • Collect EXIF/XMP/IPTC data (#550)

There is a $200 open bounty on this issue. Add to the bounty at Bountysource.

Copied from original issue: owncloud/gallery#221

@oparoz oparoz added blocked by task bounty This issue have a bounty design Related to the design enhancement New feature or request labels Sep 4, 2016
@oparoz
Copy link
Member Author

oparoz commented Sep 4, 2016

Agreed, but for this to work well, data should be extracted and indexed in the DB at upload time, otherwise it will slow down all operations since each file would have to be opened before we're able to get the list of files.

@oparoz
Copy link
Member Author

oparoz commented Sep 4, 2016

From @jancborchardt on July 20, 2015 17:26

Maybe this is something for core then? The sorting should be right in the Files app already as well. cc @schiesbn @DeepDiver1975

@oparoz
Copy link
Member Author

oparoz commented Sep 4, 2016

Yes, I think the upload mechanism could do with adding more data to the FileInfo object attached to each file (dimensions, exif data object), but I'm not sure the Files app would benefit from it, even in thumbnails view. That kind of sorting works best for media files.

Also, I want to add the possibility of letting users add the missing data to other files so that they can mix PSD and JPEG files per example, but I'm not sure it's something people are going to do often (both adding data and mixing content).

A first step could be to introduce another of those settings which lets users who need the feature enable it despite the slowdowns. That way, it wouldn't affect the app for all the other users.
The main downside is that there is no easy way of caching the information between sessions...

@oparoz
Copy link
Member Author

oparoz commented Sep 4, 2016

From @patman15 on September 21, 2015 19:55

I just made in one evening a prototype implementation on v14.0.0 that adds the exif time to the file properties and allows sorting. Since I'm new to owncloud and galleryplus, I'm pretty sure it's not the best, but it works. Also I read using the file system might slow things down. Nevertheless, if there is interest, I could provide the patch.

@oparoz
Copy link
Member Author

oparoz commented Sep 4, 2016

From @jancborchardt on September 22, 2015 7:58

@patman15 hey, cool! :) Of course there’s interest – it would be very nice if you can open a pull request here with the patch.

@oparoz
Copy link
Member Author

oparoz commented Sep 4, 2016

@patman15 - Thanks for your work :). Please submit a PR and we'll know more about what you've done.

The actual EXIF parsing code is not really the problem here, but the fact that not all images contain that information and that if we don't store it, then we need to parse the files every time and it will be very slow for external storages.

Without a new sorting button (!), the sort by date button will mix together different types of information. We need to think about the best solution for users here. The behaviour needs to be predictable.
Let's list available options and outcomes.

@oparoz
Copy link
Member Author

oparoz commented Sep 4, 2016

From @patman15 on September 22, 2015 14:21

Ok, sure I'll submit a PR soon.

Button is drawn and added, logic is enhanced from two to three sort types. I'm still thinking about how to handle files without EXIF. Either use the modification time as fallback for those cases or for the complete album or disable the button if not all files are with EXIF.

Regarding speed: I read about the issue before I started. Since I'm new to OC programming I couldn't figure out yet where file (content) is actually accessed. Thumbnail generation might be one good point to add the parsing. Currently, I put it into the searchmediaserivce.php and there is still an enhancement by switching to stream access. Nevertheless, yes, it loads every time you open the folder.

@oparoz
Copy link
Member Author

oparoz commented Sep 4, 2016

Thumbnail generation might be one good point to add the parsing

Yes and that's done in core. There are 2 alternatives:

I'm in favour of the 2nd option as it offers much more flexibility and zero dependency on core.

Then in Gallery, we could have an ExifService (or use the one in the Exif app even if it's frown upon) which would collect the needed information.

@oparoz
Copy link
Member Author

oparoz commented Sep 4, 2016

From @patman15 on September 23, 2015 12:30

@oparoz So, PR is there, if you want to have a look. I still have to fix a few issues to increase quality.

I just tested performance on a directory with only jpg images (worst case as other mime types do not impact performance). On the directory containing 1429 images my (very small home) server took 3.9s for scanning while the patch increases the time to 6.4s (+64%). I still have a few performance increasing options available, but it won't help too much. Nevertheless, I'm quite happy with it for my personal use case.

If you could give me some more infos on how your second option should look like or whether I should try to extend the core, I would be happy to invest some more time. Just let me know if I should finalize the current patch as an intermediate solution or directly focus on a more mature solution.

@oparoz
Copy link
Member Author

oparoz commented Sep 4, 2016

Regarding performance, it would be good to time a few test cases, so that we can use it in the documentation, but that's only relevant if we don't store the information in the DB, so let's look at that first :)


Exif app

Requirements

  • Generic EXIF parser, not just limited to date collection
  • Stores the information in its own table(s)
  • Adds the information at upload time (GUI, WebDAV)
  • CRUD operations

Implementation

  • AppFramework app created with ocdev
  • Don't trust the documentation on hooks, it's bogus

I can definitely help you with setting things up. We just have to think hard to decide if an Exif apps would be useful to others, because otherwise, there is little point in keeping the functionality outside of the app.

@oparoz
Copy link
Member Author

oparoz commented Sep 4, 2016

From @patman15 on September 23, 2015 20:57

I checked for apps that could use it as well. I only found

So basically put photos on a map or different galleries. Also when thinking about which apps (e.g. on Android) use EXIF it is mostly for rotating or displaying on a map (which could be part of gallery as well) Not sure it's worth the effort to make a separate app in the first run, but for sure you have more experience. My guess is to go for a service and if it's really demanded by/interesting for others we could still move it to a separate app.

What do you think?

@oparoz
Copy link
Member Author

oparoz commented Sep 4, 2016

I was about to ask on the mailing list, but I agree. If there is demand for it, it will be straightforward to re-package the service. People will be looking for a migration path and you can decide then if you want to offer/sell one.
There is an issue open for showing pictures on a map. That can be done by invoking the maps app or by using a layer provided by a maps API. In any cases, it should be easy to implement then, once we have the pins' coordinates.

So, let's talk about architecture. I think it would be good to implement this as a part of owncloud/gallery#101

So maybe we should have a MetaDataService and a folder (like preview or config) containing various parsers/writers and we would start with the ExifParser Class?
Feel free to come up with your own proposal :)

The database classes are described in #101 and you can use Activity as an example of how to use hooks.
https://github.com/owncloud/activity/blob/master/lib/fileshooksstatic.php#L36-L38

It creates untestable code unfortunately, but that would be the quickest way to get started and we can improve on that later.

@oparoz
Copy link
Member Author

oparoz commented Sep 4, 2016

From @patman15 on September 24, 2015 9:0

I did a bit of research how EXIF information is handled in other viewers. The most general approach -- which is used in some famous viewers -- is to sort files with EXIF and files without in separate blocks. Then there is sometimes an option to use modification date as fall back (default disabled). I like this quite a lot as we would not need an additional sorting button but, unlike in my pull request.
@jancborchardt Nevertheless, if we parse all EXIF information there are a number of sorting options we would need a GUI as soon as not only date is interesting as mentioned in #101.

@oparoz To summarize:

Exif app

Requirements

  • Generic EXIF parser, not just limited to date collection
  • Stores the information in its own table(s)
  • Adds the information at upload time (GUI, WebDAV)
  • CRUD operations

Implementation

Next steps

  • Keep PR as patch for people not willing to wait?
  • Ask other people about whether to make a separate app?
  • I'll make a prototype implementation for the EXIF service

Sounds like a (reasonable) plan?

@oparoz
Copy link
Member Author

oparoz commented Sep 4, 2016

is to sort files with EXIF and files without in separate blocks

I think this is going to be a barrier, because it requires making big changes to the photowall to make it usable. People won't know there are more images much lower in the wall, in another block.

an option to use modification date as fall back (default disabled)

That could be very messy, with big blocks of pictures with exif data, in the middle of the rest of the pictures, unless people carefully curate their collections.
I guess, if this is optional and comes with a list of caveats, people enabling it know what they're getting into, until we can come up with a better design.

Keep PR as patch for people not willing to wait?

Yes. That could help us get some feedback. I'm not sure Gallery wants patches, so maybe keep this in Gallery+? @jancborchardt @karlitschek

Ask other people about whether to make a separate app?

I'd say, let's use your fresh stock of energy to get the service built first and we'll see about the app when that's done :)

I'll make a prototype implementation for the EXIF service

Cool. I'll move the specs over to #101, since this is the way forward.

@oparoz
Copy link
Member Author

oparoz commented Sep 4, 2016

OK, so I've updated #101. @patman15 , please refer to that for the general guidelines and I'll update it if it needs be.
This issue here will be about the specifics of the Exif parser and how to make sorting by date work.

@oparoz
Copy link
Member Author

oparoz commented Sep 4, 2016

From @deMattin on September 24, 2015 10:48

I'm the same opinion as @oparoz not to give an automatic date sorting on files date or exif date.
This can be very confusing in some not very well maintained collections.
It's not only a matter of existence of Exif data because often Exif dates exist, but are wrong (empty battery of cam before useage and forgotten to set date and time).

So sorting on Exif data should always be an additional option.

@oparoz
Copy link
Member Author

oparoz commented Sep 4, 2016

From @patman15 on September 24, 2015 11:4

Well, we can keep a separate button (I'm happy with that) nevertheless we need to define what should happen if you choose Exif sorting and the information is partly missing. Maybe @oparoz you got me wrong, the current tools do sort all files without Exif info by modification time first and then all with exif information by this date. It's not perfect but if you choose the fall back it's like you said somehow mixed.
Before reading @deMattin comment I would have assumed Exif date <= modification date but even that is not always true. :-( Anyhow all of this can be done just by minor adjustments to the javascript sorting algorithm, after the data is available. :-)

@oparoz
Copy link
Member Author

oparoz commented Sep 4, 2016

Anyhow all of this can be done just by minor adjustments to the javascript sorting algorithm, after the data is available. :-)

Agreed :)

@oparoz
Copy link
Member Author

oparoz commented Sep 4, 2016

From @rhatguy on September 24, 2015 13:30

I'm more than thrilled to see the progress being made here. I wanted to throw out one use case in case you guys hadn't thought of it yet. I upload my pictures into a folder that is an external storage mountpoint in owncloud. Then sometime later (maybe years) I may go back and add further exif information (GPS coordinates...facetags..etc). It would be good for the implementation to have some way to trigger a re-scan of the exif data in case the files are updated outside of ownclouds control. I can understand though if the final answer is "the files should be under owncloud control" :)

I'm willing to help if testing is needed. I have a 27k file picture repository with pics and videos.

@oparoz
Copy link
Member Author

oparoz commented Sep 4, 2016

@rhatguy - That could be solved:

  • via a TTL, but people may not like the slowdown happening on a weekly basis and it may always be the wrong time span
  • via the command line or a button in user settings

I think solution 1 is not good enough and solution 2 will take a while since we still don't have settings in the app.

@oparoz
Copy link
Member Author

oparoz commented Sep 4, 2016

From @patman15 on September 24, 2015 16:53

Trying to get familiar with owncloud concepts ...

@oparoz Just thinking (maybe too) loud: if we talk about "meta data support" we need parsers for different types. Looking through the thumbnail service in core I can find excellent patterns, we would need/should use as well:

  • configurable list of supported mime types
  • providers for different mime types and all the patterns around that
  • all necessary functions/hooks for create/update, etc.

I know we discussed two days ago that this is the ideal point. Looking at the requirements #101 it's more or less duplicating thumbnail generation, name it meta data generation, and replace a bit of logic in the individual providers for the mime types. Further, meta data might be relevant also for non images, e.g. PDFs where you have author, ...

Just want to clarify before I start which patterns to use. I can of course skip the provider concept and just use a switch statement for the types and more or less put all logic into one "metadataservice".

So shouldn't we somehow reduce the scope (not metadata but image properties or so) or really go for a core extension? If (both) not, can someone give me guidance on which patterns are worth the effort for the intended functionality?

@oparoz
Copy link
Member Author

oparoz commented Sep 4, 2016

@patman15 - I'm guessing you're making reference to the Preview Class?

If I understand you correctly, you'd rather add a metaData() method to each preview provider, collecting the information and returning it, ready to be added to the database by the Preview class?

One problem is that the preview system can be disabled. That would stop all meta data collection. core's solution is...an app! owncloud-archive/metadata#1

Attributes - No visible attributes as key-values that can be attached to a file/folder/object. This can be used to store exif data for a picture, id3 data for an mp3 or other attributes.

Using that app means that you won't be able to start development until next year, at the earliest, but it means that all the data collected would be made available via WebDAV.

Hacking the preview class is not a bad idea, but beware. It's a dinosaur pining to be refactored.
Also, the Image class contains a couple of EXIF methods which should be extracted.

If you think it makes more sense, go for it. Just open an issue in core describing what you want to achieve, so that others can voice their opinion. I know the preview class and its providers are not very popular and that the latter will eventually move out of core, so there might not be many people jumping in joy.
Once the solution is ready, we can always include a patch of the PR in Gallery+ as a preview of what's coming in 9.0. And all that data can later be exposed via the metadata app.

The smaller project is, of course, to start here, focusing on media meta data, probably duplicating some of the patterns put in place in core, but without the need to use providers (I think). But it would be good to think big when designing the DB schema, so that it doesn't need too many updates if it moves to core.

@oparoz
Copy link
Member Author

oparoz commented Sep 4, 2016

From @patman15 on September 25, 2015 6:3

@oparoz Thanks a lot for the overview, I think I got an impression about the surrounding problems.

I'm guessing you're making reference to the Preview Class?

Yes, I was browsing around there. :-)

I'll do something that can be made available earlier than one year but with concepts that can survive longer. Hopefully I can spend enough time. Let's start.

Thanks again!

@oparoz
Copy link
Member Author

oparoz commented Sep 4, 2016

From @jancborchardt on September 25, 2015 20:42

@oparoz @patman15 pull requests should always go to the original Gallery, never to a fork. Period. cc @karlitschek @DeepDiver1975

@oparoz
Copy link
Member Author

oparoz commented Sep 4, 2016

If the feature is intended for Gallery, yes. It makes things easier from a legal and organisational point of view.

@oparoz
Copy link
Member Author

oparoz commented Sep 4, 2016

From @patman15 on September 27, 2015 7:36

I did some more research and also performance tests. Here is a short summary of the three options I found:

Method Effort Speed Dependency File Types example*
exif_read_data low fast (1x) none TIFF/JPG 15s
exiftool higher slower (2x) tool installed 65+ 35s
imagick low slow (20x) no additional ~5-10 348s

*) example: 1420 JPG files on local storage, initial scan

My suggestion would be to go with exiftool (can produce php, xml, or json output directly) and use exif_read_data as fall back. Exiftool can be run as a separate process in batch mode, so it has low overhead.
The only issue that still worries me at the moment are external storages as they would require to transfer all files locally first and then scan them. Thus, speed would go down to the one of imagick unless I find a way to pipe data in and read only the necessary (how much?) header of all files.
Database is rather easy as all tools use "[section]:[tag]" => "[value]" whereas [section] can be [section1]:[section2]:...

@oparoz, @jancborchardt are you ok with the additional dependency for exiftool? It would save a huge implementation effort regarding all the file types, etc. Any other thoughts?

@oparoz
Copy link
Member Author

oparoz commented Sep 4, 2016

I'm OK with using exiftool with a exif_*_data fallback.
As you've probably discovered, there is even a composer library to make things easier:
https://github.com/romainneutron/PHPExiftool

I'm not worried about the overhead since we should really only collect the data at upload time and provide a separate scanner for existing files.

Regarding external storage, the files are already transferred locally in order to create the preview. If the hook gives us access to that local representation, then there shouldn't be any problem. If it gives us a "link" to the external file, then it will be difficult to stream the data if I'm not mistaken as not all backends support it.

@oparoz
Copy link
Member Author

oparoz commented Sep 4, 2016

Also, for performance reasons, I would use $this->configService->getSupportedMediaTypes(true, false) to make sure we only generate meta data for something we support.

@oparoz
Copy link
Member Author

oparoz commented Sep 4, 2016

From @patman15 on September 27, 2015 16:32

As you've probably discovered, there is even a composer library to make things easier:
https://github.com/romainneutron/PHPExiftool

Yes, I did, but then I thought of KISS (keep it simple stupid). The library does a lot of type conversion and is dependent on the version of exiftool. Most things we do not need (including tag writing). And:

This driver is not suitable for production

So I thought more about using something like https://github.com/tsmgeek/ExifTool_PHP_Stayopen or even better take the ideas and build my own class with the functionality we need here.

@oparoz
Copy link
Member Author

oparoz commented Sep 4, 2016

Fine by me :)

@oparoz
Copy link
Member Author

oparoz commented Sep 4, 2016

@cbxk1xg might be interested to collaborate with @patman15 on this as he's also experienced this area.

@oparoz
Copy link
Member Author

oparoz commented Sep 4, 2016

From @patman15 on November 19, 2015 19:43

Sorry, currently I'm pretty busy with my normal job so implementation is still not progressing. :-(

@oparoz
Copy link
Member Author

oparoz commented Sep 4, 2016

No problem. One thing that would be great, if you can, is to have some sort of roadmap for your project so that we can know a bit more about it and where you're at. It could be that some part can be implemented earlier and beneficial to other items.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bounty This issue have a bounty design Related to the design enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant