Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement proof-of-concept Web upload / import #21

Closed
lastzero opened this issue Oct 8, 2018 · 48 comments
Closed

Implement proof-of-concept Web upload / import #21

lastzero opened this issue Oct 8, 2018 · 48 comments
Assignees
Labels
enhancement Refactoring, improvement or maintenance task priority Supported by early sponsors or popular demand released Available in the stable release
Projects
Milestone

Comments

@lastzero
Copy link
Member

lastzero commented Oct 8, 2018

It should not be required to use the command-line interface to start importing. Users should also be able to upload photos instead of putting them in the import directory directly.

@lastzero lastzero added important enhancement Refactoring, improvement or maintenance task labels Oct 8, 2018
@lastzero lastzero added this to the MVP milestone Oct 8, 2018
@lastzero lastzero added this to To do in Development Oct 8, 2018
@kngu9
Copy link

kngu9 commented Oct 17, 2018

@lastzero I can handle this.

@lastzero
Copy link
Member Author

Did you do something like that before with go & VueJS? One thing is the UI and the other the server API (see internal/api). You should be able to use the existing indexMediaFile() function for indexing after upload. Didn't have the time to write a full concept (and I'm on vacation right now), so I thought I do it myself. But any ideas / help is welcome. Let's use the gophers Slack chat if you have questions.

@kngu9
Copy link

kngu9 commented Oct 18, 2018

@lastzero I already finished the uploading part, I'm going to start working on the indexing today. It wouldn't be hard, there's an example in your import command.

@lastzero
Copy link
Member Author

Oh, that was fast. Let me know if you need a code review or you have questions. I'll start coding again next week. At the beach right now. 🌴

@lastzero
Copy link
Member Author

lastzero commented Nov 9, 2018

See [WIP] Web Upload #44

@rayrrr
Copy link

rayrrr commented Jun 14, 2019

Update: #44 was stale and closed without merging. We need someone else to do this.

@lastzero
Copy link
Member Author

@rayrrr I've done this but forgot to reference the commit: 60e9346

@lastzero
Copy link
Member Author

Screenshot 2019-06-14 at 08 25 51

@lastzero lastzero added the released Available in the stable release label Jun 14, 2019
@lastzero lastzero self-assigned this Jun 14, 2019
@rayrrr
Copy link

rayrrr commented Jun 15, 2019

@lastzero I pulled the latest Docker image after you posted your message here and while I see the Upload button now, just like in your screenshot, and I can get it to accept files and upload them, the uploaded images do NOT show up under "Photos." I've tried subsequently running index and thumbnail CLI commands but still nothing. On a Mac if that helps. Can provide logs.

@lastzero
Copy link
Member Author

Did you reload and sort by import date?

@lastzero
Copy link
Member Author

Or try our demo at demo.photoprism.org - somebody already uploaded a photo there (we might disable that soon to prevent abuse)

@rayrrr
Copy link

rayrrr commented Jun 17, 2019

@lastzero I tried the upload feature on the demo site with a couple of random images I downloaded from the internet and it works! Running with docker-compose locally, I can do the same with those two images.

I still can't upload images that I've taken myself with a DSLR camera to my local instance though...I'm going to investigate further and file a new bug if needed.

Thank you for adding this feature!

@lastzero
Copy link
Member Author

@rayrrr Are those RAW files? DNG and Canon RAWs are tested, other RAW formats might be supported if Darktable supports them AND there is a readable Exif header (like in TIFF or JPG files).

@lastzero lastzero reopened this Jun 17, 2019
@rayrrr
Copy link

rayrrr commented Jun 17, 2019

@lastzero I'm using a Canon 7D. Tried Canon RAWs and Canon JPGs. Can't get either to work. Here's an example of the log generated during the failed attempt:

photoprism_1  | [GIN] 2019/06/17 - 19:10:36 | 200 |  156.090306ms |      172.31.4.1 | POST     /api/v1/upload/1560798635945
photoprism_1  | time="2019-06-17T19:10:36Z" level=info msg="importing photos from /srv/photoprism/photos/Import/upload/1560798635945"
photoprism_1  | time="2019-06-17T19:10:36Z" level=info msg="moving main jpg file \"IMG_9729.JPG\" to \"/srv/photoprism/photos/Originals/-6147/04/-61470411_164528_83B434367A70.jpg\""
photoprism_1  | 
photoprism_1  | (/go/src/github.com/photoprism/photoprism/internal/photoprism/indexer.go:244) 
photoprism_1  | [2019-06-17 19:10:39]  Error 1292: Incorrect datetime value: '-6147-04-11 16:45:28' 
photoprism_1  | time="2019-06-17T19:10:39Z" level=info msg="adding labels: [{Name:house front Source:image Uncertainty:53 Priority:0 Categories:[window house building architecture]} {Name:house front Source:image Uncertainty:60 Priority:0 Categories:[window house building architecture]}]"
photoprism_1  | 
photoprism_1  | (/go/src/github.com/photoprism/photoprism/internal/models/photo_label.go:33) 
photoprism_1  | [2019-06-17 19:10:39]  Error 1364: Field 'photo_id' doesn't have a default value 
photoprism_1  | 
photoprism_1  | (/go/src/github.com/photoprism/photoprism/internal/models/photo_label.go:33) 
photoprism_1  | [2019-06-17 19:10:39]  Error 1364: Field 'photo_id' doesn't have a default value 
photoprism_1  | time="2019-06-17T19:10:39Z" level=info msg="added main jpg file \"-6147/04/-61470411_164528_83B434367A70.jpg\""
photoprism_1  | [GIN] 2019/06/17 - 19:10:39 | 200 |  3.438380163s |      172.31.4.1 | POST     /api/v1/import/upload/1560798635945

I see you reopened this issue; thanks! I'll keep the discussion here as well.

@rayrrr
Copy link

rayrrr commented Jun 17, 2019

My first guess is that photoprism is pulling the timestamp data from the EXIF data; perhaps my camera doesn't include it there. Hoping we can make photoprism cascade down to the file date created timestamp and use that if no EXIF timestamp data is available, to resolve this.

@lastzero
Copy link
Member Author

There are fallbacks but it looks more like there is a timestamp but in the wrong format or maybe there is an issue with the timezone... we intentionally fail in those cases so that it doesn't go unnoticed. Can you send us an example? Did you delete the db (either mysql or the files in the database folder) and try again? Database tables are changed a lot right now and not all changes are backwards compatible...

@lastzero
Copy link
Member Author

PS: If you open the demo, you'll see that there are photos taken with a 7D, so it's not a general issue. But those were RAWs. Maybe it's a time format bug in the RAW to JPEG converter you use?

@rayrrr
Copy link

rayrrr commented Jun 18, 2019

@lastzero I did not use a "converter" for the images. The 7D has a built-in setting where two images (one raw and one jpeg) are produced instantaneously when a photo is taken. That's how I got my jpegs, straight out of the camera.

Here is one of the photos from the batch that is causing errors. I hope this helps! https://gist.github.com/rayrrr/42fb47db62ed9574370fea03027f3cff

@lastzero
Copy link
Member Author

lastzero commented Jun 19, 2019

@rayrrr I can confirm the bug... the year is not correct, maybe an issue with our Exif library... guess we simply stop using it and switch to XMP files created with exiftool ASAP. Those seem pretty clean and correct. Might be a bit slower though while indexing for the first time.

Alternatively we can try using a different go exif lib and see if the results are better: https://github.com/dsoprea/go-exif

@rayrrr
Copy link

rayrrr commented Jun 19, 2019

@lastzero thanks for confirming. I will try to do some experimenting with those EXIF libs too and report back with any findings.

@rayrrr
Copy link

rayrrr commented Jun 19, 2019

I can confirm that https://github.com/dsoprea/go-exif shows the correct DateTime attribute value for the example image I posted to that gist (using the command line reader tool). My vote would be to try that lib.

@lastzero
Copy link
Member Author

@rayrrr Is there a chance you can send a pull request? Otherwise I'll put that on my todo for later.

@lastzero
Copy link
Member Author

@rayrrr Done. Code is still a bit dirty and needs testing, but the bug is gone. I'll start a master build. Let me know if it works and consider a donation, if you're happy with the result. We also index ISO, exposure and GPS altitude now. More to come. Also XMP support.

@lastzero
Copy link
Member Author

lastzero commented Jun 20, 2019

Seems like GPS coordinates are rounded now... need to investigate this further, but tomorrow.

Update: Degrees, Minutes and Seconds probably should be float not int in gps.go:

type GpsDegrees struct {
	Orientation               byte
	Degrees, Minutes, Seconds int
}

@dsoprea
Copy link
Collaborator

dsoprea commented Dec 23, 2019

@lastzero I think we resolved the rounding problem with/in go-exif a few months ago. Is there anything else that I can do to support you, to get this to move forward?

@lastzero
Copy link
Member Author

@dsoprea Thank you for that 👍 Forgot to close this ticket! Upload is done, import and index via Web UI too.

I'll update our issues and add new tasks for contributors, might take a couple of days because it's family time. Any help is most welcome.

@dsoprea
Copy link
Collaborator

dsoprea commented Dec 23, 2019

Ah. Got it. That's great.

ad93ad1

It looks like you're doing a brute-force search for the EXIF block:

rawExif, err := exif.SearchFileAndExtractExif(m.Filename())

It would be quicker to try a context-specific method first (if a JPEG, try finding/parsing APP1 segment directly; if a PNG, try finding/parsing a "eXIf" chunk directly) and then fallback to the brute force method (only for file-formats that support them, currently JPG, TIF, XMP, PNG, and HEIF/HEIC). That said, is there a chance that you're doing this byte-by-byte search through formats that don't even support EXIF? I'd be worried about how much time we might be losing because at worst this can be very expensive and fruitless (like scanning RAW for EXIF?).

Note the comment here:

Obviously, it is most efficient to properly parse the media file and then provide the specific EXIF data to be parsed, but there is also a heuristic for finding the EXIF data within the media blob, directly. This means that, at least for testing or curiosity, you do not have to parse or even understand the format of image or audio file in order to find and decode the EXIF information inside of it. See the usage of the SearchAndExtractExif method in the example.

I'm currently indexing my personal photo collection into Photoprism. The cost is relevant because there are a few hundred thousand images and it seems to be ticking slowly (it makes sense if this is just largely due to the NN analysis).

@lastzero
Copy link
Member Author

lastzero commented Dec 23, 2019

@dsoprea Note that OpenStreetMap does not allow batch operations with their public API, so you shouldn't index large photo collections just yet. My current task is to replace it with our own service that will also be much faster.

Can you send a PR for improving our exif / metadata related code? RAW may also contain Exif data and we didn't have time so far to implement format specific optimizations. I'd rather have it working for everything than just (faster) for JPEG.

Last but not least we also have XMP on our todo list, so our exif code should evolve into a general metadata abstraction. I've met the author of go-xmp in Berlin and think we can build upon this, see #68.

@dsoprea
Copy link
Collaborator

dsoprea commented Dec 23, 2019

Sure. I can take a look.

Who's working on the mapping API? That sounds fun.

Is there any way to disable the queries? I really don't care for anything that we're probably doing with it. That said, my interest in "geographic enrichment" of my images is minimally. I can't really imagine that I'd much care about having more than coordinates, at least for the foreseeable future (I've already written some tooling to identify coordinates for a list of images given a list of GPXs and group these by large cities based on the population data and coordinates from the free and rich GeoNames database, with the search optimized down to a string-prefix search by converting all of the coordinates to the Google S2 representation in Hilbert space).

@lastzero
Copy link
Member Author

We use geo information to generate meaningful titles and (soon) to group photos by time and location, see #152 and #154. Maybe we should also take a look at GeoNames and your implementation - is it on GitHub?

Use case: Many friends have thousands of photos taken at music festivals the last couple of years. If we have time and location, we can easily group them (and add the festival name to the title). Our geo service will later also return a list of public events that have taken place at a location.

@dsoprea
Copy link
Collaborator

dsoprea commented Dec 23, 2019

Yes, but we'd have to review the access and caching semantics to make sure they make sense for Photoprsm. It would also need heavier caching, in general, in order to expose all of that location data, efficiently to API access. By default, it considers a "population center" to be a city of 100K or higher but it will fallback to the nearest city. I also wrote a time-series storage format that allows us to bin the location data (and any miscellaneous metadata) so we can lazy-load the data rather than scanning a fully-populated geographic index every time.

The attractor: https://github.com/dsoprea/go-geographic-attractor
The greater autogrouping project (which integrates the attractor): https://github.com/dsoprea/go-geographic-autogroup-images

The in-memory index that serves as both the frontend and caching layer: https://github.com/dsoprea/go-geographic-index
The on-disk index: https://github.com/dsoprea/time-to-go

I then got redirected for six-months writing a read-write-seekable archive format (serving both compressed filesystem representations and a derivative stream-only format) that I could use to compress the on-disk time-series without needing to decompress it first into some intermediate format. I had to take a break but only have a couple of smaller requirements left.

Where are we getting that event data from? We're not relying on crowdsourced data, are we?

@lastzero
Copy link
Member Author

There are directories like residentadvisor.net and electronic-festivals.com, but I didn't check yet if they provide machine readable data and/or are willing to share (parts of) their database. In the worst case, we can start with adding the biggest, well-known festivals manually (e.g. Burning Man, Boom and Fusion).

@dsoprea
Copy link
Collaborator

dsoprea commented Dec 23, 2019

Yeah. Servicing those massive crowds might be enough to motivate the others. Might be nice to have that be an independent process that can support other solutions and be supported by them in turn.

Is there an issue open for the geographic API task or its index/algorithms, yet?

@lastzero
Copy link
Member Author

I'm on the geo API already but no code pushed yet. My plan is to use google/open-location-code as primary key for our database and get the data from a private OpenStreetMap instance, so basically it's a cache for our use case (labeling photos). Not sure if that works out, but it seems simple and fast.

@dsoprea
Copy link
Collaborator

dsoprea commented Dec 23, 2019

I would seriously consider using S2 to encode the addresses. Not only can they unique identify every reasonable geometric point on earth with a 64-bit number, but they're implicitly grouped by prefix. The more of the prefix that two locations have in common, the closer they are in space (the localization characteristic of Hilbert spaces; this is the secret sauce). Google also made some Earth-specific refinements on top of that. That means that you reduce a clusterization problem to a string-prefix search. There's some inherit inexactness in comparisons because these are actually one-dimensional distances that snake through all points in space, so not all points are necessarily adjacent all of the time, but it has good-enough accuracy in exchange for a massive cost improvement.

Even if you don't need it now, if you're looking for a way to uniquely identify locations by a string then I'd encourage you to use S2. But, that's all of the soliciting that I'll do for it.

@dsoprea
Copy link
Collaborator

dsoprea commented Dec 23, 2019

Also, what about my question about the disabling the geographic queries if I have no need for them, they're slowing down the indexing process, and you're concerned about the API usage?

#21 (comment)

@lastzero
Copy link
Member Author

@dsoprea It's easy to add a "disable location" flag, you can probably do it yourself in like 5 minutes. However, our photo titles will be very simple like "Building / 2019" or just "Unknown" and most users certainly want additional information.

When we start testing with huge libraries and different types of users, we will certainly add more optimizations and more customization options. It's also unclear if the db structure works like that or if we need different indexes etc... we also want to index in parallel using go routines. Just removing the geo query will not solve performance issues.

@lastzero
Copy link
Member Author

Thanks for the hint! I'll take a look at S2, didn't notice it can be useful to us :)

@dsoprea
Copy link
Collaborator

dsoprea commented Dec 24, 2019

I don't need the location, especially when it has such a high indexing cost, but it sounds like it's not possible to just use the name of the file, as would be expected of most applications. Especially since there doesn't seem to currently be a way to even see the filename (which seems odd). I guess that, in the current situation, I'm stuck.

@lastzero
Copy link
Member Author

Dustin, we can do it... but it's Christmas and I'm in a train right now. What do you expect?

We're actually indexing the name of the file, but in many cases it would look ugly to use it e.g. IMG_1235.JPG and including the path it would also be very very long. So we need to add a setting for that and maybe a detection if the file name would make a useful name or contains useful information. Remember that this is not a simple file browser, there are certainly better tools if you just want to see a filename and a preview. Why index at all in this case? Every modern file system already has an index for file names...

Also I'll be working on a metadata detail and edit view but it's not done yet... First, we need a stable data model and that's why I do locations first which is what I'm doing right now. Also I do commercial projects to finance this because we only get very little financial support. We would be a lot faster otherwise!

@dsoprea
Copy link
Collaborator

dsoprea commented Dec 24, 2019

Sure, but I'm just suggesting the filename (not the whole path). Such can also be truncated if it exceeds N bytes, too. Presumably, Photoprism is a tool to find images and, currently, you can't figure out where a image is, on disk.. nor any of the other images that would be in that same collection. So, no.. not merely just file-browsing. Currently it's an index that provides no context to what's in it. I'm just putting this in perspective to where my question came from. I saw a picture yesterday, which had appeared near the top of my photos while my Photoprism instance was indexing, that I hadn't seen in years, and there was absolutely no way to find it in my collection. I tried inspecting the DOM, and I still had nothing to go on. Even just hovering over the title to show the full path would be totally reasonable (and, again, still intuitive according to what people expect based on popular conventions).

I'm not really presuming to ask you to do work, much less for usage scenarios that only originate with me. On the one hand, I'm fishing for information. On the other hand, I wouldn't say no to longterm goals that mitigated the above.

..on the other-other hand, I didn't know that you were still working to stabilize the underlying semantics. So, it makes more sense to me now.

@lastzero
Copy link
Member Author

You can find that info in the database and probably also in the Web service response, for example when you use the Chrome dev tools. We do the easy stuff with low risk last... That includes displaying metadata. Bottom up, logic and tests first. Started as a console app, Web UI is relatively new.

@lastzero
Copy link
Member Author

Check out 875245f and 366c70d to see what I mean... there is no way for us to provide you with database update scripts for each of those changes, they are too big. So it doesn't make sense to index large photo collections at the moment because you would have to re-index every couple of hours or days. Once people start using this in production, we are done with breaking changes - that's also why the UI should not look like we are done!

I'd be very happy about feedback regarding my implementation of S2 location IDs. Decided to go for integers and level 15 cells to save memory and storage (compared to string tokens as index and smaller cells). Might further normalize location data and move city, state and country to a separate table ("places").

@dsoprea
Copy link
Collaborator

dsoprea commented Dec 29, 2019

Got it. I'm right there with you.

I'd like to contribute, especially with the EXIF and geographic stuff, at least where previously suggested. I'll do so after I close a dangerous bug that I'm currently investigating in go-exif.

@lastzero
Copy link
Member Author

@dsoprea Excellent :) it makes most sense you improve our Exif code in the first step as you already identified specific work that needs to be done (like check the file type and then decide for a read strategy).

Again pushed major changes yesterday to prepare our database for grouping / clustering of photos (country/year/month and camera/lens). Also added new fields to our search form, previously you needed to know what's in the database to get an idea what you can search for. Code is still a bit rough and needs refactoring.

@lastzero
Copy link
Member Author

@dsoprea Created issue #172 for you and added you as collaborator so that you can be assigned to issues in general

@lastzero lastzero added the priority Supported by early sponsors or popular demand label Dec 14, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Refactoring, improvement or maintenance task priority Supported by early sponsors or popular demand released Available in the stable release
Projects
No open projects
Development
  
To do
Development

No branches or pull requests

4 participants