Implement proof-of-concept Web upload / import #21

lastzero · 2018-10-08T07:52:30Z

It should not be required to use the command-line interface to start importing. Users should also be able to upload photos instead of putting them in the import directory directly.

kngu9 · 2018-10-17T23:57:45Z

@lastzero I can handle this.

lastzero · 2018-10-18T07:34:16Z

Did you do something like that before with go & VueJS? One thing is the UI and the other the server API (see internal/api). You should be able to use the existing indexMediaFile() function for indexing after upload. Didn't have the time to write a full concept (and I'm on vacation right now), so I thought I do it myself. But any ideas / help is welcome. Let's use the gophers Slack chat if you have questions.

kngu9 · 2018-10-18T13:58:35Z

@lastzero I already finished the uploading part, I'm going to start working on the indexing today. It wouldn't be hard, there's an example in your import command.

lastzero · 2018-10-18T14:15:05Z

Oh, that was fast. Let me know if you need a code review or you have questions. I'll start coding again next week. At the beach right now. 🌴

lastzero · 2018-11-09T20:46:56Z

See [WIP] Web Upload #44

rayrrr · 2019-06-14T12:06:37Z

Update: #44 was stale and closed without merging. We need someone else to do this.

lastzero · 2019-06-14T15:13:02Z

@rayrrr I've done this but forgot to reference the commit: 60e9346

lastzero · 2019-06-14T15:27:33Z

rayrrr · 2019-06-15T12:27:01Z

@lastzero I pulled the latest Docker image after you posted your message here and while I see the Upload button now, just like in your screenshot, and I can get it to accept files and upload them, the uploaded images do NOT show up under "Photos." I've tried subsequently running index and thumbnail CLI commands but still nothing. On a Mac if that helps. Can provide logs.

lastzero · 2019-06-15T16:13:16Z

Did you reload and sort by import date?

lastzero · 2019-06-15T16:19:22Z

Or try our demo at demo.photoprism.org - somebody already uploaded a photo there (we might disable that soon to prevent abuse)

rayrrr · 2019-06-17T14:04:22Z

@lastzero I tried the upload feature on the demo site with a couple of random images I downloaded from the internet and it works! Running with docker-compose locally, I can do the same with those two images.

I still can't upload images that I've taken myself with a DSLR camera to my local instance though...I'm going to investigate further and file a new bug if needed.

Thank you for adding this feature!

lastzero · 2019-06-17T15:42:24Z

@rayrrr Are those RAW files? DNG and Canon RAWs are tested, other RAW formats might be supported if Darktable supports them AND there is a readable Exif header (like in TIFF or JPG files).

rayrrr · 2019-06-17T19:11:36Z

@lastzero I'm using a Canon 7D. Tried Canon RAWs and Canon JPGs. Can't get either to work. Here's an example of the log generated during the failed attempt:

photoprism_1  | [GIN] 2019/06/17 - 19:10:36 | 200 |  156.090306ms |      172.31.4.1 | POST     /api/v1/upload/1560798635945
photoprism_1  | time="2019-06-17T19:10:36Z" level=info msg="importing photos from /srv/photoprism/photos/Import/upload/1560798635945"
photoprism_1  | time="2019-06-17T19:10:36Z" level=info msg="moving main jpg file \"IMG_9729.JPG\" to \"/srv/photoprism/photos/Originals/-6147/04/-61470411_164528_83B434367A70.jpg\""
photoprism_1  | 
photoprism_1  | (/go/src/github.com/photoprism/photoprism/internal/photoprism/indexer.go:244) 
photoprism_1  | [2019-06-17 19:10:39]  Error 1292: Incorrect datetime value: '-6147-04-11 16:45:28' 
photoprism_1  | time="2019-06-17T19:10:39Z" level=info msg="adding labels: [{Name:house front Source:image Uncertainty:53 Priority:0 Categories:[window house building architecture]} {Name:house front Source:image Uncertainty:60 Priority:0 Categories:[window house building architecture]}]"
photoprism_1  | 
photoprism_1  | (/go/src/github.com/photoprism/photoprism/internal/models/photo_label.go:33) 
photoprism_1  | [2019-06-17 19:10:39]  Error 1364: Field 'photo_id' doesn't have a default value 
photoprism_1  | 
photoprism_1  | (/go/src/github.com/photoprism/photoprism/internal/models/photo_label.go:33) 
photoprism_1  | [2019-06-17 19:10:39]  Error 1364: Field 'photo_id' doesn't have a default value 
photoprism_1  | time="2019-06-17T19:10:39Z" level=info msg="added main jpg file \"-6147/04/-61470411_164528_83B434367A70.jpg\""
photoprism_1  | [GIN] 2019/06/17 - 19:10:39 | 200 |  3.438380163s |      172.31.4.1 | POST     /api/v1/import/upload/1560798635945

I see you reopened this issue; thanks! I'll keep the discussion here as well.

rayrrr · 2019-06-17T19:15:15Z

My first guess is that photoprism is pulling the timestamp data from the EXIF data; perhaps my camera doesn't include it there. Hoping we can make photoprism cascade down to the file date created timestamp and use that if no EXIF timestamp data is available, to resolve this.

lastzero · 2019-06-17T20:00:02Z

There are fallbacks but it looks more like there is a timestamp but in the wrong format or maybe there is an issue with the timezone... we intentionally fail in those cases so that it doesn't go unnoticed. Can you send us an example? Did you delete the db (either mysql or the files in the database folder) and try again? Database tables are changed a lot right now and not all changes are backwards compatible...

lastzero · 2019-06-17T20:03:41Z

PS: If you open the demo, you'll see that there are photos taken with a 7D, so it's not a general issue. But those were RAWs. Maybe it's a time format bug in the RAW to JPEG converter you use?

rayrrr · 2019-06-18T13:00:37Z

@lastzero I did not use a "converter" for the images. The 7D has a built-in setting where two images (one raw and one jpeg) are produced instantaneously when a photo is taken. That's how I got my jpegs, straight out of the camera.

Here is one of the photos from the batch that is causing errors. I hope this helps! https://gist.github.com/rayrrr/42fb47db62ed9574370fea03027f3cff

lastzero · 2019-06-19T00:36:36Z

@rayrrr I can confirm the bug... the year is not correct, maybe an issue with our Exif library... guess we simply stop using it and switch to XMP files created with exiftool ASAP. Those seem pretty clean and correct. Might be a bit slower though while indexing for the first time.

Alternatively we can try using a different go exif lib and see if the results are better: https://github.com/dsoprea/go-exif

rayrrr · 2019-06-19T01:21:01Z

@lastzero thanks for confirming. I will try to do some experimenting with those EXIF libs too and report back with any findings.

rayrrr · 2019-06-19T02:09:22Z

I can confirm that https://github.com/dsoprea/go-exif shows the correct DateTime attribute value for the example image I posted to that gist (using the command line reader tool). My vote would be to try that lib.

lastzero · 2019-06-19T12:29:13Z

@rayrrr Is there a chance you can send a pull request? Otherwise I'll put that on my todo for later.

lastzero · 2019-06-20T03:48:42Z

@rayrrr Done. Code is still a bit dirty and needs testing, but the bug is gone. I'll start a master build. Let me know if it works and consider a donation, if you're happy with the result. We also index ISO, exposure and GPS altitude now. More to come. Also XMP support.

lastzero · 2019-06-20T04:48:34Z

Seems like GPS coordinates are rounded now... need to investigate this further, but tomorrow.

Update: Degrees, Minutes and Seconds probably should be float not int in gps.go:

type GpsDegrees struct {
	Orientation               byte
	Degrees, Minutes, Seconds int
}

dsoprea · 2019-12-23T16:14:20Z

@lastzero I think we resolved the rounding problem with/in go-exif a few months ago. Is there anything else that I can do to support you, to get this to move forward?

lastzero · 2019-12-23T16:23:49Z

@dsoprea Thank you for that 👍 Forgot to close this ticket! Upload is done, import and index via Web UI too.

I'll update our issues and add new tasks for contributors, might take a couple of days because it's family time. Any help is most welcome.

dsoprea · 2019-12-23T17:32:33Z

Ah. Got it. That's great.

ad93ad1

It looks like you're doing a brute-force search for the EXIF block:

photoprism/internal/photoprism/exif.go

Line 81 in 8e15c1d

rawExif, err := exif.SearchFileAndExtractExif(m.Filename())

It would be quicker to try a context-specific method first (if a JPEG, try finding/parsing APP1 segment directly; if a PNG, try finding/parsing a "eXIf" chunk directly) and then fallback to the brute force method (only for file-formats that support them, currently JPG, TIF, XMP, PNG, and HEIF/HEIC). That said, is there a chance that you're doing this byte-by-byte search through formats that don't even support EXIF? I'd be worried about how much time we might be losing because at worst this can be very expensive and fruitless (like scanning RAW for EXIF?).

Note the comment here:

Obviously, it is most efficient to properly parse the media file and then provide the specific EXIF data to be parsed, but there is also a heuristic for finding the EXIF data within the media blob, directly. This means that, at least for testing or curiosity, you do not have to parse or even understand the format of image or audio file in order to find and decode the EXIF information inside of it. See the usage of the SearchAndExtractExif method in the example.

I'm currently indexing my personal photo collection into Photoprism. The cost is relevant because there are a few hundred thousand images and it seems to be ticking slowly (it makes sense if this is just largely due to the NN analysis).

lastzero · 2019-12-23T18:15:09Z

@dsoprea Note that OpenStreetMap does not allow batch operations with their public API, so you shouldn't index large photo collections just yet. My current task is to replace it with our own service that will also be much faster.

Can you send a PR for improving our exif / metadata related code? RAW may also contain Exif data and we didn't have time so far to implement format specific optimizations. I'd rather have it working for everything than just (faster) for JPEG.

Last but not least we also have XMP on our todo list, so our exif code should evolve into a general metadata abstraction. I've met the author of go-xmp in Berlin and think we can build upon this, see #68.

dsoprea · 2019-12-23T18:28:35Z

Sure. I can take a look.

Who's working on the mapping API? That sounds fun.

Is there any way to disable the queries? I really don't care for anything that we're probably doing with it. That said, my interest in "geographic enrichment" of my images is minimally. I can't really imagine that I'd much care about having more than coordinates, at least for the foreseeable future (I've already written some tooling to identify coordinates for a list of images given a list of GPXs and group these by large cities based on the population data and coordinates from the free and rich GeoNames database, with the search optimized down to a string-prefix search by converting all of the coordinates to the Google S2 representation in Hilbert space).

lastzero · 2019-12-23T18:43:37Z

We use geo information to generate meaningful titles and (soon) to group photos by time and location, see #152 and #154. Maybe we should also take a look at GeoNames and your implementation - is it on GitHub?

Use case: Many friends have thousands of photos taken at music festivals the last couple of years. If we have time and location, we can easily group them (and add the festival name to the title). Our geo service will later also return a list of public events that have taken place at a location.

dsoprea · 2019-12-23T19:34:03Z

Yes, but we'd have to review the access and caching semantics to make sure they make sense for Photoprsm. It would also need heavier caching, in general, in order to expose all of that location data, efficiently to API access. By default, it considers a "population center" to be a city of 100K or higher but it will fallback to the nearest city. I also wrote a time-series storage format that allows us to bin the location data (and any miscellaneous metadata) so we can lazy-load the data rather than scanning a fully-populated geographic index every time.

The attractor: https://github.com/dsoprea/go-geographic-attractor
The greater autogrouping project (which integrates the attractor): https://github.com/dsoprea/go-geographic-autogroup-images

The in-memory index that serves as both the frontend and caching layer: https://github.com/dsoprea/go-geographic-index
The on-disk index: https://github.com/dsoprea/time-to-go

I then got redirected for six-months writing a read-write-seekable archive format (serving both compressed filesystem representations and a derivative stream-only format) that I could use to compress the on-disk time-series without needing to decompress it first into some intermediate format. I had to take a break but only have a couple of smaller requirements left.

Where are we getting that event data from? We're not relying on crowdsourced data, are we?

lastzero · 2019-12-23T19:43:13Z

There are directories like residentadvisor.net and electronic-festivals.com, but I didn't check yet if they provide machine readable data and/or are willing to share (parts of) their database. In the worst case, we can start with adding the biggest, well-known festivals manually (e.g. Burning Man, Boom and Fusion).

dsoprea · 2019-12-23T19:51:15Z

Yeah. Servicing those massive crowds might be enough to motivate the others. Might be nice to have that be an independent process that can support other solutions and be supported by them in turn.

Is there an issue open for the geographic API task or its index/algorithms, yet?

lastzero · 2019-12-23T20:10:30Z

I'm on the geo API already but no code pushed yet. My plan is to use google/open-location-code as primary key for our database and get the data from a private OpenStreetMap instance, so basically it's a cache for our use case (labeling photos). Not sure if that works out, but it seems simple and fast.

dsoprea · 2019-12-23T20:53:13Z

I would seriously consider using S2 to encode the addresses. Not only can they unique identify every reasonable geometric point on earth with a 64-bit number, but they're implicitly grouped by prefix. The more of the prefix that two locations have in common, the closer they are in space (the localization characteristic of Hilbert spaces; this is the secret sauce). Google also made some Earth-specific refinements on top of that. That means that you reduce a clusterization problem to a string-prefix search. There's some inherit inexactness in comparisons because these are actually one-dimensional distances that snake through all points in space, so not all points are necessarily adjacent all of the time, but it has good-enough accuracy in exchange for a massive cost improvement.

Even if you don't need it now, if you're looking for a way to uniquely identify locations by a string then I'd encourage you to use S2. But, that's all of the soliciting that I'll do for it.

dsoprea · 2019-12-23T20:59:45Z

Also, what about my question about the disabling the geographic queries if I have no need for them, they're slowing down the indexing process, and you're concerned about the API usage?

#21 (comment)

lastzero · 2019-12-23T21:12:14Z

@dsoprea It's easy to add a "disable location" flag, you can probably do it yourself in like 5 minutes. However, our photo titles will be very simple like "Building / 2019" or just "Unknown" and most users certainly want additional information.

When we start testing with huge libraries and different types of users, we will certainly add more optimizations and more customization options. It's also unclear if the db structure works like that or if we need different indexes etc... we also want to index in parallel using go routines. Just removing the geo query will not solve performance issues.

lastzero · 2019-12-23T21:19:42Z

Thanks for the hint! I'll take a look at S2, didn't notice it can be useful to us :)

dsoprea · 2019-12-24T06:56:30Z

I don't need the location, especially when it has such a high indexing cost, but it sounds like it's not possible to just use the name of the file, as would be expected of most applications. Especially since there doesn't seem to currently be a way to even see the filename (which seems odd). I guess that, in the current situation, I'm stuck.

lastzero · 2019-12-24T07:34:31Z

Dustin, we can do it... but it's Christmas and I'm in a train right now. What do you expect?

We're actually indexing the name of the file, but in many cases it would look ugly to use it e.g. IMG_1235.JPG and including the path it would also be very very long. So we need to add a setting for that and maybe a detection if the file name would make a useful name or contains useful information. Remember that this is not a simple file browser, there are certainly better tools if you just want to see a filename and a preview. Why index at all in this case? Every modern file system already has an index for file names...

Also I'll be working on a metadata detail and edit view but it's not done yet... First, we need a stable data model and that's why I do locations first which is what I'm doing right now. Also I do commercial projects to finance this because we only get very little financial support. We would be a lot faster otherwise!

dsoprea · 2019-12-24T16:08:09Z

Sure, but I'm just suggesting the filename (not the whole path). Such can also be truncated if it exceeds N bytes, too. Presumably, Photoprism is a tool to find images and, currently, you can't figure out where a image is, on disk.. nor any of the other images that would be in that same collection. So, no.. not merely just file-browsing. Currently it's an index that provides no context to what's in it. I'm just putting this in perspective to where my question came from. I saw a picture yesterday, which had appeared near the top of my photos while my Photoprism instance was indexing, that I hadn't seen in years, and there was absolutely no way to find it in my collection. I tried inspecting the DOM, and I still had nothing to go on. Even just hovering over the title to show the full path would be totally reasonable (and, again, still intuitive according to what people expect based on popular conventions).

I'm not really presuming to ask you to do work, much less for usage scenarios that only originate with me. On the one hand, I'm fishing for information. On the other hand, I wouldn't say no to longterm goals that mitigated the above.

..on the other-other hand, I didn't know that you were still working to stabilize the underlying semantics. So, it makes more sense to me now.

lastzero · 2019-12-24T16:41:47Z

You can find that info in the database and probably also in the Web service response, for example when you use the Chrome dev tools. We do the easy stuff with low risk last... That includes displaying metadata. Bottom up, logic and tests first. Started as a console app, Web UI is relatively new.

lastzero · 2019-12-28T09:25:12Z

Check out 875245f and 366c70d to see what I mean... there is no way for us to provide you with database update scripts for each of those changes, they are too big. So it doesn't make sense to index large photo collections at the moment because you would have to re-index every couple of hours or days. Once people start using this in production, we are done with breaking changes - that's also why the UI should not look like we are done!

I'd be very happy about feedback regarding my implementation of S2 location IDs. Decided to go for integers and level 15 cells to save memory and storage (compared to string tokens as index and smaller cells). Might further normalize location data and move city, state and country to a separate table ("places").

dsoprea · 2019-12-29T11:49:59Z

Got it. I'm right there with you.

I'd like to contribute, especially with the EXIF and geographic stuff, at least where previously suggested. I'll do so after I close a dangerous bug that I'm currently investigating in go-exif.

lastzero · 2019-12-29T12:03:52Z

@dsoprea Excellent :) it makes most sense you improve our Exif code in the first step as you already identified specific work that needs to be done (like check the file type and then decide for a read strategy).

Again pushed major changes yesterday to prepare our database for grouping / clustering of photos (country/year/month and camera/lens). Also added new fields to our search form, previously you needed to know what's in the database to get an idea what you can search for. Code is still a bit rough and needs refactoring.

lastzero · 2019-12-30T08:44:02Z

@dsoprea Created issue #172 for you and added you as collaborator so that you can be assigned to issues in general

lastzero added important enhancement Refactoring, improvement or maintenance task labels Oct 8, 2018

lastzero added this to the MVP milestone Oct 8, 2018

lastzero added this to To do in Development Oct 8, 2018

lastzero closed this as completed Jun 14, 2019

lastzero added the released Available in the stable release label Jun 14, 2019

lastzero self-assigned this Jun 14, 2019

lastzero reopened this Jun 17, 2019

lastzero mentioned this issue Jun 20, 2019

GPS Coordinates Are Calculated Incorrectly dsoprea/go-exif#17

Closed

lastzero closed this as completed Dec 23, 2019

lastzero mentioned this issue Dec 23, 2019

Get a better understanding of Adobe XMP sidecar files #68

Closed

lastzero mentioned this issue Dec 30, 2019

Improve Exif data extraction from images #172

Closed

7 tasks

lastzero mentioned this issue Jan 11, 2020

Infrastructure: Use PhotoPrism without cloud services #180

Open

4 tasks

lastzero added the priority Supported by early sponsors or popular demand label Dec 14, 2023

Implement proof-of-concept Web upload / import #21

Implement proof-of-concept Web upload / import #21

Comments

lastzero commented Oct 8, 2018

kngu9 commented Oct 17, 2018

lastzero commented Oct 18, 2018

kngu9 commented Oct 18, 2018

lastzero commented Oct 18, 2018

lastzero commented Nov 9, 2018

rayrrr commented Jun 14, 2019

lastzero commented Jun 14, 2019

lastzero commented Jun 14, 2019

rayrrr commented Jun 15, 2019 • edited Loading

lastzero commented Jun 15, 2019

lastzero commented Jun 15, 2019

rayrrr commented Jun 17, 2019 • edited Loading

lastzero commented Jun 17, 2019

rayrrr commented Jun 17, 2019

rayrrr commented Jun 17, 2019

lastzero commented Jun 17, 2019

lastzero commented Jun 17, 2019

rayrrr commented Jun 18, 2019

lastzero commented Jun 19, 2019 • edited Loading

rayrrr commented Jun 19, 2019

rayrrr commented Jun 19, 2019

lastzero commented Jun 19, 2019

lastzero commented Jun 20, 2019

lastzero commented Jun 20, 2019 • edited Loading

dsoprea commented Dec 23, 2019 • edited Loading

lastzero commented Dec 23, 2019

dsoprea commented Dec 23, 2019 • edited Loading

lastzero commented Dec 23, 2019 • edited Loading

dsoprea commented Dec 23, 2019

lastzero commented Dec 23, 2019

dsoprea commented Dec 23, 2019 • edited Loading

lastzero commented Dec 23, 2019

dsoprea commented Dec 23, 2019 • edited Loading

lastzero commented Dec 23, 2019

dsoprea commented Dec 23, 2019 • edited Loading

dsoprea commented Dec 23, 2019

lastzero commented Dec 23, 2019

lastzero commented Dec 23, 2019

dsoprea commented Dec 24, 2019 • edited Loading

lastzero commented Dec 24, 2019

dsoprea commented Dec 24, 2019 • edited Loading

lastzero commented Dec 24, 2019

lastzero commented Dec 28, 2019

dsoprea commented Dec 29, 2019

lastzero commented Dec 29, 2019

lastzero commented Dec 30, 2019

rayrrr commented Jun 15, 2019 •

edited

Loading

rayrrr commented Jun 17, 2019 •

edited

Loading

lastzero commented Jun 19, 2019 •

edited

Loading

lastzero commented Jun 20, 2019 •

edited

Loading

dsoprea commented Dec 23, 2019 •

edited

Loading

dsoprea commented Dec 23, 2019 •

edited

Loading

lastzero commented Dec 23, 2019 •

edited

Loading

dsoprea commented Dec 23, 2019 •

edited

Loading

dsoprea commented Dec 23, 2019 •

edited

Loading

dsoprea commented Dec 23, 2019 •

edited

Loading

dsoprea commented Dec 24, 2019 •

edited

Loading

dsoprea commented Dec 24, 2019 •

edited

Loading