Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JPEG: Automatically check and repair broken/invalid images #2463

Closed
lastzero opened this issue Jun 24, 2022 · 17 comments · Fixed by #2721
Closed

JPEG: Automatically check and repair broken/invalid images #2463

lastzero opened this issue Jun 24, 2022 · 17 comments · Fixed by #2721
Assignees
Labels
idea Feedback wanted / feature request released Available in the stable release ux Impacts User Experience

Comments

@lastzero
Copy link
Member

lastzero commented Jun 24, 2022

Go happily refuses to open JPEGs if they contain any glitches. As a user with such files, you still want to index and view them. It would therefore make sense to automate the repair of JPEGs, especially since the process is always the same.

Step 1: Use GraphicsMagick to check a JPEG for issues

GraphicsMagick is a great tool for this task. Depending on what operating system you are using, you may need to install it first: http://www.graphicsmagick.org/identify.html

Now run this command and check the report for any problems:

gm identify -verbose broken.jpg

Note that ImageMagick may also have a command for this. This describes the specific process I used and that worked for me. There are variants and it can likely be optimized so that fewer tools and libraries need to be installed.

Step 2: Use the ImageMagick convert command to create a valid copy of the image

To create a valid copy of the original JPEG:

convert broken.jpg fixed.jpg

Any glitches should then be fixed i.e. the repaired copy fixed.jpg can be opened without problems.

Related Issues:

@lastzero lastzero added help wanted Well suited for external contributors! idea Feedback wanted / feature request ux Impacts User Experience labels Jun 24, 2022
lastzero added a commit that referenced this issue Jun 25, 2022
Found this here, although I'm really not sure how this should fix it:
- golang/go#45902 (comment)

According to the tests I added, the error "unexpected EOF" remains!
At least this change shouldn't break anything either.... Help is more
than welcome if someone has more time to read through all the issue
comments.
@pbek
Copy link

pbek commented Aug 21, 2022

Thank you very much! Even doing that manually helped me a lot with errors like import: failed creating thumbnails for 2022/08/20220820_141434_F993F0A5.jpg (EOF while decoding).
gm said gm identify: Invalid SOS parameters for sequential JPEG (20220820_141434_3172D2BD.jpg). and I could repair those images with convert.

It would be a great joy for me if PhotoPrism could do that automatically! 😸

@Lash-L
Copy link

Lash-L commented Sep 3, 2022

Would be willing to take a stab at this (no promises though). Are there any examples of some broken/invalid jpegs I could use for testing/debugging?

@pbek
Copy link

pbek commented Sep 4, 2022

Certainly there are! Thank you very much for giving it a try!

Here are my two (rather large) samples:
https://cloud.tugraz.at/index.php/s/kD2AJpyqtmQYZtT

@krystiancharubin
Copy link

@lastzero looking at the database tables it seems like we should be able to identify all the broken images by looking at files table and checking for non empty file_error.
Has something changed since may because I am not seeing entries for new panoramas that do not show up in the ui.
There is also the errors table, but we would need to parse the error message to determine the file location
import: failed creating thumbnails for 2022/09/20220906_170826_0582F88A.jpg (EOF while decoding)

@lastzero
Copy link
Member Author

lastzero commented Sep 9, 2022

It is possible that files that cannot be read at all and are not yet indexed will be skipped. Accordingly, there is no entry for them in the files table.

@AlD
Copy link
Contributor

AlD commented Sep 19, 2022

How about something like #2721?

Could of course be made configurable.

lastzero added a commit that referenced this issue Feb 21, 2023
Signed-off-by: Michael Mayer <michael@photoprism.app>
@lastzero lastzero added the please-test Ready for acceptance test label Feb 21, 2023
@lastzero lastzero self-assigned this Feb 21, 2023
@lastzero lastzero removed the help wanted Well suited for external contributors! label Feb 21, 2023
lastzero added a commit that referenced this issue Feb 21, 2023
Signed-off-by: Michael Mayer <michael@photoprism.app>
@lastzero
Copy link
Member Author

We have added a new cache/media folder for this, which contains the fixed JPEG files with a hash as filename. This way we make sure that images are updated when the original JPEG changes.

seeschloss pushed a commit to seeschloss/photoprism that referenced this issue Apr 10, 2023
@graciousgrey graciousgrey added tested Changes have been tested successfully and removed please-test Ready for acceptance test labels Apr 24, 2023
@graciousgrey graciousgrey added released Available in the stable release and removed tested Changes have been tested successfully labels May 3, 2023
@maxime1992
Copy link

For reference, I don't think this is fully fixed

#1407 (comment)

@lastzero
Copy link
Member Author

lastzero commented May 3, 2023

Let's create a new issue for that, because we will never be able to fix all possible (broken) file problems.

@graciousgrey
Copy link
Member

Here is the follow up issue

lastzero added a commit that referenced this issue Jun 29, 2023
Signed-off-by: Michael Mayer <michael@photoprism.app>
@lastzero
Copy link
Member Author

@Lash-L @maxime1992 It seems that Go's JPEG decoder received a fix earlier this year:

It would be great if anyone still following this issue could check if this resolves the problems with indexing affected JPEG files!

Our current workaround is to open the image with ImageMagick and then save an error-free copy. So this requires additional software, slows down indexing and consumes additional disk space...

@maxime1992
Copy link

Interesting. I guess I could help with testing (next week though) but unsure what you'd like for me to test?

Do you want to make a temporary branch with photoprism where I could test this?

Also, if you prefer to directly try it out on your own (might be faster than me), I had made an issue and provided a file that had the issue in the first place here: #3363

If you try out with that file and it works, you can consider it's fixed 👌

@lastzero
Copy link
Member Author

@maxime1992 One way to test this with our stable release would be to index previously affected files with ImageMagick enabled / disabled via PHOTOPRISM_DISABLE_IMAGEMAGICK:

You should also see info or debug logs when a file cannot be read with the built-in image library:

Note that the "ImageMagick" fallback does not just work for the EOF error, but all encoding issues that the Go library can't handle. So you should not expect that every "broken" JPEG file can now be indexed without it.

We've started collecting JPEG test samples here, although we could use more as these two files probably don't cover all (potential) problems:

@graciousgrey
Copy link
Member

I just tested the preview. It seems that the files still cannot be indexed when ImageMagick is disabled.

@lastzero
Copy link
Member Author

@maxime1992 While the changes did not help to read the "broken" JPEGs without re-encoding them, they broke the repair mechanism that we implemented for Samsung S21 panorama images. I've just added a fix for this, see:

An updated preview build will be available for testing soon!

@holzfelix
Copy link

I got this error what could be the reason?
gm convert: Unsupported marker type 0x19 (00019402.jpg).

@graciousgrey
Copy link
Member

@holzfelix could you send us a sample file to samples@photoprism.app so that we can have a look at this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
idea Feedback wanted / feature request released Available in the stable release ux Impacts User Experience
Projects
Status: Release 🌈
Development

Successfully merging a pull request may close this issue.

8 participants