Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do not remove orientation EXIF data #683

Merged

Conversation

ekzyis
Copy link
Member

@ekzyis ekzyis commented Dec 11, 2023

Fix #678

Keeping this in draft until I confirmed this does indeed not strip orientation EXIF data - which I am going to do now.

I got the tag value 0x0112 from here.


Okay, 21a9c3e does not work because we are actually deleting all EXIF data at once. So the loop is not iterating over EXIF "tags". It's searching for the start of the EXIF data and then deleting it all at once. At least that is my current understanding which I am writing down here to not forget that this is my current "baseline of progress".

I think I need to dive deeper into the EXIF spec to be able to do what some tools can do on the CLI in JS using our existing approach.

There are libraries which handle EXIF data in JS but they seem to be either unmaintained, or not necessary. I also don't like new dependencies anymore. There is a way to do it within our code but ... might take me too long to pull it off. I'll keep trying for a few hours more at least :)

Worst case scenario, I have to look at the source code of these tools to see how they manage to keep orientation data.


Ok I stopped trying to get this to work without using a library since it is taking me too long and doing binary stuff in Javascript is definitely not my strength, lol. I am now trying to simply use piexifjs even though last update was from 4 years ago. But the EXIF spec seems to be from 1999 itself, so maybe the latest update doesn't matter? If I can get it to work with a library, maybe THEN I am able to do it without a library, lol.

I should learn to find the right order of doing things, lol. First try the simple thing so I at least have something working which I can fall back on when other things don't work out for whatever reason.


Tried to get piexifjs to work but somehow it didn't work. Didn't even get a real error message. Trying out sharp now because it was mentioned as an alternative in a ticket. Turns out that is super maintained since the last commit is literally from 15 hours ago, lol.

So I think we should really go with this if possible.


Seems like it only runs on the server, not in the browser:

It can be used with all JavaScript runtimes that provide support for Node-API v9, including Node.js >= 18.17.0, Deno and Bun.


lol, I found another custom code just for orientation now. Going to try this now. If not, I can try this SO code using the even older exif-js library but someone said:

The JS lib in the question looks little bit outdated, but would probably work.


Okay, finally was able to read the orientation metadata with the custom code provided, lol


For some reason, struggling with the last step of putting the extracted orientation value back in after deleting everything else ...

2023-12-14-085134_982x480_scrot


finding this piece of documentation was what i was missing all along. and it turns out, using a simple int was right all along (as I initially tried), I just must have gotten confused that there was still an error:

2023-12-15-063740_983x451_scrot

Looking at the code, the documentation and basically everything about this library: i guess this library really only supports JPEG, lol:

documentation:

Insert exif into JPEG.
-- https://piexifjs.readthedocs.io/en/latest/functions.html#insert

example:

var exifbytes = piexif.dump(exifObj)
var newJpeg = piexif.insert(exifbytes, jpegData)

code

    that.insert = function (exif, jpeg) {
        var b64 = false;
        if (exif.slice(0, 6) != "\x45\x78\x69\x66\x00\x00") {
            throw new Error("Given data is not exif.");
        }
        if (jpeg.slice(0, 2) == "\xff\xd8") {
        } else if (jpeg.slice(0, 23) == "data:image/jpeg;base64," || jpeg.slice(0, 22) == "data:image/jpg;base64,") {
            jpeg = atob(jpeg.split(",")[1]);
            b64 = true;
        } else {
            throw new Error("Given data is not jpeg.");
        }

        var exifStr = "\xff\xe1" + pack(">H", [exif.length + 2]) + exif;
        var segments = splitIntoSegments(jpeg);
        var new_data = mergeSegments(segments, exifStr);
        if (b64) {
            new_data = "data:image/jpeg;base64," + btoa(new_data);
        }

        return new_data;
    };

mhh, but I think that's not a problem since there aren't other file formats we allow that can have EXIF data according to my research while working on #634. So I guess this is just another additional indicator that there are indeed no other common image files that allow EXIF next to JPEG.


So I think the only stuff left to do is this (I am finally back with my TODOs in PR descriptions, lol):

  • make sure there are indeed no other image file types that allow orientation values in EXIF but aren't supported by the library
  • map orientation values from output of getOrientation to required values for piexifjs. I think that's the cause that some JPEGs still throw pack error since there is a val > 0 check in the library but getOrientation can return value < 0 (ah, no mapping required, just filter for value < 0)
  • fix JPEG detection since if there is no pack error, a "not JPEG" error comes up
  • ignore everything which is not JPEG + make sure there is indeed no EXIF with a sanity check
  • fix: JPEGs with orientation value > 0 throw an blank error if img.src is set to them

What the spec says on page 30 about the values for orientation:

2023-12-15-141955_653x566_scrot

VGsAj

source

@ekzyis ekzyis marked this pull request as draft December 11, 2023 16:30
@ekzyis ekzyis assigned ekzyis and unassigned ekzyis Dec 11, 2023
@ekzyis ekzyis added the bug label Dec 11, 2023
@ekzyis ekzyis changed the title Do not remove orientation EXIF markers Do not remove orientation EXIF data Dec 12, 2023
@ekzyis ekzyis force-pushed the 678-image-orientation-metadata-is-removed branch 3 times, most recently from 0b920c2 to 6e840db Compare December 17, 2023 14:42
@ekzyis
Copy link
Member Author

ekzyis commented Dec 17, 2023

Finally done:

2023-12-17.15-57-52.mp4

TODO:

  • make sure EXIF data looks as expected for all file formats

mhh, for some reason, EXIF is no longer removed 🤔

@ekzyis
Copy link
Member Author

ekzyis commented Dec 17, 2023

I think I found the error. Apparently, Javascript also has the notion of transferable objects, similar to languages like C++.

Because of that, the buffer that I passed into cleanBuffer forgot its length and thus the loop was a noop:

When an ArrayBuffer is transferred, its original copy becomes detached — this means it is no longer usable. At any moment, there will only be one copy of the ArrayBuffer that actually has access to the underlying memory. Detached buffers have the following behaviors:

  • byteLength becomes 0 (in both the buffer and the associated typed array views).

-- https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/ArrayBuffer#transferring_arraybuffers

Now trying to fix, for example via SharedArrayBuffers.


Tried to find out at which point the EXIF scrubbing gets broken by redoing all my steps, starting at 6763ef7. I have a feeling it's related to arrow functions vs regular functions. But I will continue after I slept.

@ekzyis
Copy link
Member Author

ekzyis commented Dec 18, 2023

Can't use SharedArrayBuffer because of Spectre. When I try to use it such that any modifications on the array buffer are reflected everywhere (and not just on a (detached) copy of it), I get ReferenceError: SharedArrayBuffer is not defined.

What I still don't get why removing EXIF data works in 6763ef7, but when I started to add code to save the orientation value before removing the EXIF data, it suddenly no longer works.

Sounds like just accessing the buffer in some way changes behavior down the line in the code regarding this buffer? I feel like Javascript is trolling or even conspiring against me, lol

I will try to read as much as I can about Transferable objects now. There MUST be a way. And it CAN'T be that hard. 😄

If I am still not done with this by 4PM CET, I will just start working on #661 and #662 as mentioned here and come back to this later. Also #684 annoys me a lot so maybe I'll also do that.


Okay, already the beginning tells me that this is most likely caused by all the promises that I use due to Javascript's event-driven architecture:

Transferable objects are commonly used to share resources that can only be safely exposed to a single JavaScript thread at a time. For example, an ArrayBuffer is a transferable object that owns a block of memory. When such a buffer is transferred between threads, the associated memory resource is detached from the original buffer and attached to the buffer object created in the new thread. The buffer object in the original thread is no longer usable because it no longer owns a memory resource.

but I thought this whole "array buffer getting detached and thus we no longer point to the buffer that is actually getting cleaned from EXIF data" problem is caused while cleanBuffer is running. And this function does not use any promises and we return the buffer at the end? In fact, I didn't change anything about this function?

@huumn
Copy link
Member

huumn commented Dec 18, 2023

An alternative approach would be to simply read the orientation data, remove the metadata, then rotate the image. I'm not sure how simple that is, but it sounds like most of the issue here is writing EXIF data so why don't we just not write it.

@ekzyis
Copy link
Member Author

ekzyis commented Dec 18, 2023

An alternative approach would be to simply read the orientation data, remove the metadata,

I am currently failing at the "remove the metadata" step because of this buffer transferring between threads. The cleaned buffer is no longer reflected in the thread that will upload the image. So according to my current understanding, it's basically removing the EXIF data in another thread and then throwing away this buffer later, lol.

It's weird. The existing code works but if I slightly try to change it (adding some promises for example), it suddenly no longer works. So writing is currently not the problem. Removing it is (I know how weird that sounds since we're already able to remove it in the running code).

One solution is probably to not add "some promises". Or just completely rewrite this code and use canvas drawing.

I even tried to use SharedArrayBuffer now (including matching the security requirements to use them) and that also didn't work for some reason (nothing changed).

@ekzyis ekzyis force-pushed the 678-image-orientation-metadata-is-removed branch from 6e840db to 4d03721 Compare December 19, 2023 10:02
@ekzyis
Copy link
Member Author

ekzyis commented Dec 19, 2023

Proof of fix:

2023-12-19.11-04-58.mp4

Going to lunch now. Will refactor afterwards and test some more.

Update: Done. Tested JPEGs with and without orientation metadata, WEBP, GIF, PNG. All good.

@ekzyis ekzyis force-pushed the 678-image-orientation-metadata-is-removed branch from 4d03721 to 55104fb Compare December 19, 2023 12:13
@ekzyis ekzyis marked this pull request as ready for review December 19, 2023 12:14
@huumn
Copy link
Member

huumn commented Dec 19, 2023

I think you forgot to include package.json

@ekzyis
Copy link
Member Author

ekzyis commented Dec 19, 2023

I think you forgot to include package.json

Oh right, piexifjs was already installed in node_modules but I undid the changes in package.json at some point. Fixed in edeb0e9.

@huumn huumn merged commit 6682817 into stackernews:master Dec 19, 2023
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Image orientation metadata is removed
2 participants