Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use Core Image to convert between color spaces rather than Metal #1

Open
fumoboy007 opened this issue Jan 25, 2019 · 11 comments
Open

Comments

@fumoboy007
Copy link

Unless I am misunderstanding something, it seems like you are introducing a lot of unnecessary complexity by using Metal to convert between color spaces. It looks like Core Image can do the job for you?

(Caveat: I have not personally tested any of these functions except the first one.)

@mdejong
Copy link
Owner

mdejong commented Jan 25, 2019

Hi fumo, yes you are correct that Apple already implements colorspace conversions inside their own libraries. For example, one could simply play a .m4v video file with AVPlayerViewController, you can see an example of working code that uses this approach in the Xcode project build target named AVPlayerViewController. One could also use CoreImage to decode from YCbCr -> sRGB, an example of that approach is defined in the target named CoreVideoDecodeiOS. But, the problem with Apple software in general is that it is all closed source and it is very very difficult to determine what they actually did when implementing things and it is next to impossible to find actual working code to do any of this. What this project is attempting to do is create an actual working piece of software that is compatible with the way Apple decodes video and also provide a way to encode RGB data to video in a way that decodes back to the original RGB values. The "colorspace conversion" you describe is actually just gamma correction when converting from BT.709 -> sRGB since the color primaries are the same between these colorspaces. This Metal implementation address these problems and also provides a correct implementation of scaling since sampling from non-linear pixels is not trivial when rescaling is involved.

@fumoboy007
Copy link
Author

Ah, so this project is just for academic purposes?

@mdejong
Copy link
Owner

mdejong commented Jan 25, 2019

No, I am going to produce another completely different commercial library based on this logic, but the point here is to create an real working implementation of what Apple actually did. What is critical is a correct implementation of the BT.709 matrix transform and the correct gamma correction at encode and decode time, because everything else depends on these two pieces of software being correctly implemented. After literally weeks of work, I think that what I have now is actually compatible, but it has been a strange trip getting to this point. Documentation stinks, there is almost no working code or examples, and most every piece of software I have looked at does things slightly differently and many of them are just plain wrong in one way or another.

@fumoboy007
Copy link
Author

fumoboy007 commented Jan 25, 2019

For encode, I think you can use the regular ITU-R 709 color profile since that is what the rest of the industry uses?

(Note: Apple’s regular ITU-R 709 color profile is wrong. The d value in the transfer function should be 0.018 instead of 0.081.) Oops my bad; it’s correct…

@mdejong
Copy link
Owner

mdejong commented Jan 25, 2019

I tried that before, but it does not produce results that can be inverted by the default Apple gamma curve of 1.961. Basically, it seems that if RGB input is encoded, it would need to be boosted before hand by gamma 1.2 -> 1.25 (before being BT.709 gamma curve encoded) in order to be near the level that would then decode back near the original values. Using the exact inverse of the 1.961 gamma curve when encoding gives me the most exact RGB -> YCbCr -> RGB results while also maintaining compatibility with the way Apple would display professionally authored video from a camera and touched up in post production. It is weird, but this seems to be the most mathematically correct approach for these two use cases. You can see this in action by running the MetalBT709Decoder-iOS target and and uncommenting the call to decodeCloudsiPadImage in decodeH264YCbCr in the file AAPLRenderer.m. The original JPEG examined side by side on the same screen looks as close to identical as one could expect given that the gamma is completely different and the compressed M4V is almost half the size of the original JPEG.

@fumoboy007
Copy link
Author

fumoboy007 commented Jan 26, 2019

Hmm I’ve confirmed your results using ColorSync Utility.

1. Original sRGB Image

1- sRGB

2. ✅ ITU-R 709 Image

Converted from sRGB→709 using ColorSync Utility’s Match to Profile function.

2- ITU-R 709

3. ❌ HDTV Image

Replaced 709 with Apple’s HDTV profile using ColorSync Utility’s Assign Profile function.

3- HDTV

There is not enough contrast, so it’s hard to see the object on the left side of the wall and the detail in the woman’s hair is lost.

@UliZappe I didn’t realize the difference between ITU-R 709 and Apple’s HDTV profile is so big. I assume the slope limit is already applied when viewing these images on an Apple operating system?

@mdejong
Copy link
Owner

mdejong commented Jan 26, 2019

Here is a graph of the 3 encoding curves, note that the BT.709 here does not include any gamma boost for dark room offset. The thing to note is that the 1.96 AG line differs from sRGB and the BT.709 line significantly in the first 1/2 of the X values.

comparegammacurves

@UliZappe
Copy link

@UliZappe I didn’t realize the difference between ITU-R 709 and Apple’s HDTV profile is so big.

In fact, the huge original Color Management (OS X): Image is too dark thread, from which the whole discussion started, began with the observation that assuming BT.709 as the video color space did not work correctly, whereas, as we found out after some time, BT.709 with gamma 1.961 (= Apple HDTV) did.

Anyway, converting to one color space (ITU-R 709) and then assigning a different one (Apple HDTV) necessarily produces incorrect results in ICC color management.

I assume the slope limit is already applied when viewing these images on an Apple operating system?

Yep.

This is also true for other operating systems, as long as they use an Apple, Adobe or Kodak CMM (or maybe even others I’m not aware of). If the Little CMS CMM is used, no slope limit is applied.

@fumoboy007
Copy link
Author

Anyway, converting to one color space (ITU-R 709) and then assigning a different one (Apple HDTV) necessarily produces incorrect results in ICC color management.

I was simulating the situation where an encoding application uses the ITU-R 709 transfer function and a decoding application uses the approximated transfer function to see the loss in quality.

@mdejong
Copy link
Owner

mdejong commented Jan 27, 2019

One issue that I was confused about is what gamma curve does AVFoundation make use of when exporting with AVAssetWriter and AVAssetWriterInputPixelBufferAdaptor. Previously, I was seeing that the exported YCbCr values seemed to be larger than with BT.709, but I just retested and compared to sRGB and the results indicate that AVFoundation is also making use of the HDTV profile to export with a 1.961 gamma value even when sRGB is indicated as the base colorspace of a CoreVideo pixel buffer. This seems to indicate that my other export process that uses vImage and the 1.961 gamma settings is in fact correct. The weird thing is that there is a Y range issue with the AVAssetWriterInputPixelBufferAdaptor export as the emitted Y values are in the range [16, 237] as opposed to [16, 235]. The red line is BT.709 segmented curve, yellow cropped values over blue indicate the Apple196 line with the quant AVFoundation Y values shown as yellow, finally the purple line with orange sRGB curve shows the original sRGB values along with the sRGB exact curve. The point of the graph is that Apple internally emits YCbCr values by raising to pow(x, 1.0/1.961) so that the Apple decoding curve outputs RGB as close to the original values as possible.

avfoundationexports196

@mdejong
Copy link
Owner

mdejong commented Feb 5, 2019

After additional testing, I am finding some very interesting results. I have reworked the srgb_to_bt709 command line tool that converts an image to Y4M (to be encoded with ffmpeg/x264) and I found some very interesting results when it comes to compression. The encoding process now supports Apple gamma 196 and also directly encoding with the sRGB gamma curve. The sRGB curve does just a little bit better in terms of round trip, though both approaches are very good in that they have a maximum round trip error of +- 2. What is interesting though is that encoding with a sRGB gamma seems to have an advantage when doing lossy encoding.

Thing is, the sRGB encoded image data is only 67Kb while the Apple gamma encoding is 349Kb, that is a huge compression performance benefit from encoding with the sRGB gamma. Further testing would be needed to determine if this result can be seen with other example input. This specific example of a drop of water has some large smooth areas, so it might be an edge case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants