Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Perceptual image precision + 90% speed improvement #628

Merged
merged 8 commits into from Sep 21, 2022

Conversation

ejensen
Copy link
Contributor

@ejensen ejensen commented Sep 12, 2022

Problem

The existing image matching precision strategy is not good at differentiating between a significant difference in a relatively small portion of a snapshot, and imperceivable differences in a large portion of the snapshot. For example, these snapshots below show that a 99.5% precision value fails an imperceivable background color change while allowing noticeable changes (text and color) to pass:

Reference Fails Passes
Reference Fail Pass
Imperceivable background color difference Significant text and color changes

Solution

This PR adds a new optional perceptualPrecision parameter to image snapshotting which determines how perceptually similar a pixel must be to consider it matching. This parameter complements the existing precision parameter that determines the percentage of pixels that must be considered matching in order to consider the whole image matching.

This approach is similar to #571 and #580 but uses perceptual distance rather than Euclidean distance of sRGB values. This is significant because the sRGB color space is not perceptually uniform. Pairs of colors with the same Euclidean distance can have large perceptual differences. The left and right colors of each row have the same Euclidean distance:

sRGB-Distance

The perceptualPrecision parameter is the inverse of a Delta E value. The following table can be used to determine the perceptualPrecision that suites your needs:

Perceptual Precision Value Description
100% Must be exactly equal
≥ 99% Allows differences not perceptible by human eyes
≥ 98% Allows differences possibly perceptible through close observation
≥ 90% Allows differences perceptible at a glance
≥ 50% Allows differences when more similar than opposite
≥ 0% Allows any differences

This perceptual color difference calculation is performed by the CILabDeltaE CoreImage Filter which is available on macOS 10.13+, iOS 11+, and tvOS 11+. The use of CoreImage to accelerate the image diffing results in a +90% over the existing byte-by-byte comparison.

Additionally, when macOS 11.0+, iOS 14+, tvOS 14+ is available, the CILabDeltaE filter is joined with the CIColorThreshold and CIAreaAverage filters to accelerate the whole image comparison, resulting in a 97% speed improvement.
The CIColorThreshold filter has been backported to 10.13+, iOS 11+, and tvOS 11+ using MPSImageThresholdBinary so the 97% speed improvement is available in all OS versions.

Benchmarks

Byte-Buffer Comparison CILabDeltaE + MPSImageThresholdBinary + CIAreaAverage
macOS 10.10+, iOS 11+, tvOS 10+ macOS 10.13+, iOS 11+, tvOS 11+
0.906s - baseline 0.025s - 97% speed improvement
baseline-macOS-10 10-compatible macOS-11 0-compatible

Addressed Issues

Related PRs

@lukeredpath
Copy link

I've just taken this for a spin in our project - so far I've converted a few of our snapshots over to this strategy and images taken in an iOS 15/iPhone 12 simulator that would fail with inperceptible differences on an iOS 16/iPhone 14 simulator are now passing.

Unfortunately due to some other packages having a dependency on this library I was not able to switch over to this fork to test this out, I had to collate the changes into a separate file in my project and rename the strategies. I've created a gist containing the file I'm using - I've not imported everything over from this PR, so far just the UIImage, UIView, UIViewController and SwiftUI.View strategies:

https://gist.github.com/lukeredpath/9abc51d9eee349c2f209cc0431c8eb6f

@jlcvp
Copy link

jlcvp commented Sep 13, 2022

This PR is pure gold, thank you.

I tested it on my project and all the headache and compromises we're having here with the differences between apple silicon and intel generated snapshots are gone using this new perceptualPrecision flag.

Looking forward to have this merged on the main branch by the reviewers

@lukeredpath
Copy link

FWIW, I've settled on perceptual precision of 0.98 and precision of 0.995 - the latter seems to account for very minor layout shifts of a few pixels between iOS 15 and iOS 16 without triggering any significant false positives (there's always the possibility that some will slip through with < 1 precision but I can live with that). Performance seems OK in most cases too.

@thedavidharris
Copy link
Contributor

thedavidharris commented Sep 14, 2022

Is there any value in having perceptualPrecision adjustable as a global property, or is it something that should still only be set per snapshot?

@ejensen
Copy link
Contributor Author

ejensen commented Sep 14, 2022

Is there any value in having perceptualPrecision adjustable as a global property, or is it something that should still only be set per snapshot?

In my test target I've added a global default by defining a defaultPerceptualPrecision global variable and adding Snapshotting extensions that take precedence over the ones in the SnapshotTesting framework:

/// The default `perceptualPrecision` to use if a specific value is not provided.
private let defaultPerceptualPrecision: Float = {
#if arch(x86_64)
    // When executing on Intel (CI machines) lower the `defaultPerceptualPrecision` to 98% which avoids failing tests
    // due to imperceivable differences in anti-aliasing, shadows, and blurs between Intel and Apple Silicon Macs.
    return 0.98
#else
    // The snapshots were generated on Apple Silicon Macs, so they match 100%.
    return 1.0
#endif
}()

// Local extensions that override the default `perceptualPrecision` value with the `defaultPerceptualPrecision` global defined above.

extension Snapshotting where Value == UIView, Format == UIImage {
    /// A snapshot strategy for comparing views based on perceptual pixel equality.
    static let image = image(perceptualPrecision: defaultPerceptualPrecision)

    /// A snapshot strategy for comparing views based on perceptual pixel equality.
    ///
    /// - Parameters:
    ///   - drawHierarchyInKeyWindow: Utilize the simulator's key window in order to render `UIAppearance` and `UIVisualEffect`s. This option requires a host application for your tests and will _not_ work for framework test targets.
    ///   - perceptualPrecision: The percentage a pixel must match the source pixel to be considered a match. [98-99% mimics the precision of the human eye.](http://zschuessler.github.io/DeltaE/learn/#toc-defining-delta-e)
    ///   - size: A view size override.
    ///   - traits: A trait collection override.
    static func image(
        drawHierarchyInKeyWindow: Bool = false,
        perceptualPrecision: Float = defaultPerceptualPrecision,
        size: CGSize? = nil,
        traits: UITraitCollection = .init()
    ) -> Self {
        image(
            drawHierarchyInKeyWindow: drawHierarchyInKeyWindow,
            precision: 1,
            perceptualPrecision: perceptualPrecision,
            size: size,
            traits: traits
        )
    }
}

extension Snapshotting where Value == UIViewController, Format == UIImage {
    /// A snapshot strategy for comparing view controllers based on perceptual pixel equality.
    static let image = image(perceptualPrecision: defaultPerceptualPrecision)

    /// A snapshot strategy for comparing view controller views based on perceptual pixel equality.
    ///
    /// - Parameters:
    ///   - config: A set of device configuration settings.
    ///   - perceptualPrecision: The percentage a pixel must match the source pixel to be considered a match. [98-99% mimics the precision of the human eye.](http://zschuessler.github.io/DeltaE/learn/#toc-defining-delta-e)
    ///   - size: A view size override.
    ///   - traits: A trait collection override.
    static func image(
        on config: ViewImageConfig,
        perceptualPrecision: Float = defaultPerceptualPrecision,
        size: CGSize? = nil,
        traits: UITraitCollection = .init()
    ) -> Self {
        image(
            on: config,
            precision: 1,
            perceptualPrecision: perceptualPrecision,
            size: size,
            traits: traits
        )
    }
}

This lets me leave the assertSnapshot(matching: view, as: .image) lines in the test cases as-is while using a 98% perceptual precision (perceptually near-identical) when running on machines where the snapshots aren't exactly identical.

The static let defaultPerceptualPrecision could also vary based on OS version in addition to architecture if needed.

@lukeredpath
Copy link

I can confirm that this change also fixed our issue of tests not passing on M1 when using Intel generated snapshots, or vice versa. 🎉

@Kaspik
Copy link

Kaspik commented Sep 17, 2022

Trying this PR on our project, where devs have M1 Ultra MacBooks and CI is M1 MacMini. Some of our snapshots are still randomly failing with "ghost" failures (attaching one example). Any idea what these could be? Do we need perceptualPrecision < 1?

Difference:
difference_3_214AC2A3-FDF0-4A2A-A4C8-C6AC7F45D91A

Failure:
failure_2_214AC2A3-FDF0-4A2A-A4C8-C6AC7F45D91A

Reference:
reference_1_214AC2A3-FDF0-4A2A-A4C8-C6AC7F45D91A

@lukeredpath
Copy link

@Kaspik yes if you’ve got perceptual precision at 1 it won’t behave any differently. Try 0.98.

@ejensen
Copy link
Contributor Author

ejensen commented Sep 17, 2022

Trying this PR on our project, where devs have M1 Ultra MacBooks and CI is M1 MacMini. Some of our snapshots are still randomly failing with "ghost" failures (attaching one example). Any idea what these could be? Do we need perceptualPrecision < 1?

@Kaspik, yes a perceptualPrecision < 1 is needed since the images aren't exactly the same. The perceptualPrecision parameter is the inverse of a Delta E value. The following table can be used to determine the perceptualPrecision that suites your needs:

Perceptual Precision Value Description
100% Must be exactly equal
≥ 99% Allows differences not perceptible by human eyes
≥ 98% Allows differences possibly perceptible through close observation
≥ 90% Allows differences perceptible at a glance
≥ 50% Allows differences when more similar than opposite
≥ 0% Allows any differences

The example images you attached have a 0.3 Delta E value. So a .image(perceptualPrecision: 0.997) will match the images. Other images generated might have a slightly larger or smaller Delta E value, so I generally recommend a perceptualPrecision around 99%, but not less than 98% since perceptible differences pass the assertions with perceptualPrecision < 0.98.

@Kaspik
Copy link

Kaspik commented Sep 18, 2022

@ejensen Thanks Eric, that makes sense. How did you get the image difference value? Asking because I have another examples where even setting perceptualPrecision = 0.99 fails with ghost image (attaching). Trying to figure out what's the difference, what's the reason, and what's the ideal precision for cases like these (have another 30-40 cases with 0.99 not working).

I can go with 0.98, but that sounds like a bottom limit recommended by you.

Difference:

Failure - Reference:

@ejensen
Copy link
Contributor Author

ejensen commented Sep 18, 2022

@ejensen Thanks Eric, that makes sense. How did you get the image difference value? Asking because I have another examples where even setting perceptualPrecision = 0.99 fails with ghost image (attaching). Trying to figure out what's the difference, what's the reason, and what's the ideal precision for cases like these (have another 30-40 cases with 0.99 not working).

I can go with 0.98, but that sounds like a bottom limit recommended by you.

@Kaspik I'm using a new branch that reports the actual precision of the images when the tests fail. It reports that the precision of the second set of images you attached is 98.59%

precision-reporting

I will PR that branch if/when this PR is merged. You could attempt to use 0.985 precision for all your test cases. Or can use the branch to find a precision for each of the snapshot cases.

98% is a good limit since it is generally difficult for an untrained eye tell the difference and display color accuracy is often inadequate to even represent the differences within that range.

@Kaspik
Copy link

Kaspik commented Sep 18, 2022

Ohh nice, thanks! I'll try that branch on top of this one, great job!

@cooksimo
Copy link

I have tested this out and it works great for us as well 👍

Copy link
Member

@stephencelis stephencelis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ejensen Sorry for the delay! This looks like a great update to us. We're going to merge things and fast-track a release soon. There's a lot upstream, including some bug fixes that will require folks to rerecord some snapshots. This breakage has been holding up release, but we don't think snapshot backwards compatibility is feasible at this time, and it shouldn't hold up all the good things that have been contributed.

Do you think you can get a PR up for the two improvements you commented about?

  • Adding a global configuration option for projects that expect to run tests on both Intel/M1
  • Adding the extra messaging about the actual precision difference on failure

@simondelphia
Copy link

simondelphia commented Sep 23, 2022

yea, just wondering why the error message I'm getting when I use 0.995 for precision is telling me the resulting snapshots have precision below 0.98 when I use a perceptualPrecision below 1.0 alongside the precision above. But when I don't specify perceptualPrecision the same snapshots pass with 0.98 precision.

@ejensen
Copy link
Contributor Author

ejensen commented Sep 23, 2022

yea, just wondering why the error message I'm getting when I use 0.995 for precision is telling me I have precision below 0.98 when I use a perceptualPrecision below 1.0 alongside the precision above. But when I don't specify perceptualPrecision the same snapshots pass with 0.98 precision.

There is a difference in how the precision % is calculated depending on whether perceptual precision is enabled. The difference is whether the precision is the % of pixels matching or the % of bytes matching (each pixel has 4 bytes - one for each channel: Red, Green, Blue, and Alpha).

  • When perceptualPrecision < 1 the new algorithm calculates % of pixels matching
  • When perceptualPrecision = 1 the old algorithm calculates the % of bytes matching

When using the old algorithm, a pixel with multiple channels different (ex Red + Green) skews the image precision more than if only one channel of the pixel were different. This seems intuitive at first but has some unintended side effects. For example, we'd expect these 2 images (solid white and solid black) to have a 0% precision:
referenceactual
However, when using the old algorithm the precision is 25%. This is because the alpha of an image is the same across the two images. Meaning 25% of the bytes match. Therefore any images with the same alpha have a floor of 25% precision when using the old algorithm. Not ideal.

The new precision algorithm that's enabled when perceptualPrecision < 1 is simply a percentage of pixels that meet the perceptualPrecision requirement. Therefore it's not skewed by pixel channels, and doesn't suffer the same side effects:

Screen Shot 2022-09-23 at 12 20 29 PM

The old algorithm was left as-is when perceptualPrecision = 1 to prevent existing break tests that haven't opted-in to the perceptual precision algorithm.

@simondelphia If most of the images in your repro are opaque, you were probably experiencing the 25% precision floor of the old algorithm. You might want to try a 74.6% (0.995 * 0.75) precision since it is roughly equivalent to the old algorithm's 99.5% precision for opaque images.

@simondelphia
Copy link

simondelphia commented Sep 23, 2022

Hm interesting, thank you that is really helpful!

@simondelphia
Copy link

I used precision = 0.9 and perceptualPrecision = 0.98 and now all but two snapshots passed.

Here's one set:

Archive.zip

Actual image precision 0.03083378 is less than required 0.9 Actual perceptual precision 0.9574219 is less than required 0.98 (0.00s)

I'm confused by the 0.03 vs 0.9 and why is the perceptual precision almost 0.95 when there's no noticeable difference? Have a light mode version of the same snapshots but only the dark mode ones failed.

The failing image was captured on an Intel Mac on remote CI (AWS) and the reference was captured on an M1 mac locally.

The other failing set was also a dark mode version for a somewhat similar screen.

Both have a speckled pattern in the background, which I suspect is part of the reason why it's saying there's a difference.

@ejensen
Copy link
Contributor Author

ejensen commented Sep 27, 2022

@simondelphia When comparing the PNG images included in the zip they match with precision = 1.0 and perceptualPrecision = 0.99. You should be able to replicate this by writing a test that compares the UIImages similar to testImagePrecision(). I'm interested to see if the UIImage comparison test passes on your CI machines. If so, it means there are differences in the UIView/UIViewController snapshots that are lost when the snapshots are saved to PNGs. This could be CI color profile differences (possibly similar to #313 (comment) and #419) that are normalized when saved to PNGs. If your project is adjusting the output PNGs (similar to #446) the color space of the PNGs could differ from the original snapshots.

I'm working on a Swift Playground that assists in debugging image differences by outputting all the intermediate values and providing suggestions for precision parameter values. It might help identify the causes of different machine rendering and suggest values that accommodate them.
NotMatching

@simondelphia
Copy link

simondelphia commented Sep 28, 2022

In my test target I've added a global default by defining a defaultPerceptualPrecision global variable and adding Snapshotting extensions that take precedence over the ones in the SnapshotTesting framework:

@ejensen: According to #628 (comment), doesn't using a perceptual precision of 1 on Apple Silicon Macs cause the library to use the old snapshotting algorithm, and thus wouldn't this defeat the purpose of using perceptualPrecision?

Also I will give the UIImage test a try.

@simondelphia
Copy link

simondelphia commented Sep 29, 2022

I tried assertSnapshot with the reference image above on the failed snapshot on my local M1 machine:

precision perceptualPrecision result message
1 unspecified failure Newly-taken snapshot does not match reference
0.99 unspecified success
0.99 0.99 success
1 0.99 failure Actual perceptual precision 0.9862305 is less than required 0.99
unspecified 0.99 failure Actual perceptual precision 0.9862305 is less than required 0.99

On CI with the same calls I got the same failures with the same messages.

On CI the snapshots on the original login UIView used 0.98 and 0.99 for precision and perceptual, respectively:

Actual image precision 0.02803135 is less than required 0.98 Actual perceptual precision 0.9574219 is less than required 0.99 (0.00s)

@simondelphia
Copy link

@ejensen

@simondelphia
Copy link

simondelphia commented Oct 21, 2022

I managed to produce a diff using a tool online that gave me something I could actually see as a difference in png files from the same snapshots. Seems to be something about the edge around the capsule button, and I suppose while the actual difference is imperceptible, there are a substantial number of pixels involved.

Notably the two failing snapshots are also only dark mode tests and the corresponding light mode ones worked fine.

@ejensen
Copy link
Contributor Author

ejensen commented Oct 22, 2022

According to #628 (comment), doesn't using a perceptual precision of 1 on Apple Silicon Macs cause the library to use the old snapshotting algorithm, and thus wouldn't this defeat the purpose of using perceptualPrecision?

On the team I'm using that strategy, everyone locally has the same M1 machine so the snapshots exactly match. We generate new snapshots locally, so everyone's local machine produces snapshots that are byte identical. There's no need for any precision or perceptualPrecision that isn't 100%. However, our CI machines are Intel, so we set the
perceptualPrecision < 1 when comparing snapshots in CI test runs.
This precision-setting strategy works for this team, but if your team is generating snapshots on a mix of different machines, then you'd want to have perceptualPrecision < 1 for all architectures.

On CI with the same calls I got the same failures with the same messages.

Actual perceptual precision 0.9862305 is less than required 0.99

On CI the snapshots on the original login UIView used 0.98 and 0.99 for precision and perceptual, respectively:

Actual image precision 0.02803135 is less than required 0.98 Actual perceptual precision 0.9574219 is less than required 0.99 (0.00s)

This points to the CI machine producing a snapshot of the original login UIView that differs from the image that is saved. This difference might be a color space difference. This PR #665 normalizes the image color spaces before comparison, which might be the solution to resolve your CI image differences.

@IlyaPuchkaTW
Copy link

IlyaPuchkaTW commented Nov 22, 2022

We've started to use this new precision parameter recently to mitigate M1 vs Intel issue and have exactly the same setup - 1.0 precision for local runs on M1 and 0.999 for CI runs (we are using Bitrise). For whatever reason though while locally tests fail as expected (both with 1.0 and 0.999 precision, calculated precision is actually negative) the same tests pass on CI with such precision. At the same time when precision is 1.0 on CI they as expected fail. We have parallelised tests enabled.
Any tip what could be the reason would be welcome.

@ejensen
Copy link
Contributor Author

ejensen commented Nov 22, 2022

We've started to use this new precision parameter recently to mitigate M1 vs Intel issue and have exactly the same setup - 1.0 precision for local runs on M1 and 0.999 for CI runs (we are using Bitrise). For whatever reason though while locally tests fail as expected (both with 1.0 and 0.999 precision, calculated precision is actually negative) the same tests pass on CI with such precision. At the same time when precision is 1.0 on CI they as expected fail. We have parallelised tests enabled.
Any tip what could be the reason would be welcome.

The issue sounds similar to the one addressed in #666 where some virtualized macOS environments silently fail due to the lack of Metal support.

@IlyaPuchkaTW
Copy link

IlyaPuchkaTW commented Nov 23, 2022

Thanks, indeed CI is running on VM, I'm trying the package from your fork with those changes but it seems like it results in CI timing out although the tests seem to run fast enough =/

@tdrhq
Copy link

tdrhq commented Nov 24, 2022

@IlyaPuchkaTW An alternative approach, is to use an open source tool like Vizzy or Screenshotbot, so that your screenshots are always recorded in CI (hopefully all CI machines have an identical environment).

https://github.com/workday/vizzy

https://github.com/screenshotbot/screenshotbot-oss

(I built Screenshotbot, so I'm a bit biased toward that. I know of teams using it with swift-snapshot-testing)

msadoon added a commit to kickstarter/ios-oss that referenced this pull request Nov 26, 2022
…eenshots and verify them on M1. Seems to work up to a perceptable difference of 98%

pointfreeco/swift-snapshot-testing#628 (comment)
niil-qb pushed a commit to quickbit/swift-snapshot-testing that referenced this pull request Feb 23, 2023
* Add an optional perceptualPrecision parameter for image snapshot comparisons

The perceptual precision number is between 0 & 1 that gets translated to a CIE94 tolerance https://en.wikipedia.org/wiki/Color_difference

* Use CIColorThreshold and CIAreaAverage for a 70% faster image diff on iOS 14, tvOS 14, and macOS 11

* Add a unit test demonstrating the difference between pixel precision and perceptual precision

* Update the reference image for the image precision test

* Backport the threshold filter to macOS 10.13 by creating a CIImageProcessorKernel implementation

* Update Sources/SnapshotTesting/Snapshotting/UIImage.swift

* Update NSImage.swift

Co-authored-by: Stephen Celis <stephen@stephencelis.com>
Co-authored-by: Stephen Celis <stephen.celis@gmail.com>
OksanaFedorchuk pushed a commit to urlaunched-com/swift-snapshot-testing that referenced this pull request Mar 28, 2024
* Add an optional perceptualPrecision parameter for image snapshot comparisons

The perceptual precision number is between 0 & 1 that gets translated to a CIE94 tolerance https://en.wikipedia.org/wiki/Color_difference

* Use CIColorThreshold and CIAreaAverage for a 70% faster image diff on iOS 14, tvOS 14, and macOS 11

* Add a unit test demonstrating the difference between pixel precision and perceptual precision

* Update the reference image for the image precision test

* Backport the threshold filter to macOS 10.13 by creating a CIImageProcessorKernel implementation

* Update Sources/SnapshotTesting/Snapshotting/UIImage.swift

* Update NSImage.swift

Co-authored-by: Stephen Celis <stephen@stephencelis.com>
Co-authored-by: Stephen Celis <stephen.celis@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
10 participants