Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Question] Visual testing in docker on different CPU architecture #13873

Open
nickofthyme opened this issue May 2, 2022 · 53 comments
Open

[Question] Visual testing in docker on different CPU architecture #13873

nickofthyme opened this issue May 2, 2022 · 53 comments

Comments

@nickofthyme
Copy link
Contributor

nickofthyme commented May 2, 2022

Hey team 👋🏼

I am working to migrate my puppeteer visual regression testing to playwright.

My team has people working on Macs using with either arm64 (M1 SoC) or amd64 (Intel) CPU architecture.

I'd like a way to run and update playwright tests/screenshots locally from either architecture and have the local screenshots match the screenshots running from the CI (linux/amd64).

Currently we use the mcr.microsoft.com/playwright:vx.x.x-focal docker image to run tests both locally and in the CI. However running on these different architectures produce screenshots that are ever so slightly different when run on a different architecture, virtually imperceptible differences.

Screenshot from M1 mac - arm64

image

Screenshot from Intel mac - amd64

image

Diff screenshot

image

Diff gif

Screen Recording 2022-05-03 at 07 27 31 AM


So my questions is, does anyone have a good strategy to avoid the above errors on these two architectures without reducing the threshold?

I've tried running docker with --platform=linux/amd64 on my M1 mac, but I run into #13724 (comment) when running the tests, even on the latest docker version (v20.10.8) with Rosetta 2 installed. Sounds like this could just be a known issue with docker.

@dgozman
Copy link
Contributor

dgozman commented May 3, 2022

@nickofthyme No good news, unfortunately. Overall, this sounds like a docker issue. It is expected that arm docker image vs intel docker image produce different screenshots - after all, they have different libraries/executables inside. Ideally, you would force intel image everywhere as you tried, but I guess that does not work as you've linked above.

As for mitigation, we have maxDiffPixels and maxDiffPixelRatio in addition to threshold, but diff image suggests that you'll need pretty big values, so tests will not be that useful.

@aslushnikov Any more ideas?

@aslushnikov
Copy link
Collaborator

@nickofthyme any chance you can supply me with a short repro that illustrates this image differences? I'd love to play with it!

@aslushnikov aslushnikov self-assigned this May 4, 2022
@nickofthyme
Copy link
Contributor Author

nickofthyme commented May 4, 2022

Yeah totally! I put up a branch at elastic/elastic-charts#buildkite-arch-test. It's a little clunky to run right now but this should work...

git clone https://github.com/elastic/elastic-charts.git
cd ./elastic-charts
git checkout buildkite-arch-test

Start web server on localhost:9002

yarn install
cd ./e2e && yarn install && cd ../

# generate web server files
yarn test:e2e:generate

# starts local web server
yarn test:e2e:server

Run playwright against local web server on http://localhost:9002/http://host.docker.internal:9002

yarn test:e2e stylings_stories.test.ts

The key part is --platform=linux/arm64 option is e2e/scripts/start.sh that calls docker run with the necessary playwright options. So to generate the diff above you'd need to run this command on both linux/arm64 and linux/amd64 thus changing that --platform flag. The current screenshots are based on linux/arm64.

Let me know if that gives you enough to play with! 👍🏼

@aslushnikov
Copy link
Collaborator

Start web server on localhost:9002

@nickofthyme I got stuck here; first, Puppeteer failed to download binary:

[-/10] ⡀ waiting...                                                                                                                                 [9/10] ⡀ puppeteer                                                                                                                                  error /Users/andreylushnikov/tmp/node_modules/puppeteer: Command failed.
Exit code: 1
Command: node install.js
Arguments:
Directory: /Users/andreylushnikov/tmp/node_modules/puppeteer
Output:
The chromium binary is not available for arm64:                                                                                                     If you are on Ubuntu, you can install with:                                                                                                                                                                                                                                                              apt-get install chromium-browser

/Users/andreylushnikov/tmp/node_modules/puppeteer/lib/cjs/puppeteer/node/BrowserFetcher.js:112
            throw new Error();

I ran with PUPPETEER_SKIP_CHROMIUM_DOWNLOAD=1, but this yielded another error:

[5/5] 🔨  Building fresh packages...
[7/10] ⠐ canvas
[-/10] ⠐ waiting...                                                                                                                                 [-/10] ⠐ waiting...                                                                                                                                 [-/10] ⠐ waiting...                                                                                                                                 error /Users/andreylushnikov/tmp/node_modules/canvas: Command failed.
Exit code: 1
Command: node-pre-gyp install --fallback-to-build
Arguments:
Directory: /Users/andreylushnikov/tmp/node_modules/canvas

With a few dozens of error lines.

@nickofthyme
Copy link
Contributor Author

You shouldn't need any puppeteer* deps, so if you just remove those from the top-level package.json that should fix things. What device/os/architecture are you trying to run it on?

@aslushnikov
Copy link
Collaborator

@nickofthyme I'm on M1 Mac 12.3, running with Node.js 16.13

@nickofthyme
Copy link
Contributor Author

nickofthyme commented May 7, 2022

Interesting that's pretty similar to mine. Let me know if it still gives you trouble and I can try running it on mine from scratch.

@aslushnikov
Copy link
Collaborator

Yep, still the same issue with node-pre-gyp

@nickofthyme
Copy link
Contributor Author

Ok I'm afk but I'll try it tomorrow and let you know.

@nickofthyme
Copy link
Contributor Author

nickofthyme commented May 7, 2022

Ok I removed all references to puppeteer used in old visual testings and canvas used in jest tests, both shouldn't affect playwright tests. I pushed changes to elastic/elastic-charts#buildkite-arch-test.

Since you are on an M1 mac I also pushed a commit (elastic/elastic-charts@dbfd2ed) updating all screenshots on my intel Mac so you can see the diff from your M1.

I was able to install and run the above commands on my M1 Mac 12.3.1, running with Node.js 16.13.2.

I show running stylings_stories.test.ts file because that included the most problematic screenshot diffs but you can exchange that for any other file under ./e2e/tests/. There are some 1100 tests/screenshots in total so I limit the run 😅

yarn test:e2e stylings_stories.test.ts

Let me know if you still have troubles with these changes.

@uudens
Copy link

uudens commented Sep 20, 2022

Hey @nickofthyme 🙂 I'm wondering what you ended up doing with this? Did you manage to solve it or find some workaround?
Thanks!

@nickofthyme
Copy link
Contributor Author

nickofthyme commented Sep 20, 2022

No not really 😞

We tried everything in the book to make it work to where we could update screenshots locally on either arm64 or amd64 CPU architecture, interchangeably. Regardless of what we did, there was always about 10% of our screenshots that failed with diffs ranging from 0.01% to over 5%. Laxing the threshold to 5% would have defeated the utility of the screenshots so that was never an option for us.

The only way we could make it consistent with 0% threshold, was to just update the screenshot on CI machines via a GitHub PR comment (see elastic/elastic-charts#1819 (comment)) and handle updating the screenshots by committing directly to the PR from CI. Not the most ideal solution but it does the trick for us.

I don't see this changing anytime soon since the two docker images use completely difference binaries and the --platform flag appears to be meant for an interim "best effort" usage, per the docs below.

Some images do not support the ARM64 architecture. You can add --platform linux/amd64 to run (or build) an Intel image using emulation.

However, attempts to run Intel-based containers on Apple silicon machines under emulation can crash as qemu sometimes fails to run the container. In addition, filesystem change notification APIs (inotify) do not work under qemu emulation. Even when the containers do run correctly under emulation, they will be slower and use more memory than the native equivalent.

In summary, running Intel-based containers on Arm-based machines should be regarded as “best effort” only. We recommend running arm64 containers on Apple silicon machines whenever possible, and encouraging container authors to produce arm64, or multi-arch, versions of their containers. We expect this issue to become less common over time, as more and more images are rebuilt supporting multiple architectures.

source: https://docs.docker.com/desktop/mac/apple-silicon/#known-issues

@uudens
Copy link

uudens commented Sep 20, 2022

I see.. It is what it is 🙁
Anyway thank you very much for the summary! 🙇

@aslushnikov
Copy link
Collaborator

Hey @nickofthyme, thank you again for the repro. I came back to this issue and can reproduce it reliably.

I don't see this changing anytime soon since the two docker images use completely difference binaries and the --platform flag appears to be meant for an interim "best effort" usage, per the docs below.

Yeah, QEMU emulation is very slow and not advanced enough to run browser binaries, so things are likely to crash / be very slow there. Running docker on M1 was the main motivation for us to provide aarch64 linux browser binaries.

Now, back to your repro. I researched this topic again, and it turns out that chromium rendering is not necessarily deterministic even within a single platform (see https://crbug.com/919955). The differences might happen in the anti-aliasing, so while human eye won't be able to see any difference between images, the images are not pixel-to-pixel perfect.
In case of different CPU architectures, the anti-aliasing differences are even more striking and easy to reproduce.

So with this, I draw the following conclusions:

  • Docker environment DOES guarantee structural identity, e.g. layout & fonts will be the same
  • Docker environment DOES NOT guarantee pixel-perfect screenshots

Realistically, though, the anti-aliasing differences are not important for the visual regression testing use cases. So
with this, we need a pragmatic approach to discard anti-aliasing differences that might happen sporadically.

So far we have some tools to deal with anti-aliasing noise:

  • we use a https://github.com/mapbox/pixelmatch algorithm that is configured to cancel some anti-aliasing pixels by default (these are marked with yellow pixels on the diff)
  • we have the threshold argument to increase color difference tolerance. E.g. running your tests with threshold: 0.1 make them all pass for me locally on M1
  • we have maxDiffPixels and maxDiffPixelsRatio that let you tolerate some pixels. (These are probably too crude though)

I see you chose to run threshold: 0 for your VRT tests. Do you consider this to be a too-big-of-a-hammer to fight the anti-aliasing artifacts on the screenshots? Why? Have you been bitten by it before?

@nickofthyme
Copy link
Contributor Author

nickofthyme commented Sep 30, 2022

@aslushnikov Thank you for the incredibly detailed review of this!


Realistically, though, the anti-aliasing differences are not important for the visual regression testing use cases. So
with this, we need a pragmatic approach to discard anti-aliasing differences that might happen sporadically.

Agreed, this would be amazing! I looked at differences running pixelmatch directly against a subset of screenshots and found consistent results, though I haven't tested this against screenshots with different cpu architectures.

That being said it would be great if playwright allowed passing all the options defined by pixelmatch and not only threshold as is currently stands 👇🏼 . This could be useful in fine-tuning the results.

const count = pixelmatch(expected.data, actual.data, diff.data, expected.width, expected.height, {
threshold: options.threshold ?? 0.2,
});


we have maxDiffPixels and maxDiffPixelsRatio that let you tolerate some pixels.

We have some screenshots with animations that are flaky and produce sporadical results on identical test runs, even when disabling or waiting for animations to be complete. I'm guessing this is due to anti-aliasing because this diff is imperceptible to the eye...

image

So we used maxDiffPixels to resolve this issue, see elastic/elastic-charts#1804.

A side note to this, now that you mention it... It was strange to me that setting the maxDiffPixels option when calling toMatchSnapshot does not clear threshold and maxDiffPixelRatio from the playwright.config.ts which still governs the failing screenshot (i.e. 0.1 > 0). This forces me to manually reset them, see snippet below.

// playwright.config.ts
const config: PlaywrightTestConfig = {
  expect: {
    toMatchSnapshot: {
      threshold: 0,
      maxDiffPixelRatio: 0,
      maxDiffPixels: 0,
    },
  },
}

// some.test.ts
function getSnapshotOptions(options?: ScreenshotDOMElementOptions) {
  if (options?.maxDiffPixels !== undefined) {
    // need to clear default options for maxDiffPixels to be respected, else could still fail on threshold or maxDiffPixelRatio
    return {
      threshold: 1,
      maxDiffPixelRatio: 1,
      maxDiffPixels: 1000000, // some large value
      ...options,
    };
  }
  return options;
}

expect(await page.screenshot()).toMatchSnapshot('landing-page.png', getSnapshotOptions({ maxDiffPixels: 5 }));

The current approach would be sensible to me in most other config override scenarios but in this case all three options are tightly interdependent. Ideally, if I set any of the three directly in the options of toMatchSnapshot the others should be set to default values or the lowest restricted value.

This may just be me, I could see an explanation for both ways, but thought I'd put it out there. 🤷🏼‍♂️


I see you chose to run threshold: 0 for your VRT tests. Do you consider this to be a too-big-of-a-hammer to fight the anti-aliasing artifacts on the screenshots? Why? Have you been bitten by it before?

I'd say we could probably get away with a looser threshold of say 0.1 with no noticeable impact. The 0 threshold has become somewhat of a prideful aspect of the stability of our VRT setup, starting with puppeteer using 0 which served us very well. That being said, we are testing very detailed canvas element where even the smallest pixel difference could indicate a bug. This would not be on the order of 10's of pixels but we have caught issues where 0.1 threshold would not have been sufficient to signal a bug/failure. Granted I think our case is exceptionally rare, whereas the general DOM use case of toMatchSnapshot a 0.5 threshold would be adequate to signal a failure.

All that being said we do sometimes have large diffs of PRs that require going though 100's of virtually identical screenshots, which is a pain when reviewing files in GitHub!

@gselsidi
Copy link

gselsidi commented Nov 8, 2022

reviving this a bit, as noticing similar issues, few weeks ago tests were passing on different machines with pretty tight tolerances.

But now realizing things are failing. The goal is to create baseline images on local and run them on CI but reading from the above that seems like an issue.

Are rendering issues chrome specific? or any browser based on chromium including firefox? Or maybe we can switch to Safari Webkit? or do they all have rendering issues that don't guarantee visual accuracy across different browsers???

Or is the only solution run docker on 1 cpu architecture locally and it should be ok on CI?

Only way we're able to pass semi accurately on two different OS X machines both M1, is setting maxpixelratio .05 5% which seems fairly high

@markov00
Copy link

markov00 commented Nov 9, 2022

Are rendering issues chrome specific? or any browser based on chromium including firefox? Or maybe we can switch to Safari Webkit? or do they all have rendering issues that don't guarantee visual accuracy across different browsers???

I'm pretty sure that there is no possibility to guarantee visual accuracy across different browsers because the rendering engine differs across browsers (with some exceptions) and it could render with slight differences. One over the other is the font rendering I never got the chance to render the same exact text path on two different browsers.

The visual accuracy should be instead guaranteed by rendering on the same CPU architecture and the same browsers and even if you are running the tests on different machines (aka CI and local). We used to run an x86 Docker image on both a macOS (x86 arch) and a different x86 CI with no visual differences.

Instead, here we are highlighting that the difference happens when rendering on the same browser but under two different CPU architecture (e.g. testing a screenshot generated using an x86 chrome-based image with an arm64 chrome-based image).

@gselsidi you mentioned:

Only way we're able to pass semi accurately on two different OS X machines both M1, is setting maxpixelratio .05 5% which seems fairly high

This means that you also get different screenshots when using the same browser and the same CPU arch?

@nickofthyme
Copy link
Contributor Author

nickofthyme commented Nov 9, 2022

What @markov00 said...

Also 5% maxpixelratio does seem a little high but it really depends on your use case. For ours even a few pixel diff could signal a bug. But for testing the UI on a website, 5% may be just fine to signal desired errors. I'd suggest you run it with the lower accuracy for a bit and see if it misses key errors, then make the call on if/how to increase the accuracy, based on the suggestions above.

@gselsidi
Copy link

gselsidi commented Nov 9, 2022

Don't know anymore screenshots behave weirdly, just created baseline images on mac m1, had a coworker run them on his intel based mac they all passed with super strict .0001 and 0 threshold. Yesterday co worker with same m1 would get constant failures

@nickofthyme
Copy link
Contributor Author

nickofthyme commented Nov 9, 2022

I don't recall ever seeing inconsistencies in failures (flakiness) for the same screenshots across differing CPU architectures. But I have seen flakiness related to the nature of the screenshot itself specifically when using animations.

@aslushnikov aslushnikov removed the v1.28 label Nov 11, 2022
@gselsidi
Copy link

gselsidi commented Dec 12, 2022

@aslushnikov

yeah how can i give them to you without uploading them here publicly? or should i just upload and delete them after you've saved them?

@aslushnikov
Copy link
Collaborator

@gselsidi feel free to send them over to anlushni[at]microsoft[dot]com

@aslushnikov
Copy link
Collaborator

Allow to global mask elements in config and inside test in test.use sometimes it's useful while there are changes being made to the site to mask new elements ect, and you can update in the future, instead of masking elements at each individual step.

@@gselsidi Please file a separate feature request for this!

@gselsidi
Copy link

gselsidi commented Dec 15, 2022

did you receive my email with all the files?

@aslushnikov
Copy link
Collaborator

@gselsidi just checked - yes I did! The latest version of the comparator passes all your samples though. Could you please check with npm i @playwright/test@1.29.0-alpha-dec-13-2022?

@gselsidi
Copy link

I was on dec 09, this new version dec 13 passes. Also what else can we achieve with this?

Would headed vs headless pass?

Non docker images vs docker images pass?

Haven't gotten around to testing it that in depth yet, or would those scenarios still fail?

@aslushnikov
Copy link
Collaborator

aslushnikov commented Dec 15, 2022

@gselsidi thank you for checking!

Also what else can we achieve with this?

The new comparator is designed to handle browser rendering non-determinism, so as long as the page layout is exactly the same, the screenshots should pass now.

Do note though that for the layout to be the same, font metrics must be the same as well. Otherwise, line wrapping might happen in different places, boxes will have different sizes, and the images will in fact be different even for human eye.

Would headed vs headless pass?

According to some exploration I did back in the days, headless vs headed rendering differences consist of the following nuances:

  1. different font stacks: this should be fixed as of Dec 2022
  2. different anti-aliasing: this should be fixed by the new comparator
  3. showing vs hiding scrollbars: there's an open bug for this

We didn't aim to fix this though and we didn't experiment yet.

Non docker images vs docker images pass?

This won't work since browsers inside and outside docker will use different font stacks, resulting in different font metrics, resulting in different layout, and finally, yielding visually different screenshots.

@pastelsky
Copy link
Contributor

I ran playwright on all CSS tests from the Web Platform Test Suite, with a threshold of 0.05 and maxDiffPixels of 0. The images were first generated locally on M1 and tests ran on GitHub Actions (Intel), and the results were quite good. I didn't notice any specific failures due to rendering differences between CPU arch (most of the failures were due to race conditions of capture) in the ones I scanned.

The run results are here —
(Look at the Print URL step for HTML reports)

Pixelmatch WPLAT
SSIM WPLAT

With SSIM I did notice a few cases where the actual and expected looked exactly the same, but there was a large diff —
https://pastelsky.github.io/wplat-playwright-tests/ssim/css-grid/index.html#?testId=2189cda207cfc3edfe7f-82682e3c967e85b9522c

Not sure what's happening here, but looks like the images being displayed aren't the ones actually used for comparison (some race condition?)

The number of failures with a threshold of 0 was a lot more, but it would usually be anti-aliasing issues on the text and around the edges of shapes — I doubt setting it that low is realistic.

Caveat: Web platform tests test breadth, and render cases aren't always complex.

@pastelsky
Copy link
Contributor

pastelsky commented Feb 12, 2023

On a related note,
I did notice font stack between the official docker images offered by Playwright, and Ubuntu is quite different.
e.g.

(Non-latin)
https://pastelsky.github.io/wplat-playwright-tests/ubuntu-change/css-content/index.html#?testId=8e873656b9d1ccb2286a-5929271583149bc0602e

(Monospace)
https://pastelsky.github.io/wplat-playwright-tests/ubuntu-change/css-backgrounds/index.html#?testId=afc163fe806768d40fba-6f0e37f2580394646066

It'd be convenient if the official docker image set fonts to something that is available on most Linux machines. Currently, it seems to use an obscure font (WenQuanYi Zen Hei) for rendering monospace fonts.

@github-daniel-mcdonald
Copy link

Reposting what I saw here

Was this released in 1.30.0? I'm still not seeing the comparator option after pulling in the latest stable release

Has ssim-cie94 been removed or not yet released?

@aslushnikov
Copy link
Collaborator

@github-daniel-mcdonald the comparator is still under experiment / development.

@JPtenBerge
Copy link

JPtenBerge commented Apr 11, 2023

TL;DR: please try out the new experimental "ssim-cie94" image comparator.

...
The new image comparator is called "ssim-cie94". You can enable it in playwright config:

// playwright.config.ts
import type { PlaywrightTestConfig } from '@playwright/test';
const config: PlaywrightTestConfig = {
  expect: {
    toMatchSnapshot: { comparator: 'ssim-cie94', },
    toHaveScreenshot: { comparator: 'ssim-cie94', },
  }, 
};
export default config;

@aslushnikov Thanks for this tip. I've npm install-ed the current latest alpha (1.33.0-alpha-apr-10-2023), but am I correct in that TypeScript doesn't recognize this new comparator property in the config yet? This is of course an easy wockaround:

expect: {
	toMatchSnapshot: { comparator: 'ssim-cie94' } as any,
	toHaveScreenshot: { comparator: 'ssim-cie94' } as any,
},

But as the property is not recognized, it does bring up the question, is the new comparator being used at all or is this just a property that's not read by the Playwright runner? Is there any way by which I can confirm the new comparator being used, anything in the logs or in the HTML report to look for that mentions ssim-cie94?

@aslushnikov
Copy link
Collaborator

aslushnikov commented Apr 11, 2023

@JPtenBerge it is currently available as _comparator with underscore and is not officially released yet since we don't have enough feedback to release it.

@JPtenBerge
Copy link

JPtenBerge commented Apr 11, 2023

@JPtenBerge it is currently available as _comparator with underscore and is not officially released yet since we don't have enough feedback to release it.

Ah, great, that's doing something!

To other readers: _comparator is also not recognized as a property in my TypeScript file, I'm still using any here, but when I enter non-existent comparators like this:

expect: {
	toMatchSnapshot: { _comparator: 'loremipsum' } as any,
	toHaveScreenshot: { _comparator: 'loremipsum' } as any,
},

It clearly gives an error: Error: Configuration specifies unknown comparator "loremipsum", so it does appear to be used.

@aslushnikov: Thanks!

@aslushnikov
Copy link
Collaborator

@JPtenBerge I'd appreciate if you could share feedback on how it goes for you; things that worked and things that didn't work!

@JPtenBerge
Copy link

@aslushnikov Will do. Have some other work to focus on first, but I hope to experiment a bit further somewhere later on this week.

@michalstrzelecki
Copy link

Hi! I am also using ssim-cie94 comparator and struggle for a while. I was updating PW from a beta build released in december to the 1.36. It was very hard to find that comparator was renamed to _comparator.

@aslushnikov
Copy link
Collaborator

aslushnikov commented Jul 14, 2023

@michalstrzelecki I'm sorry for the inconvenience. This is an experimental feature that has never been released or documented.

I'd also encourage everybody to share their experience with the ssim-cie94 comparator. Your feedback is crucial when it comes to releasing / sunsetting this experiment.

@nickofthyme
Copy link
Contributor Author

nickofthyme commented Jul 20, 2023

Finally had a chance to test this and found a pretty large reduction in failures (~83%) but not yet good enough to move away from single arch testing/updating. See elastic/elastic-charts#2110 (comment).

Most failures with ssim-cie94 are small <0.001 diffs related to fonts, lines or svgs, from a cursory review.

@aslushnikov make sure to download the html reports before they expire.

@aslushnikov
Copy link
Collaborator

@nickofthyme thank you so much for trying out the comparator!

I'm going over the HTML report, and so far majority of the failures actually look legit to me.

  • all the annotations_stories.test.ts have a red dot in the top-left corner which is different between actual/expected.
  • same for the bar_stories.test.ts
  • the mixed_stories.test.ts tests have an eye icon rendered in different color. The color difference is pretty striking to an eye; do you actually want this test to pass?
  • all tests in test_cases_stories.test.ts are actually failing for a good reason: the layout is different in these tests.
  • the failures in all.test.ts all look legit to me as well
  • wordcloud_stories.test.ts - all different layouts
  • area_stories.test.ts - eye icon is again rendered in different color, similarly to mixed-stories.test.ts.
  • legend_stories.test.ts - one test has extra red dot, another test has the "eye" icon
  • axis_stories.test.ts - extra red dot
  • chart.test.ts - different layout
  • dark_mode.test.ts - the "ASIA" wheel is actually rendered slightly differently. I wonder if this is supposed to be a passing test as well?

Two tests caught my eye: goal_stories.test.ts and stylings_stories.test.ts. These are failing, but could be fixed if we make the anti-aliasing detection algorithm slightly more vague. Let us consider exposing this setting to control anti-aliasing aggressiveness.

@markov00
Copy link

Hei @aslushnikov @nickofthyme few things here I've noticed:

Actually that "red dot" is a dash of the red dashed line, it is really strange that in one architecture it renders that and on the other not. Looks like a different algorithm to me that in x86 renders only full dashes, were in ARM always renders them and cut to the edges if needed.

The layout differences are possible only due to a difference in how text metrics are computed in SVG. I believe that this is something that can't be fixed with a comparator and is only specific to the browser underlying implementation. I already tested multiple time how different browsers handle differently the measurements of the same text with the same font and I strongly believe this is the reason for these failures.

I also noticed different text ligature of arab language across charts, need more investigation here.

The different color icon: strange fact here, I will investigate more, but testing with two real machines (x84 and arm) same browser, same OS, they render correctly with the same svg fill color.

@gajus
Copy link

gajus commented Oct 18, 2023

@aslushnikov It would be nice if this option was added behind experimental option or some other way that does not require disabling linting.

await expect(page).toHaveScreenshot({
-  // https://github.com/microsoft/playwright/issues/20097#issuecomment-1382672908
-  // @ts-expect-error experimental feature
-  _comparator: 'ssim-cie94',
+ experimental: { comparator: 'ssim-cie94', }
  clip: box,
  fullPage: true,
});

@sweetcv
Copy link

sweetcv commented Dec 12, 2023

Hey folks, we're using the experimental { comparator: 'ssim-cie94' } to address some inconsistency with the Wbkit visual testing but faced an issue on Firefox Desktop (only) we can't handle. The comparator gives us a false comparison of these images.

dashboard-1-actual
dashboard-1-expected
dashboard-1-diff

UPD1: We handled the difference on the box-shadow but there's still a comparison error giving this diff for the logo.
image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests