[system] Add browser benchmark #22923

mnajdova · 2020-10-07T09:25:42Z

This PR adds browser benchmark for the system performance. Try it with running the following command on the root of the project:

yarn benchmark

It will build the benchmark/ project, serve the root and start the scripts/benchmark.js script.

eps1lon · 2020-10-07T09:36:12Z

scripts/perf.js

+    var start = performance.now();
+    const page = await browser.openPage(`http://${SERVER}:${PORT}/${APP}?${testCase}`);
+    const end = performance.now();


Is there a way we can benchmark what's happening in the browser and send that back? Otherwise we'll add a lot of noise from what the browser is doing that is unrelated to the actual benchmark.

Seems like there's plenty of existing APIs for that: https://addyosmani.com/blog/puppeteer-recipes/#runtime-perf-metrics

This looks much better, let me refactor to use the metrics()

It might also be interesting to compare these metrics() to React's Profiler component (see also How to use profiling in production).

Refactored using

const perf = await page.evaluate(_ => { const { loadEventEnd, navigationStart } = performance.timing; return loadEventEnd - navigationStart; });

Let me compare the numbers now with the React's Profiler component :)

I've added the Profiler on the box-emotion example. Profiler result:

actualDuration: 19.274999969638884 baseDuration: 15.385000151582062 commitTime: 119.98000007588416 id: "box-emotion" interactions: Set(0) {} phase: "mount" startTime: 98.64500002004206

perf result (10 different runs; this includes the component as well):

Box emotion: 69.00ms 68.00ms 90.00ms 78.00ms 44.00ms 44.00ms 57.00ms 56.00ms 56.00ms 56.00ms ------------- Avg: 61.80ms

Let me try this on more examples to see if the relative differences are similar.

Here are more numbers:

box-emotion

Box emotion: 69.00ms 67.00ms 77.00ms 93.00ms 43.00ms 57.00ms 56.00ms 56.00ms 57.00ms 43.00ms ------------- Avg: 61.80ms

actualDuration: 18.609998980537057 baseDuration: 15.669998829253018 commitTime: 110.84500001743436 id: "box-emotion" interactions: Set(0) {} phase: "mount" startTime: 90.1650000596419

box-styled-components

Box styled-components: 67.00ms 63.00ms 64.00ms 80.00ms 97.00ms 61.00ms 61.00ms 47.00ms 48.00ms 48.00ms ------------- Avg: 63.60ms

actualDuration: 15.089999069459736 baseDuration: 12.439999729394913 commitTime: 96.09999996609986 id: "naked-styled-components" interactions: Set(0) {} phase: "mount" startTime: 79.29499994497746

box @material-ui/styles

Box @material-ui/styles: 123.00ms 122.00ms 89.00ms 103.00ms 103.00ms 85.00ms 121.00ms 123.00ms 84.00ms 84.00ms ------------- Avg: 103.70ms

actualDuration: 44.76000100839883 baseDuration: 41.65500064846128 commitTime: 130.44999993871897 id: "box-material-ui-system" interactions: Set(0) {} phase: "mount" startTime: 83.49999994970858

naked-styled-components

Box styled-components: 67.00ms 63.00ms 64.00ms 80.00ms 97.00ms 61.00ms 61.00ms 47.00ms 48.00ms 48.00ms ------------- Avg: 63.60ms

actualDuration: 15.089999069459736 baseDuration: 12.439999729394913 commitTime: 96.09999996609986 id: "naked-styled-components" interactions: Set(0) {} phase: "mount" startTime: 79.29499994497746

I am not sure how to compare the numbers of both as they are quite different... But I can conclude from both measurements for example that @material-ui/styles' Box is slower than the rest, not much else... We can try to render more instances of the components too, currently I am rendering 100 boxes. @eps1lon what do you think?

I'd keep both: Browser metrics and React metrics.

Added with 02fc972

mui-pr-bot · 2020-10-07T09:36:57Z

No bundle size changes comparing 72ceb2d...2df44b7

Generated by 🚫 dangerJS against 2df44b7

test/perf/tests/box-emotion/index.js

package.json

scripts/perf.js

oliviertassinari

The solution looks simple and seems to do the job. Nice. Regarding the folder location. Does it mean that we should move /packages/material-ui-benchmark outside of the packages folder in the future?

test/perf/tests/box-emotion/index.js

test/perf/webpack.config.js

test/perf/tests/styled-system-spaces/index.js

Co-authored-by: Olivier Tassinari <olivier.tassinari@gmail.com>

eps1lon · 2020-10-08T19:06:06Z

I have mixed up "mean" with "median" which are both two different types of averages.

I understood what you meant. I meant median as well (hence percentile). Median is not a type of average.

oliviertassinari · 2020-10-08T19:12:02Z

@eps1lon I guess it depends on the definition. In https://en.wikipedia.org/wiki/Average, they defined an average as "a single number taken as representative of a list of numbers" 🤷‍♂️ 🙃

eps1lon · 2020-10-09T08:47:36Z

@eps1lon I guess it depends on the definition.

You conveniently skipped the first part of that quote: "colloquial" which is not the same as a definition. With "mean" I was being colloquial and meant the "arithmetic mean". But I'm not sure why you deflected to "definitions of average".

We want percentiles in benchmarks not discuss what an average is.

mnajdova · 2020-10-11T08:26:31Z

I was confused to find the benchmark concern not self-contained. It took me a bit of time to understand that they were logic spread between two different folders. I was expecting everything inside /benchmark

Move the script to the /benchmark/scripts folder. In the future if we want to add more tests like this, we can have benchmark:system that runs the system scripts for example as a pattern.

If you create a scenario that does nothing:
export default function Noop() {
 return null;
}
It reports Avg: 35.00ms. What are we testing? There is an upfront bias that isn't fully stable:

49.00ms
34.00ms
33.00ms
37.00ms
33.00ms
34.00ms
33.00ms
32.00ms
32.00ms
33.00ms

On this to be honest, I am interested in the relative difference between the scenarios we are running. If we have some 30 - 40ms noise on each scenario it won't affect the relative difference. Should we add this noop scenario as a reference, or run it and extract the difference from all examples? I wouldn't change anything regarding this at this moment to be honest. And we can experiment with adding the Profiler numbers later on.

It would be great to log the results of each test as they are running. During the first run, I thought that something was wrong. Turns out, it was running but I needed to wait longer.

Done 👍

It seems that for pure JavaScript object manipulations, relying on Node.js + Benchmark as done in /packages/material-ui-benchmark/, or https://github.com/smooth-code/xstyled/blob/master/benchmarks/system.js, https://github.com/styled-system/styled-system/blob/c3cf5829f43749e688ca667263a51a5c8875d6f9/benchmarks/test.js is what's more reliable. However, it doesn't provide the whole story, once integrated inside styled-components/emotion.

Agree, we should use the browser benchmark for scenarios that run some react code in the browser. I will do another iteration for cleaning up the examples and leaving only the relevant in each of the benchmarks we have currently.

Maybe we could consider using a trace https://github.com/emotion-js/emotion/blob/c85378a204613885a356eaba1480c5151838c458/scripts/benchmarks/run.js#L62 (but the results might be hard to interpret).

I would go with the simplest thing at the start, and if we need to we can alter it in the future. What do you think?

I also wonder about the tradeoff to have the user click on buttons in the UI to run the tests, like done in https://necolas.github.io/react-native-web/benchmarks/ or what we have done so far to measure Unstyled vs JSS vs emotion or even running a page of the docs: https://material-ui.com/performance/table-component/.

If we see problem with this that would be the way to go. However, at this moment, until we see some potential problems, I would keep the command results as it is simpler and does not require any interaction.

I have run the tests suite 3 times to see how much reproducibility we can get

Again if we use this for the relative difference between the scenarios in one run, the numbers should be stable between them in my opinion. I wouldn't use this scenario for comparing with some scenario running in different environment anyway.

Should we consider the average or the ~~mean~~ median? It seems that mean if most frequently used.

Added median as well as average, I think for start those two numbers should be enough. We are anyway printing all results when running the scenario.

In addition to this, I've changed the components scenario to run 1000 instances of the component instead of 100 so that the numbers are showing biggest difference (tried 10000 as well, but the time for running them was too bug and the relative values were similar, so I decided to stay with 1000)

oliviertassinari

I am interested in the relative difference between the scenarios we are running. If we have some 30 - 40ms noise on each scenario it won't affect the relative difference.

Definitely, relative different works great with one limit (more in the next comment). I guess the next person that will look at the results needs to be aware that he can't do any absolute comparison between the results.

Should we add this noop scenario as a reference

I think that the main values of including the noop scenario is to raise awareness of the above (don't do an absolute comparison) and to know when your test case is slow enough. If the first test case runs for 5ms ± 1ms and the second first test case runs for 4ms ± 1ms but you have noop that takes 30ms ± 5ms. The uncertainty will likely be too high: 35ms ± 6ms vs. 36ms ± 6ms to do any comparison.

I would go with the simplest thing at the start, and if we need to we can alter it in the future. What do you think?

Agree, definitely for later (or even never).

oliviertassinari · 2020-10-11T13:41:21Z

I have tried to improve the precision of the measurement on mnajdova#10, to reduce the uncertainty. If we take the noop case, we go from 34ms ±9ms to 4.63ms ±0.25ms (I have used the mean for the baseline value and the difference between the slowest and faster run divided by two for the ± uncertainty). This was achieved using the following leverages:

use performance.now that gives sub-ms precision.
bypass the JavaScript loading, parsing, evaluation steps. We arguably already track that aspect of the performance with the bundle size.
measure the time spend up to as soon as the layout is finished.

@eps1lon do you know how we can measure up to the composite stage? https://developers.google.com/web/fundamentals/performance/rendering But maybe we don't want to do that as it would only introduce uncertainty without giving us measurement on something we can optimize.

mnajdova · 2020-10-11T16:58:14Z

I've added noop scenario and README.md. I think we should merge the changes from mnajdova#10 - thanks @oliviertassinari for trying this different alternative

mnajdova · 2020-10-12T13:21:12Z

@eps1lon @oliviertassinari merging this one, had to rebase two times this morning because of conflicts in yarn.lock. If there are other suggestions, I'll do a second iteration.

mnajdova added 16 commits September 25, 2020 17:15

add emotion peer dependencies

c2d31cc

fixed types & tests

5ae933f

prettier

18b0668

Merge branch 'next' of https://github.com/mui-org/material-ui into next

f0ef95c

Merge branch 'next' of https://github.com/mui-org/material-ui into next

c7bebb8

Merge branch 'next' of https://github.com/mui-org/material-ui into next

92b2d6e

Merge branch 'next' of https://github.com/mui-org/material-ui into next

13da531

Merge branch 'next' of https://github.com/mui-org/material-ui into next

cf5d9a5

Merge branch 'next' of https://github.com/mui-org/material-ui into next

b8d1291

Merge branch 'next' of https://github.com/mui-org/material-ui into next

a9d8690

Merge branch 'next' of https://github.com/mui-org/material-ui into next

497830a

Merge branch 'next' of https://github.com/mui-org/material-ui into next

d50ea1e

Merge branch 'next' of https://github.com/mui-org/material-ui into next

0f6a4cd

wip

616bd28

spaces

bcd581e

Added all system tests

0fde5b1

eps1lon reviewed Oct 7, 2020

View reviewed changes

moved wait-on and concurrently to devDependencies

73105d0

eps1lon reviewed Oct 7, 2020

View reviewed changes

test/perf/tests/box-emotion/index.js Outdated Show resolved Hide resolved

package.json Outdated Show resolved Hide resolved

scripts/perf.js Outdated Show resolved Hide resolved

mnajdova added 4 commits October 7, 2020 12:05

fixed & improvements

520cbc5

moved serve + wait in perf

b779a2a

improved perf metrics

13c3f6e

improved emotion example

4dff549

github-actions bot mentioned this pull request Oct 7, 2020

Update snapshots for pr/22923 eps1lon/mui-scripts-incubator#1103

Closed

oliviertassinari added the performance label Oct 7, 2020

oliviertassinari reviewed Oct 7, 2020

View reviewed changes

mnajdova and others added 3 commits October 7, 2020 15:56

added Profiler

02fc972

Update test/perf/webpack.config.js

3ff75fd

Co-authored-by: Olivier Tassinari <olivier.tassinari@gmail.com>

Olivier's feedback

95cad05

mnajdova added 2 commits October 11, 2020 10:02

Olivier's comments

b4c0e13

lint, added 1000 components per scenario

32f1219

lint

f0c65a5

oliviertassinari approved these changes Oct 11, 2020

View reviewed changes

have sub millisecond measurments

9763fbd

added noop and README

05e55ec

mnajdova added 5 commits October 12, 2020 09:49

prettier

60aa699

Update benchmark/scripts/benchmark.js

0a835ef

Merge pull request #10 from oliviertassinari/sub-ms

da8bf24

Merge branch 'next' into feat/perf-browser-tests

73b06eb

formatted median print

6a94b91

github-actions bot added the PR: out-of-date The pull request has merge conflicts and can't be merged label Oct 12, 2020

mnajdova added 3 commits October 12, 2020 12:04

reverted dependencies

91dcbfe

Merge branch 'next' into feat/perf-browser-tests

ff5e451

conflicts

da39d72

github-actions bot added PR: out-of-date The pull request has merge conflicts and can't be merged and removed PR: out-of-date The pull request has merge conflicts and can't be merged labels Oct 12, 2020

mnajdova added 3 commits October 12, 2020 14:35

dependencies removed

4ed84c8

Merge branch 'next' into feat/perf-browser-tests

5292dba

conflicts

2df44b7

github-actions bot removed the PR: out-of-date The pull request has merge conflicts and can't be merged label Oct 12, 2020

mnajdova merged commit a4a4476 into mui:next Oct 12, 2020

mnajdova mentioned this pull request Oct 14, 2020

[benchmark] Fix benchmark scripts & moved scenarios to correct benchmark project #23058

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[system] Add browser benchmark #22923

[system] Add browser benchmark #22923

mnajdova commented Oct 7, 2020 •

edited

eps1lon Oct 7, 2020

mnajdova Oct 7, 2020

eps1lon Oct 7, 2020

mnajdova Oct 7, 2020

mnajdova Oct 7, 2020

mnajdova Oct 7, 2020

mnajdova Oct 7, 2020

eps1lon Oct 7, 2020

mnajdova Oct 7, 2020

mui-pr-bot commented Oct 7, 2020 •

edited

oliviertassinari left a comment

eps1lon commented Oct 8, 2020

oliviertassinari commented Oct 8, 2020

eps1lon commented Oct 9, 2020

mnajdova commented Oct 11, 2020

oliviertassinari left a comment •

edited

oliviertassinari commented Oct 11, 2020 •

edited

mnajdova commented Oct 11, 2020

mnajdova commented Oct 12, 2020

[system] Add browser benchmark #22923

[system] Add browser benchmark #22923

Conversation

mnajdova commented Oct 7, 2020 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mui-pr-bot commented Oct 7, 2020 • edited

oliviertassinari left a comment

Choose a reason for hiding this comment

eps1lon commented Oct 8, 2020

oliviertassinari commented Oct 8, 2020

eps1lon commented Oct 9, 2020

mnajdova commented Oct 11, 2020

oliviertassinari left a comment • edited

Choose a reason for hiding this comment

oliviertassinari commented Oct 11, 2020 • edited

mnajdova commented Oct 11, 2020

mnajdova commented Oct 12, 2020

mnajdova commented Oct 7, 2020 •

edited

mui-pr-bot commented Oct 7, 2020 •

edited

oliviertassinari left a comment •

edited

oliviertassinari commented Oct 11, 2020 •

edited