Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[system] Add browser benchmark #22923

Merged
merged 51 commits into from Oct 12, 2020
Merged

Conversation

mnajdova
Copy link
Member

@mnajdova mnajdova commented Oct 7, 2020

This PR adds browser benchmark for the system performance. Try it with running the following command on the root of the project:

yarn benchmark

It will build the benchmark/ project, serve the root and start the scripts/benchmark.js script.

scripts/perf.js Outdated
Comment on lines 32 to 34
var start = performance.now();
const page = await browser.openPage(`http://${SERVER}:${PORT}/${APP}?${testCase}`);
const end = performance.now();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a way we can benchmark what's happening in the browser and send that back? Otherwise we'll add a lot of noise from what the browser is doing that is unrelated to the actual benchmark.

Seems like there's plenty of existing APIs for that: https://addyosmani.com/blog/puppeteer-recipes/#runtime-perf-metrics

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks much better, let me refactor to use the metrics()

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might also be interesting to compare these metrics() to React's Profiler component (see also How to use profiling in production).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Refactored using

    const perf = await page.evaluate(_ => {
      const { loadEventEnd, navigationStart } = performance.timing;
      return loadEventEnd - navigationStart;
    });

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me compare the numbers now with the React's Profiler component :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added the Profiler on the box-emotion example. Profiler result:

actualDuration: 19.274999969638884
baseDuration: 15.385000151582062
commitTime: 119.98000007588416
id: "box-emotion"
interactions: Set(0) {}
phase: "mount"
startTime: 98.64500002004206

perf result (10 different runs; this includes the component as well):

Box emotion:

69.00ms
68.00ms
90.00ms
78.00ms
44.00ms
44.00ms
57.00ms
56.00ms
56.00ms
56.00ms
-------------
Avg: 61.80ms

Let me try this on more examples to see if the relative differences are similar.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here are more numbers:

  1. box-emotion
Box emotion:

69.00ms
67.00ms
77.00ms
93.00ms
43.00ms
57.00ms
56.00ms
56.00ms
57.00ms
43.00ms
-------------
Avg: 61.80ms
actualDuration: 18.609998980537057
baseDuration: 15.669998829253018
commitTime: 110.84500001743436
id: "box-emotion"
interactions: Set(0) {}
phase: "mount"
startTime: 90.1650000596419
  1. box-styled-components
Box styled-components:

67.00ms
63.00ms
64.00ms
80.00ms
97.00ms
61.00ms
61.00ms
47.00ms
48.00ms
48.00ms
-------------
Avg: 63.60ms
actualDuration: 15.089999069459736
baseDuration: 12.439999729394913
commitTime: 96.09999996609986
id: "naked-styled-components"
interactions: Set(0) {}
phase: "mount"
startTime: 79.29499994497746
  1. box @material-ui/styles
Box @material-ui/styles:

123.00ms
122.00ms
89.00ms
103.00ms
103.00ms
85.00ms
121.00ms
123.00ms
84.00ms
84.00ms
-------------
Avg: 103.70ms
actualDuration: 44.76000100839883
baseDuration: 41.65500064846128
commitTime: 130.44999993871897
id: "box-material-ui-system"
interactions: Set(0) {}
phase: "mount"
startTime: 83.49999994970858
  1. naked-styled-components
Box styled-components:

67.00ms
63.00ms
64.00ms
80.00ms
97.00ms
61.00ms
61.00ms
47.00ms
48.00ms
48.00ms
-------------
Avg: 63.60ms
actualDuration: 15.089999069459736
baseDuration: 12.439999729394913
commitTime: 96.09999996609986
id: "naked-styled-components"
interactions: Set(0) {}
phase: "mount"
startTime: 79.29499994497746

I am not sure how to compare the numbers of both as they are quite different... But I can conclude from both measurements for example that @material-ui/styles' Box is slower than the rest, not much else... We can try to render more instances of the components too, currently I am rendering 100 boxes. @eps1lon what do you think?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd keep both: Browser metrics and React metrics.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added with 02fc972

@mui-pr-bot
Copy link

mui-pr-bot commented Oct 7, 2020

No bundle size changes comparing 72ceb2d...2df44b7

Generated by 🚫 dangerJS against 2df44b7

test/perf/tests/box-emotion/index.js Outdated Show resolved Hide resolved
package.json Outdated Show resolved Hide resolved
scripts/perf.js Outdated Show resolved Hide resolved
Copy link
Member

@oliviertassinari oliviertassinari left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The solution looks simple and seems to do the job. Nice. Regarding the folder location. Does it mean that we should move /packages/material-ui-benchmark outside of the packages folder in the future?

test/perf/tests/box-emotion/index.js Outdated Show resolved Hide resolved
test/perf/webpack.config.js Outdated Show resolved Hide resolved
test/perf/tests/styled-system-spaces/index.js Outdated Show resolved Hide resolved
test/perf/tests/styled-system-spaces/index.js Outdated Show resolved Hide resolved
mnajdova and others added 3 commits October 7, 2020 15:56
Co-authored-by: Olivier Tassinari <olivier.tassinari@gmail.com>
@eps1lon
Copy link
Member

eps1lon commented Oct 8, 2020

I have mixed up "mean" with "median" which are both two different types of averages.

I understood what you meant. I meant median as well (hence percentile). Median is not a type of average.

@oliviertassinari
Copy link
Member

@eps1lon I guess it depends on the definition. In https://en.wikipedia.org/wiki/Average, they defined an average as "a single number taken as representative of a list of numbers" 🤷‍♂️ 🙃

@eps1lon
Copy link
Member

eps1lon commented Oct 9, 2020

@eps1lon I guess it depends on the definition.

You conveniently skipped the first part of that quote: "colloquial" which is not the same as a definition. With "mean" I was being colloquial and meant the "arithmetic mean". But I'm not sure why you deflected to "definitions of average".

We want percentiles in benchmarks not discuss what an average is.

@mnajdova
Copy link
Member Author

I was confused to find the benchmark concern not self-contained. It took me a bit of time to understand that they were logic spread between two different folders. I was expecting everything inside /benchmark

Move the script to the /benchmark/scripts folder. In the future if we want to add more tests like this, we can have benchmark:system that runs the system scripts for example as a pattern.

If you create a scenario that does nothing:

export default function Noop() {
 return null;
}

It reports Avg: 35.00ms. What are we testing? There is an upfront bias that isn't fully stable:

49.00ms
34.00ms
33.00ms
37.00ms
33.00ms
34.00ms
33.00ms
32.00ms
32.00ms
33.00ms

On this to be honest, I am interested in the relative difference between the scenarios we are running. If we have some 30 - 40ms noise on each scenario it won't affect the relative difference. Should we add this noop scenario as a reference, or run it and extract the difference from all examples? I wouldn't change anything regarding this at this moment to be honest. And we can experiment with adding the Profiler numbers later on.

It would be great to log the results of each test as they are running. During the first run, I thought that something was wrong. Turns out, it was running but I needed to wait longer.

Done 👍

It seems that for pure JavaScript object manipulations, relying on Node.js + Benchmark as done in /packages/material-ui-benchmark/, or https://github.com/smooth-code/xstyled/blob/master/benchmarks/system.js, https://github.com/styled-system/styled-system/blob/c3cf5829f43749e688ca667263a51a5c8875d6f9/benchmarks/test.js is what's more reliable. However, it doesn't provide the whole story, once integrated inside styled-components/emotion.

Agree, we should use the browser benchmark for scenarios that run some react code in the browser. I will do another iteration for cleaning up the examples and leaving only the relevant in each of the benchmarks we have currently.

Maybe we could consider using a trace https://github.com/emotion-js/emotion/blob/c85378a204613885a356eaba1480c5151838c458/scripts/benchmarks/run.js#L62 (but the results might be hard to interpret).

I would go with the simplest thing at the start, and if we need to we can alter it in the future. What do you think?

I also wonder about the tradeoff to have the user click on buttons in the UI to run the tests, like done in https://necolas.github.io/react-native-web/benchmarks/ or what we have done so far to measure Unstyled vs JSS vs emotion or even running a page of the docs: https://material-ui.com/performance/table-component/.

If we see problem with this that would be the way to go. However, at this moment, until we see some potential problems, I would keep the command results as it is simpler and does not require any interaction.

I have run the tests suite 3 times to see how much reproducibility we can get

Again if we use this for the relative difference between the scenarios in one run, the numbers should be stable between them in my opinion. I wouldn't use this scenario for comparing with some scenario running in different environment anyway.

Should we consider the average or the mean median? It seems that mean if most frequently used.

Added median as well as average, I think for start those two numbers should be enough. We are anyway printing all results when running the scenario.

In addition to this, I've changed the components scenario to run 1000 instances of the component instead of 100 so that the numbers are showing biggest difference (tried 10000 as well, but the time for running them was too bug and the relative values were similar, so I decided to stay with 1000)

Copy link
Member

@oliviertassinari oliviertassinari left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am interested in the relative difference between the scenarios we are running. If we have some 30 - 40ms noise on each scenario it won't affect the relative difference.

Definitely, relative different works great with one limit (more in the next comment). I guess the next person that will look at the results needs to be aware that he can't do any absolute comparison between the results.

Should we add this noop scenario as a reference

I think that the main values of including the noop scenario is to raise awareness of the above (don't do an absolute comparison) and to know when your test case is slow enough. If the first test case runs for 5ms ± 1ms and the second first test case runs for 4ms ± 1ms but you have noop that takes 30ms ± 5ms. The uncertainty will likely be too high: 35ms ± 6ms vs. 36ms ± 6ms to do any comparison.

I would go with the simplest thing at the start, and if we need to we can alter it in the future. What do you think?

Agree, definitely for later (or even never).

@oliviertassinari
Copy link
Member

oliviertassinari commented Oct 11, 2020

I have tried to improve the precision of the measurement on mnajdova#10, to reduce the uncertainty. If we take the noop case, we go from 34ms ±9ms to 4.63ms ±0.25ms (I have used the mean for the baseline value and the difference between the slowest and faster run divided by two for the ± uncertainty). This was achieved using the following leverages:

  • use performance.now that gives sub-ms precision.
  • bypass the JavaScript loading, parsing, evaluation steps. We arguably already track that aspect of the performance with the bundle size.
  • measure the time spend up to as soon as the layout is finished.

@eps1lon do you know how we can measure up to the composite stage? https://developers.google.com/web/fundamentals/performance/rendering But maybe we don't want to do that as it would only introduce uncertainty without giving us measurement on something we can optimize.

@mnajdova
Copy link
Member Author

I've added noop scenario and README.md. I think we should merge the changes from mnajdova#10 - thanks @oliviertassinari for trying this different alternative

@github-actions github-actions bot added the PR: out-of-date The pull request has merge conflicts and can't be merged label Oct 12, 2020
@github-actions github-actions bot added PR: out-of-date The pull request has merge conflicts and can't be merged and removed PR: out-of-date The pull request has merge conflicts and can't be merged labels Oct 12, 2020
@github-actions github-actions bot removed the PR: out-of-date The pull request has merge conflicts and can't be merged label Oct 12, 2020
@mnajdova
Copy link
Member Author

@eps1lon @oliviertassinari merging this one, had to rebase two times this morning because of conflicts in yarn.lock. If there are other suggestions, I'll do a second iteration.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants