Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Showing JS-only, Browser-only, and JS+Browser measurements #1233

Closed
fabiospampinato opened this issue Apr 8, 2023 · 21 comments
Closed

Showing JS-only, Browser-only, and JS+Browser measurements #1233

fabiospampinato opened this issue Apr 8, 2023 · 21 comments

Comments

@fabiospampinato
Copy link
Contributor

As I understand it tests like "create many rows" measure the entire change, not just how long it took for the javascript to execute, but also how long it took for style recalculations, layout recalculations, and painting.

If that's the case It'd be interesting to be able to see results under 3 different filters, browser-only measurements (to check if the implementation has abnormal layout recalculations for example), javascript-only measurements (to check how much work the framework is actually doing, which assuming everything else is normal is largely what actually matters in the benchmark), and combined view where everything is taken into account.

The more practical motivation is that I think I might know how to half the amount of javascript work that is needed in some unkeyed tests (maybe some keyed tests also, depending on the exact definition one is using), but if the number displayed is dwarfed by style/layout/paint calculations it will seem like not much, even though it's actually pretty significant.

@leeoniya
Copy link
Contributor

leeoniya commented Apr 8, 2023

i've been championing this for a while, along with reducing 16x slowdown to maybe 4x, and reducing the DOM size. so far there's been little enthusiasm.

even very heavy apps like Slack consistently stay well below 10k dom nodes, while this bench generates 88k for the "create 10k rows + append 1k" metric.

see #403 (comment), #403 (comment)

@fabiospampinato
Copy link
Contributor Author

Potentially if the purpose of the slowdown is to make some tests weigh more than they otherwise would on the average the slowdown could just be deleted and the result could be multiplied by some constant number, I guess it should achieve the same result, perhaps lowering the amount of time it takes to run the benchmark also? I'm not exactly sure how the slowdown is implemented in Chrome also, it may be more reliable just to multiply the normal result by some constant.

80k+ nodes is kind of absurd, though I think the point of that test is more see how the framework scales, and perhaps to magnify small problems that might exist at lower numbers but go unnoticed. We could probably just switch to 100 rows base case and 1000 rows "worst" case without losing too much information. Maybe that'd be good if it makes the benchmark significantly cheaper to run.

@unmeimusu

This comment was marked as off-topic.

@krausest
Copy link
Owner

krausest commented May 5, 2023

I investigated measuring JS script duration only. The idea is to compute it as the delta of page.metrics().ScriptDurationfor puppeteer and the CDP command for Performance.getMetrics(

) since I haven't found a way to extract this directly from the traces.

Here's an example for alpine and create rows. The total duration is 99 and the JS script duration 59 msecs. The test driver reports 104 msecs and 62.5 msecs respectively, which sounds plausible.
alpine
Same for voby is an example for a total duration of 43.15 and a JS script duration of 5 msecs. The test driver reports 41.4 msecs for the total duration and 4.9 msecs for the script duration, which also sounds good.
voby
Here's a comparison for some frameworks that seem plausible.

Bildschirmfoto 2023-05-05 um 9 17 55 PM

I did this for all frameworks during the chrome 113 run, but some results are just too good to believe and must be investigated:
Bildschirmfoto 2023-05-05 um 9 20 57 PM
(Values reported as 0.0 are reported as zero due to rounding, not by actually having a duration of 0)

@krausest
Copy link
Owner

krausest commented May 5, 2023

For ember I'd actually expect ~28 msecs:
Bildschirmfoto 2023-05-05 um 9 29 34 PM

Hmmm. Even if I add a large sleep after run benchmark both puppeteer and playwright report something like ScriptDuration = 0.027602 before runBenchmark and 0.028274 after, which yields a duration of 0.672 mescs. The timestamps shows that values are about 1 sec apart (due to the wait). Not good.

@leeoniya
Copy link
Contributor

leeoniya commented May 5, 2023

great to see some progress!

i would also make sure to include any gc cost in this if it's not part of "script" already. i've seen it under "system" in chrome's profiler summary (especially forced gc at end of a run)

@ClassicOldSong
Copy link
Contributor

One opinion: how the script manipulates DOM still counts. One framework takes much less time scripting but with much more duplicated DOM operations, while one other may take some more time scripting but significantly less time with DOM operations. These values can be measured separately, but must be calculated together.

@fabiospampinato
Copy link
Contributor Author

fabiospampinato commented May 6, 2023

To give higher cost to inefficient DOM operations one can just add 10 mutation observers on the page or something like that, those things are slow.

Problem is ~everybody uses .appendChild because it's faster for no good reason, when measured in isolation, while .append is faster if there are any mutation observer on the page, potentially faster by a huge amount.

@ClassicOldSong
Copy link
Contributor

That's still not a very good solution to simulate slow repaint/reflows. When moving empty text nodes around, it's costing almost nothing if there's no mutation observers, but with multiple observers added, it's adding unrealistic costs to these originally cheap operations.

@leeoniya
Copy link
Contributor

leeoniya commented May 6, 2023

One opinion: how the script manipulates DOM still counts.

agreed. this whole js/gc-only measurement exercise only makes sense for frameworks that already do near-optimal DOM ops with identical repaint/reflow costs.

however, there could be a different approach here.

instead of trying to measure script execution directly, why dont we measure the fastest possible restyle+reflow+paint and simply subtract this from the totals. that will hopefully cover all cases, including frameworks that do duplicate/inefficient dom ops.

@leeoniya
Copy link
Contributor

leeoniya commented May 6, 2023

here's another interesting project: https://github.com/yamiteru/isitfast

@ClassicOldSong
Copy link
Contributor

ClassicOldSong commented May 6, 2023

instead of trying to measure script execution directly, why dont we measure the fastest possible restyle+reflow+paint and simply subtract this from the totals. that will hopefully cover all cases, including frameworks that do duplicate/inefficient dom ops.

That's an interesting idea, but it's hard to determine what is the fastest possible time. We have multiple different implementations with different pros and cons, different approaches may have different limitations from each other. It's very hard to find an even ground for these measures.

@leeoniya
Copy link
Contributor

leeoniya commented May 6, 2023

We have multiple different implementations with different pros and cons

this benchmark does not attempt to pick the fastest framework in all categories. there will likely never be any implementation that has 1.00 across the board. each metric is ranked in isolation, which makes the proposed approach conisistent with how it already works.

@krausest
Copy link
Owner

krausest commented May 8, 2023

I implemented a first version that tries to compute script duration from the trace files, since I couldn't get reasonable values from performance.getMetrics:

It seems to work for some nasty cases like ui5-webcomponents:
Bildschirmfoto 2023-05-08 um 9 47 30 PM
The script duration is the sum of the two yellow top level boxes and yields 13.082 which corresponds to chrome's script duration.
In the preview you can choose the duration measurement mode in the dropdown area: total duration which means the time elapsed between the start of the click event and the end of repainting and script duration which should roughly equal the sum of all yellow boxes (except garbage collection) on the highest level: https://krausest.github.io/js-framework-benchmark/current.html (maybe you have to clear the cache to get the latest version)

Looking forward to your feedback. I haven't checked enough values yet to be confident that all values are correct.

@leeoniya
Copy link
Contributor

leeoniya commented May 8, 2023

i'm not sure if script-only is a good metric. i do think that "everything except baseline reflow+restlyle+repaint" is what we need. the difference for swap total vs only-js does not capture the extra ~110ms of DOM/GC inefficiency in the latter:

total:

image

js-only:

image

@ClassicOldSong
Copy link
Contributor

As the author of ef.js, I lost my first place of swap rows in only-js, that's not good 😈

Kidding, but I agree that GC time should be taken account into scripting.

And still, what baseline should we take for reflow+restlyle+repaint?

@leeoniya
Copy link
Contributor

leeoniya commented May 8, 2023

And still, what baseline should we take for reflow+restlyle+repaint?

whatever implementation is fastest in reflow+restlyle+repaint for each metric.

@krausest
Copy link
Owner

I came back to that issue. We have an established way to compute total duration (end of paint - start of click). I created a way to measure JS duration (sum of the duration of all events that are "EventDispatch", "EvaluateScript", "v8.evaluateModule", "FunctionCall", "TimerFire", "FireIdleCallback", "FireAnimationFrame", "RunMicrotasks", "V8.Execute"). This gives the table above. And it seems to be close to what chrome displays as scripting.

If Browser-ony meant total duration minus js duration results get odd. Miso is fastest for create 1k with 56 msecs total duration, 23 msecs js duration (giving 33 msecs browser only), whilst vanillajs has 39 msecs total duration and 2 msecs js duration which gives 39 msecs browser only. Sorted by create 1 k the table looks like that:
Screenshot 2023-11-25 at 10 38 55 AM

It seems to make little sense for create 1,000 rows. Some benchmarks are a little more interesting like remove row:
Screenshot 2023-11-25 at 6 47 15 PM

Maybe it makes more sense to compute the lengths of all painting and layout intervals as a third factor?

The current results table lets one play with the three options.

@krausest
Copy link
Owner

I added a "only render duration" selection. It computes the duration as the sum of all intervals for the "UpdateLayoutTree", "Layout", "Commit", "Paint", "Layerize", "PrePaint" events.

I haven't had the time to do a big quality check yet, so please take the results with a grain of salt.

@krausest
Copy link
Owner

krausest commented Nov 26, 2023

I ran a check that total time >= script time + paint time. That assertion holds for most runs, except 21 traces (openui is causing 15 of those for replace rows. All other cases are below 1 msec difference, so they don't matter much). The trace looks like that:
Screenshot 2023-11-26 at 11 20 45 AM
My interval logic computes:
total duration=45.898
script duration=23.182
paint duration=32.722
Those numbers look right, but sum of script + paint is misleading though.
The issue seems to be that script evaluation and recalculate style happen in parallel (or recalculate style is called from script evaluation).

@krausest
Copy link
Owner

The result table allows to choose total duration, only JS duration and render duration.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants