Showing JS-only, Browser-only, and JS+Browser measurements #1233

fabiospampinato · 2023-04-08T04:31:36Z

As I understand it tests like "create many rows" measure the entire change, not just how long it took for the javascript to execute, but also how long it took for style recalculations, layout recalculations, and painting.

If that's the case It'd be interesting to be able to see results under 3 different filters, browser-only measurements (to check if the implementation has abnormal layout recalculations for example), javascript-only measurements (to check how much work the framework is actually doing, which assuming everything else is normal is largely what actually matters in the benchmark), and combined view where everything is taken into account.

The more practical motivation is that I think I might know how to half the amount of javascript work that is needed in some unkeyed tests (maybe some keyed tests also, depending on the exact definition one is using), but if the number displayed is dwarfed by style/layout/paint calculations it will seem like not much, even though it's actually pretty significant.

leeoniya · 2023-04-08T19:28:15Z

i've been championing this for a while, along with reducing 16x slowdown to maybe 4x, and reducing the DOM size. so far there's been little enthusiasm.

even very heavy apps like Slack consistently stay well below 10k dom nodes, while this bench generates 88k for the "create 10k rows + append 1k" metric.

see #403 (comment), #403 (comment)

fabiospampinato · 2023-04-08T21:03:07Z

Potentially if the purpose of the slowdown is to make some tests weigh more than they otherwise would on the average the slowdown could just be deleted and the result could be multiplied by some constant number, I guess it should achieve the same result, perhaps lowering the amount of time it takes to run the benchmark also? I'm not exactly sure how the slowdown is implemented in Chrome also, it may be more reliable just to multiply the normal result by some constant.

80k+ nodes is kind of absurd, though I think the point of that test is more see how the framework scales, and perhaps to magnify small problems that might exist at lower numbers but go unnoticed. We could probably just switch to 100 rows base case and 1000 rows "worst" case without losing too much information. Maybe that'd be good if it makes the benchmark significantly cheaper to run.

krausest · 2023-05-05T19:25:07Z

I investigated measuring JS script duration only. The idea is to compute it as the delta of page.metrics().ScriptDurationfor puppeteer and the CDP command for Performance.getMetrics(

js-framework-benchmark/webdriver-ts/src/forkedBenchmarkRunnerPuppeteer.ts

Line 133 in 13f2777

let m2 = await page.metrics();

) since I haven't found a way to extract this directly from the traces.

Here's an example for alpine and create rows. The total duration is 99 and the JS script duration 59 msecs. The test driver reports 104 msecs and 62.5 msecs respectively, which sounds plausible.

Same for voby is an example for a total duration of 43.15 and a JS script duration of 5 msecs. The test driver reports 41.4 msecs for the total duration and 4.9 msecs for the script duration, which also sounds good.

Here's a comparison for some frameworks that seem plausible.

I did this for all frameworks during the chrome 113 run, but some results are just too good to believe and must be investigated:

(Values reported as 0.0 are reported as zero due to rounding, not by actually having a duration of 0)

krausest · 2023-05-05T19:30:09Z

For ember I'd actually expect ~28 msecs:

Hmmm. Even if I add a large sleep after run benchmark both puppeteer and playwright report something like ScriptDuration = 0.027602 before runBenchmark and 0.028274 after, which yields a duration of 0.672 mescs. The timestamps shows that values are about 1 sec apart (due to the wait). Not good.

leeoniya · 2023-05-05T19:48:59Z

great to see some progress!

i would also make sure to include any gc cost in this if it's not part of "script" already. i've seen it under "system" in chrome's profiler summary (especially forced gc at end of a run)

ClassicOldSong · 2023-05-06T14:10:00Z

One opinion: how the script manipulates DOM still counts. One framework takes much less time scripting but with much more duplicated DOM operations, while one other may take some more time scripting but significantly less time with DOM operations. These values can be measured separately, but must be calculated together.

fabiospampinato · 2023-05-06T14:14:57Z

To give higher cost to inefficient DOM operations one can just add 10 mutation observers on the page or something like that, those things are slow.

Problem is ~everybody uses .appendChild because it's faster for no good reason, when measured in isolation, while .append is faster if there are any mutation observer on the page, potentially faster by a huge amount.

ClassicOldSong · 2023-05-06T14:22:05Z

That's still not a very good solution to simulate slow repaint/reflows. When moving empty text nodes around, it's costing almost nothing if there's no mutation observers, but with multiple observers added, it's adding unrealistic costs to these originally cheap operations.

leeoniya · 2023-05-06T14:32:50Z

One opinion: how the script manipulates DOM still counts.

agreed. this whole js/gc-only measurement exercise only makes sense for frameworks that already do near-optimal DOM ops with identical repaint/reflow costs.

however, there could be a different approach here.

instead of trying to measure script execution directly, why dont we measure the fastest possible restyle+reflow+paint and simply subtract this from the totals. that will hopefully cover all cases, including frameworks that do duplicate/inefficient dom ops.

leeoniya · 2023-05-06T14:35:53Z

here's another interesting project: https://github.com/yamiteru/isitfast

ClassicOldSong · 2023-05-06T14:39:58Z

instead of trying to measure script execution directly, why dont we measure the fastest possible restyle+reflow+paint and simply subtract this from the totals. that will hopefully cover all cases, including frameworks that do duplicate/inefficient dom ops.

That's an interesting idea, but it's hard to determine what is the fastest possible time. We have multiple different implementations with different pros and cons, different approaches may have different limitations from each other. It's very hard to find an even ground for these measures.

leeoniya · 2023-05-06T14:52:37Z

We have multiple different implementations with different pros and cons

this benchmark does not attempt to pick the fastest framework in all categories. there will likely never be any implementation that has 1.00 across the board. each metric is ranked in isolation, which makes the proposed approach conisistent with how it already works.

krausest · 2023-05-08T20:01:44Z

I implemented a first version that tries to compute script duration from the trace files, since I couldn't get reasonable values from performance.getMetrics:

It seems to work for some nasty cases like ui5-webcomponents:

The script duration is the sum of the two yellow top level boxes and yields 13.082 which corresponds to chrome's script duration.
In the preview you can choose the duration measurement mode in the dropdown area: total duration which means the time elapsed between the start of the click event and the end of repainting and script duration which should roughly equal the sum of all yellow boxes (except garbage collection) on the highest level: https://krausest.github.io/js-framework-benchmark/current.html (maybe you have to clear the cache to get the latest version)

Looking forward to your feedback. I haven't checked enough values yet to be confident that all values are correct.

leeoniya · 2023-05-08T20:33:52Z

i'm not sure if script-only is a good metric. i do think that "everything except baseline reflow+restlyle+repaint" is what we need. the difference for swap total vs only-js does not capture the extra ~110ms of DOM/GC inefficiency in the latter:

total:

js-only:

ClassicOldSong · 2023-05-08T20:45:00Z

As the author of ef.js, I lost my first place of swap rows in only-js, that's not good 😈

Kidding, but I agree that GC time should be taken account into scripting.

And still, what baseline should we take for reflow+restlyle+repaint?

leeoniya · 2023-05-08T20:48:50Z

And still, what baseline should we take for reflow+restlyle+repaint?

whatever implementation is fastest in reflow+restlyle+repaint for each metric.

krausest · 2023-11-25T17:51:46Z

I came back to that issue. We have an established way to compute total duration (end of paint - start of click). I created a way to measure JS duration (sum of the duration of all events that are "EventDispatch", "EvaluateScript", "v8.evaluateModule", "FunctionCall", "TimerFire", "FireIdleCallback", "FireAnimationFrame", "RunMicrotasks", "V8.Execute"). This gives the table above. And it seems to be close to what chrome displays as scripting.

If Browser-ony meant total duration minus js duration results get odd. Miso is fastest for create 1k with 56 msecs total duration, 23 msecs js duration (giving 33 msecs browser only), whilst vanillajs has 39 msecs total duration and 2 msecs js duration which gives 39 msecs browser only. Sorted by create 1 k the table looks like that:

It seems to make little sense for create 1,000 rows. Some benchmarks are a little more interesting like remove row:

Maybe it makes more sense to compute the lengths of all painting and layout intervals as a third factor?

The current results table lets one play with the three options.

krausest · 2023-11-25T22:13:49Z

I added a "only render duration" selection. It computes the duration as the sum of all intervals for the "UpdateLayoutTree", "Layout", "Commit", "Paint", "Layerize", "PrePaint" events.

I haven't had the time to do a big quality check yet, so please take the results with a grain of salt.

krausest · 2023-11-26T10:26:27Z

I ran a check that total time >= script time + paint time. That assertion holds for most runs, except 21 traces (openui is causing 15 of those for replace rows. All other cases are below 1 msec difference, so they don't matter much). The trace looks like that:

My interval logic computes:
total duration=45.898
script duration=23.182
paint duration=32.722
Those numbers look right, but sum of script + paint is misleading though.
The issue seems to be that script evaluation and recalculate style happen in parallel (or recalculate style is called from script evaluation).

krausest · 2024-02-14T21:34:00Z

The result table allows to choose total duration, only JS duration and render duration.

This comment was marked as off-topic.

Sign in to view

ClassicOldSong mentioned this issue Nov 8, 2023

The weight is killing innovations in JS frameworks #1487

Open

krausest closed this as completed Feb 14, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Showing JS-only, Browser-only, and JS+Browser measurements #1233

Showing JS-only, Browser-only, and JS+Browser measurements #1233

fabiospampinato commented Apr 8, 2023

leeoniya commented Apr 8, 2023 •

edited

fabiospampinato commented Apr 8, 2023

This comment was marked as off-topic.

krausest commented May 5, 2023

krausest commented May 5, 2023 •

edited

leeoniya commented May 5, 2023 •

edited

ClassicOldSong commented May 6, 2023

fabiospampinato commented May 6, 2023 •

edited

ClassicOldSong commented May 6, 2023

leeoniya commented May 6, 2023 •

edited

leeoniya commented May 6, 2023

ClassicOldSong commented May 6, 2023 •

edited

leeoniya commented May 6, 2023

krausest commented May 8, 2023

leeoniya commented May 8, 2023 •

edited

ClassicOldSong commented May 8, 2023

leeoniya commented May 8, 2023

krausest commented Nov 25, 2023

krausest commented Nov 25, 2023

krausest commented Nov 26, 2023 •

edited

krausest commented Feb 14, 2024

Showing JS-only, Browser-only, and JS+Browser measurements #1233

Showing JS-only, Browser-only, and JS+Browser measurements #1233

Comments

fabiospampinato commented Apr 8, 2023

leeoniya commented Apr 8, 2023 • edited

fabiospampinato commented Apr 8, 2023

This comment was marked as off-topic.

krausest commented May 5, 2023

krausest commented May 5, 2023 • edited

leeoniya commented May 5, 2023 • edited

ClassicOldSong commented May 6, 2023

fabiospampinato commented May 6, 2023 • edited

ClassicOldSong commented May 6, 2023

leeoniya commented May 6, 2023 • edited

leeoniya commented May 6, 2023

ClassicOldSong commented May 6, 2023 • edited

leeoniya commented May 6, 2023

krausest commented May 8, 2023

leeoniya commented May 8, 2023 • edited

ClassicOldSong commented May 8, 2023

leeoniya commented May 8, 2023

krausest commented Nov 25, 2023

krausest commented Nov 25, 2023

krausest commented Nov 26, 2023 • edited

krausest commented Feb 14, 2024

leeoniya commented Apr 8, 2023 •

edited

krausest commented May 5, 2023 •

edited

leeoniya commented May 5, 2023 •

edited

fabiospampinato commented May 6, 2023 •

edited

leeoniya commented May 6, 2023 •

edited

ClassicOldSong commented May 6, 2023 •

edited

leeoniya commented May 8, 2023 •

edited

krausest commented Nov 26, 2023 •

edited