Skip to content

Performance: RSC rendering pipeline performance analysis #36143

@switz

Description

@switz

React version: 19.3.0-canary-c0d218f0-20260324 and uhh, a custom fork

Steps To Reproduce

  1. first set of benchmarks are in this repo: https://github.com/switz/rsc-benchmarks
  2. second set are based on my fork implementing a fused rsc pipeline renderer (one pass, fizz takes on more responsibility - preview here)

Some Background

I am working on a new RSC-based framework. There's been a lot of recent discussion around react-based frameworks' performance in what are generally not representative of real world performance (aren't all web benchmarks?), but they did uncover some serious gaps. RSC performance is honestly sufficient for most use-cases, but it could be much more efficient and gets very throttled very quickly.

I spent a lot of time digging into RSC rendering throughput today+yesterday with claude. Both in next and outside of next (w/ my new framework, some pure react, etc.). I found two small perf wins in both Fizz and Flight (sent a PR for Fizz), but they are minimal in comparison to the two below. I spent most of the day debugging real world scenarios: code running on the same k8s cluster across the same apps in different contexts (next, my framework, etc.) all running through real-world networks. I then dropped down to baseline benchmarks to try and isolate the problems which reflected my real-world testing.

This is all based on single core, single threaded rendering. If I got anything wrong here, if I shoved my foot in my mouth, if I over-dramatized the situation, please tell me. I'm not an expert in web throughput engineering, cpu architectures, or javascript/react internals. I'm just a long-time software engineer who's having way too much fun building my own framework on what I consider to be the most complete web architecture.

The current behavior

These benchmarks are run in as simple cases as I could define, on my own M1 Max, on a single core at a time. To isolate the performance, I ran each test in a container to ensure somewhat consistent results where we could control the CPU limits (is this bad? you tell me – I'm sure there's some issue with it). On the average web server, perf will be worse than here which only exacerbates the issue further. None of this is meant to be a perfect or clean room benchmark – but, I think you'll find fairly consistent results as I did.

It's important to note that cpu-bound tasks just suffer in javascript environments. This isn't an I/O issue. You'll see below how the current RSC infrastructure compounds the CPU problem.

Node Streams vs Web Streams

The first issue is fairly well documented and offers the first ceiling of performance. Next runs on web streams (renderToReadableStream) which are written in javascript and much slower than node streams.

If you write a barebones RSC rendering test, you'll see that this is the first limit you hit.

Metric Node Streams Web Streams Difference
req/s 1,004 743 Node 35% faster
Median latency 43ms 58ms Node 26% faster
P99 latency 134ms 139ms ~same

You get a 35% win in this particular case - without this you'll eventually throttle, so for anyone deploying to node environments, this should be the first priority.

But this only really is the first win, and it is negated in most real world scenarios because it brings us to the second major issue.

Test req/s Median Size What it measures
[0] renderToString 376 2.4ms 116KB Sync SSR baseline — no streams
[0a] Direct SSR (Node pipe) 273 3.6ms 116KB Streaming SSR, no RSC
[0b] Direct SSR (Web stream) 197 4.5ms 116KB Streaming SSR, Web streams
[1a] Flight serialize 110 7.6ms 235KB RSC → Flight wire format
[1b] SSR from Flight (Node) 100 6.8ms 116KB Pre-rendered Flight → HTML
[1c] SSR from Flight (Web) 92 7.7ms 116KB Same, Web streams
[2a] Full RSC → Node 44 22.7ms 398KB Flight + SSR + inject
[2b] Full RSC → Web 36 25.6ms 398KB Same, Web streams
[3a] Full RSC → Node + gzip 40 25.0ms 22KB Full pipeline + gzip
[3b] Full RSC → Web + gzip 34 27.3ms 22KB Same, Web + gzip

These results align with what we should expect:

  • The full RSC pipeline is close to 10x worse than renderToString
  • Node Streams are 20-30% faster than web streams, but in RSC that only buys us a small amount of improved throughput
  • the tee coupling (flight + ssr) fights each other on the event loop and cuts half the performance right there
  • raw SSR (not RSC) gets decent performance at 273 req/s

Now measuring req/s with 1 concurrent request isn't the best way to test real world performance. But what you'll find if you dig in is that with more concurrency, CPU usage throttles even harder, and memory usage balloons. So I think it's enough to show the drop-off.

So that brings us to..

Flight Serialization + SSR + Compression

When preparing an initial HTML response, React kicks off a flight serialization of the server component tree. This is because Fizz doesn't have knowledge of the server-client boundary.

After this serialization, the output gets tee'd into two streams for frameworks to consume:

  • the flight serialization is converted to a fizz stream, which turns it back into a react tree, then the react tree is ssr'd/serialized to html
  • the framework generates flight-based hydration <script> tags for injecting

On a single thread, these parallel streams back-pressure and compete for cpu. By serializing to flight, then back to fizz, then to html we end up throttling the single thread with a ton of unnecessary work.

So what's the solve? I mean, that's up to you guys. But on a single-thread, there's only one real pathway to improved performance: do less work.

So rather than the three-step intermediate serialization and deserialization, it would be better if there was a "fused" pipeline to handle rsc -> ssr in one pass. This would require some architectural changes to Fizz to identify client components and serialize the prop boundaries. My guess is that Fizz and Flight were given separate responsibilities because in theory you may want to run them in different places. But in practice, those of us shipping RSC servers run them together anyway.

I built a proof of concept with Claude (it's not wholly complete, perhaps the benchmarks are misleading, the props serialization is clearly incomplete, but it's a worthy exploration) and saw some real gains in performance, memory, and more consistent throughput under concurrent load. Especially for the pure server-component path (sans client components).

Per-Request Breakdown (226-product PLP)

Phase Full RSC Pipeline Fused Renderer
Flight tree walk + encoding 2.36ms
Props serialization (JSON) 0.88ms 0.88ms
Flight deserialize 0.32ms
Fizz render 3.53ms 1.47ms ¹
Hydration data to client 0.65ms 0.20ms
Fused boundary overhead 1.73ms ²
TOTAL 7.74ms 4.28ms

¹ Fused Fizz render matches plain renderToPipeableStream (1.42ms) — no Flight element overhead
² Markers, module resolution, props serialization, chunk output for hydration data

Our fused pipeline starts performing closer to raw SSR when there are no client components. But we do see a large drop-off once we serialize props into client components.

Throughput Comparison

Mode ms/req req/s Output Description
Plain Fizz (no RSC) 1.42ms 702 102 KB Theoretical ceiling
Fused (server-only) 1.47ms 680 102 KB Matches plain Fizz
Fused (w/ client boundaries) 4.28ms 234 433 KB 1.8x faster than full pipeline
Full RSC pipeline + hydration 7.74ms 129 411 KB Current

Where the 1.8x Comes From

Eliminated Saved
Flight tree walk + wire format encoding 2.36ms
Flight wire format parsing 0.32ms
Flight element reconstruction overhead in Fizz 2.11ms
Flight payload inlining (JSON.stringify) 0.45ms
Total eliminated 5.24ms
Added Cost
Props serialization at boundaries 0.88ms
Hydration script emission 0.20ms
Boundary markers + module resolution + queue management 0.65ms
Total added 1.73ms

This results in roughly 1.8x fewer CPU time than the current path. It's possible this is not a great benchmark, but it's just intended to be a proof of concept.

Key Properties

Property Full Pipeline Fused
Tree walks 3 (Flight + Flight Client + Fizz) 1 (Fizz only)
Serialization passes 2 (Flight wire + inline payload) 1 (props at boundaries)
Intermediate buffers ~291 KB Flight wire format None
Output to client ~411 KB (HTML + hydration data) ~433 KB (HTML + hydration data)
Peak heap (c=50) 297 MB 60 MB
Flight server modified No (zero changes)
Reconciler modified No (zero changes)

Client components still suffer from expensive props serialization into hydration tags. But at the very least, memory usage goes way down and throughput becomes more consistent. Because the ergonomics of serializing data from server to client is so clean, it becomes very easy to highten this issue without the end-user understanding why.

I know there's been some past discussion of handling the duplication of hydration content. This duplication of data was kind of undersold as not a big deal because of compression, but it turns out the throughput of running that compression inline on the bigger set of data leads to more cpu contention. You can offload compression to an external host (e.g. Cloudflare) or server, but then you're paying the transfer cost and relying on external processing power.

Why haven't these issues surfaced earlier?

Well, I don't know. At the end of the day, a full SSR rendering pipeline will almost always be more delayed by I/O (database, api requests, etc.) than by ~10-20ms of cpu time. The concurrency issue is fairly easily papered over with a few extra pods or cores. People get fewer req/s than they think.

On top of which, the most observed places that RSCs have been deployed has been on serverless platforms, where each request often gets its own thread – so you wouldn't notice the concurrency isuses unless you're looking at a traditional node server or really digging in. The problem here isn't really the wall clock of the cpu time (imo), it's the bottlenecking and throughput - JS just plain suffers here. So this is all easy to miss, or perhaps makes it worth dismissing as unnecessary optimization entirely.

But I think we'd all agree that higher throughput, better concurrency, lower memory, fewer bytes would be a net win if possible. And might bring RSC performance back to more traditional alternatives, while maintaining its architectural advantages.

After doing some research, I found an issue opened by @WIVSW from October that essentially identified much of this. They also saw a 10x drop in req/s when switching to the RSC pipeline.

The expected behavior

Ultimately, the desire is that RSCs render faster and more concurrently across a variety of scenarios: pure server rendering, many client boundaries/props serialization, and so on. With a reduction in memory usage and thrashing.

Hope this is useful, I spent some time trying to understand the internals of React so if I got anything wrong, please reorient me in the right direction – thanks for reading.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Status: UnconfirmedA potential issue that we haven't yet confirmed as a bug

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions