-
Notifications
You must be signed in to change notification settings - Fork 46.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add highWaterMark option to ReactDOMNodeStream. #10036
Conversation
Sigh. This PR failed prettier, although I've run I think this might be because of the many files in ETA: Looks like it was a prettier version problem. I reformatted with prettier 1.2.2, and it passed. |
// Calls the stream.Readable(options) constructor. Consider exposing built-in | ||
// features like highWaterMark in the future. | ||
super({}); | ||
constructor(element, makeStaticMarkup, {highWaterMark} = {}) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any specific reason we're destructing in these constructors? It doesn't look like highWaterMark
is used here, so maybe we can just do:
constructor(element, makeStaticMarkup, options = {}) {
super(options);
// ...
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great question!
The reason I'm destructing highWaterMark
is that there are several other options that can be passed to the Readable
constructor (documented here), and those options will make the stream behave incorrectly. For example, if you pass an encoding
argument to the Readable
constructor, the stream will output gibberish. Passing a read
argument will make the stream not render anything.
If we pass the options
object through unfiltered, the client of this class could accidentally make it malfunction, whereas if we only pass highWaterMark
, we will be OK. Does that make sense?
Current strategies (unfortunately) use pre-fetching of data and then is able to synchronous render all the result. I'd expect the data fetching to power a whole document to yield memory usage in the same order regardless if the output buffer is larger or not. So, is this really about saving on CPU and not memory usage? The alternative algorithm I had in mind as mentioned on Twitter, wouldn't necessarily be able to support a high water mark. The high water mark requires a manual stack to pause at arbitrary points. The strategy I had in mind was to instead only stop pushing buffers at async boundaries. With the assumption that you'd want to insert enough async points to split up the work sufficiently. For large documents you'll probably want to split it out to an out-of-order renderer that can fetch its data for each piece and then stream them in the order data becomes available. In that renderer, I was planning on only using custom data structures for the minimal things we need to hook up the async code. Maybe if you hit the high water mark we could make it an async boundary on the fly but it wouldn't be as natural. |
@sebmarkbage thanks for the response!
You're right that data-driven webpages will probably use the same O of N level of memory with or without highWaterMark, but they could easily use 2x or 3x more memory with a large output buffer, which seems significant to me. Just to make sure this is true, I did some basic memory benchmarking of rendering a giant page with low and high
I'm very interested in your ideas around this (and I really want async rendering in some form), but I'm a bit concerned that it sounds like you're suggesting implementing this as a push stream rather than a pull stream, which would mean that the stream wouldn't be responsive to backpressure. If I understand you correctly, this kind of stream would not necessarily be a particularly good citizen in the stream ecosystem. It's obviously a spectrum, but streams that are more on the push side of the spectrum get fewer of the benefits of streaming (efficient use of IO and data arriving earlier for the user). I am, however, a strong supporter of async rendering; I guess I just don't understand yet why async rendering can't be combined with an interruptible stack like the current implementation. |
It could be combined but the interruptible stack has some overhead. Both in complexity, since everything have to be able to deal with interruption, and there is some performance overhead associated with it. Another added complexity when you add async is where you allow these interruptions to be resumed. E.g. if in the meantime we get async data satisfied elsewhere, do we resume there, or where we previously left off? That means possibly multiple stacks. That's true for async points but since they're fewer and associated with I/O anyway I don't expect the total overhead to be as significant. Already satisfied dependencies don't bear this overhead. I'm also concerned that the strategy of waiting will defer data fetching on async components. By eagerly rendering them you get faster to the next async point. My assumption is that the whole pipeline will be largely be I/O bound. For the in-order rendering, like spitting out plain HTML as done today, there is another complexity with async. It'll have to buffer work for future siblings while a previous sibling is still waiting for data. If we don't, then it's just sequential. We can't even push anything to the network buffers yet. The back pressure mechanism doesn't even work in this case. It's not the output stream that drives the need to hold work. It's the CPU and memory usage. So, maybe the mechanism to do throttling any particular request needs to be elsewhere? |
So, I'm having a really hard time following the argument here. I'm not sure what it is you are proposing to build, and I think it might help me to back up a few steps. It sounds like you are proposing building a server renderer that:
Is this right? Are there other critical features I'm missing? And is this something that you want to build in the 16 timeframe that's meant to replace If this new renderer is meant to be post-16, is it possible to separate the discussion of this new renderer (in which, to be clear, I'm interested!) from this PR, which seems to me a pretty small incremental upgrade to |
So the awkward part is that we want to start adding async APIs in 16 or likely in a minor. In this architecture we can't do that. It'll fork the capabilities between server and client. The current renderer also doesn't support fragments. We'd have to test another implementation twice and maybe introduce a small breaking change in a minor. So it becomes awkward. I'd like to take a pass at an async approach ASAP. I don't want to hold up 16 but I'd like to race it to the release. :) Regardless, the discussion is important because it is surfacing some interesting details about how back pressure is supposed to work and what drives the heuristic. Which might also be relevant to this PR. |
This is a small follow-on to the addition of ReactDOMNodeStream, allowing users of the API to specify how much memory the stream should use to buffer.
As explained in the Node stream buffering documentation, users of Readable streams can pass in a
highWaterMark
which indicates how many bytes the stream should buffer before pausing the stream. This allows users of the stream to tune how much memory they use, depending on their particular use case.highWaterMark
defaults to 16K but can be set to any number.The code part of this is pretty simple;
renderToStream
andrenderToStaticStream
now just take in an optional options object with ahighWaterMark
property, and that value is passed along to theReadable
constructor. All the code for that is inReactDOMNodeStreamRenderer.js
.The rest of the PR is documentation and a few simple tests to make sure
highWaterMark
is being used.