Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add highWaterMark option to ReactDOMNodeStream. #10036

Closed
wants to merge 4 commits into from

Conversation

aickin
Copy link
Contributor

@aickin aickin commented Jun 25, 2017

This is a small follow-on to the addition of ReactDOMNodeStream, allowing users of the API to specify how much memory the stream should use to buffer.

As explained in the Node stream buffering documentation, users of Readable streams can pass in a highWaterMark which indicates how many bytes the stream should buffer before pausing the stream. This allows users of the stream to tune how much memory they use, depending on their particular use case. highWaterMark defaults to 16K but can be set to any number.

The code part of this is pretty simple; renderToStream and renderToStaticStream now just take in an optional options object with a highWaterMark property, and that value is passed along to the Readable constructor. All the code for that is in ReactDOMNodeStreamRenderer.js.

The rest of the PR is documentation and a few simple tests to make sure highWaterMark is being used.

@aickin
Copy link
Contributor Author

aickin commented Jun 25, 2017

Sigh. This PR failed prettier, although I've run npm run prettier and yarn prettier many, many times on the branch.

I think this might be because of the many files in master that are currently incorrectly formatted, which I just submitted as #10037. I'm hopeful but not particularly certain that #10037 will make this branch pass.

ETA: Looks like it was a prettier version problem. I reformatted with prettier 1.2.2, and it passed.

// Calls the stream.Readable(options) constructor. Consider exposing built-in
// features like highWaterMark in the future.
super({});
constructor(element, makeStaticMarkup, {highWaterMark} = {}) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any specific reason we're destructing in these constructors? It doesn't look like highWaterMark is used here, so maybe we can just do:

constructor(element, makeStaticMarkup, options = {}) {
   super(options);
   // ...
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great question!

The reason I'm destructing highWaterMark is that there are several other options that can be passed to the Readable constructor (documented here), and those options will make the stream behave incorrectly. For example, if you pass an encoding argument to the Readable constructor, the stream will output gibberish. Passing a read argument will make the stream not render anything.

If we pass the options object through unfiltered, the client of this class could accidentally make it malfunction, whereas if we only pass highWaterMark, we will be OK. Does that make sense?

@sebmarkbage
Copy link
Collaborator

Current strategies (unfortunately) use pre-fetching of data and then is able to synchronous render all the result. I'd expect the data fetching to power a whole document to yield memory usage in the same order regardless if the output buffer is larger or not.

So, is this really about saving on CPU and not memory usage?

The alternative algorithm I had in mind as mentioned on Twitter, wouldn't necessarily be able to support a high water mark. The high water mark requires a manual stack to pause at arbitrary points.

The strategy I had in mind was to instead only stop pushing buffers at async boundaries. With the assumption that you'd want to insert enough async points to split up the work sufficiently. For large documents you'll probably want to split it out to an out-of-order renderer that can fetch its data for each piece and then stream them in the order data becomes available.

In that renderer, I was planning on only using custom data structures for the minimal things we need to hook up the async code. Maybe if you hit the high water mark we could make it an async boundary on the fly but it wouldn't be as natural.

@aickin
Copy link
Contributor Author

aickin commented Jun 26, 2017

@sebmarkbage thanks for the response!

I'd expect the data fetching to power a whole document to yield memory usage in the same order regardless if the output buffer is larger or not.

You're right that data-driven webpages will probably use the same O of N level of memory with or without highWaterMark, but they could easily use 2x or 3x more memory with a large output buffer, which seems significant to me. Just to make sure this is true, I did some basic memory benchmarking of rendering a giant page with low and high highWaterMark, and I confirmed that the node process needs significantly more heap space with high highWaterMark. (It was a really quick and dirty, but a 16M webpage needed 20M of heap to render with a 17K highWaterMark and 42M of heap with a 17M highWaterMark.)

The strategy I had in mind was to instead only stop pushing buffers at async boundaries. With the assumption that you'd want to insert enough async points to split up the work sufficiently.

I'm very interested in your ideas around this (and I really want async rendering in some form), but I'm a bit concerned that it sounds like you're suggesting implementing this as a push stream rather than a pull stream, which would mean that the stream wouldn't be responsive to backpressure. If I understand you correctly, this kind of stream would not necessarily be a particularly good citizen in the stream ecosystem. It's obviously a spectrum, but streams that are more on the push side of the spectrum get fewer of the benefits of streaming (efficient use of IO and data arriving earlier for the user).

I am, however, a strong supporter of async rendering; I guess I just don't understand yet why async rendering can't be combined with an interruptible stack like the current implementation.

@sebmarkbage
Copy link
Collaborator

It could be combined but the interruptible stack has some overhead. Both in complexity, since everything have to be able to deal with interruption, and there is some performance overhead associated with it.

Another added complexity when you add async is where you allow these interruptions to be resumed. E.g. if in the meantime we get async data satisfied elsewhere, do we resume there, or where we previously left off? That means possibly multiple stacks. That's true for async points but since they're fewer and associated with I/O anyway I don't expect the total overhead to be as significant. Already satisfied dependencies don't bear this overhead.

I'm also concerned that the strategy of waiting will defer data fetching on async components. By eagerly rendering them you get faster to the next async point. My assumption is that the whole pipeline will be largely be I/O bound.

For the in-order rendering, like spitting out plain HTML as done today, there is another complexity with async. It'll have to buffer work for future siblings while a previous sibling is still waiting for data. If we don't, then it's just sequential. We can't even push anything to the network buffers yet. The back pressure mechanism doesn't even work in this case. It's not the output stream that drives the need to hold work. It's the CPU and memory usage.

So, maybe the mechanism to do throttling any particular request needs to be elsewhere?

@aickin
Copy link
Contributor Author

aickin commented Jun 27, 2017

So, I'm having a really hard time following the argument here. I'm not sure what it is you are proposing to build, and I think it might help me to back up a few steps.

It sounds like you are proposing building a server renderer that:

  1. renders components asynchronously, presumably via componentWillMount or render returning a promise.
  2. potentially renders the same input element differently (i.e. with different order of sub-elements) depending on I/O
  3. rehydrates (maybe in a streaming manner?) on the client and can deal with the fact that the existing server markup might be in different orders

Is this right? Are there other critical features I'm missing? And is this something that you want to build in the 16 timeframe that's meant to replace ReactDOMNodeStream or something in a later version?

If this new renderer is meant to be post-16, is it possible to separate the discussion of this new renderer (in which, to be clear, I'm interested!) from this PR, which seems to me a pretty small incremental upgrade to ReactDOMNodeStream?

@sebmarkbage
Copy link
Collaborator

So the awkward part is that we want to start adding async APIs in 16 or likely in a minor. In this architecture we can't do that. It'll fork the capabilities between server and client. The current renderer also doesn't support fragments.

We'd have to test another implementation twice and maybe introduce a small breaking change in a minor. So it becomes awkward. I'd like to take a pass at an async approach ASAP. I don't want to hold up 16 but I'd like to race it to the release. :)

Regardless, the discussion is important because it is surfacing some interesting details about how back pressure is supposed to work and what drives the heuristic. Which might also be relevant to this PR.

@gaearon gaearon closed this Oct 4, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants