Skip to content
This repository has been archived by the owner on Nov 22, 2022. It is now read-only.

Improve performance scaling #50

Merged
merged 6 commits into from
Dec 11, 2019
Merged

Improve performance scaling #50

merged 6 commits into from
Dec 11, 2019

Conversation

jaxry
Copy link
Contributor

@jaxry jaxry commented Dec 6, 2019

Brief Description

This PR improves the algorithm that determines each draw call's tile size. This ensures the renderer remains responsive while also rendering as quickly as possible.

Pull Request Guidelines

  • I have added pull requests labels which describe my contribution.
  • All existing tests passed.
    • I have added tests to cover my changes, of which pass.
  • I have compared the render output of my branch to master.
  • My code has passed the ESLint configuration for this project.
  • My change requires modifications to the documentation.
    • I have updated the documentation accordingly.

@jaxry jaxry added the bug Something isn't working label Dec 6, 2019
@@ -106,17 +104,6 @@ export function RayTracingRenderer(params = {}) {

module.getPixelRatio = () => pixelRatio;

module.setRenderTime = (time) => {
Copy link
Contributor Author

@jaxry jaxry Dec 6, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can simply remove this method from the API. It seems there's an ideal render time (~50 fps) which balances responsiveness and render speed that's the same between all the devices I tested on, so there's no reason to change it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jaxry which devices have you tested on?

Copy link
Contributor Author

@jaxry jaxry Dec 10, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My Macbook with an integrated Intel GPU (low-mid end), and my desktop with the GTX 2080 (very high end) on both Linux and Windows (different drivers, different results). Both devices had the performance issues we've noted, and this PR fixes it for both. I think the most useful device to test at this point would be a low-end one.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you briefly explain how this works compared to before? It looks like expAvg is controlling convergence rate, similarly to the old convergenceStrength variable. Why did you choose 0.5?

Based on the current frame's performance, we calculate how long each pixel takes to compute, and from that derive the ideal resolution that takes the desired amount of time to sample (20 ms in this case).

This^ is basically the same thing that was happening before right?

The exponential averaging is not necessary, but what it does is smooth out performance differences between frames. The higher the expAvg, the more stable the fps becomes, but the slower it responds to changes in the scene or camera. 0.5 was a stable number for my hardware that still adjusted the framerate in about 1 second whenever I moved the camera to a more complex angle.

makes sense, how different is this than what was happening before?

When your camera stays still, does your FPS do a good job hovering around 50 fps? If not, this number should be increased.

bounces between 47 and 53

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This^ is basically the same thing that was happening before right?

I actually tried this current method months ago, I didn't think to use an exponential average, so the framerate varied too much with this approach. So the method before this PR steered pixelsPerTile to the right number using a smaller, arbitrary step that scaled with the error. That was more stable but it look longer to converge.

But now this current technique is more exact AND stable with the help of averaging.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great, nice going!

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be great to avoid breaking changes when possible.
Calling this function would now cause a crash.

const previewHeight = Math.round(clamp(previewWidth / aspectRatio, 1, hdrBuffer.height));

const diff = Math.abs(previewWidth - hdrPreviewBuffer.width) / previewWidth;
if (diff > 0.05) { // don't bother resizing if the buffer size is only slightly different
Copy link
Contributor Author

@jaxry jaxry Dec 6, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resizing the buffer takes time and we can avoid it if the change in buffer size is small. This approach is a definite improvement, but I'm not happy with it all together.

In a separate PR, I will overhaul the buffer system so that the preview buffer actually uses a small viewport of the main buffer, instead of constantly resizing an independent buffer.

const expAvg = 0.5;

const newPixelsPerTile = pixelsPerTile * desiredTimePerTile / timePerTile;
pixelsPerTile = expAvg * pixelsPerTile + (1 - expAvg) * newPixelsPerTile;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Calculating the ideal pixels per tile (to match desiredTimePerTile) in a different way. This method is way faster in both converging to the ideal time, and also remaining stable once it reaches that time.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you briefly explain how this works compared to before? It looks like expAvg is controlling convergence rate, similarly to the old convergenceStrength variable. Why did you choose 0.5?

} else if (maxRenderbufferSize === 16384) {
return 400000;
} else if (maxRenderbufferSize >= 32768) {
return 600000;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Significantly increase the initial estimates for all devices. The previous numbers were too low in practice and took a while to converge to the actual numbers for each device.

In the case that these numbers are too high, the new converging method quickly brings them to the right range (only 1 or 2 draw calls needed)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice!

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it might be good to add these magic numbers as const variables.

@jaxry
Copy link
Contributor Author

jaxry commented Dec 6, 2019

This PR brings a significant improvement for my laptop and desktop. We should test these changes on a few other devices too.

@lyonsno
Copy link
Contributor

lyonsno commented Dec 10, 2019 via email

@jaxry
Copy link
Contributor Author

jaxry commented Dec 10, 2019

I think so. We should pay attention to whether the preview window lags when moving the camera, and whether the renderer remains responsive when the camera stays still.

Copy link
Contributor

@lyonsno lyonsno left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to test this on another device before approving, but on my machine this is certainly an improvement! Responsiveness has increased, and convergence speed has also increased!

@@ -80,12 +80,15 @@ export function makeRenderingPipeline({
}

function setPreviewBufferDimensions() {
const aspectRatio = hdrBuffer.width / hdrBuffer.height;
const desiredTimeForPreview = 16; // 60 fps
const desiredTimeForPreview = 10;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I notice my preview buffer starts at a lower resolution than before, but it does not adjust upwards. Is this because the lower resolution is all that my computer can handle?

Copy link
Contributor Author

@jaxry jaxry Dec 11, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is your framerate while moving the camera?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

56 - 60, settling on 60 after a few seconds. On master, which has a higher res preview buffer, framerate bounces around between 32 and 51. Maybe the buffer on this branch just happens to start with the correct settings for my machine?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ideal FPS for the preview buffer would be something just below 60 but not actually 60. That would mean we're making max usage of the hardware, while also keeping things responsive. Maybe we can increase this number a little, but in complex scenes it will then be too high according to my testing. That's because we're currently estimating the preview size based on the running tile size. But this estimation isn't exact since it doesn't account for ray coherence across the entire scene (vs just a tile), among other issues.

I think our best bet, in the future, would be to have a separate benchmark that just controls the preview window's size. Do you have other ideas? I'd like to tackle this during the buffer management refactor I'll be tackling in a week or two.

@@ -31,31 +31,17 @@ export function makeTileRender(gl) {

let pixelsPerTileQuantized = pixelsPerTile;

let desiredTimePerTile = 22; // 45 fps
let desiredTimePerTile = 20;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you arrive at this number through the testing you did? If so, should we test on one more even lower spec'd device to make sure it is low enough?

Copy link
Contributor Author

@jaxry jaxry Dec 11, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The lower bound for this number is 17ms, which corresponds to 60 fps. At that point the hardware is being underworked since the animation frame caps at 60. 20ms puts it at ~50 fps which still provided responsiveness for the devices I tested, while also being 10 FPS below the limit. That 10 FPS buffer lets the framerate fluctuate around the amount you reported, while also not hitting 60, which would kill the sampling speed if hit.

In any case, testing this number on a lower spec device is important to see if what I'm saying still applies to those devices. But since this number is already lower than it was before this PR, it should run better regardless.

} else if (maxRenderbufferSize === 16384) {
return 400000;
} else if (maxRenderbufferSize >= 32768) {
return 600000;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice!

const expAvg = 0.5;

const newPixelsPerTile = pixelsPerTile * desiredTimePerTile / timePerTile;
pixelsPerTile = expAvg * pixelsPerTile + (1 - expAvg) * newPixelsPerTile;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you briefly explain how this works compared to before? It looks like expAvg is controlling convergence rate, similarly to the old convergenceStrength variable. Why did you choose 0.5?

@jaxry
Copy link
Contributor Author

jaxry commented Dec 11, 2019

Could you briefly explain how this works compared to before? It looks like expAvg is controlling convergence rate, similarly to the old convergenceStrength variable. Why did you choose 0.5?

Based on the current frame's performance, we calculate how long each pixel takes to compute, and from that derive the ideal resolution that takes the desired amount of time to sample (20 ms in this case).

The exponential averaging is not necessary, but what it does is smooth out performance differences between frames. The higher the expAvg, the more stable the fps becomes, but the slower it responds to changes in the scene or camera. 0.5 was a stable number for my hardware that still adjusted the framerate in about 1 second whenever I moved the camera to a more complex angle.

When your camera stays still, does your FPS do a good job hovering around 50 fps? If not, this number should be increased.

Copy link
Contributor

@lyonsno lyonsno left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok device testing is still in progress/slightly blocked, but like you said I don't see how this could possibly be a regression. I vote we merge and we can always make another PR if things could be further improved for low end devices.

@jaxry jaxry merged commit 71f778f into master Dec 11, 2019
@jaxry jaxry deleted the improve-performance-scaling branch December 11, 2019 19:21
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants