Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

8217472: Add attenuation for PointLight #43

Open
wants to merge 11 commits into
base: master
from

Conversation

@nlisker
Copy link
Contributor

nlisker commented Nov 17, 2019

CSR: https://bugs.openjdk.java.net/browse/JDK-8218264


Progress

  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue
  • Change must be properly reviewed

Issue

Download

$ git fetch https://git.openjdk.java.net/jfx pull/43/head:pull/43
$ git checkout pull/43

@bridgekeeper
Copy link

bridgekeeper bot commented Nov 17, 2019

👋 Welcome back nlisker! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request (refresh this page to view it).

@nlisker nlisker changed the title 8217472 add attenuation for point light 8217472: Add attenuation for PointLight Nov 17, 2019
@nlisker
Copy link
Contributor Author

nlisker commented Nov 18, 2019

Tested D3D on Win 10 and GL on Ubuntu 18.04.

There is still a bug (at least on Win) for some certain combinations of values as shown in this picture:
image

This sample program can be used to test the changes.
Controls: mouse wheel zooms, RMB drag rotates, LMB pans.

sample.zip

@nlisker nlisker marked this pull request as ready for review Nov 18, 2019
@nlisker
Copy link
Contributor Author

nlisker commented Nov 18, 2019

Kevin, Ambarish,

You can start the review, especially the API. I will hunt that specific values bug this week.

I'll need to know what kind of tests are needed in terms of functionality and performance.

@kevinrushforth kevinrushforth self-requested a review Nov 18, 2019
@openjdk openjdk bot added the rfr label Nov 18, 2019
@mlbridge
Copy link

mlbridge bot commented Nov 18, 2019

Webrevs

@nlisker
Copy link
Contributor Author

nlisker commented Nov 20, 2019

The bug I mentioned above is not a bug actually. There are large changes over a small distance that make it looks like a jump in the lighting values, but when working with a finer scale the lighting dynamics seem correct.

@kevinrushforth
Copy link
Member

kevinrushforth commented Dec 20, 2019

I think this is on the right track. The API looks like it is in good shape.

This will need a fair bit of testing to ensure that there are no regressions either in functionality or (especially) performance, in addition to tests for the new functionality. On the performance aspect, the inner loop of the lighting calculation now has an additional if test for the max range and additional arithmetic calculations for the attenuation. What we will need is a test program that we can run on Mac and Windows to measure the performance of rendering in a fill-rate-limited case. Ideally, we would not pay much of a performance hit in the default case where ca == 1, la == 0, qa == 0, but we first need to be able to measure the drop in performance before we can say whether it is acceptable.

Speaking of testing, I took the current patch for a test drive on Mac and Windows. I get the following system test failures on Mac, and also the same failure using fx83dfeatures/LightMotion in toys.

Shader compile log: ERROR: 0:308: Use of undeclared identifier 'range'
ERROR: 0:316: Regular non-array variable 'dist' may not be redeclared

test.robot.test3d.MeshCompareTest > testSnapshot3D[3] STANDARD_ERROR
    java.lang.RuntimeException: Error creating fragment shader
    	at javafx.graphics/com.sun.prism.es2.ES2Shader.createFromSource(ES2Shader.java:141)
    	at javafx.graphics/com.sun.prism.es2.ES2PhongShader.getShader(ES2PhongShader.java:177)
        ...
test.robot.test3d.MeshCompareTest > testSnapshot3D[3] FAILED
    java.lang.IllegalArgumentException: Unrecognized image loader: null
        at javafx.graphics/javafx.scene.image.WritableImage.loadTkImage(WritableImage.java:278)
        at javafx.graphics/javafx.scene.image.WritableImage$1.loadTkImage(WritableImage.java:53)
        at javafx.graphics/javafx.scene.Scene.doSnapshot(Scene.java:1340)
        at javafx.graphics/javafx.scene.Scene.doSnapshot(Scene.java:1372)
        at javafx.graphics/javafx.scene.Scene.snapshot(Scene.java:1462)
        at test.robot.test3d.MeshCompareTest.lambda$testSnapshot3D$0(MeshCompareTest.java:315)


test.robot.test3d.Snapshot3DTest > testSnapshot3D[3] FAILED
(same failure as above)


test.robot.test3d.Snapshot3DTest > testSnapshot3D[7] FAILED
(same failure as above)
@nlisker
Copy link
Contributor Author

nlisker commented Jan 2, 2020

I get the following system test failures on Mac

Shader compile log: ERROR: 0:308: Use of undeclared identifier 'range'
ERROR: 0:316: Regular non-array variable 'dist' may not be redeclared

I don't have a Mac to test on, but on Ubuntu system tests pass (I ran the test command for systemTests). Does the sample app I attached also fail on Mac? They both use the same shaders, so I wonder where the issue could be.
Moreover, the error messages are strange. dist is not redeclared and range is not undeclared in the shader. The error message seems to originate from the native function Java_com_sun_prism_es2_GLContext_nCompileShader in GLContext.c, not managing to compile the shader, but I can't tell why.

@kevinrushforth
Copy link
Member

kevinrushforth commented Jan 3, 2020

I get the same error on Ubuntu 16.04 as on Mac. Did you run the system tests with -PFULL_TEST=true -PUSE_ROBOT=true? Also, you can try running the fx83dfeatures/LightMotion toy and you should see the same error.

I still need to test your sample app on Mac.

Copy link

arapte left a comment

I have added few comments, but have not run tests and sample yet.

* A light source that radiates light equally in all directions away from itself. The location of the light
* source is a single point in space. Any pixel within the range of the light will be illuminated by it,
* unless it belongs to a {@code Shape3D} outside of its {@code scope}.
* <p>

This comment has been minimized.

Copy link
@arapte

arapte Jan 3, 2020

Can the behavior be explained in terms of Shape or Node instead of Pixel.
May be something like this,

Any node within the range of the light will be illuminated by this light, except the nodes that are added to the exclusion scope of this light.

This comment has been minimized.

Copy link
@nlisker

nlisker Jan 3, 2020

Author Contributor

The issue is that range and attenuation work on the pixel scale, not on the node/shape scale. A node can be partially illuminated if only part of it is within the range of the light. See the image in the comment above.

This comment has been minimized.

Copy link
@kevinrushforth

kevinrushforth Jan 3, 2020

Member

Right. This needs to talk about pixels. Perhaps there is a way to make it more clear that we are talking about pixels that are part of a rendered Shape3D, but I don't have a good suggestion right now.

This comment has been minimized.

Copy link
@nlisker

nlisker Jan 4, 2020

Author Contributor

Maybe

A light source that radiates light equally in all directions away from itself. The location of the light
source is a single point in space. The light affects {@code Shape3D}s in its {@code scope}. Any pixels in
the light's {@code range} that belong to a {@code Shape3D} will be illuminated by it according to the
computation specified in {@link PhongMaterial}.

The docs of PhongMaterial will need need to be updated too.

This comment has been minimized.

Copy link
@kevinrushforth

kevinrushforth Jan 10, 2020

Member

Yes, I think that change to the docs looks good.

@kevinrushforth
Copy link
Member

kevinrushforth commented Jan 3, 2020

I still need to test your sample app on Mac.

I get the error with your sample app. It fails on Mac or Linux (Ubuntu 16.04) with the same error as I reported above.

@nlisker
Copy link
Contributor Author

nlisker commented Jan 3, 2020

The error was for the cases of 2 and 3 lights (I was testing 1) and should be fixed now. My fault with copy-paste... that's why we use loops, but I guess this is some optimization for the es2 pipeline. I wonder if it's really worth it over a single shader looping over the number of lights like d3d does.

@nlisker
Copy link
Contributor Author

nlisker commented Jan 9, 2020

This will need a fair bit of testing to ensure that there are no regressions either in functionality or (especially) performance, in addition to tests for the new functionality.

Which tests for the new functionality should I write? Up to the shader itself it's mostly just passing on variables, and the API is standard DoublePropertys. I can test the dirty bits / redraw logic.

On the performance aspect, the inner loop of the lighting calculation now has an additional if test for the max range and additional arithmetic calculations for the attenuation. What we will need is a test program that we can run on Mac and Windows to measure the performance of rendering in a fill-rate-limited case. Ideally, we would not pay much of a performance hit in the default case where ca == 1, la == 0, qa == 0, but we first need to be able to measure the drop in performance before we can say whether it is acceptable.

Can you point me to a similar performance test? I didn't see a way to easily measure FPS.
I don't know how to avoid the if test for the maxRange, I'm open to suggestions. The only thing I can think of is if maxRange is infinite (the default), we will swap the shader for one that doesn't make that check. However, swapping shaders also costs performance.

@kevinrushforth
Copy link
Member

kevinrushforth commented Jan 10, 2020

We have a few performance tests in apps/performance, but I don't know how up to date they are. They might give you a starting point on how to measure FPS, but really the harder part is going to be coming up with a workload -- a scene with a number of Shape3Ds with large triangles (so that they are fill-rate limited) and 1-3 lights, etc -- that you can use to measure rendering performance without measuring overhead. Typically you want a scene that is rendering continuously in the < 30 fps range, and more like 10 fps to minimize the overhead even more.

Before we figure out whether we need to double the number of shaders (which was one of the ideas that I had as well), we need to know how much of a problem it is. Is it < 2% performance drop on typical cases on a variety of machines or it is a 25% drop (or more likely somewhere in between). If the perf drop is negligible, then it isn't worth doubling the shaders. If it is significant, then we probably need to.

If we do need to double the shaders, I wouldn't do it based on the maxRange being infinite, rather I would do it based on whether attenuation is being used. That way the existing shaders would be unchanged, while the new shader would deal with both attenuation and the maxRange test.

Hopefully, there won't be enough of a perf hit to require doubling the shaders, but we need to be able to measure it.

For functional testing, in addition to the simple API tests, we want to make sure that the basic rendering is working with and without attenuation. Some sort of visual test where you verify that attenuation is / isn't happening as well as testing the cutoff. I wouldn't get too fancy with these...just a sanity test.

@nlisker nlisker changed the base branch from master to jfx14 Jan 10, 2020
@openjdk openjdk bot added the csr label Feb 5, 2020
@openjdk
Copy link

openjdk bot commented Feb 5, 2020

@kevinrushforth this pull request will not be integrated until the CSR request JDK-8218264 for issue JDK-8217472 has been approved.

@openjdk openjdk bot removed the rfr label Feb 5, 2020
@kevinrushforth
Copy link
Member

kevinrushforth commented Feb 5, 2020

Looks like the jcheck bot removed the rfr label because the CSR isn't complete. An incomplete CSR should be treated the same way as an insufficient number of reviewers. I filed SKARA-262 to track this.

@nlisker
Copy link
Contributor Author

nlisker commented Feb 17, 2020

I've taken the sample app and enlarged the box to fill the whole range of the lights in an attempt to have many pixels rendered for few vertices. I measured 90-115 fps during continuous animation with this patch, and 100-120 before it. I measured the fps using an external app called BandiCam on Win 10.

Will do more tests this week.

@nlisker
Copy link
Contributor Author

nlisker commented Feb 22, 2020

Attaching an attempt at testing performance with and without attenuation. Launch LightingSample without the patch or AttenLightingSample with the patch. There are 2 modes, one with a single large box and another with many small boxes. Not only is there no difference in performance with and without attenuation, if the lights are turned off (so there is no shader calculation even), there is still no difference. The "single" mode gives ~120 fps, while the "multiple" mode gives ~42 fps and seems to be limited by the number of nodes in the scene rather than anything related to lighting.

Either the calculations for 3 lights is negligible, or the testing is flawed. I tested on Win10 with an RX 470 4GB. Used BandiCam to measure the fps.

attenTest.zip

@openjdk openjdk bot added the rfr label Mar 9, 2020
@nlisker
Copy link
Contributor Author

nlisker commented Mar 10, 2020

@kevinrushforth Can you have a look at the test app? I would like to get this moving so we would have time to get the rest of the lighting enhancements into 15.

@kevinrushforth
Copy link
Member

kevinrushforth commented Mar 10, 2020

I'll take a look. My quick thought is that we need some sort of test with a reasonable number of large boxes (so that it is fill-limited). If there isn't such a case, and the 3D rendering is always node-limited, then the shader performance doesn't really matter all that much. I suspect we should be able to find a case where it does, but we'll see.

@kevinrushforth
Copy link
Member

kevinrushforth commented Mar 11, 2020

I did some limited testing today with a modification to the test program you attached to create a MeshView with 200 large quads (400 triangles) in a single node. This will eliminate the node overhead. I can confirm that it is fill rate limited, because when I send the exact same amount of data, but make the triangles small, the frame rate goes up as expected.

It sill looks like it isn't shader limited, though, at least on my Windows 10 machine, which has an Intel UHD Graphics 630. More testing is needed on other platforms. I'll share the mods to the test program when I have time, but it's basically just creating a set of quads on top of each other by reusing the same 4 points in each pair of faces.

@kevinrushforth
Copy link
Member

kevinrushforth commented Mar 14, 2020

I'll attach the above modified testcase that I ran. I ran it on a relatively new Windows 10 laptop and a rather ancient MacBook Pro. I had to drastically reduce the number of quads on the Mac, but the results are similar: no significant difference between the current code and the proposed patch for point lights (without attenuation).

I'd like to see results on a recent machine with a graphics accelerator (either NVIDIA or AMD/ATI) to see if the new shader hurts performance there, but I suspect it will be fine.

@kevinrushforth
Copy link
Member

kevinrushforth commented Mar 14, 2020

Updated test case: attenTest2.zip

@nlisker
Copy link
Contributor Author

nlisker commented Mar 17, 2020

On Win 10 with an AMD RX 470 4GB I get the following (ran the test twice):

Without the patch:
200 quads average 111.5, 113 fps
5000 quads average 11.5, 11.5 fps

With the patch:
200 quads average 106, 111 fps
5000 quads average 8.5, 8.5 fps

Will test on Ubuntu later.

@nlisker
Copy link
Contributor Author

nlisker commented Mar 18, 2020

On Ubuntu 18 with an AMD RX 470 4GB I get the following:

Without the patch:
200 quads average 107 fps
5000 quads average 11.5 fps

With the patch:
200 quads average 107 fps fps
5000 quads average 11 fps

@nlisker
Copy link
Contributor Author

nlisker commented Apr 2, 2020

@arapte Can you please test the performance changes too?

@kevinrushforth
Copy link
Member

kevinrushforth commented Apr 7, 2020

I think @arapte has a similar MacBookPro model to mine.

I think @prrace might be able to test it (I'll sync with him offline).

@kevinrushforth
Copy link
Member

kevinrushforth commented Apr 15, 2020

Here are the results on Phil's machine, which is a Mac Book Pro with a graphics accelerator (Nvidia, I think).

Without the patch:
2000 quads average 8.805 fps

With the patch:
2000 quads average 4.719 fps

Almost a 2x performance hit.

@kevinrushforth
Copy link
Member

kevinrushforth commented Apr 15, 2020

Conclusion: The new shaders that support attenuation don't seem to have much of a performance impact on machines with an Intel HD, but on systems with a graphics accelerator, it is a significant slowdown.

So we are left with the two choices of doubling the number of shaders (that is, a set of shaders with attenuation and a set without) or living with the performance hit (which will only be a problem on machines with a dedicated graphics accelerator for highly fill-limited scenes). The only way we can justify a 2x drop in performance is if we are fairly certain that this is a corner case, and thus unlikely to hit real applications.

If we do end up deciding to replicate the shaders, I don't think it is all that much work. I'm more worried about how well it would scale to subsequent improvements, although we could easily decide that for, say, spotlights attenuation is so common that you wouldn't create a version that doesn't do that.

In the D3D HLSL shaders, ifdefs are used, so the work would be to restore the original code and add the new code under an ifdef. Then double the number of lines of gradle (at that point, I'd do it in a for-each loop), then modify the logic that loads the shaders to pick the right one.

For GLSL, the different parts of the shader are in different files, so it's a matter of creating new versions of each of the three lighting shaders that handle attenuation and choosing the right one at runtime.

@nlisker
Copy link
Contributor Author

nlisker commented Apr 17, 2020

I discussed this with a graphics engineer. He said that a couple of branches do not have any real performance impact even on modern mobile devices, and that, e.g., on iOS 7 using half floats instead of floats was improving shader execution dramatically. Desktops with NVIDIA or AMD and even Intel modern cards can process dozens of branches with no significant performance degradation.

He suggested actually to have all the light types in a single shader file (looking ahead here). He also suggested not to permute on shaders based on the number of lights and just pass in a uniform for that number and loop over it. The permutations on the bump, specular and self illuminations components are correct (not sure we are not doing that for the diffuse component). If we add later shadows, which is not on my near to-do list, then we should permute there.

It also depends on our target hardware. If we take into account hardware from, say, 2005 then maybe branching will cause significant performance loss, but that hinders our ability to increase performance for newer hardware. What is the policy here?

I have a Win10 laptop with a GeForce 610M that I will test this weekend to see if the mobile NVidia cards have some issue.

@kevinrushforth
Copy link
Member

kevinrushforth commented Apr 18, 2020

I think most of those are good suggestions going forward. As for the performance drop, the only place we've seen it so far is on graphics accelerators that are a few years old by now. Integrated graphics chipsets (such as Intel HD) either old or new seem largely unaffected by the shader changes. What we are missing is performance metrics from newer graphics accelerators on Mac and Windows.

Even with the performance drop on older graphics devices, I'm leaning towards not having the shaders to be shaders to be doubled, since this is an artificial stress test with huge quads. If we could get performance data from a couple more recent graphics accelerators that would be best.

@kevinrushforth
Copy link
Member

kevinrushforth commented Apr 24, 2020

Here is a slightly modified test program. It fixes a compilation error in the previous, and also adds a system property to set the number of quads:

It creates 200 quads by default. If you need to increase this or decrease it to get something in the ~ 10 fps range you can do that with -DnumQuads=NNNN.

pointlighttest.zip

@prrace
Copy link
Contributor

prrace commented Apr 24, 2020

@kevinrushforth
Member
kevinrushforth commented Apr 18, 2020

I think most of those are good suggestions going forward. As for the performance drop, the only place we've seen it so far is on graphics accelerators that are a few years old by now.

So 50% drop on a 2015 macbook pro is OK ? Do we have numbers on recent macbook pros ?

@kevinrushforth
Copy link
Member

kevinrushforth commented Apr 25, 2020

If this were an even remotely representative use case, then no, the performance hit would not be OK. The test was designed as an artificial "worst-case" stress test: a single mesh with a large number of very large (window-sized) quads stacked on top of each other. Any real-world use case won't do this.

We should make sure that we aren't seeing any significant performance drop when rendering spheres (at a couple different tessellation levels) or boxes.

@dsgrieve
Copy link

dsgrieve commented Apr 27, 2020

Results with NVIDIA Quadro P400:

Without the fix, 1000 quads, average FPS ~7.4
With the fix, 1000 quads, average FPS ~6.1

@openjdk
Copy link

openjdk bot commented May 8, 2020

@nlisker this pull request can not be integrated into master due to one or more merge conflicts. To resolve these merge conflicts and update this pull request you can run the following commands in the local repository for your personal fork:

git checkout 8217472_Add_attenuation_for_PointLight
git fetch https://git.openjdk.java.net/jfx master
git merge FETCH_HEAD
# resolve conflicts and follow the instructions given by git merge
git commit -m "Merge master"
git push
@openjdk openjdk bot added the merge-conflict label May 8, 2020
@openjdk openjdk bot removed the merge-conflict label May 8, 2020
@nlisker
Copy link
Contributor Author

nlisker commented May 13, 2020

We should make sure that we aren't seeing any significant performance drop when rendering spheres (at a couple different tessellation levels) or boxes.

I missed this. Do you mean that the test should create a mesh of a sphere instead of a flat surface?

@kevinrushforth
Copy link
Member

kevinrushforth commented May 13, 2020

I would say in addition to rather than instead of, since both are useful.

What might help is to add the sphere test plus the pathological test I put together into your test program so we can select between them. And then get a few of us to run that updated program and post results.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

5 participants
You can’t perform that action at this time.