Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Weird Artifacts #1

Open
terhechte opened this issue May 9, 2019 · 9 comments

Comments

Projects
None yet
3 participants
@terhechte
Copy link

commented May 9, 2019

Hey! you might already know about this, but I thought I'd still post it just it case it might be helpful. I'm running piet-metal on a 2016 MBP with a Radeon Pro 450. It works, but the SVG tiger has a lot of artefacts:

Screenshot 2019-05-09 at 08 53 27

I thought it might be integrated chip so I used gfx-status to enforce the discrete GPU and ran it again. That didn't change it though as piet properly automatically uses the discrete device.

Screenshot 2019-05-09 at 08 54 05

Is this expected behaviour? Can I submit any other system info to help out? Great project, looking forward to where this is going!

@raphlinus

This comment has been minimized.

Copy link
Contributor

commented May 9, 2019

Thanks for the report, it's certainly not expected behavior. It's almost certainly because I've made some assumptions about SIMD group size which can very from card to card - PietRender.metal line 233 is the most likely culprit, and there might be another place.

I'll try to replace that with logic that uses the correct attributes for kernel function input arguments (threads_per_simdgroup and so on).

@raphlinus

This comment has been minimized.

Copy link
Contributor

commented May 9, 2019

Could you by any chance run the cardioid image (change comments in make_test_scene)? That's a very revealing test for simdgroup issues.

@torarnv

This comment has been minimized.

Copy link

commented May 9, 2019

Seeing the same artefacts on an iMac Pro with a Radeon Pro Vega 56 8 GB. Here's the cardioid image:

image

@torarnv

This comment has been minimized.

Copy link

commented May 9, 2019

It's almost certainly because I've made some assumptions about SIMD group size which can very from card to card - PietRender.metal line 233 is the most likely culprit, and there might be another place.

Not sure if that's the right approach, but changing the 16 constant to 4,8,32, or 64 did not affect things.

@terhechte

This comment has been minimized.

Copy link
Author

commented May 9, 2019

934174A6-EF55-49CA-A436-2C77A6ACD132

Here’s my image. I hope that helps!
@raphlinus

This comment has been minimized.

Copy link
Contributor

commented May 9, 2019

Thanks, it definitely helps. It suggests that the simd group might be 64 on that hardware. I'm going to redo the grid logic so that it should be able to handle just about any combination of grid and simd, but I'm finding it quite tricky.

raphlinus added a commit that referenced this issue May 9, 2019

More adaptive SIMD grid
Try to handle different subgroup sizes.

Fixes #1

@raphlinus raphlinus referenced a pull request that will close this issue May 9, 2019

Open

More adaptive SIMD grid #2

@raphlinus

This comment has been minimized.

Copy link
Contributor

commented May 9, 2019

I just pushed a PR, but don't yet have confidence it's the right fix. For one, I'm seeing a performance regression on Intel 640, and it also doesn't seem to work with a tile width of 4 (you'd never do this, but in theory it should work with any power-of-2). I'd appreciate one of you trying it on the cardioid and seeing what happens, as that will tell me whether it's on the right track.

raphlinus added a commit that referenced this issue May 10, 2019

Experiment with threadgroup instead of simd
Don't use simd, use atomic in threadgroups instead.

This is a performance regression around 10%, but might be more robust.

Fixes #1 (I hope)

@raphlinus raphlinus referenced a pull request that will close this issue May 10, 2019

Open

Experiment with threadgroup instead of simd #3

@raphlinus

This comment has been minimized.

Copy link
Contributor

commented May 10, 2019

I tried another approach, getting rid of SIMD and using threadgroups. I see a 10% perf regression ballpark, but it might be the way forward, for a number of reasons. For one, it's going to be easier to port to older graphics cards because it doesn't rely on simdgroup. Also, for some reason, #2 was causing the simdgroup size to go to 8, which seemed to be the cause of the perf regression. @kvark and I have been looking into that and haven't gotten any answers yet. In any case, my measurement with #3 is that the simdgroup size is 16.

I know #3 has some rough edges, but I'd love to see whether it fixes the problem, and also get performance numbers (from the shader profiler), if you can get those.

Thanks again for the reports!

@terhechte

This comment has been minimized.

Copy link
Author

commented May 15, 2019

As mentioned on the other PR, #3 fixes the issue for me. Thanks for looking into this!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.