Join GitHub today
GitHub is home to over 50 million developers working together to host and review code, manage projects, and build software together.
Sign upHigh CPU overhead on very complex scenes #228
Comments
|
Thanks for trying it out; this feedback is very valuable! Yes, I think you described the biggest potential performance improvement: rendering strokes with a stroke shader instead of using fills. Another potential win will simply be to make tiling faster; @nical has a work-in-progress branch to improve things significantly. Finally, moving tiling to GPU would help a lot. |
|
You may wish to try again. With @Veedrac's optimizations and the new tiling code that just landed, I've measured an 83% improvement in CPU time on some test cases. |
|
Closing for now as there have been many CPU time improvements, including an entirely new backend, since this was filed. Feel free to reopen if more issues are found. |
Right now I am working on a project where I need to render very complex lines (~2000 points per line, all straight segments per line, usually 3 lines) every frame and have been trying out Pathfinder for it. While I am aware this is probably not an intended use case, I think this may still be valuable for improving the performance of Pathfinder.
The code I am working with can be found here. (see line 296 for some of the main rendering code)
First of all, in order to even get realtime performance, I had to break each line into smaller
PathObjects instead of one singlePathObjectper line. I ended up with rendering taking roughly 8-11ms per frame (usingRayonExecutor, measured very non-scientifically), which doesn't really leave enough time for the other stuff I want to do. One of the things I found was by not usingOutlineStrokeToFill, although that obviously renders incorrectly, the render time reduced to about 2-4ms. #119 mentions that the stroke algorithm could be done on the GPU, and that might increase performance greatly in this scenario.Something else I noticed under this scenario was that
RayonExecutordoesn't seem to scale well with added threads. Even though my CPU has 8 threads, usingRayonExecutorwas only about 70% faster thanSequentialExecutor(again measured very roughly, about 16-17ms per frame usingSequentialExecutor). Additionally, the CPU usage reported byhtopwas about double that of usingSequentialExecutorFinally, I decided to compare performance of Pathfinder to Blend2D, a purely CPU based rasterizer. My code for testing it is on the
blend2dbranch of the same repo. Using it, only 3-5ms were spent drawing the lines. However, the caveat is that I have later steps that I want to perform on the GPU, and copying the rendered image from the CPU to the GPU takes a significant amount of time (at least with my naive implementation), bringing the total frame time back up to only a little faster than Pathfinder usingRayonExecutor. Despite that, the CPU usage reported while using Blend2D is roughly half of Pathfinder usingSequentialExecutor.