Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integrate non-CSG speedup initiatives [Do not merge] #4654

Draft
wants to merge 38 commits into
base: master
Choose a base branch
from

Conversation

ochafik
Copy link
Contributor

@ochafik ochafik commented May 25, 2023

Now that 3D CSG operations are super fast (with experimental features manifold or fast-csg), the rest of OpenSCAD (script evaluation, import, export) can become the bottleneck on some more ambitious models (note: I'm only looking at CLI rendering times, not UI / rendering, although this will impact preview times).

This PR is an umbrella to list, discuss and merge-integrate some active efforts towards non-CSG speedups (have been tested separately and should provide speedups of at least 5% each on some model - sometimes on the script evaluation phase only, e.g. tested w/ --export-format=echo):

Upcoming (not merged yet in this draft PR as not stable yet):

  • fast-linalg2 Optimistic Eigen-backed MatrixType instead of VectorType when possible.

Results (WIP, benchmarks here) (all assuming --enable=manifold):

  • poly_attached_inlined.scad renders 1.37x faster to binstl / 1.44x faster to asciistl
  • sphere(10, $fn=1000);: 2.83x faster to binstl, 5.92x faster to asciistl
  • sphere($fn=1000); translate([0.5, 0, 0]) sphere($fn=1000);: 1.22x faster to binstl, 1.78x faster to asciistl
  • Test your own? Ideally pick models w/ few CSG operations and high # of facets, importing large assets or that use BOSL2 to process lots of geometry in script

ochafik added 30 commits May 19, 2023 23:01
Can give up to 10% speed up of rendering of models like this:

$fn=1000;

sphere();
translate([0.5, 0, 0])
  sphere();
This allows using getArray repeatedly during construction of the reindexer without incurring quadratic costs because of array rebuilding
IndexedFace intermediate stage wasn't needed (possible now that Reindexer's vector is usable during construction)
- use strtof instead of istringstream!
- skip tests about polygons w/ duplicate vertices (guaranteed not to happen after tesselation)
- cache tostring / fromstring results for each unique vertex
- normals are now... normalized (-0 -> 0)
@ochafik ochafik changed the title [Do not merge] Umbrella PR for non-CSG speedup initiatives Umbrella PR for non-CSG speedup initiatives [Do not merge] May 25, 2023
@ochafik ochafik changed the title Umbrella PR for non-CSG speedup initiatives [Do not merge] Integrate non-CSG speedup initiatives [Do not merge] May 25, 2023
@pca006132
Copy link
Member

I thought about this for some time, I think it is hard to get decent performance without doing some kind of large refactor, as AST traversal and allocations for stack frames are slow in general. We also don't have jit, which can speed up numerical code by a lot.

I wonder if it will be a good idea to look into translation into lua for example, which has good FFI and a pretty fast jit. V8 will probably be faster, but not sure if the FFI is as simple. By writing the translation rules, we can also fix some unclear semantics issues...

@ochafik
Copy link
Contributor Author

ochafik commented Nov 20, 2023

I thought about this for some time, I think it is hard to get decent performance without doing some kind of large refactor, as AST traversal and allocations for stack frames are slow in general. We also don't have jit, which can speed up numerical code by a lot.

Other possibly tricky avenues of optimizations:

  • processing evaluations in parallel (would require tweaking the stack frame code to become threadsafe, and finding the right async / TBB graph way to split computations w/ low overhead - my first forays in that direction haven't been successful, overhead has been disappointing so far, but I have a few more things to try there)
  • speed up linear algebra by using SIMD-optimized Eigen matrices instead of vectors of vectors of generic values. Have spent quite a bit of time on this (e.g. fast-linalg3), but haven't stabilized it to the point of proving any benefit yet.
  • optimise / cache frame referencing (as discussed in this issue), which is tricky in the face of OpenSCAD's current behaviour (where the same code may be run at different frame depths between the first and subsequent iterations of a loop, for instance)

I wonder if it will be a good idea to look into translation into lua for example, which has good FFI and a pretty fast jit. V8 will probably be faster, but not sure if the FFI is as simple. By writing the translation rules, we can also fix some unclear semantics issues...

Yess!!

Tbh I've been toying with the idea of transpiling OpenSCAD to heavily templated C++ (which in turn, would use parallel constructs, Eigen matrices, etc). Advanced static analysis w/ type propagation / static tracing could bring us quite far.

Definitely worth looking at the semantics of Lua, JS, and general options that have few dependencies (and work well w/ emscripten). I've done a fair bit of V8 native interfacing (codegen), it has a solid API.

My main concern right now is I don't fully understand OpenSCAD's semantics, will need a deep dive into its possibly undocumented fine-print that enables some of the magic leveraged by libs like BOSL2.

@pca006132
Copy link
Member

There are also issues regarding caching, e.g. #782. I think I can find some time later to hack a simple bytecode interpreter and see how far can we go with something naive.

@jordanbrown0
Copy link
Contributor

It seems like it should be possible to fully resolve non-$ variable, function, and module references at parse time. (But I haven't done any analysis on how much that would save.)

@pca006132
Copy link
Member

Yes, those are just static scoping. Typically you will save time in computing the hashes and creating the hash tables. These are small gains, but can be pretty large if they are in the hot path (which they are...)

@jordanbrown0
Copy link
Contributor

I don't immediately understand why you would need hashes or hash tables for them. It seems like you could just have pointers directly to the underlying Values, allocated as part of context blocks.

$ variables are entirely different, of course.

@pca006132
Copy link
Member

Ah, poor wording. What I meant is you will save time from computing those hashes. You can use pointers to point to the underlying values, or have a stack of values stored inline (avoid allocation per value) and use indices to those values directly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants