Integrate non-CSG speedup initiatives [Do not merge] #4654

ochafik · 2023-05-25T14:25:00Z

Now that 3D CSG operations are super fast (with experimental features manifold or fast-csg), the rest of OpenSCAD (script evaluation, import, export) can become the bottleneck on some more ambitious models (note: I'm only looking at CLI rendering times, not UI / rendering, although this will impact preview times).

This PR is an umbrella to list, discuss and merge-integrate some active efforts towards non-CSG speedups (have been tested separately and should provide speedups of at least 5% each on some model - sometimes on the script evaluation phase only, e.g. tested w/ --export-format=echo):

fast-stl3: More precise & up to 2-5x faster STL output #4643
(uses indexed meshes & double-conversion library)
identifiers: Speed up script evaluation by prehashing & interning identifier strings #4648
Identifier to pre-hash and intern variable lookup strings
vector-reserve: Reserve lists (VectorType) systematically #4652
polyset-reserve: Avoid excessive reallocations in PolySet #4642
(both for # of polygons and # of vertices in each polygon)
virtual-simplify: Turn simplify_function_body into a virtual method #4653
remove-dupe-vars: Check/remove duplicate variable assignments earlier to speed up hot codepaths #4651
(takes some hashmaps out of hot evaluate codepaths)
frame-reserve: Reserve ValueMaps in ContextFrame when possible to avoid hashtable reallocs #4650

Upcoming (not merged yet in this draft PR as not stable yet):

fast-linalg2 Optimistic Eigen-backed MatrixType instead of VectorType when possible.

Results (WIP, benchmarks here) (all assuming --enable=manifold):

poly_attached_inlined.scad renders 1.37x faster to binstl / 1.44x faster to asciistl
sphere(10, $fn=1000);: 2.83x faster to binstl, 5.92x faster to asciistl
sphere($fn=1000); translate([0.5, 0, 0]) sphere($fn=1000);: 1.22x faster to binstl, 1.78x faster to asciistl
Test your own? Ideally pick models w/ few CSG operations and high # of facets, importing large assets or that use BOSL2 to process lots of geometry in script

Can give up to 10% speed up of rendering of models like this: $fn=1000; sphere(); translate([0.5, 0, 0]) sphere();

This allows using getArray repeatedly during construction of the reindexer without incurring quadratic costs because of array rebuilding

IndexedFace intermediate stage wasn't needed (possible now that Reindexer's vector is usable during construction)

- use strtof instead of istringstream! - skip tests about polygons w/ duplicate vertices (guaranteed not to happen after tesselation) - cache tostring / fromstring results for each unique vertex - normals are now... normalized (-0 -> 0)

…ectors)

…eallocs

…oSequentialAssignment)

pca006132 · 2023-11-16T13:46:11Z

I thought about this for some time, I think it is hard to get decent performance without doing some kind of large refactor, as AST traversal and allocations for stack frames are slow in general. We also don't have jit, which can speed up numerical code by a lot.

I wonder if it will be a good idea to look into translation into lua for example, which has good FFI and a pretty fast jit. V8 will probably be faster, but not sure if the FFI is as simple. By writing the translation rules, we can also fix some unclear semantics issues...

ochafik · 2023-11-20T12:18:02Z

I thought about this for some time, I think it is hard to get decent performance without doing some kind of large refactor, as AST traversal and allocations for stack frames are slow in general. We also don't have jit, which can speed up numerical code by a lot.

Other possibly tricky avenues of optimizations:

processing evaluations in parallel (would require tweaking the stack frame code to become threadsafe, and finding the right async / TBB graph way to split computations w/ low overhead - my first forays in that direction haven't been successful, overhead has been disappointing so far, but I have a few more things to try there)
speed up linear algebra by using SIMD-optimized Eigen matrices instead of vectors of vectors of generic values. Have spent quite a bit of time on this (e.g. fast-linalg3), but haven't stabilized it to the point of proving any benefit yet.
optimise / cache frame referencing (as discussed in this issue), which is tricky in the face of OpenSCAD's current behaviour (where the same code may be run at different frame depths between the first and subsequent iterations of a loop, for instance)

I wonder if it will be a good idea to look into translation into lua for example, which has good FFI and a pretty fast jit. V8 will probably be faster, but not sure if the FFI is as simple. By writing the translation rules, we can also fix some unclear semantics issues...

Yess!!

Tbh I've been toying with the idea of transpiling OpenSCAD to heavily templated C++ (which in turn, would use parallel constructs, Eigen matrices, etc). Advanced static analysis w/ type propagation / static tracing could bring us quite far.

Definitely worth looking at the semantics of Lua, JS, and general options that have few dependencies (and work well w/ emscripten). I've done a fair bit of V8 native interfacing (codegen), it has a solid API.

My main concern right now is I don't fully understand OpenSCAD's semantics, will need a deep dive into its possibly undocumented fine-print that enables some of the magic leveraged by libs like BOSL2.

pca006132 · 2023-11-20T12:34:09Z

There are also issues regarding caching, e.g. #782. I think I can find some time later to hack a simple bytecode interpreter and see how far can we go with something naive.

jordanbrown0 · 2023-11-20T18:41:10Z

It seems like it should be possible to fully resolve non-$ variable, function, and module references at parse time. (But I haven't done any analysis on how much that would save.)

pca006132 · 2023-11-22T01:14:20Z

Yes, those are just static scoping. Typically you will save time in computing the hashes and creating the hash tables. These are small gains, but can be pretty large if they are in the hot path (which they are...)

jordanbrown0 · 2023-11-22T02:55:31Z

I don't immediately understand why you would need hashes or hash tables for them. It seems like you could just have pointers directly to the underlying Values, allocated as part of context blocks.

$ variables are entirely different, of course.

pca006132 · 2023-11-22T03:00:45Z

Ah, poor wording. What I meant is you will save time from computing those hashes. You can use pointers to point to the underlying values, or have a stack of values stored inline (avoid allocation per value) and use indices to those values directly.

ochafik added 30 commits May 19, 2023 23:01

Require polygon size arg in PolySet::append_poly

77eb5d5

Can give up to 10% speed up of rendering of models like this: $fn=1000; sphere(); translate([0.5, 0, 0]) sphere();

Reindexer::lookup updates vector on the go.

c632f3e

This allows using getArray repeatedly during construction of the reindexer without incurring quadratic costs because of array rebuilding

Improve tesselate_faces + add indexed output overload

f868ff1

IndexedFace intermediate stage wasn't needed (possible now that Reindexer's vector is usable during construction)

Use indexed triangle tesselation in PolySet->Manifold conversion

eba8271

Support IndexedTriangleMesh in ExportMesh

acfa4e2

rm dead code

f28aaa4

Add a few STL export tests (degeneracies & precision limits)

7ba7ac9

STL: Format floats w/ %.9g to guarantee parsing of same number

eb494c1

Update STL export expectations w/ proper precision

0ebd4ed

STL: use double-conversion's ToShortestSingle to format vectors

304ff62

Speed up Binary STL output

39b8bd9

Update fastcsg-remesh-cube-2-expected.stl

039161e

Restore static assert in export_stl

51f7f05

Reserve capacity for every VectorType (less reallocs!)

0ef6988

Fix apparent bug in VectorType:empty() (when only contains embedded v…

1dc2d7f

…ectors)

Identifier: pre-hashed string to speed up lookups!

c548934

Unordered set in Let

ed54f91

Cache builtin function lookups in Identifier

5556b49

Update stl-import-limits.stl

cc614da

Remove EMIT_POSITIVE_EXPONENT_SIGN from stl double-conversion flags

4cb54da

Add some missing Identifier refs

55806a0

Add Identifier::location()

67b4826

Move is_config_variable from ContextFrame -> Identifier

fc356ba

Use emplace_back's c++17 reference result to reserve in less code

a7e12e0

Reserve PolySets everywhere it's easy

4cccd7e

Reserve polyhedron() vectors

e8cd387

Merge branch 'polyset-reserve' into fast-integ

7423302

Merge remote-tracking branch 'origin/identifiers' into fast-integ

fbd535d

Intern Identifier's name for faster == (speeds up hashtable lookups)

1c59d13

ochafik added 8 commits May 25, 2023 13:36

Merge branch 'identifiers' into fast-integ

6369062

Turn simplify_function_body into a virtual method

c046fd1

Reserve ValueMaps in ContextFrame when possible to avoid hashtables r…

1c58c7b

…eallocs

Check/remove duplicate variable assignments away from hot codepath (d…

4b75791

…oSequentialAssignment)

Merge branch 'remove-dupe-vars' into fast-integ

4521c7a

Merge remote-tracking branch 'origin/virtual-simplify' into fast-integ

dca2c8d

Merge remote-tracking branch 'origin/frame-reserve' into fast-integ

6437167

Merge remote-tracking branch 'origin/vector-reserve' into fast-integ

e7d4b24

ochafik changed the title ~~[Do not merge] Umbrella PR for non-CSG speedup initiatives~~ Umbrella PR for non-CSG speedup initiatives [Do not merge] May 25, 2023

ochafik changed the title ~~Umbrella PR for non-CSG speedup initiatives [Do not merge]~~ Integrate non-CSG speedup initiatives [Do not merge] May 25, 2023

pca006132 mentioned this pull request Oct 7, 2023

bytecode-based interpreter #4771

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Integrate non-CSG speedup initiatives [Do not merge] #4654

Integrate non-CSG speedup initiatives [Do not merge] #4654

ochafik commented May 25, 2023 •

edited

pca006132 commented Nov 16, 2023

ochafik commented Nov 20, 2023

pca006132 commented Nov 20, 2023

jordanbrown0 commented Nov 20, 2023

pca006132 commented Nov 22, 2023

jordanbrown0 commented Nov 22, 2023

pca006132 commented Nov 22, 2023

Integrate non-CSG speedup initiatives [Do not merge] #4654

Are you sure you want to change the base?

Integrate non-CSG speedup initiatives [Do not merge] #4654

Conversation

ochafik commented May 25, 2023 • edited

pca006132 commented Nov 16, 2023

ochafik commented Nov 20, 2023

pca006132 commented Nov 20, 2023

jordanbrown0 commented Nov 20, 2023

pca006132 commented Nov 22, 2023

jordanbrown0 commented Nov 22, 2023

pca006132 commented Nov 22, 2023

ochafik commented May 25, 2023 •

edited