Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More precise & up to 2-5x faster STL output #4643

Closed
wants to merge 16 commits into from

Conversation

ochafik
Copy link
Contributor

@ochafik ochafik commented May 19, 2023

Fixes #4316 (and the precision part of #2651 )

sphere(10, $fn=1000); now renders 5.3x faster in ASCII STL and 2.5x faster in binary STL (tested on M1 mac).

Notes:

  • Using wicked-fast double-conversion's ToShortestSingle to format floats (as suggested by @thehans in the thread below), which also guarantees that parsing gives back the exact same float value.
    • As a result, ASCII STL output now has the same precision as the binary format (reading it is guaranteed to yield the exact same float coordinates)
    • Also means we no longer need to parse strings after formatting, for further speedups
  • Uses an indexed mesh:
    • Added a tesselate_faces overload that outputs the intermediate indexed mesh instead of wasting time building a PolySet.
    • For ASCII export, caching the results of each vertex's toString
  • Both overloads of tesselate_faces are also faster because we now skip the indexed faces stage (made possible by making the Reindexer's points vector grow incrementally / available for lookups w/o overhead)
  • Skipping expensive superfluous tests in STL output (tesselation gives us guarantee of no duplicate vertices in each poly)
  • Batch vector outputs in binary STL mode for extra speedups
  • Preallocation of polygon vectors in the PolySet append_poly / append_vertex flow yields up to 10% render speedups on its own and was merged independently in Avoid excessive reallocations in PolySet #4642 (this PR's branch has now been rebased)

return v;
auto cstr = vertexString.c_str();
return {
strtof(cstr, (char**)&cstr),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does that work in locales having something else than '.' as decimal separator? The man page says it's using locale settings.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah! Luckily there's a a setlocale(LC_NUMERIC, "C"); in the parent of all append_stl functions that's meant to force '.' as decimal separator for both ways of parsing in this context (weirdly tho, I tried to force the locale to "fr_FR.UTF-8" there and the output didn't budge - I was expecting commas 🧐)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Keep in mind that setLocale has been shown to be incredibly slow on windows if you enable localization in preferences. #2744

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@thehans oh, does that setlocale currently make ASCII STL export slower on Windows? (sounds like something exciting to follow up on, although this PR shouldn't make things worse in that respect)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know how much it is specifically affecting STL output speed.
IIRC the test case there was particularly slow during CSG generation (sample code had tons of polyhedron data, thousands of vertex coordinates that require double to string conversion).

That was the impetus for adding double-conversion library, and so any time we print a number through our Value variant class it now goes through that. But I don't think any of the export code was ever touched as part of those changes.

There has been some discussion more recently about switching that out for libfmt aka {fmt} which is supposed to be even faster than double-conversion.
I spent quite some time a few months ago attempting to swap out the libraries but didn't quite get it to a fully working/presentable state. I might revisit it soon if I find the energy.

Copy link
Member

@thehans thehans May 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You might also try casting double to float and calling ToShortestSingle

edit: I'm surprised it wasn't faster than the standard lib functions, since that's kinda the whole point of it. Are you testing on Release build (i.e. with optimizations)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re/ doubleconv it’s weird indeed, maybe I’ve misused it (tried same Params as the Value code, then more naive params), or the mac stdlib is faster than average (might try this benchmark if I have a chance), or… the way I naively stripped trailing zeros negated its performance? (Hadn’t seen ToShortestSingle, will try next)

Copy link
Member

@thehans thehans May 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From your gist trimTrailingZeros doesn't appear to actually get called anywhere.

Anyways, simply stripping zeroes from the end is not even remotely correct behavior, in the event that:

  1. There is no decimal point: "1000" becomes "1" (also "0" would become empty string)
  2. There is an exponent
    a) "1.000000e6" would not get trimmed at all
    b) "1.000000e10" would trim the exponent "1.000000e1" echo() formating error #2950

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The more I think about it, I feel that using ToShortestSingle would make the most sense here, and would allow for simplifying the code much further:

ToShortestSingle creates the shortest exact representation of the input value (a "round trip" guarantee). So no need to worry about trailing zeros (and no directly specifying any precision)

Additionally, all of the code going from double to string and back, was for the purpose of normal calculations matching up after reading in the decimal ASCII converted vertex positions (#1853).
If you downcast double to float and calculate normals from that, then there's no need to perform the round trip (string back to float) ourselves, because no further precision will be lost(post-downcast) by ToShortestSingle.

Copy link
Contributor Author

@ochafik ochafik May 22, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the record, a %.9g format would have had the same property of compact format & round trip guarantee (and simpler code).

But double-conversion is indeed much faster (earlier naive attempt was costly in terms of builder usage, not to mention the bogus trailing zeros logic - note: was inlined / not in the unused helper I'd copied over), so I've switched toString to it, PTAL!

@ochafik ochafik changed the title Up to 2-3x faster STL output [WIP] Up to 2-3x faster STL output May 21, 2023
@ochafik ochafik changed the title [WIP] Up to 2-3x faster STL output [WIP] More precise & up to 2-5x faster STL output May 22, 2023
@ochafik ochafik changed the title [WIP] More precise & up to 2-5x faster STL output More precise & up to 2-5x faster STL output May 22, 2023
@ochafik ochafik marked this pull request as ready for review May 22, 2023 17:13
std::string toString(const Vector3d& v)
/* Define values for double-conversion library. */
#define DC_BUFFER_SIZE (128)
#define DC_FLAGS (double_conversion::DoubleToStringConverter::UNIQUE_ZERO | double_conversion::DoubleToStringConverter::EMIT_POSITIVE_EXPONENT_SIGN)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know this is basically copied verbatim from Value.cc, but I would remove the EMIT_POSITIVE_EXPONENT_SIGN flag here. I was planning to remove that one from the original code too. Also meant to convert all these to constexpr instead of defines.

See this snippet from my unmerged PR for the converted values. I had also tweaked a couple of other values, but I don't know if those are relevant to ToShortestSingle (or ToShortest) without reviewing the double-conversion "docs" (header file comments).

I basically halted work on that PR to start implementing libfmt, but that ended up being much more work than anticipated.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, cool PR! I've dropped EMIT_POSITIVE_EXPONENT_SIGN and will keep the defines consistent-ish for now (unless you'd prefer I constexprify both places in this PR).

Re/ libfmt, once all compilers fully support C++20 (clang 14 doesn't seem there yet), would we get enough of its features through std::format?

vertex 0 1.2345679 1.2345679
endloop
endfacet
facet normal 0 0 -1.471716e-7
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still not a unit vector

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, have now given isZero a zero tolerance to fix this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this supposed to be empty?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry it wasn't 😅 (probably still had a bit of fever)

@thehans
Copy link
Member

thehans commented May 24, 2023

I guess I'm still a bit confused about what exactly the ASCII STL tests are checking for, since various discrepancies are still passing.
Should we be doing a full textual diff of actual vs expected stl file output, to better ensure correct output?

@ochafik
Copy link
Contributor Author

ochafik commented May 27, 2023

@thehans I've restructured the test so that stl-import-export-limits.scad both imports and expects the expectations of stl-export-limits.scad (modified the cmake and python test to accept an explicit --expected-file arg), to minimize the changes needed and still benefit from TEST_GENERATE=1

@pca006132 pca006132 mentioned this pull request Nov 29, 2023
15 tasks
pca006132 added a commit to pca006132/openscad that referenced this pull request Dec 1, 2023
@pca006132 pca006132 mentioned this pull request Dec 1, 2023
kintel pushed a commit that referenced this pull request Dec 1, 2023
* build PolySet directly from manifold mesh
* use small_vector for faces
* faster stl export (#4643)
@pca006132
Copy link
Member

some optimizations are already implemented, remaining should be done with polyset cleanup.

@pca006132 pca006132 closed this Dec 2, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ASCII STL export is slow
4 participants