Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove Thrust #809

Open
elalish opened this issue May 11, 2024 · 19 comments
Open

Remove Thrust #809

elalish opened this issue May 11, 2024 · 19 comments
Milestone

Comments

@elalish
Copy link
Owner

elalish commented May 11, 2024

Thrust is now deprecated, and we've been wanting to move off it for awhile anyway since we're no longer using CUDA. Thrust is turning in CCCL - getting more integrated with CUDA, so we don't need that. Thrust gave birth to PSTL, which is pretty widely supported now (C++17). PSTL appears to be backed by TBB and/or OpenMP in most compiler's standard libraries.

I think the big question is: do we switch to PSTL or TBB? What's your opinion, @pca006132? Related: #520

My impression is PSTL might be easier to switch to since the API shape is close to Thrust.

On the other hand, TBB is lower-level and so may have more performance, and we already have a little TBB code.

@fire @kintel thoughts on what would be easiest to consume as far as dependencies from a downstream perspective?

@fire
Copy link
Contributor

fire commented May 11, 2024

Background notes

Building TBB as a static library is not recommened and is only supported because Intel has a "bigiron" business requirement. https://github.com/jckarter/tbb/blob/master/build/big_iron.inc

Godot Engine doesn't use openmp because that requires a "MSVC redistributable". https://learn.microsoft.com/en-us/cpp/windows/latest-supported-vc-redist?view=msvc-170

Edited:

As far as I know openmp degrades nicely though, but it's also different from C++17 https://stackoverflow.com/questions/67848884/c-compiler-support-for-stdexecution-parallel-stl-algorithms

@pca006132
Copy link
Collaborator

I think PSTL is easier to switch to. It lacks some special APIs, but we can probably implement our own. The main issue here is compiler support, e.g. we need GCC 13 or libc++ to properly use it with onetbb.

Using TBB directly will require a lot of work. Some algorithms are not that easy to implement efficiently.

OpenMP is probably not an option. We tried that before and the performance is not that good, at least for thrust impelmentation.

@kintel
Copy link
Contributor

kintel commented May 11, 2024

I'm thoroughly confused about PSTL and TBB, so I cannot really comment here. ..but if PSTL is part of c++17 that will get my vote. We already package TBB, so that should be easy to keep supporting. But searching around give me the feeling that PSTL and TBB are not particularly compatible? https://community.intel.com/t5/Intel-oneAPI-Threading-Building/Is-PSTL-still-supported-by-TBB/m-p/1487798

@pca006132
Copy link
Collaborator

@kintel They are compatible, but it depends on the versions... oneapi-src/oneTBB#332

Basically:

  1. The old version (before onetbb) seems to be compatible with every version of PSTL. But this is no longer maintained, and I think distros are moving towards onetbb?
  2. When libstdc++ is used, onetbb is compatible with the libstdc++ in GCC 13+. Note that even when you compile with clang, by default it is linking against libstdc++.
  3. When libc++ is used (e.g. on Mac or on Linux using clang with some additional parameters), it is fine with onetbb.
  4. For windows, I haven't checked.

Note that for 3, I only tested relatively new LLVM version. Not sure about which version is the oldest supported version. Probably require a relatively new version (https://reviews.llvm.org/D141779). And it seems that the PSTL support on libc++ is quite incomplete (https://libcxx.llvm.org/Status/PSTL.html), but that may be about PSTL support with other backends?

@kintel
Copy link
Contributor

kintel commented May 11, 2024

This all sounds like a bit of a nightmare if targeting Linux distro packaging though, but perhaps that shouldn't be driving design decisions too much..

@fire
Copy link
Contributor

fire commented May 11, 2024

Here's a decision table for compiler support godotengine/godot#91833

@t-paul
Copy link

t-paul commented May 11, 2024

gcc-13
Expanding a bit on the Linux topic, requiring gcc-13 will not be much of an issue for official distro packaging as that essentially only goes forward. Classic distros hardly backport packages to already released distro version and rolling distros are also moving along with recent versions of applications and tools.

It becomes a huge issue for providing recent application versions, a.k.a. dev builds though:

  • AppImages are by design built on older distros, OpenSCAD currently uses Ubuntu 20.04. but even upgrading to 22.04 would only bring gcc-11 as default compiler
  • People trying to self building applications on their not so recent installations. Unfortunately that's also pretty common where people are behind 2 or more LTS releases which amounts to about 5 years or so

Long story short, if gcc-13 will be a requirement, that will kill almost all OpenSCAD dev builds we currently provide for older distributions and will make AppImages impossible for a couple of years.

c++17
Things are a bit more relaxed on that, even Ubuntu 20.04 has some support for c++17 features, it would be nice to delay moving to 22.04 level a bit more, but that's not a showstopper in my opinion.

@pca006132
Copy link
Collaborator

I'm curious what other libraries use. We should not be the first one hitting this compatibility issue?

And yeah, I don't think we want to make gcc-13 a requirement. I don't think we can rely solely on PSTL for now.

@fire
Copy link
Contributor

fire commented May 12, 2024

@pca006132 do you have a listing of all the thrust apis we use? It'll help us select another option.

@pca006132
Copy link
Collaborator

@pca006132
Copy link
Collaborator

I think the slightly trickier ones to implement are things like copy_if, remove_if, that requires the final result to have the same ordering as the input.

@elalish
Copy link
Owner Author

elalish commented May 12, 2024

Regarding compatibility - my impression is PSTL and TBB are related a little like Thrust and CUB. They have slightly different APIs and TBB and CUB are slightly lower-level. But mostly: PSTL/Thrust is really just APIs, while TBB and CUB have actual parallel algorithm implementations. So I think TBB using PSTL was probably a bootstrap to get some OpenMP support before TBB was finished or something. Nowadays it seems we're in a PSTL calls TBB (or OpenMP) under the hood kind of situation, which is much how we currently use Thrust.

@fire
Copy link
Contributor

fire commented May 12, 2024

Gathered by chatgpt4 from par.h

THRUST_DYNAMIC_BACKEND(copy_if, void)
THRUST_DYNAMIC_BACKEND_VOID(exclusive_scan)
THRUST_DYNAMIC_BACKEND_VOID(for_each)
THRUST_DYNAMIC_BACKEND_VOID(for_each_n)
THRUST_DYNAMIC_BACKEND(gather_if, void)
THRUST_DYNAMIC_BACKEND_VOID(gather)
THRUST_DYNAMIC_BACKEND(reduce_by_key, void)
THRUST_DYNAMIC_BACKEND_VOID(scatter)
THRUST_DYNAMIC_BACKEND_VOID(sequence)
THRUST_DYNAMIC_BACKEND(transform_reduce, void)

STL_DYNAMIC_BACKEND(all_of, bool)
STL_DYNAMIC_BACKEND(count_if, int)
STL_DYNAMIC_BACKEND_VOID(copy)
STL_DYNAMIC_BACKEND_VOID(copy_n)
STL_DYNAMIC_BACKEND(find_if, void)
STL_DYNAMIC_BACKEND(find, void)
STL_DYNAMIC_BACKEND(fill, void)
STL_DYNAMIC_BACKEND(inclusive_scan, void)
STL_DYNAMIC_BACKEND(is_sorted, bool)
STL_DYNAMIC_BACKEND(remove_if, void)
STL_DYNAMIC_BACKEND(remove, void)
STL_DYNAMIC_BACKEND(reduce, void)
STL_DYNAMIC_BACKEND_VOID(stable_sort)
STL_DYNAMIC_BACKEND_VOID(transform)
STL_DYNAMIC_BACKEND_VOID(uninitialized_copy)
STL_DYNAMIC_BACKEND_VOID(uninitialized_fill)

Feel free to edit my list.

@elalish
Copy link
Owner Author

elalish commented May 12, 2024

I think regarding old compilers that have bugs or lack support for certain PSTL algorithms, we should just let those fall back to single-threaded. Then it should work everywhere, but it'll be fastest on the latest platforms. That feels like a reasonable compromise regarding maintainability and compatibility. I don't think we can afford to optimize performance heavily for every old platform.

@elalish
Copy link
Owner Author

elalish commented May 12, 2024

Besides, I feel like on average we only get ~2x speedup for parallel over single-threaded anyway. CPU pipelining is pretty good when your algorithms are parallelized!

@pca006132
Copy link
Collaborator

I think there can be 4x speedup, and probably more if we can optimize mesh simplification better.

The major issue with old vs new platform is that people like to have a single binary, e.g. appimage for openscad, and that means they need to use the single threaded version for several years.

@pca006132
Copy link
Collaborator

@elalish btw, by "So I think TBB using PSTL was probably a bootstrap to get some OpenMP support before TBB was finished or something." do you mean "So I think PSTL using TBB was probably a bootstrap to get some OpenMP support before PSTL was finished or something"? TBB does not depend on PSTL.
Also, I don't feel that PSTL wants to get rid of tbb later.

@elalish
Copy link
Owner Author

elalish commented May 12, 2024

Maybe I misunderstood what you were saying earlier. Either way it's confusing enough we should probably chat about it sometime face-to-face.

@pca006132
Copy link
Collaborator

For the record, our current goal is to get rid of thrust and use PSTL for parallelization. Users with GCC 12 or older will hit #787. They can either disable multicore (it is slower, but typically not that slow) or accept the leak. Considering other users, e.g. openscad, did not report such leak causing an issue, this should be acceptable.

And if needed, we can have some intermediate option, where we use tbb for_each directly but no PSTL algorithms. This will be slower than using every parallel APIs, but the user can still get some multicore performance improvement without having to live with memory leak.

@elalish elalish added this to the v3.0 milestone Jun 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants