-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
What would be the main design trade-offs when re-implementing in clean modern C++? #354
Comments
this is a very ill-defined question, because people's ideas of what constitute "clean" and "modern". Personally, I wouldn't consider using an abstraction that leads to noticeable performance impact clean. In that case, you're using the wrong tool for the job. That still leaves a lot things you can do we C++ that are "cleaner" than the C -counterpart. Btw, there isn't really any relation between whether code is C or C++ and how close it is to "bare metal", e.g, cutlass implements fast matrix multiplication on GPUs with a lot of template magic that often ends in inline assembly. |
Hi @ngc92, thanks for your comment.
|
Hi Andrej, this implementation is fantastic!
In your view, what would be the main design trade-offs if one were to re-implement the C code that is intended to run on the CPU in modern C++? By modern C++, I generally mean code that favors
std
containers over raw C-style arrays, favorsstd
ranges and similar tools over raw loops, etc. Obviously, doing so would deviate somewhat from the minimal as-close-to-the-metal-as-possible design philosophy of this repository. And obviously, some kinds of code stand to benefit from this kind of transformation more than others, so maybe there isn't much benefit in this case. On the other hand, modern C++ may result in fewer lines of code without sacrificing readability (or performance), which also appears to be one of the design goals of this repo.With all of that being said, how much speed do you think one would gain/lose from a clean modern C++ implementation, and how much simplicity/safety/flexibility do you think you'd gain/lose? What about re-implementing the raw CUDA kernels as
thrust
kernels?Cheers,
Mike
The text was updated successfully, but these errors were encountered: