-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Switch to autodiff? #65
Comments
Thank you for your insights! We use the num-dual package (https://github.com/itt-ustutt/num-dual) which uses forward mode automatic differentiation to calculate first and higher order derivatives similar to what autodiff does. While Rust fills a somewhat similar niche to C++ and uses a comparable syntax, there is no further connection. Rust is bootstrapped from OCaml and compiles to LLVM. External C or C++ libraries can be linked to, but in our case we would lose a lot of flexibility. Have you specifically compared the perfomance of forward and reverse mode AD? The Helmholtz energy as single output function appears to be the ideal case for reverse mode but I always wonder whether the overhead from maintaining an expression tree (as autodiff calls it) is worth it. |
i did some tests about that in julia (https://github.com/ypaul21/Clapeyron.jl), and at least is not worth it for nothing else than the chemical potential, and even then, Forward mode has less overhead in that regard. But, then again, i haven't checked what AD mode from autodiff does teqp use?
So, autodiff uses a code-rewriting Forward mode AD, whereas num-dual uses a function overloading type AD. also, it implies that for autodiff to work, it requires full access to a C++ codebase. For Rust, there is Enzyme, that performs code-rewriting at the LLVM level, and has Rust bindings. |
@prehner I guess I misunderstood. Didn't you previously use some complex math derivatives in FEOS? I recall citing something along those lines in our paper about teqp. More generally, I would be curious to see a speed comparison between FEOS and teqp. I think the derivatives in Clapeyron were a fair bit slower than in C++, which is no great surprise. At least that was the case when @pw0908 and I looked at it. In my experience, the forward mode (what I use), seems to be the "right" answer. |
Then there is also this: https://github.com/feos-org/feos/blob/main/examples/core_dual_numbers.ipynb |
@longemen3000 good to know that for you forward AD worked well, that implies that we're on a good path. A comment on the operator overloading in Rust: Rust uses compile-time polymorphism, i.e., the different variants (floats, first derivatives, higher derivatives, ...) are all compiled separately, similar to templates in C++. Therefore, the generated code can leverage all low-level compiler optimizations and becomes very efficient.
Currently we do multiple evaluations for multiple derivatives (e.g. chemical potentials) because it is more efficient than having dynamically sized types and heap allocations in every operation. In general, however, it is also possible to calculate gradients in a single pass using forward mode AD.
This example shows indeed how (generalized) dual numbers are used in feos. The notation and the algebra is similar to complex numbers but we never used (multi)complex numbers in FeOs. Note that in that example, the eos is defined in Python, which is useful for demonstration and for prototyping but certainly not optimal for performance. The derivatives, however, are calculated using the same dual number implementation as an eos which is implemented in Rust. A comparison between FeOs and teqp is certainly interesting. I would expect comparable performance but we should certainly confirm that. |
Hi Pierre, this is unfortunate and I am not sure where this confusion stems from. We thought that it's very clearly presented in our readme and the documentation that we use generalized hyper-dual numbers via Regarding the benchmarks, if I recall correctly, these are the same results you showed in you presentation on June this year? As you mentioned, a fair comparison/benchmark between packages is very difficult because there are differences due to the programming languages, algorithms (and their implementations) and starting values used. I did some comparisons between Clapeyron and FeOs (because I was intrigued by this table) and got these results:
Note though, that my experience with Julia is limited so I might have done something wrong or non-optimal. The results also depend on the thermodynamic conditions. I used I also played around with the parameters for water (I created a new substance called As stated before however, I am not sure how helpful these benchmarks actually are. All the work on equations of state and making them publicly available is exciting and I really appreciate the discussions. As far as comparisons or benchmarks are concerned, I am personally somewhat reluctant because they are difficult to do in a fair manner and it is very difficult to yield insights for as complex projects as ours. |
|
I agree with the assessment that we need to be careful with benchmarks. Nevertheless, I did another one comparing FeOs and teqp. To assess the performance of the AD instead of algorithms, I specifically looked at the calculation of properties and not phase equilibria. This is for the three component example that is used in the teqp introduction.
So the speed is comparable with teqp winning out in general. For these extremely fast calculations though the overhead from the Python calls can be meaningful. |
That's about what I would have expected, good check. Call overhead should be roughly 1 µs I think, at least its about that from my testing, going to depend on precisely what calculations you do. |
I did I think the same test in C++, with teqp, and I get:
|
I think I need to revisit how the array passing is done in the C++ interface, there are opportunities to avoid copies, and make the Python and C++ values closer. |
I added some tests to the Jupyter notebook of docs: https://teqp.readthedocs.io/en/latest/models/PCSAFT.html#PC-SAFT |
We conclude that the compiled languages produce comparably efficient code, with differences that arise mainly from usability aspects (caching, contributions, units, Python). As the original topic of this issue arose from a misunderstanding about the nature of dual numbers and automatic differentiation, I'm closing this issue. |
For my curiosity, how long do these calls take in rust? |
I can push benchmarks if you want to run them on your machine and compare. It is difficult to compare numbers across different machines. A call to a property in our code involves some steps (given a thermodynamic state, i.e. a
The Helmholtz energy itself contains loops over trait objects (dynamic dispatch) to evaluate the contributions. That's what the code does when a method is called on Taking Python as baseline (e.g.
Regarding the actual issue raised: I compared the ratio of time of the helmholtz energy evaluation ( If we attribute the difference to the implementation of dual numbers alone, autodiff is approx 5 to 10% faster (I expected worse to be honest, so I think our implementation is quite good here). To summarize:
Thanks for raising the issue. It's an interesting topic to look into. |
Thanks for digging into this. I spent a lot of time thinking about optimizing these calls since they are at the heart of everything. I agree that 10-15% worse in speed is pretty impressive and you should be proud of that. My experience was that even complex step derivatives were much slower than autodiff, and anyway are only useful for first derivatives. Our implementation of multicomplex doesn't scale well at all, so the only option remaining is autodiff. But I didn't have your hyper dual implementation to compare against, that would have been very interesting. There are some rather severe downsides to autodiff, especially the compile time is long, and the amount of memory needed to compile the library is quite crazy (12 GB!). So if we can find a method that is only 10% slower but reduces the compile time and memory use, I'm interested. I noticed your wheels are pretty huge (at least compared with teqp). Anything you can do to reduce that? Not super crucial though as computers have lots of memory nowadays. Just a pain if you need to package into larger programs as I have learned the hard way. |
The wheel-size can be reduced with a compiler flag, i.e. Just rerunning my benchmarks showed an performance improvement of about 50% across all functions that use dual numbers (Helmholtz energy was about 20% faster). The ratios of "derivatives vs residual Helmholtz energy evaluations" drastically decreased. For example I think we can let this issue rest now as the actual topic is resolved. I opened #77 to track implementation of proper benchmarks that should make it easier in the future to do these kinds of investigations. |
That's great news, LTO is key also in C++ to reduce the wheel sizes. Surprising (in a nice way) that the speed is also increased so much, but it you were moving across shared library boundaries, I could imagine that might prevent further optimizations. It would be best to build a docker container for benchmarking so we can run everything with the same setup. I can start that on my side. |
FYI, the current We have a benchmark ( The You can find an overview and information about how to run the benchmarks here. |
Cool, that's great. I'll have a go. Do you have a docker image for the benchmarks by any chance? If not, I can put one together as I'll be testing in a docker container. Maybe not the right place for this question, but do you have any interest in releasing a port of your dual library for C++? I would like to have a go with that and see if it can really compete with autodiff. If so, I'll move to it because autodiff makes compilation very slow due to all the super-nested template evaluations. |
We have very limited experience with Docker, so no, we don't have an image. I am not sure if a port of our library to C++ would be useful. In Rust, we are able to define functions that are generic over the dual number types because we defined and implemented a trait (shared behavior or interface in Rust). The compiler, as Philipp mentioned above, creates at compile time a type-specific version of each function so there is no cost (other than more complex arithmetic operations) in using dual numbers at run time. That's what makes it very efficient. I have no experience with templates in C++, but I don't think that there would be an advantage of our implementation over using dual numbers defined in C++ libraries such as |
Ok, I'll look into a docker image on my side. That sound very similar to how autodiff works in C++ with templates. And I think the computational penalty would therefore be roughly similar for an apples-to-apples comparison. I wonder if both the Rust and C++ approaches would compile to the same assembly for a simple derivative (of x+1 w.r.t. x for instance). cppduals is new to me, thanks very much for making me aware of it! |
In implementing teqp (https://github.com/usnistgov/teqp), I found that automatic differentiation was MUCH faster than complex step derivatives and multicomplex numbers. As you wrote the library in Rust, have you benchmarked your hyper dual derivatives with autodiff? I used this package in C++: https://github.com/autodiff/autodiff. I think you could also use this in Rust as Rust is C++-derived?
The text was updated successfully, but these errors were encountered: