Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compiler compatibility/std::vector/performance testing. #32

Open
objarni opened this issue May 30, 2022 · 16 comments
Open

Compiler compatibility/std::vector/performance testing. #32

objarni opened this issue May 30, 2022 · 16 comments
Labels
enhancement New feature or request. medium priority This should be made working as soon as time permits.

Comments

@objarni
Copy link
Collaborator

objarni commented May 30, 2022

I'm having trouble running make on my Mac machine, and after squashing a lot of small issues, I started reading on the va_args behaviour/specification in the standard, and while googling found this interesting article:

https://www.linkedin.com/pulse/modern-c-variadic-functions-how-shoot-yourself-foot-avoid-zinin/

(The actual build error was squashed by adding -Wno-non-pod-varargs to CFLAGS in makefile, but it feels wrong to sweep a semantic issue/concern like this under the carpet)

Is this function essential to the code base? I notice there are at least 1 more version of it?

BTW - My "long-term" intention with getting up regression/automatic tests is to enable refactoring to std::vector instead of so much pointers/arrays, i.e. using modern C++ constructs instead (it was one of the first things I read about podock, and felt I could contribute to). It would mean passing a variable number of objects/values to a function would be much simplified - just pass a (reference) to a vector object - so while I like va_args it seems a bit over-the-top.

@electronicsbyjulie
Copy link
Contributor

electronicsbyjulie commented May 30, 2022

I'll remove va_args from the code. Fortunately, it is only being used in one section of code (lines 216 and 232 of main() in metal.cpp) so it will be easy to live without. I intend to keep podock's dependencies as nonspecific as possible so that it can be compiled on the widest range of systems and platforms, and it sounds like va_args might not be good for that.

@electronicsbyjulie
Copy link
Contributor

Since automated tests do not have to run on my web host, it is certainly okay to set them up, with CMake and everything, in some way disconnected from the makefile. Not sure if you'd want to do that before converting the code to std::vector, but...

I looked at std::vector a while back and it looks promising, I just couldn't figure it out. If it will solve the problems that come from using arrays of pointers, it'd certainly be great to have valgrind not complain about mismatched deletes anymore. :D It would be best to do the std::vector in a separate branch so that the main branch stays a stable version, then once podock itself and the various tests check out okay in the std::vector branch, it can be merged.

By the way, podock itself intentionally has randomness in it, so that could interfere with using text diff tests on it. Mathematically the randomness allows better access to a very large set of possible molecular conformations that would take a very long time to compute thoroughly, so by randomizing the attempted conformations, each run is less likely to fail to find good configurations, and if a run doesn't produce any good output, it can be repeated for another try. I think the other dockers do this as well.

It's easy to see visually whether podock is working, but programmatically would be more of a challenge. There would ideally be a module that makes sure the ligand is mostly within the binding pocket, that its parts are near features of the protein that they would bind to, and that the breakdown of energies correlates to these binding features and the numbers add up to the total. It would be a pass/fail based on each metric falling within a defined threshold.

Good catch on the va_args!

electronicsbyjulie pushed a commit that referenced this issue May 30, 2022
@objarni
Copy link
Collaborator Author

objarni commented Jun 1, 2022

Thanks for thourough answers!

I gave up on compiling on MacBook Air M1 - was too many errors for now. I did spot some odd things though - I believe it's clang that is more picky than g++ (which I'm guessing you're using?) - and I might fix them even if it compiles with g++. E.g. initializing arrays with ={}, a construct which was new to me. I don't think it's legal/standard C++ but rather some g++ wizardry.

So I'm back at my Ubuntu machine, which is g++.

W.r.t vector, yes it's possible to pass around pointers, but even better is using shared_ptr which makes for a simple 'reference counting' system, with high performance. No more new/delete/alloc/free calls in the code base, so valgrind should be quiet after that!

While I agree we need a stable version (preferrably at all times), I'm not so keen on doing a separate branch, since it would have to be merged/rebased continously as it's a large refactoring.

I'd much rather make small, iterative changes to the code, switching one array for vector at a time.

But before those small-refactoring-steps can be made, I want to have:

  1. automatic tests that I trust (the diff regressions / reports we have now might not be enough; in fact what you've written above about the metrics sounds much more robust! Visual testing is harder, but doable - e.g rendering to ASCII art)
  2. code coverage close to 100%, to increase trust

The code coverage part is dead simple with CMake based project in CLion (my IDE of choice for C/C++ code bases), so that would be one way to do it. But it is still possible with simple make and gcov. So I see it as a bit of a challenge/fun to get coverage without adding CMake. :)

It is good to know, however, that adding a CMakeLists.txt to the repo wouldn't interrupt your web host / deployment environment! In case I give up ;)

@objarni
Copy link
Collaborator Author

objarni commented Jun 1, 2022

Oh, the ={} syntax is actually valid C++ since C23:

int a[3] = {}; // valid C++ way to zero-out a block-scope array; valid in C since C23

Found here: https://en.cppreference.com/w/c/language/array_initialization

Pardon my ignorance!

@electronicsbyjulie
Copy link
Contributor

C23... interesting. I'm not sure what to do. In the interest of compatibility it probably should not depend on the C23 standard since the makefile specifies C++11.

It's definitely not good that the app won't compile on a Mac. :(

Would it work to use vectors of shared_ptrs? Since the code currently uses types like Atom** and AminoAcid***.

@objarni
Copy link
Collaborator Author

objarni commented Jun 1, 2022

std::vector<std::shared_ptr<Point>> is a valid type in C++, yes! And for some small structs/classes, like Point, it makes sense to simplify to std::vector<Point>, since the copying is cheaper than allocation/deallocation of Points (which still happens under the surface, it's just that new/delete/malloc/free isn't visible in the code base anymore). BUT: I would wait with making assumptions until I can measure the performance of a big simulation. Then, with the data from real perf.measures, try both and see.

@electronicsbyjulie
Copy link
Contributor

electronicsbyjulie commented Jun 1, 2022

A docking run would be a good simulation to use for measuring performance. Here's an archive file containing a PDB, an SDF, and a podock config file to run a docker test:

dock_sample.tar.gz

After extracting the contents to the podock folder, you can use the command ./podock test_TAAR8.config to run a calculation that should take somewhere between 5-30 minutes and should report the total run time at the end. (If it takes too long, you can decrease the "POSE" value in the .config file.) It also creates an output TAAR8_test.txt file that can be opened in the 3D viewer.

If we change the code to use std::vector, would it make sense to rename the Vector class? It's really a spherical coordinate construct, so maybe it could be something like SphericalVector or abbreviated to SphVec?

@objarni
Copy link
Collaborator Author

objarni commented Jun 1, 2022

On the C23 vs C++11 regard, I think at least C++14 could be "assumed" nowadays. It is 2022 after all - 11 years since C++11 was released. But I'm fine with focusing on C++11 - it's a large leap compared to C++03!

@objarni
Copy link
Collaborator Author

objarni commented Jun 1, 2022

I think it makes sense to rename Vector to something else, yes, and would suggest something similar to what wikipedia and your note above says - SphericalCoordinate or perhaps SCoord? I'm not a big fan of abbreviations in general, but for really basic/common types I'm OK with it...

@objarni
Copy link
Collaborator Author

objarni commented Jun 1, 2022

I've managed to run the simulation, however don't know how to use the visualization web page. It would be fun to see result of simulation!

It takes about 69 seconds on my machine, with pose = 1. This is great material for perf.measure, not too long, but not too short either!

@objarni
Copy link
Collaborator Author

objarni commented Jun 1, 2022

I added a testdata/ folder with the content of the archive, and added a performance test target to run it. Since the simulation outputs number of seconds, I think this is well enough to get started optimizing!

However, still stuck with the code-coverage issue; will hack on gcov solution for awhile, and if it is too much effort will switch to CMakeLists.txt based approach.

@electronicsbyjulie
Copy link
Contributor

C++14 sounds good, I suppose it would be highly unusual to run anything as CPU intensive as a molecular docker on a system older than 5-10 years.

SCoord works for me. Even though it erases the distinction that it refers to relative coordinates while a Point refers to absolute coordinates, however sometimes the code uses a Point as a relative coordinate so I guess the distinction is not so important. The LocatedVector class can also be renamed, maybe it could be either LSCoord or similar.

The viewer.htm page can be opened in any browser directly as a local file (e.g. file:///home/path/to/podock/viewer.htm) or it is also available on the website at http://primaryodors.org/viewer.htm. There is a file control at top left, the kind used for an uploader although nothing is actually uploaded anywhere. It lets you select a local file which the page will then render in 3D.

Great to hear about optimization! I know podock is not as fast as AutoDock or some of the more streamlined dockers, e.g. LeDock. Any speed improvement would be super awesome and much appreciated!

@electronicsbyjulie electronicsbyjulie changed the title Concerns w.r.t va_args version of average_of_points Compiler compatibility/std::vector/performance testing. Jun 1, 2022
@electronicsbyjulie electronicsbyjulie added enhancement New feature or request. medium priority This should be made working as soon as time permits. labels Jun 1, 2022
@electronicsbyjulie
Copy link
Contributor

Hope it's okay I renamed the issue.

@objarni
Copy link
Collaborator Author

objarni commented Jun 1, 2022

Cool! 👯‍♂️

image

@objarni
Copy link
Collaborator Author

objarni commented Jun 1, 2022

As per above discussion, switched to C++14. Also, ignoring gcov output files.

I hope it's OK to 'push commits' like this? I have a habit of doing rebase very often, so I don't need the extra hassle of pull-requests...

@electronicsbyjulie
Copy link
Contributor

Sure! Long as nothing breaks. I also frequently push commits directly and, to be honest, sometimes I do break something, so it's not much of a worry. 😄

I just pulled the latest and rebuilt it from clean. It looks really good. There used to be a lot of compiler warnings and now there's just a few, it's great! Thank you for solving those!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request. medium priority This should be made working as soon as time permits.
Projects
None yet
Development

No branches or pull requests

2 participants