Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add MolSSI Driver Interface Support #2611

Merged
merged 44 commits into from May 14, 2021
Merged

Add MolSSI Driver Interface Support #2611

merged 44 commits into from May 14, 2021

Conversation

taylor-a-barnes
Copy link
Contributor

@taylor-a-barnes taylor-a-barnes commented Feb 19, 2021

Summary

This PR adds support for the MolSSI Driver Interface (MDI) to LAMMPS. MDI is an effort of The MolSSI to improve and standardize the process of interoperating codes within the computational molecular sciences domain.

In particular, MDI supports interoperability within a driver-engine paradigm, in which a driver code controls the high-level operations of one or more engine codes, orchestrating complex calculations like QM/MM or advanced sampling. Some of the low-level functionality of MDI is analogous to (and partially inspired by) the CSlib and its implementation within LAMMPS. Building off of this and other efforts, MDI seeks to establish a standardized API for tight integration between chemistry codes.

This PR enables LAMMPS to act as an MDI engine, responding to commands from external drivers for a large number of potential use cases, some of which were explored in a recently published MDI paper. Additionally, it is planned that this PR will include support for LAMMPS to run ab initio MD simulations as an MDI driver.

This contribution is still in a state of development, and is not yet ready to be considered for inclusion in LAMMPS. Nonetheless, comments and observations about the implementation approach used here are very much welcome.

Related Issue(s)

None.

Author(s)

Taylor Barnes - taylor.a.barnes@gmail.com

Licensing

By submitting this pull request, I agree, that my contribution will be included in LAMMPS and redistributed under either the GNU General Public License version 2 (GPL v2) or the GNU Lesser General Public License version 2.1 (LGPL v2.1).

Backward Compatibility

This PR does not break backwards compatibility.

Implementation Notes

This PR currently does the following:

  • Adds a lib/mdi directory, which contains scripts that enable LAMMPS to download, compile, and link against the MDI Library.
  • Adds a USER-MDI package, which contains support for MDI-related features.
  • Adds an mdi_engine command, which enables LAMMPS to function as an MDI Engine. When LAMMPS encounters this command, it enters a server-like mode in which it listens for commands from an external driver. Execution of this command terminates only when LAMMPS receives an "EXIT" command from the driver.
  • Adds an mdi/engine fix, which enables LAMMPS to function as an engine at specific points within a larger simulation. For example, this fix allows LAMMPS to listen for commands from a driver during the post_integrate, pre-reverse, and post_force steps of an MD simulation.

The following features are additionally planned for inclusion in this PR:

  • Enable LAMMPS to communicate via the MDI Standard when LAMMPS is compiled as a library.
  • Add support for an mdi/aimd fix, which will enable LAMMPS to run ab inito MD simulations using MDI. More specifically, this fix will cause LAMMPS to function as an MDI driver, commanding external quantum chemistry codes to compute forces and energies as needed.
  • Add proper documentation and testing for the features introduced by this PR.

Properly testing the functionality of an interoperability interface is not entirely trivial. It is currently planned that this PR will include at least one test in which LAMMPS is run as both an ab initio MD driver and as the (pseudo-) quantum chemistry engine. Beyond this, the MDI project has an in-development code called MDI Mechanic which performs automated testing of MDI codes and generates reports regarding the degree to which they support the MDI API. The MDI Mechanic results for this PR can be seen in the README.md file here.

Post Submission Checklist

  • The feature or features in this pull request is complete
  • Licensing information is complete
  • Corresponding author information is complete
  • The source code follows the LAMMPS formatting guidelines
  • Suitable new documentation files and/or updates to the existing docs are included
  • The added/updated documentation is integrated and tested with the documentation build system
  • The feature has been verified to work with the conventional build system
  • The feature has been verified to work with the CMake based build system
  • Suitable tests have been added to the unittest tree.
  • A package specific README file has been included or updated
  • One or more example input decks are included

Further Information, Files, and Links

src/main.cpp Show resolved Hide resolved
src/lammps.cpp Outdated Show resolved Hide resolved
src/USER-MDI/mdi_init.h Outdated Show resolved Hide resolved
lib/mdi/Makefile.mpi Outdated Show resolved Hide resolved
lib/mdi/Makefile.icc Outdated Show resolved Hide resolved
@akohlmey
Copy link
Member

@taylor-a-barnes thanks for the submission. I did a quick read through the changes and added some comments for issues that immediately caught my eye. This will likely need more work. E.g. there doesn't seem to be any documentation and no update for the added command line flag (why is this needed at all? All I am seeing is that it passes the communicator that is also available as universe->uworld. Perhaps I am missing something.).

Don't hesitate to ask questions about the comments. If this feels to cumbersome and you need more direct interaction, you can ask for an invite to the LAMMPS slack channer.

Because you stated that this is still work in progress, I've added the corresponding label and also converted the request to a draft to signal this, so you can click on "Ready for review" when you feel it is ready for a more thorough review.

src/lammps.cpp Outdated Show resolved Hide resolved
@taylor-a-barnes
Copy link
Contributor Author

@akohlmey Thanks very much for your prompt replies. I'll try to clarify a few details of how the MDI interface works, which will hopefully make more sense of the motivation behind some of the modifications you have taken particular note of, like the edits to lammps.cpp and main.cpp.

MDI enables codes to communicate via several mechanisms, including TCP/IP sockets, MPI, or as linked libraries. The details of how this communication happens are set by the end-user at runtime. For example, suppose the user wants to launch a calculation involving a driver and a single engine (LAMMPS, in this case). To have the codes communicate via TCP/IP sockets, the end-user can just launch each code independently, with a -mdi command-line option allowing them to set any required information about the driver's hostname, port, etc.:

<driver_executable> <driver_options> -mdi "-role DRIVER -name driver -method TCP -port 8021" &
lmp -mdi "-role ENGINE -name lammps -method TCP -port 8021 -hostname localhost" -in lammps.in &

If the end-user instead wishes to have the codes communicate via MPI, both codes should be launched via a single mpiexec call:

mpiexec -n 4 <driver_executable>  -mdi "-role DRIVER -name driver -method MPI"  : \
        -n 8 lmp -mdi "-role ENGINE -name lammps -method MPI" -in lammps.in

The size of MPI_COMM_WORLD in the above calculation will be 12, but the number of ranks actually associated with running LAMMPS is only 8. In this situation, LAMMPS should only ever perform operations within an MPI sub-communicator that corresponds to the subset of ranks that are running the LAMMPS executable. To make it as simple as possible for codes to implement MDI support, the MDI Library handles the problem of splitting the correct sub-communicator automatically. When a call to MDI_Init(mdi_options, &communicator) returns, communicator is this sub-communicator.

Generally speaking, a code that is running an MDI calculation should treat communicator as a replacement for MPI_COMM_WORLD. All of the edits in main.cpp and lammps.cpp are intended for this purpose. My assumption in writing the lines MPI_Comm world_comm = lammps->world in main.cpp was that they would result in world_comm being the same communicator that was split by MDI_Init, but as you point out, that assumption is incorrect. I think I need to take a page from the -mpicolor implementation and define an mdicomm class variable for tracking this sub-communicator.

@akohlmey
Copy link
Member

akohlmey commented Feb 20, 2021

@akohlmey Thanks very much for your prompt replies. I'll try to clarify a few details of how the MDI interface works, which will hopefully make more sense of the motivation behind some of the modifications you have taken particular note of, like the edits to lammps.cpp and main.cpp.

Thanks for the explanation, but what you are doing feels conceptually wrong to me. Please keep in mind that there are many ways to instantiate LAMMPS and what you are describing and the modifications are doing make assumptions about a specific mode of operation. I am not overly happy with what -mpicolor does but at least it does an explicit MPI_Comm_split() and updates the universe communicator. Thus it implicitly assumes that what the other side of the client/server code communicating via MPI does is the same and compatible.

So if your library initialization code will do the MPI_Comm_split() and provide a new communicator that LAMMPS should be using as a world communicator, that call would have to be even earlier, i.e. in main.cpp and not in lammps.cpp.
The function call you have should then be similar to MPI_Init(), i.e remove all MDI specific command line arguments and initialize the MDI library (and MPI) as needed based on those. So something like this (untested):

diff --git a/src/main.cpp b/src/main.cpp
index 582de6999c..21fe882aab 100644
--- a/src/main.cpp
+++ b/src/main.cpp
@@ -33,8 +33,25 @@ using namespace LAMMPS_NS;
 
 int main(int argc, char **argv)
 {
+  MPI_Comm world_comm = MPI_COMM_WORLD;
   MPI_Init(&argc,&argv);
 
+#ifdef LMP_USER_MDI
+  int found_mdi_flag = -1;
+  for (int i = 0; i < argc; ++i) {
+    if ((strcmp(argv[i],"-mdi") == 0) && (i+1 < argc)) {
+      found_mdi_flag = i;
+    }
+  }
+  if (found_mdi_flag > 0) {
+    MDI_init(argv[found_mdi_flag+1],&world_comm);
+    argc -= 2;
+    for (int i = found_mdi_flag; i < argc+2; ++i) {
+      argv[i] = argv[i+2];
+    }
+  }
+#endif
+
 // enable trapping selected floating point exceptions.
 // this uses GNU extensions and is only tested on Linux
 // therefore we make it depend on -D_GNU_SOURCE, too.
@@ -49,21 +66,21 @@ int main(int argc, char **argv)
 
 #ifdef LAMMPS_EXCEPTIONS
   try {
-    LAMMPS *lammps = new LAMMPS(argc,argv,MPI_COMM_WORLD);
+    LAMMPS *lammps = new LAMMPS(argc,argv,world_comm);
     lammps->input->file();
     delete lammps;
   } catch(LAMMPSAbortException &ae) {
     MPI_Abort(ae.universe, 1);
   } catch(LAMMPSException &e) {
-    MPI_Barrier(MPI_COMM_WORLD);
+    MPI_Barrier(world_comm);
     MPI_Finalize();
     exit(1);
   }
 #else
-  LAMMPS *lammps = new LAMMPS(argc,argv,MPI_COMM_WORLD);
+  LAMMPS *lammps = new LAMMPS(argc,argv,world_comm);
   lammps->input->file();
   delete lammps;
 #endif
-  MPI_Barrier(MPI_COMM_WORLD);
+  MPI_Barrier(world_comm);
   MPI_Finalize();
 }

But that is rather clumsy and essentially suggests an opportunity to improve the API of your library. Since the MDI_Init() function of your library has a rather similar function to MPI_Init() you should set it up like that, e.g. it would be cleaner if the calls would be something like this and hide the messy finding and removing of the MDI specific arguments in the library (which should be needed in other codes as well):

#ifdef LMP_USER_MDI
  MDI_Init(&argc, &argv);
  MDI_Get_world_comm(&world_comm);
#endif

That would be far more obvious in what it does and be consistent with other libraries that add library specific command line flags. For example we have the same thing with googletest which we use for unit tests. One could even go so far to say that this could include the call to MPI_init(), but then it would probably be better to name this call MDI_MPI_Init(). Passing &world_comm as argument to the MDI_Init() call is problematic, since it looks to the casual observer as if you are passing MPI_COMM_WORLD to the library and not that the library call will return an updated communicator. Having a "getter function" makes this far more obvious.

@akohlmey
Copy link
Member

akohlmey commented Feb 20, 2021

@taylor-a-barnes here is another thought about how to more cleanly provide an interface to a facility like yours that is in a package (i.e. optional) and rather invasive (i.e. needs to be present in many places). In general, we have a while ago started to PIMPL-ify interfaces to external libraries instead of having direct access to the data types and functions provided by it. That means everything is wrapped in a struct or class or namespace (hence PIMPL) that hold references to the internal data and is a dummy version without the package. Or there are dummy versions of the classes otherwise required and the package replaces or derives from them. See for example accelerator_kokkos.h or accelerator_omp.h or lmppython.h.
My personal favorite would be a file mdi_interface.h that would define an MDI namespace with the necessary wrapper functions (similar to what I previously suggested) that then either have dummy responses or would be calling the actual functions from the library, which could be then imported from a file mdi_interface_impl.h. mdi_interface.h would be in the src folder and thus could be included everywhere and the (dummy) functions could be called where needed without worrying about ifdefs and without making code less readable or writing code in multiple variants.

@taylor-a-barnes
Copy link
Contributor Author

@akohlmey Thanks for these insights. I agree with everything you suggest regarding updates to the MDI API. Actually, we've added a function analogous to the MDI_Get_world_comm that you describe here, and plan to transition to a more MPI-like initialization call. I'll go ahead and make this change to the MDI Library before finishing this PR.

The PIMPL approach does sound like the correct solution, and I'll take a look at how this has previously been implemented in LAMMPS.

Thanks again for the assistance. It'll probably be next week before I can do serious work on this, but I'll incorporate your other suggested revisions then.

@akohlmey
Copy link
Member

The PIMPL approach does sound like the correct solution, and I'll take a look at how this has previously been implemented in LAMMPS.

When looking at the existing code, you have to be aware of LAMMPS' history and that there is plenty of legacy code in the package that would not be included in its current form today. LAMMPS was originally written and Fortran and then "translated" to a "C with classes" style of C++ to simplify in the inclusion of contributions. That step was done in such a way as to keep the code structure and data structures similar to what it was in Fortran (sometimes classes are used more like common blocks or Fortran 90 modules). In recent years we have gradually tried to consolidate the code and make it more maintainable. The sheer number and growing complexity of contributions made it necessary to be more strict with what style of code to add. This got accelerated over the last year with adopting C++11 (plus the fmt library from C++20) and some refactoring of internal facilities. By replacing C style code with C++ constructs, often the code is more readable and obvious and shorter. So while we try to make each new addition as good and "clean" as can be managed, we also are aware that parts of the existing code need to be updated to follow the same standards. This is a learning process and is being adapted as we learn more and run into problems. It also depends on how deep a new feature needs to dig into internal data structures, and that is where you have a bit of a disadvantage, since it cannot get any deeper than modifying the "world" communicator.

@taylor-a-barnes
Copy link
Contributor Author

It also depends on how deep a new feature needs to dig into internal data structures, and that is where you have a bit of a disadvantage, since it cannot get any deeper than modifying the "world" communicator.

No problem; this is entirely understandable. I'm happy to make whatever modifications are necessary to conform to current LAMMPS standards. Part of the reason I'm making this PR at a somewhat earlier stage than I otherwise might is because I'm aware that touching world and main.cpp needs to be handled delicately, to say the least.

@akohlmey
Copy link
Member

Part of the reason I'm making this PR at a somewhat earlier stage than I otherwise might is because I'm aware that touching world and main.cpp needs to be handled delicately, to say the least.

Exactly, this is why we encourage people to submit draft pull requests early before they make choices that are difficult to revert and difficult to integrate for us. As time progresses, contributions to LAMMPS have become more involved and complex, so reviewing those and making them fit in is becoming more effort. Seeing code early and having an opportunity to provide guidance and point out pitfalls is a big help for both sides.

@sjplimp
Copy link
Contributor

sjplimp commented Feb 22, 2021

@taylor-a-barnes @akohlmey So I want to add a few points to this discussion.

(1) As Taylor noted, at a low-level the MDI lib is functionally similar to the CSlib that has already been added to LAMMPS. Thus I think its implementation in LAMMPS can be done similarly. Which overlaps with some of the comments above. I think once MDI is fully integrated into LAMMPS, CSlib will likely be obsolete. On this page: https://lammps.sandia.gov/doc/Howto_client_server.html
message args <--> -mdi args, server <--> mdi_engine, fix client/md <--> a driver app.
There should be a similar Howto_mdi.html page.

(2) -mdi command line. For MDI I think it makes more sense for this to be command line args, rather than an inside-script "message" command (CSlib), since other MDI-supporting codes will use -mdi. Thus it will be simpler for users to use MDI the same for LAMMPS. Esp if they are writing an mpirun command with multiple MDI exes in it.

(3) The -mpicolor command line arg was added to LAMMPS for CSlib for the same purpose MDI might want to use it (multiple mpirun exes). It could become part of the -mdi arg syntax or kept independent. Note that CSlib defines 3 variables in lammps.h: clientserver flag (driver/engine in MDI), cslib (instantiation of MDI lib), cscomm (the dual-app communicator). So MDI could define 3 similar variables in lammps.h if that's useful. There's a useful comment about mpicolor and all this near the top of lammps.cpp

(4) I can't actually think of a use case for using multi-partition mode with MDI (or CSlib). So it would be fine to check if both -mdi and -partition were defined and throw an error. If we allow partition mode, then all MDI calls in the code (server or driver) will need to check if partition mode is enabled and do something slightly different.

(5) Ideally I'd like to see a single "mdi" command added to the LAMMPS input script syntax.
E.g. "mdi engine" where engine is an arg, rather than "mdi_engine". Does there also need to be a command to shut down MDI driver mode (a LAMMPS script could go on to do other non-MDI stuff). This is what "message quit" does for CSlib. If so, then it could be "mdi quit".

@akohlmey
Copy link
Member

(4) I can't actually think of a use case for using multi-partition mode with MDI (or CSlib). So it would be fine to check if both -mdi and -partition were defined and throw an error. If we allow partition mode, then all MDI calls in the code (server or driver) will need to check if partition mode is enabled and do something slightly different.

How about using run style verlet/split? Things like parallel tempering or NEB with QM/MM would be conceivable, but may be more difficult to realize unless you don't use MPI for inter-code communication but sockets or shared memory or files.
Specifically NEB with QM/MM seems like something people are interested in. I recall seeing inquiries from different people about that.

@sjplimp
Copy link
Contributor

sjplimp commented Feb 22, 2021

How about using run style verlet/split? Things like parallel tempering or NEB with QM/MM would be conceivable, but may be more difficult to realize unless you don't use MPI for inter-code communication but sockets or shared memory or files.
Specifically NEB with QM/MM seems like something people are interested in. I recall seeing inquiries from different people about that.

Verlet/split might be a use case for this, hadn't thought about that one. I think the MDI-way of doing tempering or NEB is to have the driver (not LAMMPS) implement the tempering or NEB algorithm and instantiate multiple engines (MD or QM), one per replica.

I believe the CSlib code in lammps.cpp already allows for -partition and uses the universe comm rather than world comm to do the coloring. So it would be OK to leave that in and allow -partition. My comment was more that the engine and driver code will not support it unless we make extra effort. For now I'd say just throw errors for -partition mode in the server and/or driver apps if we don't think it makes sense. E.g. a request for "COORDS" made to the server is problematic. Does that mean return multiple sets of coords? And all the info has to be accumulated to a single universe proc.

@akohlmey
Copy link
Member

@akohlmey @sjplimp Thanks for the assistance with this PR. I believe it is now ready to be considered for acceptance.

@taylor-a-barnes I have just started to look over the changes and have created a draft pull request with proposed changes.
However, I have already reached an issue point where we need some discussion. Do you want to do those here, or would a tool for more immediate communication be better? For example, I could send you an invite to the LAMMPS slack.

I haven't added any unit tests, as this can be a little tricky with MDI calculations that intrinsically require the participation of multiple codes. Probably the simplest test of USER-MDI that can be performed would be to run lammps/examples/USER/mdi/Script.sh and confirm that it runs to completion. Is it possible to do that within the LAMMPS unit test framework?

It is possible see for example the tests here which check whether you can run LAMMPS by itself and whether it produces certain output in the process.
https://github.com/lammps/lammps/blob/master/unittest/CMakeLists.txt#L6

We would have to make sure that the MDI python module is available when downloading and compiling the MDI library as part of the LAMMPS compilation, though.

@taylor-a-barnes
Copy link
Contributor Author

@akohlmey Thanks for these updates. I would be happy to discuss over slack.

@akohlmey
Copy link
Member

@akohlmey Thanks for these updates. I would be happy to discuss over slack.

Invite has been sent. Next bunch of updates (mostly for the manual) are pushed to the PR.

MDI formatting updates and MPI STUBS handling
@akohlmey akohlmey merged commit 6740959 into lammps:master May 14, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants