- Description
AutomaDeD (Automata-based Debugging for Dissimilar parallel tasks) is a tool for automatic diagnosis of performance and correctness problems in MPI applications. It creates control-flow models of each MPI process and, when a failure occurs, these models are leveraged to find the origin of problems automatically. MPI calls are intercepted (using wrappers) to create the models. When an MPI application hangs, AutomaDeD creates a progress-dependence graph that helps finding the process (or group of processes) that caused the hang. Please refer to [1, 2] for more details.
This version of AutomaDeD implements the diagnosis algorithm of the Prodometer technique, which performs loop-aware progress-dependence analysis. For more information, please refer to Prodometer.
- Building
For Unix-based machines (with Cmake), simply execute:
$ cmake -DCMAKE_INSTALL_PREFIX=<install_path>
To use callpath library(to normalize the library loading order):
$ cmake -DCMAKE_INSTALL_PREFIX=<install_path> -DSTATE_TRACKER_WITH_CALLPATH=ON
This will require two additional libraries callpath and adept_utils. You can get those from the following link:
https://github.com/scalability-llnl.
Then:
$ make
$ make install
It requires a C++ MPI compiler wrapper (like mpic++) The configure script should detect automatically your MPI compiler installation. If you want to specify a particular compiler, it can be done standard CMake techniques.
Boost should be installed in your system. CMake will try to detect boost in your system. To set the path for Boost for CMake to find, please use: -D BOOST_ROOT=.
- Running
You have to link your MPI application against AutomaDeD's library. This could be done using either the static or the shared library. Once this is done, you can run your buggy application. You can use LD_PRELOAD=/lib/libstracker.so srun -n 16 -ppdebug ./test to run test application.
Take a look at the './example' directory to see some use cases.
To run with callpath library, please set env variable: AUT_USE_CALL_PATH=TRUE
You can stop dumping the tool output file using: export AUT_DO_NOT_DUMP=TRUE
If you choose to attach other debuggers on the LP process identified by the tool, you can use: export AUT_DO_NOT_EXIT=TRUE, to make sure the tool does not exit
- About BG/Q systems
For BG/Q system, you need to specify Toolchain file for CMake: -D CMAKE_TOOLCHAIN_FILE=cmakemodules/Toolchain/BlueGeneQ-gnu.cmake
- Using the GUI
AutomaDeD comes with a GUI which can read the AUT* file generated by the tool. The GUI, has a documentation file which explains how to use the GUI.
- Known issues
If callpath is used, currently it does not give the full file name and line number information in the output file. So GUI can not be used. This support will be added soon.
- References
[1] Subrata Mitra, Ignacio Laguna, Dong H. Ahn, Saurabh Bagchi, Martin Schulz, Todd Gamblin, Accurate application progress analysis for large-scale parallel debugging, ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), 2014.
[2] Ignacio Laguna, Dong H. Ahn, Bronis R. de Supinski, Saurabh Bagchi, Todd Gamblin, Probabilistic Diagnosis of Performance Faults in Large-Scale Parallel Applications, International Conference on Parallel Architectures and Compilation Techniques (PACT), 2012.
[3] Ignacio Laguna, Todd Gamblin, Bronis R. de Supinski, Saurabh Bagchi, Greg Bronevetsky, Dong H. Ahn, Martin Schulz, Barry Rountree, "Large Scale Debugging of Parallel Tasks with AutomaDeD, ACM/IEEE Conference for High Performance Computing, Networking, Storage and Analysis (SC), Seattle, WA, Nov 2011.
[4] Greg Bronevetsky, Ignacio Laguna, Saurabh Bagchi, Bronis R. de Supinski, Dong H. Ahn, Martin Schulz, AutomaDeD: Automata-Based Debugging for Dissimilar Parallel Tasks, IEEE/IFIP International Conference on Dependable Systems and Networks (DSN), Chicago Illinois, Jun-Jul, 2010.
[5] Science & Technology Review, Supercomputing Tools Speed Simulations, July, 2014.
The main code infrastructure of AutomaDeD was written by: Ignacio Laguna (ilaguna@llnl.gov), LLNL
The code that implements the Prodometer algorithm was written by: Subrata Mitra (mitra4@purdue.edu), Purdue University
Project contributors:
Dong H. Ahn (LLNL)
Saurabh Bagchi (Purdue University)
Bronis R. de Supinski (LLNL)
Todd Gamblin (LLNL)
Martin Schulz (LLNL)
Greg Bronevetsky