Welcome! If you are looking for a lightweight tool to analyze the critical path costs of your distributed-memory MPI program, you have come to the right place. critter
seeks to understand the critical paths of your MPI program, and decomposes critical paths defined by the following metrics:
- execution time
- communication time
- computation cost
- synchronization cost (in the alpha-beta or Bulk-synchronous-parallel model)
- communication cost (in the alpha-beta or Bulk-synchronous-parallel model)
For example, the communication-time critical path is the schedule path that incurs the maximum communication time. This path will not necessarily incur the maximum execution time.
critter
also provides both maximum-per-process and volumetric times and costs of the measures above.
critter
decomposes parallel schedule paths into contributions from MPI routines and user-defined kernels. User-defined kernels are encapsulated within preprocessor directives CRITTER_START(kernel_name) and CRITTER_STOP(kernel_name), which must be added manually inside source code.
See the lists below for an accurate depiction of our current support.
Modify compiler and flags in config/config.mk
(MPI installation and C++11 are required). Run make
in the main directory to generate the library files ./lib/libcritter.a
. Include critter.h
in all files that use MPI in your application (i.e. replace include mpi.h
), and link to ./lib/libcritter.a
. Shared library ./lib/libcritter.so
is currently not generated.
critter
provides the following C routines to the user:
void critter_register_timer(const char* timer_name)
: register a user-defined timer for a kernel with name timer_name to fix ordering of intercepted kernels (each of which must be associated with a distinct name). If processes start and stop timers in more than one ordering, all intercepted and profiled kernels must register their timers using this routinevoid critter_start_timer(const char* timer_name, bool propagate_within = true, MPI_Comm cm = MPI_COMM_NULL)
: start timer for a kernel with name timer_name. Optionally, specify whether critical path information should be propagated within this kernel and/or whether to synchronize and propagate critical path information at the start of the kernelvoid critter_stop_timer(const char* timer_name, MPI_Comm cm = MPI_COMM_NULL)
: stop timer for a kernel with name timer_name. Optionally, specify whether to synchronize and propagate critical path information at the end of the kernelint critter_get_critical_path_costs()
: get the size of the critical path information (so that critter_get_critical_path_costs(...) can be invoked properly)void critter_get_critical_path_costs(float* costs)
: set critical path information to passed buffer costsvoid critter_start(MPI_Comm cm=MPI_COMM_WORLD)
: initiates the window within which all MPI routines and computation kernels are intercepted and profiledvoid critter_stop(MPI_Comm cm=MPI_COMM_WORLD)
: closes the window within which all MPI routines and computation kernels are intercepted and profiledvoid critter_record(int variantID=-1)
: print critter's analysis
Note that one can set the environment variable CRITTER_AUTO_PROFILE=1
to enable critter to start profiling immediately following invocation MPI_Init
or MPI_Init_thread
and ending with invocation of MPI_Finalize
(and thus avoid explicitly calling critter_start(...)
and critter_stop(...)
).
See the other environment variables below for all customization options.
Env variable | description | default value |
---|---|---|
CRITTER_MODE | Switch to enable critter ; set to 0 to instead use a primitive timer with no user code interception |
1 |
CRITTER_AUTO_PROFILE | Switch to activate critter inside MPI initialization; prevents need for manually inserting critter::start() and critter::stop() inside user code; set to 1 to activate |
0 |
CRITTER_EAGER_LIMIT | Specify maximum message size (in bytes) that can utilize eager protocol | 32768 |
CRITTER_COST_MODEL | Specify cost model: Bulk-Synchronous-Parallel (0) or Alpha-Beta (1) | 0 |
CRITTER_PATH_PROFILE | Specify how critical paths are decomposed: by MPI routine (1), user-defined kernels (2), or avoid altogether (0) | 0 |
CRITTER_PATH_SELECT | Specify which critical paths are decomposed (via a 5-digit string according to the following order: Synchronization cost, Communication cost, Computation cost, Communication time, Execution time); as an example, specify 00001 to decompose the execution-time critical path | 00000 |
CRITTER_PATH_MEASURE_SELECT | Specify which metrics to profile along each critical path (via a 5-digit string according to the following order: Synchronization cost, Communication cost, Computation cost, Communication time, Execution time); as an example, specify 00010 to measure the communication time attributed to each MPI routine, or specify 00001 to measure the execution time attributed to each user-defined kernel | 00000 |
CRITTER_PROFILE_EXCLUSIVE_TIME_ONLY | Specify whether to profile each kernel's exclusive time (1) and additionally inclusive time (0) | 0 |
CRITTER_PROFILE_MAX_NUM_KERNELS | Specify maximum number of user-defined kernels to intercept, profile, and propagate during program runtime | 20 |
CRITTER_PROFILE_P2P | Specify whether to profile point-to-point communications invoked during program runtime | 1 |
CRITTER_PROFILE_COLLECTIVE | Specify whether to profile collective communications invoked during program runtime | 1 |
CRITTER_PROPAGATE_P2P | Specify whether to propagate critical-path profiles during interception of point-to-point communications invoked during program runtime | 1 |
CRITTER_PROPAGATE_COLLECTIVE | Specify whether to propagate critical-path profiles during interception of collective communications invoked during program runtime | 1 |
CRITTER_EXECUTE_KERNELS | Specify whether to execute intercepted communication routines invoked during program runtime (subject to a maximum message size CRITTER_EXECUTE_KERNELS_MAX_MESSAGE_SIZE) | 1 |
CRITTER_EXECUTE_KERNELS_MAX_MESSAGE_SIZE | Specify the maximum message size of an intercepted communication routine that should not be avoided if CRITTER_EXECUTE_KERNELS=1 | 1 |
MPI routine | profiled |
---|---|
MPI_Barrier | yes |
MPI_Bcast | yes |
MPI_Reduce | yes |
MPI_Allreduce | yes |
MPI_Gather | yes |
MPI_Gatherv | yes |
MPI_Allgather | yes |
MPI_Allgatherv | yes |
MPI_Scatter | yes |
MPI_Scatterv | yes |
MPI_Reduce_Scatter | yes |
MPI_Alltoall | yes |
MPI_Alltoallv | yes |
MPI_Ibcast | yes |
MPI_Ireduce | yes |
MPI_Iallreduce | yes |
MPI_Igather | yes |
MPI_Igatherv | yes |
MPI_Iallgather | yes |
MPI_Iallgatherv | yes |
MPI_Iscatter | yes |
MPI_Iscatterv | yes |
MPI_Ireduce_Scatter | yes |
MPI_Ialltoall | yes |
MPI_Ialltoallv | yes |
MPI_Send | yes |
MPI_Ssend | yes |
MPI_Bsend | yes |
MPI_Rsend | no |
MPI_Isend | yes |
MPI_Issend | no |
MPI_Ibsend | no |
MPI_Irsend | no |
MPI_Recv | yes |
MPI_Irecv | yes |
MPI_Sendrecv | yes |
MPI_Sendrecv_replace | yes |
MPI_Test | yes |
MPI_Testany | yes |
MPI_Testsome | yes |
MPI_Testall | yes |
critter
cannot profile libraries that use any thread mechanism other thanMPI_THREAD_SINGLE
.critter
does not profile late receivers for point-to-point communication.