-
Notifications
You must be signed in to change notification settings - Fork 37
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
3 changed files
with
59 additions
and
7 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,19 +1,71 @@ | ||
User Guide | ||
========== | ||
|
||
Hatchet is a Python tool that simplifies the process of analyzing hierarchical | ||
performance data such as calling context trees. Hatchet uses pandas dataframes | ||
to store the data on each node of the hierarchy and keeps the graph | ||
relationships between the nodes in a different data structure that is kept | ||
consistent with the dataframe. | ||
|
||
Supported Input File Formats | ||
---------------------------- | ||
|
||
Currently, hatchet supports two file formats as input: | ||
|
||
* `HPCToolkit <http://hpctoolkit.org/index.html>`_ database: This is generated | ||
by using ``hpcprof-mpi`` to post-process the raw measurements directory | ||
output by HPCToolkit. | ||
* Caliper `Json-split | ||
<http://llnl.github.io/Caliper/OutputFormats.html#json-split>`_ file: This is | ||
generated by either running cali-query on the raw Caliper data or by enabling | ||
the mpireport service when using caliper. | ||
|
||
Graph Frame | ||
Graphframe | ||
----------- | ||
|
||
``Graphframe`` is the main data structure in hatchet that stores the | ||
performance data that is read in from an HPCToolkit database or Caliper Json | ||
file. Typically, the raw input data is in the form of a tree. However, since | ||
subsequent operations on the tree can lead to new edges being created which can | ||
turn the tree into a graph, we store the input data as a directed graph. The | ||
graphframe consists of a graph object that stores the edge relationships | ||
between nodes and a dataframe that stores different metrics (numerical data) | ||
and categorical data associated with each node. | ||
|
||
Graph | ||
^^^^^ | ||
|
||
The graph can be connected or disconnected (multiple roots) and each node in | ||
the graph can have one or more parents and children. The node stores its | ||
callpath, which is a tuple of the node names from the root to this node. This | ||
is used as one of the indices in the dataframe. | ||
|
||
Dataframe | ||
^^^^^^^^^ | ||
|
||
Graph Operations | ||
---------------- | ||
The dataframe holds all the numerical and categorical data associated with each | ||
node. Since typically the call tree data is per process, a multiindex composed | ||
of the node and MPI rank is used to index into the dataframe. | ||
|
||
Graph-centric Operations | ||
------------------------ | ||
**Prune**: The ``prune`` operation is always performed after a ``filter`` is | ||
done on the dataframe. ``Prune`` removes nodes from the graph that were | ||
filtered out due to a previous ``filter`` operation. When one or more nodes on | ||
a path are removed from the graph, the nearest alive ancestor is connected by | ||
an edge to the nearest alive child on the path. | ||
|
||
Dataframe Operations | ||
-------------------- | ||
**Union**: | ||
|
||
|
||
**Diff**: | ||
|
||
|
||
Dataframe-centric Operations | ||
---------------------------- | ||
**Filter**: ``filter`` takes a user supplied function and applies that to all | ||
rows in the dataframe. The resulting Series or dataframe is used to filter the | ||
dataframe to only return rows that are True. The returned graphframe preserves | ||
the graph provided as input. | ||
|
||
**Fill**: |