Skip to content

Commit

Permalink
partial user guide
Browse files Browse the repository at this point in the history
  • Loading branch information
bhatele committed Jul 24, 2018
1 parent 0029f34 commit 8617f13
Show file tree
Hide file tree
Showing 3 changed files with 59 additions and 7 deletions.
2 changes: 1 addition & 1 deletion docs/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@
# -- Project information -----------------------------------------------------

project = u'hatchet'
copyright = u'2018, Abhinav Bhatele'
copyright = u'2017-2018, Lawrence Livermore National Security, LLC'
author = u'Abhinav Bhatele'

# The short X.Y version
Expand Down
2 changes: 1 addition & 1 deletion docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ Hatchet Documentation

Hatchet is a Python-based library to analyze performance data that has a
hierarchy (derived from calling context trees, call graphs, callpath traces,
nested regions timers, etc.). Hatchet implements various operations to analyze
nested regions' timers, etc.). Hatchet implements various operations to analyze
a single hierarchical data set or compare multiple data sets.

.. toctree::
Expand Down
62 changes: 57 additions & 5 deletions docs/userguide.rst
Original file line number Diff line number Diff line change
@@ -1,19 +1,71 @@
User Guide
==========

Hatchet is a Python tool that simplifies the process of analyzing hierarchical
performance data such as calling context trees. Hatchet uses pandas dataframes
to store the data on each node of the hierarchy and keeps the graph
relationships between the nodes in a different data structure that is kept
consistent with the dataframe.

Supported Input File Formats
----------------------------

Currently, hatchet supports two file formats as input:

* `HPCToolkit <http://hpctoolkit.org/index.html>`_ database: This is generated
by using ``hpcprof-mpi`` to post-process the raw measurements directory
output by HPCToolkit.
* Caliper `Json-split
<http://llnl.github.io/Caliper/OutputFormats.html#json-split>`_ file: This is
generated by either running cali-query on the raw Caliper data or by enabling
the mpireport service when using caliper.

Graph Frame
Graphframe
-----------

``Graphframe`` is the main data structure in hatchet that stores the
performance data that is read in from an HPCToolkit database or Caliper Json
file. Typically, the raw input data is in the form of a tree. However, since
subsequent operations on the tree can lead to new edges being created which can
turn the tree into a graph, we store the input data as a directed graph. The
graphframe consists of a graph object that stores the edge relationships
between nodes and a dataframe that stores different metrics (numerical data)
and categorical data associated with each node.

Graph
^^^^^

The graph can be connected or disconnected (multiple roots) and each node in
the graph can have one or more parents and children. The node stores its
callpath, which is a tuple of the node names from the root to this node. This
is used as one of the indices in the dataframe.

Dataframe
^^^^^^^^^

Graph Operations
----------------
The dataframe holds all the numerical and categorical data associated with each
node. Since typically the call tree data is per process, a multiindex composed
of the node and MPI rank is used to index into the dataframe.

Graph-centric Operations
------------------------
**Prune**: The ``prune`` operation is always performed after a ``filter`` is
done on the dataframe. ``Prune`` removes nodes from the graph that were
filtered out due to a previous ``filter`` operation. When one or more nodes on
a path are removed from the graph, the nearest alive ancestor is connected by
an edge to the nearest alive child on the path.

Dataframe Operations
--------------------
**Union**:


**Diff**:


Dataframe-centric Operations
----------------------------
**Filter**: ``filter`` takes a user supplied function and applies that to all
rows in the dataframe. The resulting Series or dataframe is used to filter the
dataframe to only return rows that are True. The returned graphframe preserves
the graph provided as input.

**Fill**:

0 comments on commit 8617f13

Please sign in to comment.