partial user guide

hatchet · Jul 24, 2018 · 8617f13 · 8617f13
1 parent 0029f34
commit 8617f13
Show file tree

Hide file tree

Showing 3 changed files with 59 additions and 7 deletions.
diff --git a/docs/conf.py b/docs/conf.py
@@ -20,7 +20,7 @@
 # -- Project information -----------------------------------------------------
 
 project = u'hatchet'
-copyright = u'2018, Abhinav Bhatele'
+copyright = u'2017-2018, Lawrence Livermore National Security, LLC'
 author = u'Abhinav Bhatele'
 
 # The short X.Y version

diff --git a/docs/index.rst b/docs/index.rst
@@ -8,7 +8,7 @@ Hatchet Documentation
 
 Hatchet is a Python-based library to analyze performance data that has a
 hierarchy (derived from calling context trees, call graphs, callpath traces,
-nested regions’ timers, etc.). Hatchet implements various operations to analyze
+nested regions' timers, etc.). Hatchet implements various operations to analyze
 a single hierarchical data set or compare multiple data sets.
 
 .. toctree::

diff --git a/docs/userguide.rst b/docs/userguide.rst
@@ -1,19 +1,71 @@
 User Guide
 ==========
 
+Hatchet is a Python tool that simplifies the process of analyzing hierarchical
+performance data such as calling context trees. Hatchet uses pandas dataframes
+to store the data on each node of the hierarchy and keeps the graph
+relationships between the nodes in a different data structure that is kept
+consistent with the dataframe.
+
 Supported Input File Formats
 ----------------------------
 
+Currently, hatchet supports two file formats as input:
+
+* `HPCToolkit <http://hpctoolkit.org/index.html>`_ database: This is generated
+  by using ``hpcprof-mpi`` to post-process the raw measurements directory
+  output by HPCToolkit.
+* Caliper `Json-split
+  <http://llnl.github.io/Caliper/OutputFormats.html#json-split>`_ file: This is
+  generated by either running cali-query on the raw Caliper data or by enabling
+  the mpireport service when using caliper.
 
-Graph Frame
+Graphframe
 -----------
 
+``Graphframe`` is the main data structure in hatchet that stores the
+performance data that is read in from an HPCToolkit database or Caliper Json
+file. Typically, the raw input data is in the form of a tree. However, since
+subsequent operations on the tree can lead to new edges being created which can
+turn the tree into a graph, we store the input data as a directed graph. The
+graphframe consists of a graph object that stores the edge relationships
+between nodes and a dataframe that stores different metrics (numerical data)
+and categorical data associated with each node.
+
+Graph
+^^^^^
+
+The graph can be connected or disconnected (multiple roots) and each node in
+the graph can have one or more parents and children. The node stores its
+callpath, which is a tuple of the node names from the root to this node. This
+is used as one of the indices in the dataframe.
+
+Dataframe
+^^^^^^^^^
 
-Graph Operations
-----------------
+The dataframe holds all the numerical and categorical data associated with each
+node. Since typically the call tree data is per process, a multiindex composed
+of the node and MPI rank is used to index into the dataframe.
 
+Graph-centric Operations
+------------------------
+**Prune**: The ``prune`` operation is always performed after a ``filter`` is
+done on the dataframe. ``Prune`` removes nodes from the graph that were
+filtered out due to a previous ``filter`` operation. When one or more nodes on
+a path are removed from the graph, the nearest alive ancestor is connected by
+an edge to the nearest alive child on the path.
 
-Dataframe Operations
---------------------
+**Union**:
 
 
+**Diff**:
+
+
+Dataframe-centric Operations
+----------------------------
+**Filter**: ``filter`` takes a user supplied function and applies that to all
+rows in the dataframe. The resulting Series or dataframe is used to filter the
+dataframe to only return rows that are True. The returned graphframe preserves
+the graph provided as input.
+
+**Fill**: