Define some basic terminology around loops in our documentation

I've noticed a lot of confusion around this area recently with key terms being misused in a number of threads. To help reign that in, let's go ahead and document the current terminology and meaning thereof. My hope is to grow this over time into a broader discussion of canonical loop forms - yes, there are more than one ... many more than one - but for the moment, simply having the key terminology is a good stopping place. Note: I am landing this *without* an LGTM. All feedback so far has been positive, and trying to apply all of the suggested changes/extensions would cause the review to never end. Instead, I decided to land it with the obvious fixes made based on reviewer comments, then iterate from there. Differential Revision: https://reviews.llvm.org/D65164 llvm-svn: 366960
llvm · Jul 24, 2019 · 58b4787 · 58b4787
1 parent 728b18f
commit 58b4787
Show file tree

Hide file tree

Showing 2 changed files with 113 additions and 0 deletions.
diff --git a/llvm/docs/LoopTerminology.rst b/llvm/docs/LoopTerminology.rst
@@ -0,0 +1,110 @@
+===========================================
+LLVM Loop Terminology (and Canonical Forms)
+===========================================
+
+.. contents::
+   :local:
+
+Introduction
+============
+
+Loops are a core concept in any optimizer.  This page spells out some
+of the common terminology used within LLVM code to describe loop
+structures.
+
+First, let's start with the basics.  In LLVM, a Loop is a cycle within
+the control flow graph (CFG) where there exists one block (the loop
+header block) which dominates all other blocks within the cycle.
+
+Note that there are some important implications of this definition:
+
+* Not all cycles are loops.  There exist cycles that do not meet the
+  dominance requirement and such are not considered loops.  LoopInfo
+  does not include such cycles.
+
+* Loops can contain non-loop cycles and non-loop cycles may contain
+  loops.  Loops may also contain sub-loops.
+
+* Given the use of dominance in the definition, all loops are
+  statically reachable from the entry of the function.  Loops which
+  become statically unreachable during optimization *must* be removed
+  from LoopInfo. 
+
+* Every loop must have a header block, and some set of predecessors
+  outside the loop.  A loop is allowed to be statically infinite, so
+  there need not be any exiting edges.
+
+* Any two loops are either fully disjoint (no intersecting blocks), or
+  one must be a sub-loop of the other.
+
+A loop may have an arbitrary number of exits, both explicit (via
+control flow) and implicit (via throwing calls which transfer control
+out of the containing function).  There is no special requirement on
+the form or structure of exit blocks (the block outside the loop which
+is branched to).  They may have multiple predecessors, phis, etc...
+
+Key Terminology
+===============
+
+Header Block - The basic block which dominates all other blocks
+contained within the loop.  As such, it is the first one executed if
+the loop executes at all.  Note that a block can be the header of
+two separate loops at the same time, but only if one is a sub-loop
+of the other.
+
+Exiting Block - A basic block contained within a given loop which has
+at least one successor outside of the loop and one successor inside the
+loop.  (The latter is required for the block to be contained within the
+cycle which makes up the loop.)  That is, it has a successor which is
+an Exit Block.  
+
+Exit Block - A basic block outside of the associated loop which has a
+predecessor inside the loop.  That is, it has a predecessor which is
+an Exiting Block.
+
+Latch Block - A basic block within the loop whose successors include
+the header block of the loop.  Thus, a latch is a source of backedge.
+A loop may have multiple latch blocks.  A latch block may be either
+conditional or unconditional.
+
+Backedge(s) - The edge(s) in the CFG from latch blocks to the header
+block.  Note that there can be multiple such edges, and even multiple
+such edges leaving a single latch block.  
+
+Loop Predecessor -  The predecessor blocks of the loop header which
+are not contained by the loop itself.  These are the only blocks
+through which execution can enter the loop.  When used in the
+singular form implies that there is only one such unique block. 
+
+Preheader Block - A preheader is a (singular) loop predecessor which
+ends in an unconditional transfer of control to the loop header.  Note
+that not all loops have such blocks.
+
+Backedge Taken Count - The number of times the backedge will have
+executed before some interesting event happens.  Commonly used without
+qualification of the event as a shorthand for when some exiting block
+branches to some exit block. May be zero, or not statically computable.
+
+Iteration Count - The number of times the header has executed before
+some interesting event happens.  Commonly used w/o qualification to
+refer to the iteration count at which the loop exits.  Will always be
+one greater than the backedge taken count.  (Warning: Preceding
+statement is true in the *integer domain*; if you're dealing with fixed
+width integers (such as LLVM Values or SCEVs), you need to be cautious
+of overflow when converting one to the other.)
+
+Loop Simplify Form
+==================
+
+TBD
+
+
+Loop Closed SSA (LCSSA)
+=======================
+
+TBD
+
+"More Canonical" Loops
+======================
+
+TBD
diff --git a/llvm/include/llvm/Analysis/LoopInfo.h b/llvm/include/llvm/Analysis/LoopInfo.h
@@ -30,6 +30,9 @@
 // instance.  In particular, a Loop might be inside such a non-loop SCC, or a
 // non-loop SCC might contain a sub-SCC which is a Loop.
 //
+// For an overview of terminology used in this API (and thus all of our loop
+// analyses or transforms), see docs/LoopTerminology.rst.
+//
 //===----------------------------------------------------------------------===//
 
 #ifndef LLVM_ANALYSIS_LOOPINFO_H