Skip to content

Commit

Permalink
[clang-tidy] new check: bugprone-branch-clone
Browse files Browse the repository at this point in the history
Implement a check for detecting if/else if/else chains where two or more
branches are Type I clones of each other (that is, they contain identical code)
and for detecting switch statements where two or more consecutive branches are
Type I clones of each other.

Patch by Donát Nagy!

Differential Revision: https://reviews.llvm.org/D54757

llvm-svn: 360779
  • Loading branch information
Kristof Umann committed May 15, 2019
1 parent 29257eb commit 7f7dd09
Show file tree
Hide file tree
Showing 8 changed files with 1,393 additions and 0 deletions.
226 changes: 226 additions & 0 deletions clang-tools-extra/clang-tidy/bugprone/BranchCloneCheck.cpp
@@ -0,0 +1,226 @@
//===--- BranchCloneCheck.cpp - clang-tidy --------------------------------===//
//
// The LLVM Compiler Infrastructure
//
// This file is distributed under the University of Illinois Open Source
// License. See LICENSE.TXT for details.
//
//===----------------------------------------------------------------------===//

#include "BranchCloneCheck.h"
#include "clang/AST/ASTContext.h"
#include "clang/ASTMatchers/ASTMatchFinder.h"
#include "clang/Analysis/CloneDetection.h"
#include "llvm/Support/Casting.h"

using namespace clang;
using namespace clang::ast_matchers;

/// Returns true when the statements are Type I clones of each other.
static bool areStatementsIdentical(const Stmt *LHS, const Stmt *RHS,
const ASTContext &Context) {
llvm::FoldingSetNodeID DataLHS, DataRHS;
LHS->Profile(DataLHS, Context, false);
RHS->Profile(DataRHS, Context, false);
return (DataLHS == DataRHS);
}

namespace {
/// A branch in a switch may consist of several statements; while a branch in
/// an if/else if/else chain is one statement (which may be a CompoundStmt).
using SwitchBranch = llvm::SmallVector<const Stmt *, 2>;
} // anonymous namespace

/// Determines if the bodies of two branches in a switch statements are Type I
/// clones of each other. This function only examines the body of the branch
/// and ignores the `case X:` or `default:` at the start of the branch.
static bool areSwitchBranchesIdentical(const SwitchBranch LHS,
const SwitchBranch RHS,
const ASTContext &Context) {
if (LHS.size() != RHS.size())
return false;

for (size_t i = 0, Size = LHS.size(); i < Size; i++) {
// NOTE: We strip goto labels and annotations in addition to stripping
// the `case X:` or `default:` labels, but it is very unlikely that this
// would casue false positives in real-world code.
if (!areStatementsIdentical(LHS[i]->stripLabelLikeStatements(),
RHS[i]->stripLabelLikeStatements(), Context)) {
return false;
}
}

return true;
}

namespace clang {
namespace tidy {
namespace bugprone {

void BranchCloneCheck::registerMatchers(MatchFinder *Finder) {
Finder->addMatcher(
ifStmt(stmt().bind("if"),
hasParent(stmt(unless(ifStmt(hasElse(equalsBoundNode("if")))))),
hasElse(stmt().bind("else"))),
this);
Finder->addMatcher(switchStmt().bind("switch"), this);
Finder->addMatcher(conditionalOperator().bind("condOp"), this);
}

void BranchCloneCheck::check(const MatchFinder::MatchResult &Result) {
const ASTContext &Context = *Result.Context;

if (const auto *IS = Result.Nodes.getNodeAs<IfStmt>("if")) {
const Stmt *Then = IS->getThen();
assert(Then && "An IfStmt must have a `then` branch!");

const Stmt *Else = Result.Nodes.getNodeAs<Stmt>("else");
assert(Else && "We only look for `if` statements with an `else` branch!");

if (!isa<IfStmt>(Else)) {
// Just a simple if with no `else if` branch.
if (areStatementsIdentical(Then->IgnoreContainers(),
Else->IgnoreContainers(), Context)) {
diag(IS->getBeginLoc(), "if with identical then and else branches");
diag(IS->getElseLoc(), "else branch starts here", DiagnosticIDs::Note);
}
return;
}

// This is the complicated case when we start an if/else if/else chain.
// To find all the duplicates, we collect all the branches into a vector.
llvm::SmallVector<const Stmt *, 4> Branches;
const IfStmt *Cur = IS;
while (true) {
// Store the `then` branch.
Branches.push_back(Cur->getThen());

Else = Cur->getElse();
// The chain ends if there is no `else` branch.
if (!Else)
break;

// Check if there is another `else if`...
Cur = dyn_cast<IfStmt>(Else);
if (!Cur) {
// ...this is just a plain `else` branch at the end of the chain.
Branches.push_back(Else);
break;
}
}

size_t N = Branches.size();
llvm::BitVector KnownAsClone(N);

for (size_t i = 0; i + 1 < N; i++) {
// We have already seen Branches[i] as a clone of an earlier branch.
if (KnownAsClone[i])
continue;

int NumCopies = 1;

for (size_t j = i + 1; j < N; j++) {
if (KnownAsClone[j] ||
!areStatementsIdentical(Branches[i]->IgnoreContainers(),
Branches[j]->IgnoreContainers(), Context))
continue;

NumCopies++;
KnownAsClone[j] = true;

if (NumCopies == 2) {
// We report the first occurence only when we find the second one.
diag(Branches[i]->getBeginLoc(),
"repeated branch in conditional chain");
diag(Lexer::getLocForEndOfToken(Branches[i]->getEndLoc(), 0,
*Result.SourceManager, getLangOpts()),
"end of the original", DiagnosticIDs::Note);
}

diag(Branches[j]->getBeginLoc(), "clone %0 starts here",
DiagnosticIDs::Note)
<< (NumCopies - 1);
}
}
return;
}

if (const auto *CO = Result.Nodes.getNodeAs<ConditionalOperator>("condOp")) {
// We do not try to detect chains of ?: operators.
if (areStatementsIdentical(CO->getTrueExpr(), CO->getFalseExpr(), Context))
diag(CO->getQuestionLoc(),
"conditional operator with identical true and false expressions");

return;
}

if (const auto *SS = Result.Nodes.getNodeAs<SwitchStmt>("switch")) {
const CompoundStmt *Body = dyn_cast_or_null<CompoundStmt>(SS->getBody());

// Code like
// switch (x) case 0: case 1: foobar();
// is legal and calls foobar() if and only if x is either 0 or 1;
// but we do not try to distinguish branches in such code.
if (!Body)
return;

// We will first collect the branches of the switch statements. For the
// sake of simplicity we say that branches are delimited by the SwitchCase
// (`case:` or `default:`) children of Body; that is, we ignore `case:` or
// `default:` labels embedded inside other statements and we do not follow
// the effects of `break` and other manipulation of the control-flow.
llvm::SmallVector<SwitchBranch, 4> Branches;
for (const Stmt *S : Body->body()) {
// If this is a `case` or `default`, we start a new, empty branch.
if (isa<SwitchCase>(S))
Branches.emplace_back();

// There may be code before the first branch (which can be dead code
// and can be code reached either through goto or through case labels
// that are embedded inside e.g. inner compound statements); we do not
// store those statements in branches.
if (!Branches.empty())
Branches.back().push_back(S);
}

auto End = Branches.end();
auto BeginCurrent = Branches.begin();
while (BeginCurrent < End) {
auto EndCurrent = BeginCurrent + 1;
while (EndCurrent < End &&
areSwitchBranchesIdentical(*BeginCurrent, *EndCurrent, Context)) {
++EndCurrent;
}
// At this point the iterator range {BeginCurrent, EndCurrent} contains a
// complete family of consecutive identical branches.
if (EndCurrent > BeginCurrent + 1) {
diag(BeginCurrent->front()->getBeginLoc(),
"switch has %0 consecutive identical branches")
<< static_cast<int>(std::distance(BeginCurrent, EndCurrent));

SourceLocation EndLoc = (EndCurrent - 1)->back()->getEndLoc();
// If the case statement is generated from a macro, it's SourceLocation
// may be invalid, resuling in an assertation failure down the line.
// While not optimal, try the begin location in this case, it's still
// better then nothing.
if (EndLoc.isInvalid())
EndLoc = (EndCurrent - 1)->back()->getBeginLoc();

if (EndLoc.isMacroID())
EndLoc = Context.getSourceManager().getExpansionLoc(EndLoc);

diag(Lexer::getLocForEndOfToken(EndLoc, 0, *Result.SourceManager,
getLangOpts()),
"last of these clones ends here", DiagnosticIDs::Note);
}
BeginCurrent = EndCurrent;
}
return;
}

llvm_unreachable("No if statement and no switch statement.");
}

} // namespace bugprone
} // namespace tidy
} // namespace clang
39 changes: 39 additions & 0 deletions clang-tools-extra/clang-tidy/bugprone/BranchCloneCheck.h
@@ -0,0 +1,39 @@
//===--- BranchCloneCheck.h - clang-tidy ------------------------*- C++ -*-===//
//
// The LLVM Compiler Infrastructure
//
// This file is distributed under the University of Illinois Open Source
// License. See LICENSE.TXT for details.
//
//===----------------------------------------------------------------------===//

#ifndef LLVM_CLANG_TOOLS_EXTRA_CLANG_TIDY_BUGPRONE_BRANCHCLONECHECK_H
#define LLVM_CLANG_TOOLS_EXTRA_CLANG_TIDY_BUGPRONE_BRANCHCLONECHECK_H

#include "../ClangTidy.h"

namespace clang {
namespace tidy {
namespace bugprone {

/// A check for detecting if/else if/else chains where two or more branches are
/// Type I clones of each other (that is, they contain identical code), for
/// detecting switch statements where two or more consecutive branches are
/// Type I clones of each other, and for detecting conditional operators where
/// the true and false expressions are Type I clones of each other.
///
/// For the user-facing documentation see:
/// http://clang.llvm.org/extra/clang-tidy/checks/bugprone-branch-clone.html
class BranchCloneCheck : public ClangTidyCheck {
public:
BranchCloneCheck(StringRef Name, ClangTidyContext *Context)
: ClangTidyCheck(Name, Context) {}
void registerMatchers(ast_matchers::MatchFinder *Finder) override;
void check(const ast_matchers::MatchFinder::MatchResult &Result) override;
};

} // namespace bugprone
} // namespace tidy
} // namespace clang

#endif // LLVM_CLANG_TOOLS_EXTRA_CLANG_TIDY_BUGPRONE_BRANCHCLONECHECK_H
3 changes: 3 additions & 0 deletions clang-tools-extra/clang-tidy/bugprone/BugproneTidyModule.cpp
Expand Up @@ -13,6 +13,7 @@
#include "ArgumentCommentCheck.h"
#include "AssertSideEffectCheck.h"
#include "BoolPointerImplicitConversionCheck.h"
#include "BranchCloneCheck.h"
#include "CopyConstructorInitCheck.h"
#include "DanglingHandleCheck.h"
#include "ExceptionEscapeCheck.h"
Expand Down Expand Up @@ -65,6 +66,8 @@ class BugproneModule : public ClangTidyModule {
"bugprone-assert-side-effect");
CheckFactories.registerCheck<BoolPointerImplicitConversionCheck>(
"bugprone-bool-pointer-implicit-conversion");
CheckFactories.registerCheck<BranchCloneCheck>(
"bugprone-branch-clone");
CheckFactories.registerCheck<CopyConstructorInitCheck>(
"bugprone-copy-constructor-init");
CheckFactories.registerCheck<DanglingHandleCheck>(
Expand Down
1 change: 1 addition & 0 deletions clang-tools-extra/clang-tidy/bugprone/CMakeLists.txt
Expand Up @@ -4,6 +4,7 @@ add_clang_library(clangTidyBugproneModule
ArgumentCommentCheck.cpp
AssertSideEffectCheck.cpp
BoolPointerImplicitConversionCheck.cpp
BranchCloneCheck.cpp
BugproneTidyModule.cpp
CopyConstructorInitCheck.cpp
DanglingHandleCheck.cpp
Expand Down
7 changes: 7 additions & 0 deletions clang-tools-extra/docs/ReleaseNotes.rst
Expand Up @@ -108,6 +108,13 @@ Improvements to clang-tidy
against self-assignment either by checking self-assignment explicitly or
using the copy-and-swap or the copy-and-move method.

- New :doc:`bugprone-branch-clone
<clang-tidy/checks/bugprone-branch-clone>` check.

Checks for repeated branches in ``if/else if/else`` chains, consecutive
repeated branches in ``switch`` statements and indentical true and false
branches in conditional operators.

- New :doc:`google-readability-avoid-underscore-in-googletest-name
<clang-tidy/checks/google-readability-avoid-underscore-in-googletest-name>`
check.
Expand Down
90 changes: 90 additions & 0 deletions clang-tools-extra/docs/clang-tidy/checks/bugprone-branch-clone.rst
@@ -0,0 +1,90 @@
.. title:: clang-tidy - bugprone-branch-clone

bugprone-branch-clone
=====================

Checks for repeated branches in ``if/else if/else`` chains, consecutive
repeated branches in ``switch`` statements and indentical true and false
branches in conditional operators.

.. code-block:: c++

if (test_value(x)) {
y++;
do_something(x, y);
} else {
y++;
do_something(x, y);
}

In this simple example (which could arise e.g. as a copy-paste error) the
``then`` and ``else`` branches are identical and the code is equivalent the
following shorter and cleaner code:

.. code-block:: c++

test_value(x); // can be omitted unless it has side effects
y++;
do_something(x, y);


If this is the inteded behavior, then there is no reason to use a conditional
statement; otherwise the issue can be solved by fixing the branch that is
handled incorrectly.

The check also detects repeated branches in longer ``if/else if/else`` chains
where it would be even harder to notice the problem.

In ``switch`` statements the check only reports repeated branches when they are
consecutive, because it is relatively common that the ``case:`` labels have
some natural ordering and rearranging them would decrease the readability of
the code. For example:

.. code-block:: c++

switch (ch) {
case 'a':
return 10;
case 'A':
return 10;
case 'b':
return 11;
case 'B':
return 11;
default:
return 10;
}

Here the check reports that the ``'a'`` and ``'A'`` branches are identical
(and that the ``'b'`` and ``'B'`` branches are also identical), but does not
report that the ``default:`` branch is also idenical to the first two branches.
If this is indeed the correct behavior, then it could be implemented as:

.. code-block:: c++

switch (ch) {
case 'a':
case 'A':
return 10;
case 'b':
case 'B':
return 11;
default:
return 10;
}

Here the check does not warn for the repeated ``return 10;``, which is good if
we want to preserve that ``'a'`` is before ``'b'`` and ``default:`` is the last
branch.

Finally, the check also examines conditional operators and reports code like:

.. code-block:: c++

return test_value(x) ? x : x;

Unlike if statements, the check does not detect chains of conditional
operators.

Note: This check also reports situations where branches become identical only
after preprocession.

0 comments on commit 7f7dd09

Please sign in to comment.