[FileCheck] Add precision to format specifier

Add printf-style precision specifier to pad numbers to a given number of digits when matching them if the value is smaller than the given precision. This works on both empty numeric expression (e.g. variable definition from input) and when matching a numeric expression. The syntax is as follows: [[#%.<precision><format specifier>, ...] where <format specifier> is optional and ... can be a variable definition or not with an empty expression or not. In the absence of a precision specifier, a variable definition will accept leading zeros. Reviewed By: jhenderson, grimar Differential Revision: https://reviews.llvm.org/D81667
llvm · Aug 30, 2020 · 998709b · 998709b
1 parent 719548d
commit 998709b
Show file tree

Hide file tree

Showing 5 changed files with 329 additions and 116 deletions.
diff --git a/llvm/docs/CommandGuide/FileCheck.rst b/llvm/docs/CommandGuide/FileCheck.rst
@@ -730,35 +730,60 @@ numeric expression constraint based on those variables via a numeric
 substitution. This allows ``CHECK:`` directives to verify a numeric relation
 between two numbers, such as the need for consecutive registers to be used.
 
-The syntax to define a numeric variable is ``[[#%<fmtspec>,<NUMVAR>:]]`` where:
+The syntax to capture a numeric value is
+``[[#%<fmtspec>,<NUMVAR>:]]`` where:
 
-* ``%<fmtspec>`` is an optional scanf-style matching format specifier to
-  indicate what number format to match (e.g. hex number).  Currently accepted
-  format specifiers are ``%u``, ``%d``, ``%x`` and ``%X``.  If absent, the
-  format specifier defaults to ``%u``.
+* ``%<fmtspec>,`` is an optional format specifier to indicate what number
+  format to match and the minimum number of digits to expect.
+
+* ``<NUMVAR>:`` is an optional definition of variable ``<NUMVAR>`` from the
+  captured value.
+
+The syntax of ``<fmtspec>`` is: ``.<precision><conversion specifier>`` where:
+
+* ``.<precision>`` is an optional printf-style precision specifier in which
+ ``<precision>`` indicates the minimum number of digits that the value matched
+ must have, expecting leading zeros if needed.
+
+*  ``<conversion specifier>`` is an optional scanf-style conversion specifier
+  to indicate what number format to match (e.g. hex number).  Currently
+  accepted format specifiers are ``%u``, ``%d``, ``%x`` and ``%X``.  If absent,
+  the format specifier defaults to ``%u``.
 
-* ``<NUMVAR>`` is the name of the numeric variable to define to the matching
-  value.
 
 For example:
 
 .. code-block:: llvm
 
-    ; CHECK: mov r[[#REG:]], 0x[[#%X,IMM:]]
+    ; CHECK: mov r[[#REG:]], 0x[[#%.8X,ADDR:]]
 
-would match ``mov r5, 0xF0F0`` and set ``REG`` to the value ``5`` and ``IMM``
-to the value ``0xF0F0``.
+would match ``mov r5, 0x0000FEFE`` and set ``REG`` to the value ``5`` and
+``ADDR`` to the value ``0xFEFE``. Note that due to the precision it would fail
+to match ``mov r5, 0xFEFE``.
 
-The syntax of a numeric substitution is
-``[[#%<fmtspec>: <constraint> <expr>]]`` where:
+As a result of the numeric variable definition being optional, it is possible
+to only check that a numeric value is present in a given format. This can be
+useful when the value itself is not useful, for instance:
 
-* ``%<fmtspec>`` is the same matching format specifier as for defining numeric
-  variables but acting as a printf-style format to indicate how a numeric
-  expression value should be matched against.  If absent, the format specifier
-  is inferred from the matching format of the numeric variable(s) used by the
-  expression constraint if any, and defaults to ``%u`` if no numeric variable
-  is used.  In case of conflict between matching formats of several numeric
-  variables the format specifier is mandatory.
+.. code-block:: gas
+
+    ; CHECK-NOT: mov r0, r[[#]]
+
+to check that a value is synthesized rather than moved around.
+
+
+The syntax of a numeric substitution is
+``[[#%<fmtspec>, <constraint> <expr>]]`` where:
+
+* ``<fmtspec>`` is the same format specifier as for defining a variable but
+  in this context indicating how a numeric expression value should be matched
+  against. If absent, both components of the format specifier are inferred from
+  the matching format of the numeric variable(s) used by the expression
+  constraint if any, and defaults to ``%u`` if no numeric variable is used,
+  denoting that the value should be unsigned with no leading zeros. In case of
+  conflict between format specifiers of several numeric variables, the
+  conversion specifier becomes mandatory but the precision specifier remains
+  optional.
 
 * ``<constraint>`` is the constraint describing how the value to match must
   relate to the value of the numeric expression. The only currently accepted
@@ -824,20 +849,11 @@ but would not match the text:
 Due to ``7`` being unequal to ``5 + 1`` and ``a0463443`` being unequal to
 ``a0463440 + 7``.
 
-The syntax also supports an empty expression, equivalent to writing {{[0-9]+}},
-for cases where the input must contain a numeric value but the value itself
-does not matter:
-
-.. code-block:: gas
-
-    ; CHECK-NOT: mov r0, r[[#]]
-
-to check that a value is synthesized rather than moved around.
 
 A numeric variable can also be defined to the result of a numeric expression,
 in which case the numeric expression constraint is checked and if verified the
-variable is assigned to the value. The unified syntax for both defining numeric
-variables and checking a numeric expression is thus
+variable is assigned to the value. The unified syntax for both checking a
+numeric expression and capturing its value into a numeric variable is thus
 ``[[#%<fmtspec>,<NUMVAR>: <constraint> <expr>]]`` with each element as
 described previously. One can use this syntax to make a testcase more
 self-describing by using variables instead of values:

diff --git a/llvm/lib/Support/FileCheck.cpp b/llvm/lib/Support/FileCheck.cpp
@@ -43,16 +43,28 @@ StringRef ExpressionFormat::toString() const {
   llvm_unreachable("unknown expression format");
 }
 
-Expected<StringRef> ExpressionFormat::getWildcardRegex() const {
+Expected<std::string> ExpressionFormat::getWildcardRegex() const {
+  auto CreatePrecisionRegex = [this](StringRef S) {
+    return (S + Twine('{') + Twine(Precision) + "}").str();
+  };
+
   switch (Value) {
   case Kind::Unsigned:
-    return StringRef("[0-9]+");
+    if (Precision)
+      return CreatePrecisionRegex("([1-9][0-9]*)?[0-9]");
+    return std::string("[0-9]+");
   case Kind::Signed:
-    return StringRef("-?[0-9]+");
+    if (Precision)
+      return CreatePrecisionRegex("-?([1-9][0-9]*)?[0-9]");
+    return std::string("-?[0-9]+");
   case Kind::HexUpper:
-    return StringRef("[0-9A-F]+");
+    if (Precision)
+      return CreatePrecisionRegex("([1-9A-F][0-9A-F]*)?[0-9A-F]");
+    return std::string("[0-9A-F]+");
   case Kind::HexLower:
-    return StringRef("[0-9a-f]+");
+    if (Precision)
+      return CreatePrecisionRegex("([1-9a-f][0-9a-f]*)?[0-9a-f]");
+    return std::string("[0-9a-f]+");
   default:
     return createStringError(std::errc::invalid_argument,
                              "trying to match value with invalid format");
@@ -61,27 +73,47 @@ Expected<StringRef> ExpressionFormat::getWildcardRegex() const {
 
 Expected<std::string>
 ExpressionFormat::getMatchingString(ExpressionValue IntegerValue) const {
+  uint64_t AbsoluteValue;
+  StringRef SignPrefix = IntegerValue.isNegative() ? "-" : "";
+
   if (Value == Kind::Signed) {
     Expected<int64_t> SignedValue = IntegerValue.getSignedValue();
     if (!SignedValue)
       return SignedValue.takeError();
-    return itostr(*SignedValue);
+    if (*SignedValue < 0)
+      AbsoluteValue = cantFail(IntegerValue.getAbsolute().getUnsignedValue());
+    else
+      AbsoluteValue = *SignedValue;
+  } else {
+    Expected<uint64_t> UnsignedValue = IntegerValue.getUnsignedValue();
+    if (!UnsignedValue)
+      return UnsignedValue.takeError();
+    AbsoluteValue = *UnsignedValue;
   }
 
-  Expected<uint64_t> UnsignedValue = IntegerValue.getUnsignedValue();
-  if (!UnsignedValue)
-    return UnsignedValue.takeError();
+  std::string AbsoluteValueStr;
   switch (Value) {
   case Kind::Unsigned:
-    return utostr(*UnsignedValue);
+  case Kind::Signed:
+    AbsoluteValueStr = utostr(AbsoluteValue);
+    break;
   case Kind::HexUpper:
-    return utohexstr(*UnsignedValue, /*LowerCase=*/false);
   case Kind::HexLower:
-    return utohexstr(*UnsignedValue, /*LowerCase=*/true);
+    AbsoluteValueStr = utohexstr(AbsoluteValue, Value == Kind::HexLower);
+    break;
   default:
     return createStringError(std::errc::invalid_argument,
                              "trying to match value with invalid format");
   }
+
+  if (Precision > AbsoluteValueStr.size()) {
+    unsigned LeadingZeros = Precision - AbsoluteValueStr.size();
+    return (Twine(SignPrefix) + std::string(LeadingZeros, '0') +
+            AbsoluteValueStr)
+        .str();
+  }
+
+  return (Twine(SignPrefix) + AbsoluteValueStr).str();
 }
 
 Expected<ExpressionValue>
@@ -720,41 +752,59 @@ Expected<std::unique_ptr<Expression>> Pattern::parseNumericSubstitutionBlock(
   StringRef DefExpr = StringRef();
   DefinedNumericVariable = None;
   ExpressionFormat ExplicitFormat = ExpressionFormat();
+  unsigned Precision = 0;
 
   // Parse format specifier (NOTE: ',' is also an argument seperator).
   size_t FormatSpecEnd = Expr.find(',');
   size_t FunctionStart = Expr.find('(');
   if (FormatSpecEnd != StringRef::npos && FormatSpecEnd < FunctionStart) {
-    Expr = Expr.ltrim(SpaceChars);
-    if (!Expr.consume_front("%"))
+    StringRef FormatExpr = Expr.take_front(FormatSpecEnd);
+    Expr = Expr.drop_front(FormatSpecEnd + 1);
+    FormatExpr = FormatExpr.trim(SpaceChars);
+    if (!FormatExpr.consume_front("%"))
       return ErrorDiagnostic::get(
-          SM, Expr, "invalid matching format specification in expression");
-
-    // Check for unknown matching format specifier and set matching format in
-    // class instance representing this expression.
-    SMLoc fmtloc = SMLoc::getFromPointer(Expr.data());
-    switch (popFront(Expr)) {
-    case 'u':
-      ExplicitFormat = ExpressionFormat(ExpressionFormat::Kind::Unsigned);
-      break;
-    case 'd':
-      ExplicitFormat = ExpressionFormat(ExpressionFormat::Kind::Signed);
-      break;
-    case 'x':
-      ExplicitFormat = ExpressionFormat(ExpressionFormat::Kind::HexLower);
-      break;
-    case 'X':
-      ExplicitFormat = ExpressionFormat(ExpressionFormat::Kind::HexUpper);
-      break;
-    default:
-      return ErrorDiagnostic::get(SM, fmtloc,
-                                  "invalid format specifier in expression");
+          SM, FormatExpr,
+          "invalid matching format specification in expression");
+
+    // Parse precision.
+    if (FormatExpr.consume_front(".")) {
+      if (FormatExpr.consumeInteger(10, Precision))
+        return ErrorDiagnostic::get(SM, FormatExpr,
+                                    "invalid precision in format specifier");
     }
 
-    Expr = Expr.ltrim(SpaceChars);
-    if (!Expr.consume_front(","))
+    if (!FormatExpr.empty()) {
+      // Check for unknown matching format specifier and set matching format in
+      // class instance representing this expression.
+      SMLoc FmtLoc = SMLoc::getFromPointer(FormatExpr.data());
+      switch (popFront(FormatExpr)) {
+      case 'u':
+        ExplicitFormat =
+            ExpressionFormat(ExpressionFormat::Kind::Unsigned, Precision);
+        break;
+      case 'd':
+        ExplicitFormat =
+            ExpressionFormat(ExpressionFormat::Kind::Signed, Precision);
+        break;
+      case 'x':
+        ExplicitFormat =
+            ExpressionFormat(ExpressionFormat::Kind::HexLower, Precision);
+        break;
+      case 'X':
+        ExplicitFormat =
+            ExpressionFormat(ExpressionFormat::Kind::HexUpper, Precision);
+        break;
+      default:
+        return ErrorDiagnostic::get(SM, FmtLoc,
+                                    "invalid format specifier in expression");
+      }
+    }
+
+    FormatExpr = FormatExpr.ltrim(SpaceChars);
+    if (!FormatExpr.empty())
       return ErrorDiagnostic::get(
-          SM, Expr, "invalid matching format specification in expression");
+          SM, FormatExpr,
+          "invalid matching format specification in expression");
   }
 
   // Save variable definition expression if any.
@@ -814,7 +864,7 @@ Expected<std::unique_ptr<Expression>> Pattern::parseNumericSubstitutionBlock(
     Format = *ImplicitFormat;
   }
   if (!Format)
-    Format = ExpressionFormat(ExpressionFormat::Kind::Unsigned);
+    Format = ExpressionFormat(ExpressionFormat::Kind::Unsigned, Precision);
 
   std::unique_ptr<Expression> ExpressionPointer =
       std::make_unique<Expression>(std::move(ExpressionASTPointer), Format);
@@ -948,7 +998,7 @@ bool Pattern::parsePattern(StringRef PatternStr, StringRef Prefix,
       bool IsLegacyLineExpr = false;
       StringRef DefName;
       StringRef SubstStr;
-      StringRef MatchRegexp;
+      std::string MatchRegexp;
       size_t SubstInsertIdx = RegExStr.size();
 
       // Parse string variable or legacy @LINE expression.
@@ -992,7 +1042,7 @@ bool Pattern::parsePattern(StringRef PatternStr, StringRef Prefix,
             return true;
           }
           DefName = Name;
-          MatchRegexp = MatchStr;
+          MatchRegexp = MatchStr.str();
         } else {
           if (IsPseudo) {
             MatchStr = OrigMatchStr;

diff --git a/llvm/lib/Support/FileCheckImpl.h b/llvm/lib/Support/FileCheckImpl.h
@@ -53,15 +53,17 @@ struct ExpressionFormat {
 
 private:
   Kind Value;
+  unsigned Precision = 0;
 
 public:
   /// Evaluates a format to true if it can be used in a match.
   explicit operator bool() const { return Value != Kind::NoFormat; }
 
   /// Define format equality: formats are equal if neither is NoFormat and
-  /// their kinds are the same.
+  /// their kinds and precision are the same.
   bool operator==(const ExpressionFormat &Other) const {
-    return Value != Kind::NoFormat && Value == Other.Value;
+    return Value != Kind::NoFormat && Value == Other.Value &&
+           Precision == Other.Precision;
   }
 
   bool operator!=(const ExpressionFormat &Other) const {
@@ -76,12 +78,14 @@ struct ExpressionFormat {
   StringRef toString() const;
 
   ExpressionFormat() : Value(Kind::NoFormat){};
-  explicit ExpressionFormat(Kind Value) : Value(Value){};
-
-  /// \returns a wildcard regular expression StringRef that matches any value
-  /// in the format represented by this instance, or an error if the format is
-  /// NoFormat.
-  Expected<StringRef> getWildcardRegex() const;
+  explicit ExpressionFormat(Kind Value) : Value(Value), Precision(0){};
+  explicit ExpressionFormat(Kind Value, unsigned Precision)
+      : Value(Value), Precision(Precision){};
+
+  /// \returns a wildcard regular expression string that matches any value in
+  /// the format represented by this instance and no other value, or an error
+  /// if the format is NoFormat.
+  Expected<std::string> getWildcardRegex() const;
 
   /// \returns the string representation of \p Value in the format represented
   /// by this instance, or an error if conversion to this format failed or the