update_test_checks: match IR basic block labels #88979

nhaehnle · 2024-04-16T20:12:11Z

Note: This change is split into three commits. The big refactor to GeneralizerInfo objects stands on its own, but the overall PR gives the context to justify why the refactor is done.

Labels are matched using a regexp of the form '^(pattern):', which requires the addition of a "suffix" concept to NamelessValue.

Aside from that, the key challenge is that block labels are values, and we typically capture values including the prefix '%'. However, when labels appear at the start of a basic block, the prefix '%' is not included, so we must capture block label values without the prefix '%'.

We don't know ahead of time whether an IR value is a label or not. In most cases, they are prefixed by the word "label" (their type), but this isn't the case in phi nodes. We solve this issue by leveraging the two-phase nature of variable generalization: the first pass finds all occurences of a variable and determines whether the '%' prefix can be included or not. The second pass does the actual substitution.

This change also unifies the generalization path for assembly with that for IR and analysis, in the hope that any future changes avoid diverging those cases future.

I also considered the alternative of trying to detect the phi node case using more regular expression special cases but ultimately decided against that because it seemed more fragile, and perhaps the approach of keeping a tentative prefix that may later be discarded could also be eventually applied to some metadata and attribute cases.

Note that an early version of this change was reviewed as https://reviews.llvm.org/D142452, before version numbers were
introduced. This is a substantially updated version of that change.

llvmbot · 2024-04-16T20:12:43Z

@llvm/pr-subscribers-testing-tools

Author: Nicolai Hähnle (nhaehnle)

Changes

Note: This change is split into three commits. The big refactor to GeneralizerInfo objects stands on its own, but the overall PR gives the context to justify why the refactor is done.

Labels are matched using a regexp of the form '^(pattern):', which requires the addition of a "suffix" concept to NamelessValue.

Aside from that, the key challenge is that block labels are values, and we typically capture values including the prefix '%'. However, when labels appear at the start of a basic block, the prefix '%' is not included, so we must capture block label values without the prefix '%'.

We don't know ahead of time whether an IR value is a label or not. In most cases, they are prefixed by the word "label" (their type), but this isn't the case in phi nodes. We solve this issue by leveraging the two-phase nature of variable generalization: the first pass finds all occurences of a variable and determines whether the '%' prefix can be included or not. The second pass does the actual substitution.

This change also unifies the generalization path for assembly with that for IR and analysis, in the hope that any future changes avoid diverging those cases future.

I also considered the alternative of trying to detect the phi node case using more regular expression special cases but ultimately decided against that because it seemed more fragile, and perhaps the approach of keeping a tentative prefix that may later be discarded could also be eventually applied to some metadata and attribute cases.

Note that an early version of this change was reviewed as https://reviews.llvm.org/D142452, before version numbers were
introduced. This is a substantially updated version of that change.

Patch is 63.33 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/88979.diff

9 Files Affected:

(modified) llvm/test/tools/UpdateTestChecks/update_test_checks/Inputs/phi-labels.ll.expected (+18-18)
(modified) llvm/test/tools/UpdateTestChecks/update_test_checks/phi-labels.test (+1-1)
(modified) llvm/utils/UpdateTestChecks/asm.py (+2-3)
(modified) llvm/utils/UpdateTestChecks/common.py (+469-456)
(modified) llvm/utils/UpdateTestChecks/isel.py (+2-3)
(modified) llvm/utils/update_analyze_test_checks.py (+3-2)
(modified) llvm/utils/update_cc_test_checks.py (+23-26)
(modified) llvm/utils/update_llc_test_checks.py (+5-3)
(modified) llvm/utils/update_test_checks.py (+11-4)

diff --git a/llvm/test/tools/UpdateTestChecks/update_test_checks/Inputs/phi-labels.ll.expected b/llvm/test/tools/UpdateTestChecks/update_test_checks/Inputs/phi-labels.ll.expected
index 1d21ebe547f689..5e70a6c89d3276 100644
--- a/llvm/test/tools/UpdateTestChecks/update_test_checks/Inputs/phi-labels.ll.expected
+++ b/llvm/test/tools/UpdateTestChecks/update_test_checks/Inputs/phi-labels.ll.expected
@@ -1,15 +1,15 @@
-; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 4
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 5
 ; RUN: opt < %s -S | FileCheck %s
 
 define i32 @phi_after_label(i1 %cc) {
 ; CHECK-LABEL: define i32 @phi_after_label(
 ; CHECK-SAME: i1 [[CC:%.*]]) {
-; CHECK-NEXT:  entry:
-; CHECK-NEXT:    br i1 [[CC]], label [[THEN:%.*]], label [[END:%.*]]
-; CHECK:       then:
-; CHECK-NEXT:    br label [[END]]
-; CHECK:       end:
-; CHECK-NEXT:    [[R:%.*]] = phi i32 [ 0, [[ENTRY:%.*]] ], [ 1, [[THEN]] ]
+; CHECK-NEXT:  [[ENTRY:.*]]:
+; CHECK-NEXT:    br i1 [[CC]], label %[[THEN:.*]], label %[[END:.*]]
+; CHECK:       [[THEN]]:
+; CHECK-NEXT:    br label %[[END]]
+; CHECK:       [[END]]:
+; CHECK-NEXT:    [[R:%.*]] = phi i32 [ 0, %[[ENTRY]] ], [ 1, %[[THEN]] ]
 ; CHECK-NEXT:    ret i32 [[R]]
 ;
 entry:
@@ -26,14 +26,14 @@ end:
 define void @phi_before_label(i32 %bound) {
 ; CHECK-LABEL: define void @phi_before_label(
 ; CHECK-SAME: i32 [[BOUND:%.*]]) {
-; CHECK-NEXT:  entry:
-; CHECK-NEXT:    br label [[LOOP:%.*]]
-; CHECK:       loop:
-; CHECK-NEXT:    [[CTR:%.*]] = phi i32 [ 0, [[ENTRY:%.*]] ], [ [[CTR_NEXT:%.*]], [[LOOP]] ]
+; CHECK-NEXT:  [[ENTRY:.*]]:
+; CHECK-NEXT:    br label %[[LOOP:.*]]
+; CHECK:       [[LOOP]]:
+; CHECK-NEXT:    [[CTR:%.*]] = phi i32 [ 0, %[[ENTRY]] ], [ [[CTR_NEXT:%.*]], %[[LOOP]] ]
 ; CHECK-NEXT:    [[CTR_NEXT]] = add i32 [[CTR]], 1
 ; CHECK-NEXT:    [[CC:%.*]] = icmp ult i32 [[CTR_NEXT]], [[BOUND]]
-; CHECK-NEXT:    br i1 [[CC]], label [[LOOP]], label [[END:%.*]]
-; CHECK:       end:
+; CHECK-NEXT:    br i1 [[CC]], label %[[LOOP]], label %[[END:.*]]
+; CHECK:       [[END]]:
 ; CHECK-NEXT:    ret void
 ;
 entry:
@@ -52,11 +52,11 @@ end:
 define i32 @phi_after_label_unnamed(i1 %cc) {
 ; CHECK-LABEL: define i32 @phi_after_label_unnamed(
 ; CHECK-SAME: i1 [[CC:%.*]]) {
-; CHECK-NEXT:    br i1 [[CC]], label [[TMP1:%.*]], label [[TMP2:%.*]]
-; CHECK:       1:
-; CHECK-NEXT:    br label [[TMP2]]
-; CHECK:       2:
-; CHECK-NEXT:    [[R:%.*]] = phi i32 [ 0, [[TMP0:%.*]] ], [ 1, [[TMP1]] ]
+; CHECK-NEXT:    br i1 [[CC]], label %[[BB1:.*]], label %[[BB2:.*]]
+; CHECK:       [[BB1]]:
+; CHECK-NEXT:    br label %[[BB2]]
+; CHECK:       [[BB2]]:
+; CHECK-NEXT:    [[R:%.*]] = phi i32 [ 0, [[TMP0:%.*]] ], [ 1, %[[BB1]] ]
 ; CHECK-NEXT:    ret i32 [[R]]
 ;
 0:
diff --git a/llvm/test/tools/UpdateTestChecks/update_test_checks/phi-labels.test b/llvm/test/tools/UpdateTestChecks/update_test_checks/phi-labels.test
index 411c84de1dcba5..2b0d0cb7f54ba8 100644
--- a/llvm/test/tools/UpdateTestChecks/update_test_checks/phi-labels.test
+++ b/llvm/test/tools/UpdateTestChecks/update_test_checks/phi-labels.test
@@ -1,4 +1,4 @@
-# RUN: cp -f %S/Inputs/phi-labels.ll %t.ll && %update_test_checks --version 4 %t.ll
+# RUN: cp -f %S/Inputs/phi-labels.ll %t.ll && %update_test_checks --version 5 %t.ll
 # RUN: diff -u %t.ll %S/Inputs/phi-labels.ll.expected
 ## Check that running the script again does not change the result:
 # RUN: %update_test_checks %t.ll
diff --git a/llvm/utils/UpdateTestChecks/asm.py b/llvm/utils/UpdateTestChecks/asm.py
index f0c456a1648df4..33ede81a416017 100644
--- a/llvm/utils/UpdateTestChecks/asm.py
+++ b/llvm/utils/UpdateTestChecks/asm.py
@@ -605,6 +605,7 @@ def add_checks(
     prefix_list,
     func_dict,
     func_name,
+    ginfo: common.GeneralizerInfo,
     global_vars_seen_dict,
     is_filtered,
 ):
@@ -617,9 +618,7 @@ def add_checks(
         func_dict,
         func_name,
         check_label_format,
-        True,
-        False,
-        1,
+        ginfo,
         global_vars_seen_dict,
         is_filtered=is_filtered,
     )
diff --git a/llvm/utils/UpdateTestChecks/common.py b/llvm/utils/UpdateTestChecks/common.py
index eed36a0cdd73fd..2e3c3a1621cef2 100644
--- a/llvm/utils/UpdateTestChecks/common.py
+++ b/llvm/utils/UpdateTestChecks/common.py
@@ -30,8 +30,9 @@
    in case arguments are split to a separate SAME line.
 4: --check-globals now has a third option ('smart'). The others are now called
    'none' and 'all'. 'smart' is the default.
+5: Basic block labels are matched by FileCheck expressions
 """
-DEFAULT_VERSION = 4
+DEFAULT_VERSION = 5
 
 
 SUPPORTED_ANALYSES = {
@@ -687,6 +688,7 @@ def __init__(
         args_and_sig,
         attrs,
         func_name_separator,
+        ginfo,
     ):
         self.scrub = string
         self.extrascrub = extra
@@ -694,24 +696,27 @@ def __init__(
         self.args_and_sig = args_and_sig
         self.attrs = attrs
         self.func_name_separator = func_name_separator
+        self._ginfo = ginfo
 
     def is_same_except_arg_names(
-        self, extrascrub, funcdef_attrs_and_ret, args_and_sig, attrs, is_backend
+        self, extrascrub, funcdef_attrs_and_ret, args_and_sig, attrs
     ):
         arg_names = set()
 
         def drop_arg_names(match):
-            arg_names.add(match.group(variable_group_in_ir_value_match))
-            if match.group(attribute_group_in_ir_value_match):
-                attr = match.group(attribute_group_in_ir_value_match)
+            nameless_value = self._ginfo.get_nameless_value_from_match(match)
+            if nameless_value.check_key == "%":
+                arg_names.add(self._ginfo.get_name_from_match(match))
+                substitute = ""
             else:
-                attr = ""
-            return match.group(1) + attr + match.group(match.lastindex)
+                substitute = match.group(2)
+            return match.group(1) + substitute + match.group(match.lastindex)
 
         def repl_arg_names(match):
+            nameless_value = self._ginfo.get_nameless_value_from_match(match)
             if (
-                match.group(variable_group_in_ir_value_match) is not None
-                and match.group(variable_group_in_ir_value_match) in arg_names
+                nameless_value.check_key == "%"
+                and self._ginfo.get_name_from_match(match) in arg_names
             ):
                 return match.group(1) + match.group(match.lastindex)
             return match.group(1) + match.group(2) + match.group(match.lastindex)
@@ -720,17 +725,19 @@ def repl_arg_names(match):
             return False
         if self.attrs != attrs:
             return False
-        ans0 = IR_VALUE_RE.sub(drop_arg_names, self.args_and_sig)
-        ans1 = IR_VALUE_RE.sub(drop_arg_names, args_and_sig)
+
+        regexp = self._ginfo.get_regexp()
+        ans0 = regexp.sub(drop_arg_names, self.args_and_sig)
+        ans1 = regexp.sub(drop_arg_names, args_and_sig)
         if ans0 != ans1:
             return False
-        if is_backend:
+        if self._ginfo.is_asm():
             # Check without replacements, the replacements are not applied to the
             # body for backend checks.
             return self.extrascrub == extrascrub
 
-        es0 = IR_VALUE_RE.sub(repl_arg_names, self.extrascrub)
-        es1 = IR_VALUE_RE.sub(repl_arg_names, extrascrub)
+        es0 = regexp.sub(repl_arg_names, self.extrascrub)
+        es1 = regexp.sub(repl_arg_names, extrascrub)
         es0 = SCRUB_IR_COMMENT_RE.sub(r"", es0)
         es1 = SCRUB_IR_COMMENT_RE.sub(r"", es1)
         return es0 == es1
@@ -740,7 +747,7 @@ def __str__(self):
 
 
 class FunctionTestBuilder:
-    def __init__(self, run_list, flags, scrubber_args, path):
+    def __init__(self, run_list, flags, scrubber_args, path, ginfo):
         self._verbose = flags.verbose
         self._record_args = flags.function_signature
         self._check_attributes = flags.check_attributes
@@ -759,6 +766,7 @@ def __init__(self, run_list, flags, scrubber_args, path):
         )
         self._scrubber_args = scrubber_args
         self._path = path
+        self._ginfo = ginfo
         # Strip double-quotes if input was read by UTC_ARGS
         self._replace_value_regex = list(
             map(lambda x: x.strip('"'), flags.replace_value_regex)
@@ -793,10 +801,10 @@ def global_var_dict(self):
     def is_filtered(self):
         return bool(self._filters)
 
-    def process_run_line(
-        self, function_re, scrubber, raw_tool_output, prefixes, is_backend
-    ):
-        build_global_values_dictionary(self._global_var_dict, raw_tool_output, prefixes)
+    def process_run_line(self, function_re, scrubber, raw_tool_output, prefixes):
+        build_global_values_dictionary(
+            self._global_var_dict, raw_tool_output, prefixes, self._ginfo
+        )
         for m in function_re.finditer(raw_tool_output):
             if not m:
                 continue
@@ -806,7 +814,7 @@ def process_run_line(
             # beginning of assembly function definition. In most assemblies, that is just a
             # colon: `foo:`. But, for example, in nvptx it is a brace: `foo(`. If is_backend is
             # False, just assume that separator is an empty string.
-            if is_backend:
+            if self._ginfo.is_asm():
                 # Use ':' as default separator.
                 func_name_separator = (
                     m.group("func_name_separator")
@@ -889,7 +897,6 @@ def process_run_line(
                             funcdef_attrs_and_ret,
                             args_and_sig,
                             attrs,
-                            is_backend,
                         ):
                             self._func_dict[prefix][func].scrub = scrubbed_extra
                             self._func_dict[prefix][func].args_and_sig = args_and_sig
@@ -908,6 +915,7 @@ def process_run_line(
                             args_and_sig,
                             attrs,
                             func_name_separator,
+                            self._ginfo,
                         )
                         self._func_order[prefix].append(func)
                     else:
@@ -948,6 +956,12 @@ def get_failed_prefixes(self):
 
 
 class NamelessValue:
+    """
+    A NamelessValue object represents a type of value in the IR whose "name" we
+    generalize in the generated check lines; where the "name" could be an actual
+    name (as in e.g. `@some_global` or `%x`) or just a number (as in e.g. `%12`
+    or `!4`).
+    """
     def __init__(
         self,
         check_prefix,
@@ -960,12 +974,14 @@ def __init__(
         is_number=False,
         replace_number_with_counter=False,
         match_literally=False,
-        interlaced_with_previous=False
+        interlaced_with_previous=False,
+        ir_suffix=r"",
     ):
         self.check_prefix = check_prefix
         self.check_key = check_key
         self.ir_prefix = ir_prefix
         self.ir_regexp = ir_regexp
+        self.ir_suffix = ir_suffix
         self.global_ir_rhs_regexp = global_ir_rhs_regexp
         self.is_before_functions = is_before_functions
         self.is_number = is_number
@@ -976,15 +992,10 @@ def __init__(
         self.interlaced_with_previous = interlaced_with_previous
         self.variable_mapping = {}
 
-    # Return true if this kind of IR value is "local", basically if it matches '%{{.*}}'.
+    # Return true if this kind of IR value is defined "locally" to functions,
+    # which we assume is only the case precisely for LLVM IR local values.
     def is_local_def_ir_value(self):
-        return self.ir_prefix == "%"
-
-    # Return the IR prefix and check prefix we use for this kind or IR value,
-    # e.g., (%, TMP) for locals. If the IR prefix is a regex, return the prefix
-    # used in the IR output
-    def get_ir_prefix_from_ir_value_match(self, match):
-        return re.search(self.ir_prefix, match[0])[0], self.check_prefix
+        return self.check_key == "%"
 
     # Return the IR regexp we use for this kind or IR value, e.g., [\w.-]+? for locals
     def get_ir_regex(self):
@@ -1019,205 +1030,216 @@ def get_value_name(self, var: str, check_prefix: str):
         var = var.replace("-", "_")
         return var.upper()
 
-    # Create a FileCheck variable from regex.
-    def get_value_definition(self, var, match):
-        # for backwards compatibility we check locals with '.*'
-        varname = self.get_value_name(var, self.check_prefix)
-        prefix = self.get_ir_prefix_from_ir_value_match(match)[0]
-        if self.is_number:
-            regex = ""  # always capture a number in the default format
-            capture_start = "[[#"
-        else:
-            regex = self.get_ir_regex()
-            capture_start = "[["
-        if self.is_local_def_ir_value():
-            return capture_start + varname + ":" + prefix + regex + "]]"
-        return prefix + capture_start + varname + ":" + regex + "]]"
-
-    # Use a FileCheck variable.
-    def get_value_use(self, var, match, var_prefix=None):
-        if var_prefix is None:
-            var_prefix = self.check_prefix
-        capture_start = "[[#" if self.is_number else "[["
-        if self.is_local_def_ir_value():
-            return capture_start + self.get_value_name(var, var_prefix) + "]]"
-        prefix = self.get_ir_prefix_from_ir_value_match(match)[0]
-        return prefix + capture_start + self.get_value_name(var, var_prefix) + "]]"
-
-
-# Description of the different "unnamed" values we match in the IR, e.g.,
-# (local) ssa values, (debug) metadata, etc.
-ir_nameless_values = [
-    #            check_prefix   check_key  ir_prefix           ir_regexp                global_ir_rhs_regexp
-    NamelessValue(r"TMP", "%", r"%", r"[\w$.-]+?", None),
-    NamelessValue(r"ATTR", "#", r"#", r"[0-9]+", None),
-    NamelessValue(r"ATTR", "#", r"attributes #", r"[0-9]+", r"{[^}]*}"),
-    NamelessValue(r"GLOB", "@", r"@", r"[0-9]+", None),
-    NamelessValue(r"GLOB", "@", r"@", r"[0-9]+", r".+", is_before_functions=True),
-    NamelessValue(
-        r"GLOBNAMED",
-        "@",
-        r"@",
-        r"[a-zA-Z0-9_$\"\\.-]*[a-zA-Z_$\"\\.-][a-zA-Z0-9_$\"\\.-]*",
-        r".+",
-        is_before_functions=True,
-        match_literally=True,
-        interlaced_with_previous=True,
-    ),
-    NamelessValue(r"DBG", "!", r"!dbg ", r"![0-9]+", None),
-    NamelessValue(r"DIASSIGNID", "!", r"!DIAssignID ", r"![0-9]+", None),
-    NamelessValue(r"PROF", "!", r"!prof ", r"![0-9]+", None),
-    NamelessValue(r"TBAA", "!", r"!tbaa ", r"![0-9]+", None),
-    NamelessValue(r"TBAA_STRUCT", "!", r"!tbaa.struct ", r"![0-9]+", None),
-    NamelessValue(r"RNG", "!", r"!range ", r"![0-9]+", None),
-    NamelessValue(r"LOOP", "!", r"!llvm.loop ", r"![0-9]+", None),
-    NamelessValue(r"META", "!", r"metadata ", r"![0-9]+", None),
-    NamelessValue(r"META", "!", r"", r"![0-9]+", r"(?:distinct |)!.*"),
-    NamelessValue(r"ACC_GRP", "!", r"!llvm.access.group ", r"![0-9]+", None),
-    NamelessValue(r"META", "!", r"![a-z.]+ ", r"![0-9]+", None),
-]
+    def get_affixes_from_match(self, match: re.Match):
+        prefix = re.match(self.ir_prefix, match.group(2)).group(0)
+        suffix = re.search(self.ir_suffix + "$", match.group(2)).group(0)
+        return prefix, suffix
 
-global_nameless_values = [
-    nameless_value
-    for nameless_value in ir_nameless_values
-    if nameless_value.global_ir_rhs_regexp is not None
-]
-# global variable names should be matched literally
-global_nameless_values_w_unstable_ids = [
-    nameless_value
-    for nameless_value in global_nameless_values
-    if not nameless_value.match_literally
-]
 
-asm_nameless_values = [
-    NamelessValue(
-        r"MCINST",
-        "Inst#",
-        "<MCInst #",
-        r"\d+",
-        r".+",
-        is_number=True,
-        replace_number_with_counter=True,
-    ),
-    NamelessValue(
-        r"MCREG",
-        "Reg:",
-        "<MCOperand Reg:",
-        r"\d+",
-        r".+",
-        is_number=True,
-        replace_number_with_counter=True,
-    ),
-]
+class GeneralizerInfo:
+    """
+    A GeneralizerInfo object holds information about how check lines should be generalized
+    (e.g., variable names replaced by FileCheck meta variables) as well as per-test-file
+    state (e.g. information about IR global variables).
+    """
 
-analyze_nameless_values = [
-    NamelessValue(
-        r"GRP",
-        "#",
-        r"",
-        r"0x[0-9a-f]+",
-        None,
-        replace_number_with_counter=True,
-    ),
-]
+    MODE_IR = 0
+    MODE_ASM = 1
+    MODE_ANALYZE = 2
 
+    def __init__(
+        self,
+        version,
+        mode,
+        nameless_values: List[NamelessValue],
+        regexp_prefix,
+        regexp_suffix,
+    ):
+        self._version = version
+        self._mode = mode
+        self._nameless_values = nameless_values
+
+        self._regexp_prefix = regexp_prefix
+        self._regexp_suffix = regexp_suffix
+
+        self._regexp, _ = self._build_regexp(False, False)
+        (
+            self._unstable_globals_regexp,
+            self._unstable_globals_values,
+        ) = self._build_regexp(True, True)
+
+    def _build_regexp(self, globals_only, unstable_only):
+        matches = []
+        values = []
+        for nameless_value in self._nameless_values:
+            is_global = nameless_value.global_ir_rhs_regexp is not None
+            if globals_only and not is_global:
+                continue
+            if unstable_only and nameless_value.match_literally:
+                continue
 
-def createOrRegexp(old, new):
-    if not old:
-        return new
-    if not new:
-        return old
-    return old + "|" + new
-
-
-def createPrefixMatch(prefix_str, prefix_re):
-    return "(?:" + prefix_str + "(" + prefix_re + "))"
-
-
-# Build the regexp that matches an "IR value". This can be a local variable,
-# argument, global, or metadata, anything that is "named". It is important that
-# the PREFIX and SUFFIX below only contain a single group, if that changes
-# other locations will need adjustment as well.
-IR_VALUE_REGEXP_PREFIX = r"(\s*)"
-IR_VALUE_REGEXP_STRING = r""
-for nameless_value in ir_nameless_values:
-    match = createPrefixMatch(nameless_value.ir_prefix, nameless_value.ir_regexp)
-    if nameless_value.global_ir_rhs_regexp is not None:
-        match = "^" + match
-    IR_VALUE_REGEXP_STRING = createOrRegexp(IR_VALUE_REGEXP_STRING, match)
-IR_VALUE_REGEXP_SUFFIX = r"([,\s\(\)\}]|\Z)"
-IR_VALUE_RE = re.compile(
-    IR_VALUE_REGEXP_PREFIX
-    + r"("
-    + IR_VALUE_REGEXP_STRING
-    + r")"
-    + IR_VALUE_REGEXP_SUFFIX
-)
+            match = f"(?:{nameless_value.ir_prefix}({nameless_value.ir_regexp}){nameless_value.ir_suffix})"
+            if self.is_ir() and not globals_only and is_global:
+                match = "^" + match
+            matches.append(match)
+            values.append(nameless_value)
 
-GLOBAL_VALUE_REGEXP_STRING = r""
-for nameless_value in global_nameless_values_w_unstable_ids:
-    match = createPrefixMatch(nameless_value.ir_prefix, nameless_value.ir_regexp)
-    GLOBAL_VALUE_REGEXP_STRING = createOrRegexp(GLOBAL_VALUE_REGEXP_STRING, match)
-GLOBAL_VALUE_RE = re.compile(
-    IR_VALUE_REGEXP_PREFIX
-    + r"("
-    + GLOBAL_VALUE_REGEXP_STRING
-    + r")"
-    + IR_VALUE_REGEXP_SUFFIX
-)
+        regexp_string = r"|".join(matches)
 
-# Build the regexp that matches an "ASM value" (currently only for --asm-show-inst comments).
-ASM_VALUE_REGEXP_STRING = ""
-for nameless_value in asm_nameless_values:
-    match = createPrefixMatch(nameless_value.ir_prefix, nameless_value.ir_regexp)
-    ASM_VALUE_REGEXP_STRING = createOrRegexp(ASM_VALUE_REGEXP_STRING, match)
-ASM_VALUE_REGEXP_SUFFIX = r"([>\s]|\Z)"
-ASM_VALUE_RE = re.compile(
-    r"((?:#|//)\s*)" + "(" + ASM_VALUE_REGEXP_STRING + ")" + ASM_VALUE_REGEXP_SUFFIX
-)
+        return (
+            re.compile(
+                self._regexp_prefix + r"(" + regexp_string + r")" + self._regexp_suffix
+            ),
+            values,
+        )
 
-ANALYZE_VALUE_REGEXP_PREFIX = r"(\s*)"
-ANALYZE_VALUE_REGEXP_STRING = r""
-for nameless_value in analyze_nameless_values:
-    match = createPrefixMatch(nameless_value.ir_prefix, nameless_value.ir_regexp)
-    ANALYZE_VALUE_REGEXP_STRING = createOrRegexp(ANALYZE_VALUE_REGEXP_STRING, match)
-ANALYZE_VALU...
[truncated]

nikic · 2024-04-17T02:59:45Z

Aside from that, the key challenge is that block labels are values, and we typically capture values including the prefix '%'. However, when labels appear at the start of a basic block, the prefix '%' is not included, so we must capture block label values without the prefix '%'.

Just to throw it out there: Might it make sense to generally not include the prefix % in version 5? That is, use %[[FOO:.*]] instead of [[FOO:%.*]]?

nhaehnle · 2024-04-17T10:28:34Z

Aside from that, the key challenge is that block labels are values, and we typically capture values including the prefix '%'. However, when labels appear at the start of a basic block, the prefix '%' is not included, so we must capture block label values without the prefix '%'.

Just to throw it out there: Might it make sense to generally not include the prefix % in version 5? That is, use %[[FOO:.*]] instead of [[FOO:%.*]]?

I vaguely recall thinking about this way back when I wrote the first version of this change. It will lead to check lines like:

  %[[TMP10:.*]] = add i32 %[[TMP4]], %[[TMP5]]

... where today we have ...

  [[TMP10:%.*]] = add i32 [[TMP4]], [[TMP5]]

I don't feel very strongly about it, but I prefer the second version (i.e., keep it as-is) because the right-hand side has less visual noise.

The tool verifies that the clang command line contains "-emit-llvm", and the asm module is never imported, so this code must be dead.

This change is motivated primarily by the desire to change the list of "NamelessValue" (not a great name) objects based on the --version passed to update_test_checks. In order to be able to use the version number e.g. when building up regular expressions, but also to just reduce the number of surprises caused by global state, this change introduces the GeneralizerInfo class which is meant to hold a bunch of information about how to generalize check lines. The GeneralizerInfo must be instantiated by the "driver" script (update_*_test_checks.py) and is passed around various common functions. Beware that the GeneralizerInfo objects are *not* stateless. The NamelessValue objects currently hold mapping information for "global" values.

Labels are matched using a regexp of the form '^(pattern):', which requires the addition of a "suffix" concept to NamelessValue. Aside from that, the key challenge is that block labels are values, and we typically capture values including the prefix '%'. However, when labels appear at the start of a basic block, the prefix '%' is not included, so we must capture block label values *without* the prefix '%'. We don't know ahead of time whether an IR value is a label or not. In most cases, they are prefixed by the word "label" (their type), but this isn't the case in phi nodes. We solve this issue by leveraging the two-phase nature of variable generalization: the first pass finds all occurences of a variable and determines whether the '%' prefix can be included or not. The second pass does the actual substitution. This change also unifies the generalization path for assembly with that for IR and analysis, in the hope that any future changes avoid diverging those cases future. I also considered the alternative of trying to detect the phi node case using more regular expression special cases but ultimately decided against that because it seemed more fragile, and perhaps the approach of keeping a tentative prefix that may later be discarded could also be eventually applied to some metadata and attribute cases. Note that an early version of this change was reviewed as https://reviews.llvm.org/D142452, before version numbers were introduced. This is a substantially updated version of that change.

nhaehnle · 2024-04-23T14:44:36Z

ping

nhaehnle · 2024-05-16T15:29:09Z

Another ping... it's been a couple of weeks.

hnrklssn

Sorry for the delay, I thought I'd reviewed already! LGTM

hnrklssn · 2024-05-16T22:51:33Z

We solve this issue by leveraging the two-phase nature of variable generalization: the first pass finds all occurences of a variable and determines whether the '%' prefix can be included or not.

I'd be fine with simply never including the '%' prefix in the match, but it's a pretty breaking change so I'm not opposed to keeping the existing behaviour when possible.

…a339769fe Local branch amd-gfx e36a339 Merged main:4c98f5b439ddd204d8ff1e423104215ebd0e1720 into amd-gfx:bfb7a657b78d Remote branch main 597ac47 update_test_checks: match IR basic block labels (llvm#88979)

nhaehnle requested review from nikic, mtrofin, hnrklssn and jasilvanus April 16, 2024 20:12

llvmbot added the testing-tools label Apr 16, 2024

hnrklssn mentioned this pull request Apr 19, 2024

[RemoveDIs] Update update_test_checks script to recognize dbg_records #87388

Merged

nhaehnle added 3 commits April 22, 2024 18:57

update_cc_test_checks: remove dead code

55f7c86

The tool verifies that the clang command line contains "-emit-llvm", and the asm module is never imported, so this code must be dead.

nhaehnle force-pushed the pub-utc-block-labels branch from 976bece to 4535065 Compare April 23, 2024 14:44

hnrklssn approved these changes May 16, 2024

View reviewed changes

nhaehnle merged commit 597ac47 into llvm:main May 18, 2024
4 checks passed

nhaehnle deleted the pub-utc-block-labels branch May 18, 2024 23:39

PeddleSpam mentioned this pull request May 20, 2024

[AMDGPU] Fix error in #88512. #92770

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

update_test_checks: match IR basic block labels #88979

update_test_checks: match IR basic block labels #88979

nhaehnle commented Apr 16, 2024

llvmbot commented Apr 16, 2024

nikic commented Apr 17, 2024

nhaehnle commented Apr 17, 2024

nhaehnle commented Apr 23, 2024

nhaehnle commented May 16, 2024

hnrklssn left a comment

hnrklssn commented May 16, 2024

update_test_checks: match IR basic block labels #88979

update_test_checks: match IR basic block labels #88979

Conversation

nhaehnle commented Apr 16, 2024

llvmbot commented Apr 16, 2024

nikic commented Apr 17, 2024

nhaehnle commented Apr 17, 2024

nhaehnle commented Apr 23, 2024

nhaehnle commented May 16, 2024

hnrklssn left a comment

Choose a reason for hiding this comment

hnrklssn commented May 16, 2024