Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

update_test_checks: match IR basic block labels #88979

Merged
merged 3 commits into from
May 18, 2024

Conversation

nhaehnle
Copy link
Collaborator

Note: This change is split into three commits. The big refactor to GeneralizerInfo objects stands on its own, but the overall PR gives the context to justify why the refactor is done.

Labels are matched using a regexp of the form '^(pattern):', which requires the addition of a "suffix" concept to NamelessValue.

Aside from that, the key challenge is that block labels are values, and we typically capture values including the prefix '%'. However, when labels appear at the start of a basic block, the prefix '%' is not included, so we must capture block label values without the prefix '%'.

We don't know ahead of time whether an IR value is a label or not. In most cases, they are prefixed by the word "label" (their type), but this isn't the case in phi nodes. We solve this issue by leveraging the two-phase nature of variable generalization: the first pass finds all occurences of a variable and determines whether the '%' prefix can be included or not. The second pass does the actual substitution.

This change also unifies the generalization path for assembly with that for IR and analysis, in the hope that any future changes avoid diverging those cases future.

I also considered the alternative of trying to detect the phi node case using more regular expression special cases but ultimately decided against that because it seemed more fragile, and perhaps the approach of keeping a tentative prefix that may later be discarded could also be eventually applied to some metadata and attribute cases.

Note that an early version of this change was reviewed as https://reviews.llvm.org/D142452, before version numbers were
introduced. This is a substantially updated version of that change.

@llvmbot
Copy link
Collaborator

llvmbot commented Apr 16, 2024

@llvm/pr-subscribers-testing-tools

Author: Nicolai Hähnle (nhaehnle)

Changes

Note: This change is split into three commits. The big refactor to GeneralizerInfo objects stands on its own, but the overall PR gives the context to justify why the refactor is done.

Labels are matched using a regexp of the form '^(pattern):', which requires the addition of a "suffix" concept to NamelessValue.

Aside from that, the key challenge is that block labels are values, and we typically capture values including the prefix '%'. However, when labels appear at the start of a basic block, the prefix '%' is not included, so we must capture block label values without the prefix '%'.

We don't know ahead of time whether an IR value is a label or not. In most cases, they are prefixed by the word "label" (their type), but this isn't the case in phi nodes. We solve this issue by leveraging the two-phase nature of variable generalization: the first pass finds all occurences of a variable and determines whether the '%' prefix can be included or not. The second pass does the actual substitution.

This change also unifies the generalization path for assembly with that for IR and analysis, in the hope that any future changes avoid diverging those cases future.

I also considered the alternative of trying to detect the phi node case using more regular expression special cases but ultimately decided against that because it seemed more fragile, and perhaps the approach of keeping a tentative prefix that may later be discarded could also be eventually applied to some metadata and attribute cases.

Note that an early version of this change was reviewed as https://reviews.llvm.org/D142452, before version numbers were
introduced. This is a substantially updated version of that change.


Patch is 63.33 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/88979.diff

9 Files Affected:

  • (modified) llvm/test/tools/UpdateTestChecks/update_test_checks/Inputs/phi-labels.ll.expected (+18-18)
  • (modified) llvm/test/tools/UpdateTestChecks/update_test_checks/phi-labels.test (+1-1)
  • (modified) llvm/utils/UpdateTestChecks/asm.py (+2-3)
  • (modified) llvm/utils/UpdateTestChecks/common.py (+469-456)
  • (modified) llvm/utils/UpdateTestChecks/isel.py (+2-3)
  • (modified) llvm/utils/update_analyze_test_checks.py (+3-2)
  • (modified) llvm/utils/update_cc_test_checks.py (+23-26)
  • (modified) llvm/utils/update_llc_test_checks.py (+5-3)
  • (modified) llvm/utils/update_test_checks.py (+11-4)
diff --git a/llvm/test/tools/UpdateTestChecks/update_test_checks/Inputs/phi-labels.ll.expected b/llvm/test/tools/UpdateTestChecks/update_test_checks/Inputs/phi-labels.ll.expected
index 1d21ebe547f689..5e70a6c89d3276 100644
--- a/llvm/test/tools/UpdateTestChecks/update_test_checks/Inputs/phi-labels.ll.expected
+++ b/llvm/test/tools/UpdateTestChecks/update_test_checks/Inputs/phi-labels.ll.expected
@@ -1,15 +1,15 @@
-; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 4
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 5
 ; RUN: opt < %s -S | FileCheck %s
 
 define i32 @phi_after_label(i1 %cc) {
 ; CHECK-LABEL: define i32 @phi_after_label(
 ; CHECK-SAME: i1 [[CC:%.*]]) {
-; CHECK-NEXT:  entry:
-; CHECK-NEXT:    br i1 [[CC]], label [[THEN:%.*]], label [[END:%.*]]
-; CHECK:       then:
-; CHECK-NEXT:    br label [[END]]
-; CHECK:       end:
-; CHECK-NEXT:    [[R:%.*]] = phi i32 [ 0, [[ENTRY:%.*]] ], [ 1, [[THEN]] ]
+; CHECK-NEXT:  [[ENTRY:.*]]:
+; CHECK-NEXT:    br i1 [[CC]], label %[[THEN:.*]], label %[[END:.*]]
+; CHECK:       [[THEN]]:
+; CHECK-NEXT:    br label %[[END]]
+; CHECK:       [[END]]:
+; CHECK-NEXT:    [[R:%.*]] = phi i32 [ 0, %[[ENTRY]] ], [ 1, %[[THEN]] ]
 ; CHECK-NEXT:    ret i32 [[R]]
 ;
 entry:
@@ -26,14 +26,14 @@ end:
 define void @phi_before_label(i32 %bound) {
 ; CHECK-LABEL: define void @phi_before_label(
 ; CHECK-SAME: i32 [[BOUND:%.*]]) {
-; CHECK-NEXT:  entry:
-; CHECK-NEXT:    br label [[LOOP:%.*]]
-; CHECK:       loop:
-; CHECK-NEXT:    [[CTR:%.*]] = phi i32 [ 0, [[ENTRY:%.*]] ], [ [[CTR_NEXT:%.*]], [[LOOP]] ]
+; CHECK-NEXT:  [[ENTRY:.*]]:
+; CHECK-NEXT:    br label %[[LOOP:.*]]
+; CHECK:       [[LOOP]]:
+; CHECK-NEXT:    [[CTR:%.*]] = phi i32 [ 0, %[[ENTRY]] ], [ [[CTR_NEXT:%.*]], %[[LOOP]] ]
 ; CHECK-NEXT:    [[CTR_NEXT]] = add i32 [[CTR]], 1
 ; CHECK-NEXT:    [[CC:%.*]] = icmp ult i32 [[CTR_NEXT]], [[BOUND]]
-; CHECK-NEXT:    br i1 [[CC]], label [[LOOP]], label [[END:%.*]]
-; CHECK:       end:
+; CHECK-NEXT:    br i1 [[CC]], label %[[LOOP]], label %[[END:.*]]
+; CHECK:       [[END]]:
 ; CHECK-NEXT:    ret void
 ;
 entry:
@@ -52,11 +52,11 @@ end:
 define i32 @phi_after_label_unnamed(i1 %cc) {
 ; CHECK-LABEL: define i32 @phi_after_label_unnamed(
 ; CHECK-SAME: i1 [[CC:%.*]]) {
-; CHECK-NEXT:    br i1 [[CC]], label [[TMP1:%.*]], label [[TMP2:%.*]]
-; CHECK:       1:
-; CHECK-NEXT:    br label [[TMP2]]
-; CHECK:       2:
-; CHECK-NEXT:    [[R:%.*]] = phi i32 [ 0, [[TMP0:%.*]] ], [ 1, [[TMP1]] ]
+; CHECK-NEXT:    br i1 [[CC]], label %[[BB1:.*]], label %[[BB2:.*]]
+; CHECK:       [[BB1]]:
+; CHECK-NEXT:    br label %[[BB2]]
+; CHECK:       [[BB2]]:
+; CHECK-NEXT:    [[R:%.*]] = phi i32 [ 0, [[TMP0:%.*]] ], [ 1, %[[BB1]] ]
 ; CHECK-NEXT:    ret i32 [[R]]
 ;
 0:
diff --git a/llvm/test/tools/UpdateTestChecks/update_test_checks/phi-labels.test b/llvm/test/tools/UpdateTestChecks/update_test_checks/phi-labels.test
index 411c84de1dcba5..2b0d0cb7f54ba8 100644
--- a/llvm/test/tools/UpdateTestChecks/update_test_checks/phi-labels.test
+++ b/llvm/test/tools/UpdateTestChecks/update_test_checks/phi-labels.test
@@ -1,4 +1,4 @@
-# RUN: cp -f %S/Inputs/phi-labels.ll %t.ll && %update_test_checks --version 4 %t.ll
+# RUN: cp -f %S/Inputs/phi-labels.ll %t.ll && %update_test_checks --version 5 %t.ll
 # RUN: diff -u %t.ll %S/Inputs/phi-labels.ll.expected
 ## Check that running the script again does not change the result:
 # RUN: %update_test_checks %t.ll
diff --git a/llvm/utils/UpdateTestChecks/asm.py b/llvm/utils/UpdateTestChecks/asm.py
index f0c456a1648df4..33ede81a416017 100644
--- a/llvm/utils/UpdateTestChecks/asm.py
+++ b/llvm/utils/UpdateTestChecks/asm.py
@@ -605,6 +605,7 @@ def add_checks(
     prefix_list,
     func_dict,
     func_name,
+    ginfo: common.GeneralizerInfo,
     global_vars_seen_dict,
     is_filtered,
 ):
@@ -617,9 +618,7 @@ def add_checks(
         func_dict,
         func_name,
         check_label_format,
-        True,
-        False,
-        1,
+        ginfo,
         global_vars_seen_dict,
         is_filtered=is_filtered,
     )
diff --git a/llvm/utils/UpdateTestChecks/common.py b/llvm/utils/UpdateTestChecks/common.py
index eed36a0cdd73fd..2e3c3a1621cef2 100644
--- a/llvm/utils/UpdateTestChecks/common.py
+++ b/llvm/utils/UpdateTestChecks/common.py
@@ -30,8 +30,9 @@
    in case arguments are split to a separate SAME line.
 4: --check-globals now has a third option ('smart'). The others are now called
    'none' and 'all'. 'smart' is the default.
+5: Basic block labels are matched by FileCheck expressions
 """
-DEFAULT_VERSION = 4
+DEFAULT_VERSION = 5
 
 
 SUPPORTED_ANALYSES = {
@@ -687,6 +688,7 @@ def __init__(
         args_and_sig,
         attrs,
         func_name_separator,
+        ginfo,
     ):
         self.scrub = string
         self.extrascrub = extra
@@ -694,24 +696,27 @@ def __init__(
         self.args_and_sig = args_and_sig
         self.attrs = attrs
         self.func_name_separator = func_name_separator
+        self._ginfo = ginfo
 
     def is_same_except_arg_names(
-        self, extrascrub, funcdef_attrs_and_ret, args_and_sig, attrs, is_backend
+        self, extrascrub, funcdef_attrs_and_ret, args_and_sig, attrs
     ):
         arg_names = set()
 
         def drop_arg_names(match):
-            arg_names.add(match.group(variable_group_in_ir_value_match))
-            if match.group(attribute_group_in_ir_value_match):
-                attr = match.group(attribute_group_in_ir_value_match)
+            nameless_value = self._ginfo.get_nameless_value_from_match(match)
+            if nameless_value.check_key == "%":
+                arg_names.add(self._ginfo.get_name_from_match(match))
+                substitute = ""
             else:
-                attr = ""
-            return match.group(1) + attr + match.group(match.lastindex)
+                substitute = match.group(2)
+            return match.group(1) + substitute + match.group(match.lastindex)
 
         def repl_arg_names(match):
+            nameless_value = self._ginfo.get_nameless_value_from_match(match)
             if (
-                match.group(variable_group_in_ir_value_match) is not None
-                and match.group(variable_group_in_ir_value_match) in arg_names
+                nameless_value.check_key == "%"
+                and self._ginfo.get_name_from_match(match) in arg_names
             ):
                 return match.group(1) + match.group(match.lastindex)
             return match.group(1) + match.group(2) + match.group(match.lastindex)
@@ -720,17 +725,19 @@ def repl_arg_names(match):
             return False
         if self.attrs != attrs:
             return False
-        ans0 = IR_VALUE_RE.sub(drop_arg_names, self.args_and_sig)
-        ans1 = IR_VALUE_RE.sub(drop_arg_names, args_and_sig)
+
+        regexp = self._ginfo.get_regexp()
+        ans0 = regexp.sub(drop_arg_names, self.args_and_sig)
+        ans1 = regexp.sub(drop_arg_names, args_and_sig)
         if ans0 != ans1:
             return False
-        if is_backend:
+        if self._ginfo.is_asm():
             # Check without replacements, the replacements are not applied to the
             # body for backend checks.
             return self.extrascrub == extrascrub
 
-        es0 = IR_VALUE_RE.sub(repl_arg_names, self.extrascrub)
-        es1 = IR_VALUE_RE.sub(repl_arg_names, extrascrub)
+        es0 = regexp.sub(repl_arg_names, self.extrascrub)
+        es1 = regexp.sub(repl_arg_names, extrascrub)
         es0 = SCRUB_IR_COMMENT_RE.sub(r"", es0)
         es1 = SCRUB_IR_COMMENT_RE.sub(r"", es1)
         return es0 == es1
@@ -740,7 +747,7 @@ def __str__(self):
 
 
 class FunctionTestBuilder:
-    def __init__(self, run_list, flags, scrubber_args, path):
+    def __init__(self, run_list, flags, scrubber_args, path, ginfo):
         self._verbose = flags.verbose
         self._record_args = flags.function_signature
         self._check_attributes = flags.check_attributes
@@ -759,6 +766,7 @@ def __init__(self, run_list, flags, scrubber_args, path):
         )
         self._scrubber_args = scrubber_args
         self._path = path
+        self._ginfo = ginfo
         # Strip double-quotes if input was read by UTC_ARGS
         self._replace_value_regex = list(
             map(lambda x: x.strip('"'), flags.replace_value_regex)
@@ -793,10 +801,10 @@ def global_var_dict(self):
     def is_filtered(self):
         return bool(self._filters)
 
-    def process_run_line(
-        self, function_re, scrubber, raw_tool_output, prefixes, is_backend
-    ):
-        build_global_values_dictionary(self._global_var_dict, raw_tool_output, prefixes)
+    def process_run_line(self, function_re, scrubber, raw_tool_output, prefixes):
+        build_global_values_dictionary(
+            self._global_var_dict, raw_tool_output, prefixes, self._ginfo
+        )
         for m in function_re.finditer(raw_tool_output):
             if not m:
                 continue
@@ -806,7 +814,7 @@ def process_run_line(
             # beginning of assembly function definition. In most assemblies, that is just a
             # colon: `foo:`. But, for example, in nvptx it is a brace: `foo(`. If is_backend is
             # False, just assume that separator is an empty string.
-            if is_backend:
+            if self._ginfo.is_asm():
                 # Use ':' as default separator.
                 func_name_separator = (
                     m.group("func_name_separator")
@@ -889,7 +897,6 @@ def process_run_line(
                             funcdef_attrs_and_ret,
                             args_and_sig,
                             attrs,
-                            is_backend,
                         ):
                             self._func_dict[prefix][func].scrub = scrubbed_extra
                             self._func_dict[prefix][func].args_and_sig = args_and_sig
@@ -908,6 +915,7 @@ def process_run_line(
                             args_and_sig,
                             attrs,
                             func_name_separator,
+                            self._ginfo,
                         )
                         self._func_order[prefix].append(func)
                     else:
@@ -948,6 +956,12 @@ def get_failed_prefixes(self):
 
 
 class NamelessValue:
+    """
+    A NamelessValue object represents a type of value in the IR whose "name" we
+    generalize in the generated check lines; where the "name" could be an actual
+    name (as in e.g. `@some_global` or `%x`) or just a number (as in e.g. `%12`
+    or `!4`).
+    """
     def __init__(
         self,
         check_prefix,
@@ -960,12 +974,14 @@ def __init__(
         is_number=False,
         replace_number_with_counter=False,
         match_literally=False,
-        interlaced_with_previous=False
+        interlaced_with_previous=False,
+        ir_suffix=r"",
     ):
         self.check_prefix = check_prefix
         self.check_key = check_key
         self.ir_prefix = ir_prefix
         self.ir_regexp = ir_regexp
+        self.ir_suffix = ir_suffix
         self.global_ir_rhs_regexp = global_ir_rhs_regexp
         self.is_before_functions = is_before_functions
         self.is_number = is_number
@@ -976,15 +992,10 @@ def __init__(
         self.interlaced_with_previous = interlaced_with_previous
         self.variable_mapping = {}
 
-    # Return true if this kind of IR value is "local", basically if it matches '%{{.*}}'.
+    # Return true if this kind of IR value is defined "locally" to functions,
+    # which we assume is only the case precisely for LLVM IR local values.
     def is_local_def_ir_value(self):
-        return self.ir_prefix == "%"
-
-    # Return the IR prefix and check prefix we use for this kind or IR value,
-    # e.g., (%, TMP) for locals. If the IR prefix is a regex, return the prefix
-    # used in the IR output
-    def get_ir_prefix_from_ir_value_match(self, match):
-        return re.search(self.ir_prefix, match[0])[0], self.check_prefix
+        return self.check_key == "%"
 
     # Return the IR regexp we use for this kind or IR value, e.g., [\w.-]+? for locals
     def get_ir_regex(self):
@@ -1019,205 +1030,216 @@ def get_value_name(self, var: str, check_prefix: str):
         var = var.replace("-", "_")
         return var.upper()
 
-    # Create a FileCheck variable from regex.
-    def get_value_definition(self, var, match):
-        # for backwards compatibility we check locals with '.*'
-        varname = self.get_value_name(var, self.check_prefix)
-        prefix = self.get_ir_prefix_from_ir_value_match(match)[0]
-        if self.is_number:
-            regex = ""  # always capture a number in the default format
-            capture_start = "[[#"
-        else:
-            regex = self.get_ir_regex()
-            capture_start = "[["
-        if self.is_local_def_ir_value():
-            return capture_start + varname + ":" + prefix + regex + "]]"
-        return prefix + capture_start + varname + ":" + regex + "]]"
-
-    # Use a FileCheck variable.
-    def get_value_use(self, var, match, var_prefix=None):
-        if var_prefix is None:
-            var_prefix = self.check_prefix
-        capture_start = "[[#" if self.is_number else "[["
-        if self.is_local_def_ir_value():
-            return capture_start + self.get_value_name(var, var_prefix) + "]]"
-        prefix = self.get_ir_prefix_from_ir_value_match(match)[0]
-        return prefix + capture_start + self.get_value_name(var, var_prefix) + "]]"
-
-
-# Description of the different "unnamed" values we match in the IR, e.g.,
-# (local) ssa values, (debug) metadata, etc.
-ir_nameless_values = [
-    #            check_prefix   check_key  ir_prefix           ir_regexp                global_ir_rhs_regexp
-    NamelessValue(r"TMP", "%", r"%", r"[\w$.-]+?", None),
-    NamelessValue(r"ATTR", "#", r"#", r"[0-9]+", None),
-    NamelessValue(r"ATTR", "#", r"attributes #", r"[0-9]+", r"{[^}]*}"),
-    NamelessValue(r"GLOB", "@", r"@", r"[0-9]+", None),
-    NamelessValue(r"GLOB", "@", r"@", r"[0-9]+", r".+", is_before_functions=True),
-    NamelessValue(
-        r"GLOBNAMED",
-        "@",
-        r"@",
-        r"[a-zA-Z0-9_$\"\\.-]*[a-zA-Z_$\"\\.-][a-zA-Z0-9_$\"\\.-]*",
-        r".+",
-        is_before_functions=True,
-        match_literally=True,
-        interlaced_with_previous=True,
-    ),
-    NamelessValue(r"DBG", "!", r"!dbg ", r"![0-9]+", None),
-    NamelessValue(r"DIASSIGNID", "!", r"!DIAssignID ", r"![0-9]+", None),
-    NamelessValue(r"PROF", "!", r"!prof ", r"![0-9]+", None),
-    NamelessValue(r"TBAA", "!", r"!tbaa ", r"![0-9]+", None),
-    NamelessValue(r"TBAA_STRUCT", "!", r"!tbaa.struct ", r"![0-9]+", None),
-    NamelessValue(r"RNG", "!", r"!range ", r"![0-9]+", None),
-    NamelessValue(r"LOOP", "!", r"!llvm.loop ", r"![0-9]+", None),
-    NamelessValue(r"META", "!", r"metadata ", r"![0-9]+", None),
-    NamelessValue(r"META", "!", r"", r"![0-9]+", r"(?:distinct |)!.*"),
-    NamelessValue(r"ACC_GRP", "!", r"!llvm.access.group ", r"![0-9]+", None),
-    NamelessValue(r"META", "!", r"![a-z.]+ ", r"![0-9]+", None),
-]
+    def get_affixes_from_match(self, match: re.Match):
+        prefix = re.match(self.ir_prefix, match.group(2)).group(0)
+        suffix = re.search(self.ir_suffix + "$", match.group(2)).group(0)
+        return prefix, suffix
 
-global_nameless_values = [
-    nameless_value
-    for nameless_value in ir_nameless_values
-    if nameless_value.global_ir_rhs_regexp is not None
-]
-# global variable names should be matched literally
-global_nameless_values_w_unstable_ids = [
-    nameless_value
-    for nameless_value in global_nameless_values
-    if not nameless_value.match_literally
-]
 
-asm_nameless_values = [
-    NamelessValue(
-        r"MCINST",
-        "Inst#",
-        "<MCInst #",
-        r"\d+",
-        r".+",
-        is_number=True,
-        replace_number_with_counter=True,
-    ),
-    NamelessValue(
-        r"MCREG",
-        "Reg:",
-        "<MCOperand Reg:",
-        r"\d+",
-        r".+",
-        is_number=True,
-        replace_number_with_counter=True,
-    ),
-]
+class GeneralizerInfo:
+    """
+    A GeneralizerInfo object holds information about how check lines should be generalized
+    (e.g., variable names replaced by FileCheck meta variables) as well as per-test-file
+    state (e.g. information about IR global variables).
+    """
 
-analyze_nameless_values = [
-    NamelessValue(
-        r"GRP",
-        "#",
-        r"",
-        r"0x[0-9a-f]+",
-        None,
-        replace_number_with_counter=True,
-    ),
-]
+    MODE_IR = 0
+    MODE_ASM = 1
+    MODE_ANALYZE = 2
 
+    def __init__(
+        self,
+        version,
+        mode,
+        nameless_values: List[NamelessValue],
+        regexp_prefix,
+        regexp_suffix,
+    ):
+        self._version = version
+        self._mode = mode
+        self._nameless_values = nameless_values
+
+        self._regexp_prefix = regexp_prefix
+        self._regexp_suffix = regexp_suffix
+
+        self._regexp, _ = self._build_regexp(False, False)
+        (
+            self._unstable_globals_regexp,
+            self._unstable_globals_values,
+        ) = self._build_regexp(True, True)
+
+    def _build_regexp(self, globals_only, unstable_only):
+        matches = []
+        values = []
+        for nameless_value in self._nameless_values:
+            is_global = nameless_value.global_ir_rhs_regexp is not None
+            if globals_only and not is_global:
+                continue
+            if unstable_only and nameless_value.match_literally:
+                continue
 
-def createOrRegexp(old, new):
-    if not old:
-        return new
-    if not new:
-        return old
-    return old + "|" + new
-
-
-def createPrefixMatch(prefix_str, prefix_re):
-    return "(?:" + prefix_str + "(" + prefix_re + "))"
-
-
-# Build the regexp that matches an "IR value". This can be a local variable,
-# argument, global, or metadata, anything that is "named". It is important that
-# the PREFIX and SUFFIX below only contain a single group, if that changes
-# other locations will need adjustment as well.
-IR_VALUE_REGEXP_PREFIX = r"(\s*)"
-IR_VALUE_REGEXP_STRING = r""
-for nameless_value in ir_nameless_values:
-    match = createPrefixMatch(nameless_value.ir_prefix, nameless_value.ir_regexp)
-    if nameless_value.global_ir_rhs_regexp is not None:
-        match = "^" + match
-    IR_VALUE_REGEXP_STRING = createOrRegexp(IR_VALUE_REGEXP_STRING, match)
-IR_VALUE_REGEXP_SUFFIX = r"([,\s\(\)\}]|\Z)"
-IR_VALUE_RE = re.compile(
-    IR_VALUE_REGEXP_PREFIX
-    + r"("
-    + IR_VALUE_REGEXP_STRING
-    + r")"
-    + IR_VALUE_REGEXP_SUFFIX
-)
+            match = f"(?:{nameless_value.ir_prefix}({nameless_value.ir_regexp}){nameless_value.ir_suffix})"
+            if self.is_ir() and not globals_only and is_global:
+                match = "^" + match
+            matches.append(match)
+            values.append(nameless_value)
 
-GLOBAL_VALUE_REGEXP_STRING = r""
-for nameless_value in global_nameless_values_w_unstable_ids:
-    match = createPrefixMatch(nameless_value.ir_prefix, nameless_value.ir_regexp)
-    GLOBAL_VALUE_REGEXP_STRING = createOrRegexp(GLOBAL_VALUE_REGEXP_STRING, match)
-GLOBAL_VALUE_RE = re.compile(
-    IR_VALUE_REGEXP_PREFIX
-    + r"("
-    + GLOBAL_VALUE_REGEXP_STRING
-    + r")"
-    + IR_VALUE_REGEXP_SUFFIX
-)
+        regexp_string = r"|".join(matches)
 
-# Build the regexp that matches an "ASM value" (currently only for --asm-show-inst comments).
-ASM_VALUE_REGEXP_STRING = ""
-for nameless_value in asm_nameless_values:
-    match = createPrefixMatch(nameless_value.ir_prefix, nameless_value.ir_regexp)
-    ASM_VALUE_REGEXP_STRING = createOrRegexp(ASM_VALUE_REGEXP_STRING, match)
-ASM_VALUE_REGEXP_SUFFIX = r"([>\s]|\Z)"
-ASM_VALUE_RE = re.compile(
-    r"((?:#|//)\s*)" + "(" + ASM_VALUE_REGEXP_STRING + ")" + ASM_VALUE_REGEXP_SUFFIX
-)
+        return (
+            re.compile(
+                self._regexp_prefix + r"(" + regexp_string + r")" + self._regexp_suffix
+            ),
+            values,
+        )
 
-ANALYZE_VALUE_REGEXP_PREFIX = r"(\s*)"
-ANALYZE_VALUE_REGEXP_STRING = r""
-for nameless_value in analyze_nameless_values:
-    match = createPrefixMatch(nameless_value.ir_prefix, nameless_value.ir_regexp)
-    ANALYZE_VALUE_REGEXP_STRING = createOrRegexp(ANALYZE_VALUE_REGEXP_STRING, match)
-ANALYZE_VALU...
[truncated]

@nikic
Copy link
Contributor

nikic commented Apr 17, 2024

Aside from that, the key challenge is that block labels are values, and we typically capture values including the prefix '%'. However, when labels appear at the start of a basic block, the prefix '%' is not included, so we must capture block label values without the prefix '%'.

Just to throw it out there: Might it make sense to generally not include the prefix % in version 5? That is, use %[[FOO:.*]] instead of [[FOO:%.*]]?

@nhaehnle
Copy link
Collaborator Author

Aside from that, the key challenge is that block labels are values, and we typically capture values including the prefix '%'. However, when labels appear at the start of a basic block, the prefix '%' is not included, so we must capture block label values without the prefix '%'.

Just to throw it out there: Might it make sense to generally not include the prefix % in version 5? That is, use %[[FOO:.*]] instead of [[FOO:%.*]]?

I vaguely recall thinking about this way back when I wrote the first version of this change. It will lead to check lines like:

  %[[TMP10:.*]] = add i32 %[[TMP4]], %[[TMP5]]

... where today we have ...

  [[TMP10:%.*]] = add i32 [[TMP4]], [[TMP5]]

I don't feel very strongly about it, but I prefer the second version (i.e., keep it as-is) because the right-hand side has less visual noise.

The tool verifies that the clang command line contains "-emit-llvm", and
the asm module is never imported, so this code must be dead.
This change is motivated primarily by the desire to change the list of
"NamelessValue" (not a great name) objects based on the --version passed
to update_test_checks.

In order to be able to use the version number e.g. when building up
regular expressions, but also to just reduce the number of surprises
caused by global state, this change introduces the GeneralizerInfo class
which is meant to hold a bunch of information about how to generalize
check lines.

The GeneralizerInfo must be instantiated by the "driver" script
(update_*_test_checks.py) and is passed around various common functions.

Beware that the GeneralizerInfo objects are *not* stateless. The
NamelessValue objects currently hold mapping information for "global"
values.
Labels are matched using a regexp of the form '^(pattern):', which
requires the addition of a "suffix" concept to NamelessValue.

Aside from that, the key challenge is that block labels are values, and
we typically capture values including the prefix '%'. However, when
labels appear at the start of a basic block, the prefix '%' is not
included, so we must capture block label values *without* the prefix
'%'.

We don't know ahead of time whether an IR value is a label or not. In
most cases, they are prefixed by the word "label" (their type), but this
isn't the case in phi nodes. We solve this issue by leveraging the
two-phase nature of variable generalization: the first pass finds
all occurences of a variable and determines whether the '%' prefix can
be included or not. The second pass does the actual substitution.

This change also unifies the generalization path for assembly with that
for IR and analysis, in the hope that any future changes avoid diverging
those cases future.

I also considered the alternative of trying to detect the phi node case
using more regular expression special cases but ultimately decided
against that because it seemed more fragile, and perhaps the approach
of keeping a tentative prefix that may later be discarded could also be
eventually applied to some metadata and attribute cases.

Note that an early version of this change was reviewed
as https://reviews.llvm.org/D142452, before version numbers were
introduced. This is a substantially updated version of that change.
@nhaehnle
Copy link
Collaborator Author

ping

@nhaehnle
Copy link
Collaborator Author

Another ping... it's been a couple of weeks.

Copy link
Member

@hnrklssn hnrklssn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delay, I thought I'd reviewed already! LGTM

@hnrklssn
Copy link
Member

We solve this issue by leveraging the two-phase nature of variable generalization: the first pass finds all occurences of a variable and determines whether the '%' prefix can be included or not.

I'd be fine with simply never including the '%' prefix in the match, but it's a pretty breaking change so I'm not opposed to keeping the existing behaviour when possible.

@nhaehnle nhaehnle merged commit 597ac47 into llvm:main May 18, 2024
4 checks passed
@nhaehnle nhaehnle deleted the pub-utc-block-labels branch May 18, 2024 23:39
qiaojbao pushed a commit to GPUOpen-Drivers/llvm-project that referenced this pull request Jun 5, 2024
…a339769fe

Local branch amd-gfx e36a339 Merged main:4c98f5b439ddd204d8ff1e423104215ebd0e1720 into amd-gfx:bfb7a657b78d
Remote branch main 597ac47 update_test_checks: match IR basic block labels (llvm#88979)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants