Skip to content

Conversation

@kosarev
Copy link
Collaborator

@kosarev kosarev commented Oct 21, 2025

Refine the code a bit to make it easier to comprehend the logic.

Copy link
Collaborator Author

kosarev commented Oct 21, 2025

Warning

This pull request is not mergeable via GitHub because a downstack PR is open. Once all requirements are satisfied, merge this PR as a stack on Graphite.
Learn more

This stack of pull requests is managed by Graphite. Learn more about stacking.

@llvmbot
Copy link
Member

llvmbot commented Oct 21, 2025

@llvm/pr-subscribers-testing-tools

Author: Ivan Kosarev (kosarev)

Changes

Refine the code a bit to make it easier to comprehend the logic.


Full diff: https://github.com/llvm/llvm-project/pull/164454.diff

1 Files Affected:

  • (modified) llvm/utils/update_mc_test_checks.py (+20-36)
diff --git a/llvm/utils/update_mc_test_checks.py b/llvm/utils/update_mc_test_checks.py
index c830833a11a14..fb2f7e0b78730 100755
--- a/llvm/utils/update_mc_test_checks.py
+++ b/llvm/utils/update_mc_test_checks.py
@@ -223,9 +223,6 @@ def update_test(ti: common.TestInfo):
     testlines = list(dict.fromkeys(testlines))
     common.debug("Valid test line found: ", len(testlines))
 
-    run_list_size = len(run_list)
-    testnum = len(testlines)
-
     raw_output = []
     raw_prefixes = []
     for (
@@ -267,14 +264,12 @@ def update_test(ti: common.TestInfo):
     prefix_set = set([prefix for p in run_list for prefix in p[0]])
     common.debug("Rewriting FileCheck prefixes:", str(prefix_set))
 
-    for test_id in range(testnum):
-        input_line = testlines[test_id]
-
+    for test_id, input_line in enumerate(testlines):
         # a {prefix : output, [runid] } dict
         # insert output to a prefix-key dict, and do a max sorting
         # to select the most-used prefix which share the same output string
         p_dict = {}
-        for run_id in range(run_list_size):
+        for run_id in range(len(run_list)):
             out = raw_output[run_id][test_id]
 
             if hasErr(out):
@@ -282,45 +277,34 @@ def update_test(ti: common.TestInfo):
             else:
                 o = getOutputString(out)
 
-            prefixes = raw_prefixes[run_id]
-
-            for p in prefixes:
+            for p in raw_prefixes[run_id]:
                 if p not in p_dict:
                     p_dict[p] = o, [run_id]
-                else:
-                    if p_dict[p] == (None, []):
-                        continue
+                    continue
 
-                    prev_o, run_ids = p_dict[p]
-                    if o == prev_o:
-                        run_ids.append(run_id)
-                        p_dict[p] = o, run_ids
-                    else:
-                        # conflict, discard
-                        p_dict[p] = None, []
+                if p_dict[p] == (None, []):
+                    continue
 
-        p_dict_sorted = dict(sorted(p_dict.items(), key=lambda item: -len(item[1][1])))
+                prev_o, run_ids = p_dict[p]
+                if o == prev_o:
+                    run_ids.append(run_id)
+                    p_dict[p] = o, run_ids
+                else:
+                    # conflict, discard
+                    p_dict[p] = None, []
 
         # prefix is selected and generated with most shared output lines
         # each run_id can only be used once
-        used_runid = set()
-
+        used_run_ids = set()
         selected_prefixes = set()
-        for prefix, tup in p_dict_sorted.items():
-            o, run_ids = tup
-
-            if len(run_ids) == 0:
-                continue
-
-            skip = False
-            for i in run_ids:
-                if i in used_runid:
-                    skip = True
-                else:
-                    used_runid.add(i)
-            if not skip:
+        get_num_runs = lambda item: len(item[1][1])
+        p_dict_sorted = sorted(p_dict.items(), key=get_num_runs, reverse=True)
+        for prefix, (o, run_ids) in p_dict_sorted:
+            if run_ids and used_run_ids.isdisjoint(run_ids):
                 selected_prefixes.add(prefix)
 
+            used_run_ids.update(run_ids)
+
         # Generate check lines in alphabetical order.
         check_lines = []
         for prefix in sorted(selected_prefixes):

if run_ids and used_run_ids.isdisjoint(run_ids):
selected_prefixes.add(prefix)

used_run_ids.update(run_ids)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Don't know whether its more efficient to put used_run_ids.add(run_ids) inside the if or .update outside it. Probably doesn't matter.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe that would be a functional change. The original code adds all the ids regardless of whether there are common ones, though I'm not convinced it's perfectly correct. Maybe subject to further work.

Performance-wise, I don't see how this would make any noticeable difference.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I'm misunderstanding. used_run_ids.update(run_ids) does not add items in the argument to the object if they are already present (no duplicates in used_run_ids).
This seems to achieve the same thing as the following code, which is to not add duplicates.

                if i in used_runid:
                    skip = True
                else:
                    used_runid.add(i)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see how this is relevant.

What I'm saying is, putting the used_run_ids.update(run_ids) under the if run_ids and used_run_ids.isdisjoint(run_ids) as you suggest would be a functional change, because the original code updates used_run_ids with run_ids regardless of whether they were disjoint.

So given we have something like p_dict = {'GFX8': (..., [1, 2, 3]), 'GFX9': (..., [3, 4, 5]), 'GFX10': (..., [4, 5])}, we currently should end up having GFX8 as the only member of selected_prefixes, whereas with the update() under the if it would contain GFX8 and GFX10.

Copy link
Contributor

@broxigarchen broxigarchen Oct 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There could be case that the current run_ids list and the used_run_ids is not completely disjoint. In this case we will not generate the prefix check string, but still we want to mark these run_id as used.

This is actually an error case. Noted that one run_id can only generate one output, when duplicated run_id is saw in the prefix iteration, it means there are other run_id shares the same output, but with different prefix.

An example be like:

RUNLINE1 -check-prefix=A
RUNLINE2 -check-prefix=A,B
RUNLINE3 -check-prefix=B

runline1,2,3 are the same runline and generate the same output. The dictionary be like:

A:  output, [RUNLINE1, RUNLINE2]
B:  output, [RUNLINE2, RUNLINE3]

RUNLINE2 will be a duplicated run_id. If you generate check-string for both A and B prefix, the llvm-lit will fail on RUNLINE2, since it will expects two check-string from RUNLINE2, but one runline only generates one output.

The current code will not generate check-string for B. But I agree we should throw an error here like what the other update_script does

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should throw an error here like what the other update_script does

Yep.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, thanks for the explanation. 'run_ids list and the used_run_ids is not completely disjoint' This was the confusion I was having. Still LGTM

@kosarev kosarev force-pushed the users/kosarev/support-updating-roundtrip-tests branch from a649818 to 6f52ae6 Compare October 22, 2025 11:01
@kosarev kosarev force-pushed the users/kosarev/cleanup-mc-script branch 2 times, most recently from 3959b3b to 42b3eda Compare October 23, 2025 10:23
@kosarev kosarev force-pushed the users/kosarev/support-updating-roundtrip-tests branch from 6f52ae6 to 08f1cd2 Compare October 23, 2025 10:23
Refine the code a bit to make it easier to comprehend the logic.
@kosarev kosarev force-pushed the users/kosarev/cleanup-mc-script branch from 42b3eda to 548424c Compare October 24, 2025 13:15
@kosarev kosarev force-pushed the users/kosarev/support-updating-roundtrip-tests branch from 08f1cd2 to 61a3a85 Compare October 24, 2025 13:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants