Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
56 changes: 20 additions & 36 deletions llvm/utils/update_mc_test_checks.py
Original file line number Diff line number Diff line change
Expand Up @@ -212,9 +212,6 @@ def update_test(ti: common.TestInfo):
testlines = list(dict.fromkeys(testlines))
common.debug("Valid test line found: ", len(testlines))

run_list_size = len(run_list)
testnum = len(testlines)

raw_output = []
raw_prefixes = []
for (
Expand Down Expand Up @@ -256,60 +253,47 @@ def update_test(ti: common.TestInfo):
prefix_set = set([prefix for p in run_list for prefix in p[0]])
common.debug("Rewriting FileCheck prefixes:", str(prefix_set))

for test_id in range(testnum):
input_line = testlines[test_id]

for test_id, input_line in enumerate(testlines):
# a {prefix : output, [runid] } dict
# insert output to a prefix-key dict, and do a max sorting
# to select the most-used prefix which share the same output string
p_dict = {}
for run_id in range(run_list_size):
for run_id in range(len(run_list)):
out = raw_output[run_id][test_id]

if hasErr(out):
o = getErrString(out)
else:
o = getOutputString(out)

prefixes = raw_prefixes[run_id]

for p in prefixes:
for p in raw_prefixes[run_id]:
if p not in p_dict:
p_dict[p] = o, [run_id]
else:
if p_dict[p] == (None, []):
continue
continue

prev_o, run_ids = p_dict[p]
if o == prev_o:
run_ids.append(run_id)
p_dict[p] = o, run_ids
else:
# conflict, discard
p_dict[p] = None, []
if p_dict[p] == (None, []):
continue

p_dict_sorted = dict(sorted(p_dict.items(), key=lambda item: -len(item[1][1])))
prev_o, run_ids = p_dict[p]
if o == prev_o:
run_ids.append(run_id)
p_dict[p] = o, run_ids
else:
# conflict, discard
p_dict[p] = None, []

# prefix is selected and generated with most shared output lines
# each run_id can only be used once
used_runid = set()

used_run_ids = set()
selected_prefixes = set()
for prefix, tup in p_dict_sorted.items():
o, run_ids = tup

if len(run_ids) == 0:
continue

skip = False
for i in run_ids:
if i in used_runid:
skip = True
else:
used_runid.add(i)
if not skip:
get_num_runs = lambda item: len(item[1][1])
p_dict_sorted = sorted(p_dict.items(), key=get_num_runs, reverse=True)
for prefix, (o, run_ids) in p_dict_sorted:
if run_ids and used_run_ids.isdisjoint(run_ids):
selected_prefixes.add(prefix)

used_run_ids.update(run_ids)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Don't know whether its more efficient to put used_run_ids.add(run_ids) inside the if or .update outside it. Probably doesn't matter.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe that would be a functional change. The original code adds all the ids regardless of whether there are common ones, though I'm not convinced it's perfectly correct. Maybe subject to further work.

Performance-wise, I don't see how this would make any noticeable difference.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I'm misunderstanding. used_run_ids.update(run_ids) does not add items in the argument to the object if they are already present (no duplicates in used_run_ids).
This seems to achieve the same thing as the following code, which is to not add duplicates.

                if i in used_runid:
                    skip = True
                else:
                    used_runid.add(i)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see how this is relevant.

What I'm saying is, putting the used_run_ids.update(run_ids) under the if run_ids and used_run_ids.isdisjoint(run_ids) as you suggest would be a functional change, because the original code updates used_run_ids with run_ids regardless of whether they were disjoint.

So given we have something like p_dict = {'GFX8': (..., [1, 2, 3]), 'GFX9': (..., [3, 4, 5]), 'GFX10': (..., [4, 5])}, we currently should end up having GFX8 as the only member of selected_prefixes, whereas with the update() under the if it would contain GFX8 and GFX10.

Copy link
Contributor

@broxigarchen broxigarchen Oct 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There could be case that the current run_ids list and the used_run_ids is not completely disjoint. In this case we will not generate the prefix check string, but still we want to mark these run_id as used.

This is actually an error case. Noted that one run_id can only generate one output, when duplicated run_id is saw in the prefix iteration, it means there are other run_id shares the same output, but with different prefix.

An example be like:

RUNLINE1 -check-prefix=A
RUNLINE2 -check-prefix=A,B
RUNLINE3 -check-prefix=B

runline1,2,3 are the same runline and generate the same output. The dictionary be like:

A:  output, [RUNLINE1, RUNLINE2]
B:  output, [RUNLINE2, RUNLINE3]

RUNLINE2 will be a duplicated run_id. If you generate check-string for both A and B prefix, the llvm-lit will fail on RUNLINE2, since it will expects two check-string from RUNLINE2, but one runline only generates one output.

The current code will not generate check-string for B. But I agree we should throw an error here like what the other update_script does

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should throw an error here like what the other update_script does

Yep.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, thanks for the explanation. 'run_ids list and the used_run_ids is not completely disjoint' This was the confusion I was having. Still LGTM


# Generate check lines in alphabetical order.
check_lines = []
for prefix in sorted(selected_prefixes):
Expand Down
Loading