-
Notifications
You must be signed in to change notification settings - Fork 15k
[Utils][NFC] Clean up update_mc_test_checks.py. #164454
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: users/kosarev/support-updating-roundtrip-tests
Are you sure you want to change the base?
[Utils][NFC] Clean up update_mc_test_checks.py. #164454
Conversation
|
Warning This pull request is not mergeable via GitHub because a downstack PR is open. Once all requirements are satisfied, merge this PR as a stack on Graphite.
This stack of pull requests is managed by Graphite. Learn more about stacking. |
|
@llvm/pr-subscribers-testing-tools Author: Ivan Kosarev (kosarev) ChangesRefine the code a bit to make it easier to comprehend the logic. Full diff: https://github.com/llvm/llvm-project/pull/164454.diff 1 Files Affected:
diff --git a/llvm/utils/update_mc_test_checks.py b/llvm/utils/update_mc_test_checks.py
index c830833a11a14..fb2f7e0b78730 100755
--- a/llvm/utils/update_mc_test_checks.py
+++ b/llvm/utils/update_mc_test_checks.py
@@ -223,9 +223,6 @@ def update_test(ti: common.TestInfo):
testlines = list(dict.fromkeys(testlines))
common.debug("Valid test line found: ", len(testlines))
- run_list_size = len(run_list)
- testnum = len(testlines)
-
raw_output = []
raw_prefixes = []
for (
@@ -267,14 +264,12 @@ def update_test(ti: common.TestInfo):
prefix_set = set([prefix for p in run_list for prefix in p[0]])
common.debug("Rewriting FileCheck prefixes:", str(prefix_set))
- for test_id in range(testnum):
- input_line = testlines[test_id]
-
+ for test_id, input_line in enumerate(testlines):
# a {prefix : output, [runid] } dict
# insert output to a prefix-key dict, and do a max sorting
# to select the most-used prefix which share the same output string
p_dict = {}
- for run_id in range(run_list_size):
+ for run_id in range(len(run_list)):
out = raw_output[run_id][test_id]
if hasErr(out):
@@ -282,45 +277,34 @@ def update_test(ti: common.TestInfo):
else:
o = getOutputString(out)
- prefixes = raw_prefixes[run_id]
-
- for p in prefixes:
+ for p in raw_prefixes[run_id]:
if p not in p_dict:
p_dict[p] = o, [run_id]
- else:
- if p_dict[p] == (None, []):
- continue
+ continue
- prev_o, run_ids = p_dict[p]
- if o == prev_o:
- run_ids.append(run_id)
- p_dict[p] = o, run_ids
- else:
- # conflict, discard
- p_dict[p] = None, []
+ if p_dict[p] == (None, []):
+ continue
- p_dict_sorted = dict(sorted(p_dict.items(), key=lambda item: -len(item[1][1])))
+ prev_o, run_ids = p_dict[p]
+ if o == prev_o:
+ run_ids.append(run_id)
+ p_dict[p] = o, run_ids
+ else:
+ # conflict, discard
+ p_dict[p] = None, []
# prefix is selected and generated with most shared output lines
# each run_id can only be used once
- used_runid = set()
-
+ used_run_ids = set()
selected_prefixes = set()
- for prefix, tup in p_dict_sorted.items():
- o, run_ids = tup
-
- if len(run_ids) == 0:
- continue
-
- skip = False
- for i in run_ids:
- if i in used_runid:
- skip = True
- else:
- used_runid.add(i)
- if not skip:
+ get_num_runs = lambda item: len(item[1][1])
+ p_dict_sorted = sorted(p_dict.items(), key=get_num_runs, reverse=True)
+ for prefix, (o, run_ids) in p_dict_sorted:
+ if run_ids and used_run_ids.isdisjoint(run_ids):
selected_prefixes.add(prefix)
+ used_run_ids.update(run_ids)
+
# Generate check lines in alphabetical order.
check_lines = []
for prefix in sorted(selected_prefixes):
|
| if run_ids and used_run_ids.isdisjoint(run_ids): | ||
| selected_prefixes.add(prefix) | ||
|
|
||
| used_run_ids.update(run_ids) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Don't know whether its more efficient to put used_run_ids.add(run_ids) inside the if or .update outside it. Probably doesn't matter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe that would be a functional change. The original code adds all the ids regardless of whether there are common ones, though I'm not convinced it's perfectly correct. Maybe subject to further work.
Performance-wise, I don't see how this would make any noticeable difference.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe I'm misunderstanding. used_run_ids.update(run_ids) does not add items in the argument to the object if they are already present (no duplicates in used_run_ids).
This seems to achieve the same thing as the following code, which is to not add duplicates.
if i in used_runid:
skip = True
else:
used_runid.add(i)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see how this is relevant.
What I'm saying is, putting the used_run_ids.update(run_ids) under the if run_ids and used_run_ids.isdisjoint(run_ids) as you suggest would be a functional change, because the original code updates used_run_ids with run_ids regardless of whether they were disjoint.
So given we have something like p_dict = {'GFX8': (..., [1, 2, 3]), 'GFX9': (..., [3, 4, 5]), 'GFX10': (..., [4, 5])}, we currently should end up having GFX8 as the only member of selected_prefixes, whereas with the update() under the if it would contain GFX8 and GFX10.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There could be case that the current run_ids list and the used_run_ids is not completely disjoint. In this case we will not generate the prefix check string, but still we want to mark these run_id as used.
This is actually an error case. Noted that one run_id can only generate one output, when duplicated run_id is saw in the prefix iteration, it means there are other run_id shares the same output, but with different prefix.
An example be like:
RUNLINE1 -check-prefix=A
RUNLINE2 -check-prefix=A,B
RUNLINE3 -check-prefix=B
runline1,2,3 are the same runline and generate the same output. The dictionary be like:
A: output, [RUNLINE1, RUNLINE2]
B: output, [RUNLINE2, RUNLINE3]
RUNLINE2 will be a duplicated run_id. If you generate check-string for both A and B prefix, the llvm-lit will fail on RUNLINE2, since it will expects two check-string from RUNLINE2, but one runline only generates one output.
The current code will not generate check-string for B. But I agree we should throw an error here like what the other update_script does
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should throw an error here like what the other update_script does
Yep.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, thanks for the explanation. 'run_ids list and the used_run_ids is not completely disjoint' This was the confusion I was having. Still LGTM
a649818 to
6f52ae6
Compare
3959b3b to
42b3eda
Compare
6f52ae6 to
08f1cd2
Compare
Refine the code a bit to make it easier to comprehend the logic.
42b3eda to
548424c
Compare
08f1cd2 to
61a3a85
Compare

Refine the code a bit to make it easier to comprehend the logic.