Skip to content

gh-119352 Add UOp Pair Compatibility to PyStats#119354

Closed
JeffersGlass wants to merge 33 commits intopython:mainfrom
JeffersGlass:supernodes-input-compat
Closed

gh-119352 Add UOp Pair Compatibility to PyStats#119354
JeffersGlass wants to merge 33 commits intopython:mainfrom
JeffersGlass:supernodes-input-compat

Conversation

@JeffersGlass
Copy link
Contributor

@JeffersGlass JeffersGlass commented May 21, 2024

This PR adds functionality to PyStats-enabled builds, specifically to the UOp pair-count section emitted when the interpreter is build with the Tier 2 Interpreter. It adds another column to the pair-count table, indicating whether the two instructions have overlapping uses of oparg, operand, and target. This makes it easy to tell which pairs can be immediately adapted to become superinstructions, and which would require a modified instruction format.

See below for an example of this output - the Compatibility column in the far right is the added column.

Closes #119352

Pair Count Self Cumulative Compatibility
_LOAD_FAST_0 _GUARD_TYPE_VERSION 62,941,340 7.6% 7.6% No Conflicts
_CHECK_MANAGED_OBJECT_HAS_VALUES _LOAD_ATTR_INSTANCE_VALUE_0 37,435,120 4.5% 12.2% Conflict. Both Use: Target
_GUARD_TYPE_VERSION _CHECK_MANAGED_OBJECT_HAS_VALUES 37,435,120 4.5% 16.7% Conflict. Both Use: Target
_COPY _TO_BOOL_BOOL 25,148,840 3.0% 19.8% No Conflicts
_GUARD_TYPE_VERSION _EXIT_TRACE 24,884,600 3.0% 22.8% No Conflicts
_PUSH_FRAME _RESUME_CHECK 23,560,800 2.9% 25.6% No Conflicts
_SAVE_RETURN_OFFSET _PUSH_FRAME 23,560,800 2.9% 28.5% No Conflicts
_START_EXECUTOR _SET_IP 21,741,760 2.6% 31.1% Conflict. Both Use: Operand
_SET_IP _LOAD_ATTR 21,680,480 2.6% 33.7% No Conflicts
_RESUME_CHECK _LOAD_FAST_0 21,058,400 2.6% 36.3% No Conflicts
_CHECK_FUNCTION_EXACT_ARGS _CHECK_STACK_SPACE_OPERAND 19,557,660 2.4% 38.7% Conflict. Both Use: Operand,Target
_INIT_CALL_PY_EXACT_ARGS_0 _SAVE_RETURN_OFFSET 19,042,580 2.3% 41.0% No Conflicts
... ... ... ... ...

Copy link
Member

@gvanrossum gvanrossum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I started reviewing this but realized I don't know the summarize_stats.py script well enough to review that -- I think it's probably up to @mdboom to review that.

The cases_generator changes and the new generated output LGTM.

@gvanrossum gvanrossum requested a review from mdboom May 21, 2024 20:13
Copy link
Contributor

@mdboom mdboom left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great, and thanks for working on this. Just some stylistic changes mostly. The example table in the PR was helpful to understand how this looks -- it's going to be super helpful.

line = line[len(start) :]
name, val = line.split()
defines[int(val.strip())].append(name.strip())
defines[int(val.strip().strip("()"))].append(name.strip())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
defines[int(val.strip().strip("()"))].append(name.strip())
defines[int(val.strip().removesuffix("()"))].append(name.strip())

This is probably safer, unless there is a need to remove things like (()) that I'm missing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The defines for the flags look like #define HAS_PASSTHROUGH_FLAG (4096) - this line is to strip the parens around the number. We could do resmovesuffix(")") and removeprefix("(")?

Could also do the whole thing with a regex if that's preferable, but it would be a little unweildy.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. I misunderstood what this needed to do. I think this is fine.

flag_names: tuple[str] = None,
filepath: str | Path = "Include/internal/pycore_uop_metadata.h",
) -> dict[str, list[str]]:
flags = {}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, I found this function a bit hard to follow (but then again, parsing C with regexes is always that way). It might be helpful to have a header comment with an example of the pattern it's looking for and what the output would look like for it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, I've added a comment to clarify. In general, I wonder if having something like opcode_metadata_generator.py for the UOp metadata, and spitting out that metadata as Python originally, would be more robust than parsing the C headers, but that might be a significant chunk of work.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like that idea, but can/should be a follow-on to this PR. I think parts of this predate having the generator -- it would be a lot cleaner to generate some JSON or something that would be easily parseable.

@JeffersGlass
Copy link
Contributor Author

Thanks for all the comments @mdboom! I’m taking a few days of post-conference R&R with my wife, but I will return to this PR presently and get it into shape.

@mdboom
Copy link
Contributor

mdboom commented May 24, 2024

@JeffersGlass: No worries. I fully support post-conference R&R!

@JeffersGlass
Copy link
Contributor Author

Does this want a NEWS.d entry? No worries either way, just wasn't sure whether Pystats updates in general (or this in particular) merits one.

@JeffersGlass JeffersGlass requested a review from markshannon as a code owner May 28, 2024 15:04
@JeffersGlass JeffersGlass requested a review from mdboom May 28, 2024 15:11
line = line[len(start) :]
name, val = line.split()
defines[int(val.strip())].append(name.strip())
defines[int(val.strip().strip("()"))].append(name.strip())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. I misunderstood what this needed to do. I think this is fine.

@markshannon
Copy link
Member

One minor request.
Can you shorten "Conflict. Both Use:" to just "Conflict:"?
The "both" is redundant as we are talking about pairs, and it should result in less line wrapping.

@brandtbucher
Copy link
Member

brandtbucher commented May 30, 2024

I think the ideal conflict calculation for target might be a bit trickier... our uop format is pretty subtle and confusing, since it's sort of grown organically. I think it deserves to be cleaned up at some point (but that's a separate issue).

The target member of a uop is defined as:

union {
    uint32_t target;
    struct {
        union {
            uint16_t exit_index;
            uint16_t jump_target;
        };
        uint16_t error_target;
    };
};
  • target: Only used by _DEOPT in the interpreter. No flag exists for checking this.
  • exit_index: Only used by _EXIT_TRACE in the interpreter. No flag exists for checking this.
  • jump_target: Used by instructions flagged as either HAS_DEOPT_FLAG or HAS_EXIT_FLAG
  • error_target: Used by instructions flagged as either HAS_ERROR_FLAG or HAS_ERROR_NO_POP_FLAG

So I think it would be more like the following:

targets = names.count("_DEOPT")
exit_indexes = names.count("_EXIT_TRACE")
jump_targets = combined_flags.count("HAS_DEOPT_FLAG") + combined_flags.count("HAS_EXIT_FLAG")
error_targets = combined_flags.count("HAS_ERROR_FLAG") + combined_flags.count("HAS_ERROR_NO_POP_FLAG")

valid = (
    # UOP_FORMAT_TARGET:
    (targets <= 1 and exit_indexes == 0 and jump_targets == 0 and error_targets == 0) or
    # UOP_FORMAT_EXIT:
    (targets == 0 and exit_indexes <= 1 and jump_targets == 0 and error_targets <= 1) or
    # UOP_FORMAT_JUMP:
    (targets == 0 and exit_indexes == 0 and jump_targets <= 1 and error_targets <= 1)
)

@JeffersGlass
Copy link
Contributor Author

JeffersGlass commented May 30, 2024

Thanks @brandtbucher, I've updated the logic for detecting conflicting uses of of target.

Additionally - I had been detecting the use of an oparg based on solely HAS_OPARG_AND_1_FLAG. But it should it also include the use of HAS_ARG_FLAG, yes?

On a pyperformance pystats run, none of the top 5000 pairs show a conflict for target... is that to be expected? It seems fishy to me, but I don't think I have enough intuition about the use of Target to be sure.

Here's a sample from the middle of those results:

Pair Count Self Cumulative Compatibility
_CHECK_VALIDITY _LOAD_FAST 9,082,739 0.5% 40.7% No Conflict
_LOAD_FAST_4 _SET_IP 8,849,139 0.5% 41.1% No Conflict
_LOAD_CONST_INLINE_WITH_NULL _LOAD_FAST_5 8,551,561 0.5% 41.6% No Conflict
_CHECK_FUNCTION_EXACT_ARGS _CHECK_STACK_SPACE_OPERAND 8,470,871 0.5% 42.1% Conflict: Operand
_LOAD_FAST _LOAD_FAST 8,465,883 0.5% 42.5% Conflict: Oparg
_SET_IP _LOAD_ATTR 8,337,465 0.5% 43.0% No Conflict
_LOAD_FAST_3 _LOAD_FAST_4 8,328,587 0.5% 43.4% No Conflict
_GUARD_NOT_EXHAUSTED_LIST _ITER_NEXT_LIST 8,311,545 0.5% 43.9% No Conflict
_SET_IP _BINARY_OP 8,137,465 0.4% 44.3% No Conflict
_STORE_FAST _LOAD_FAST 8,122,957 0.4% 44.8% Conflict: Oparg
_LOAD_ATTR _CHECK_VALIDITY 8,113,661 0.4% 45.2% No Conflict
_RESUME_CHECK _LOAD_FAST_0 8,094,103 0.4% 45.6% No Conflict
_LOAD_CONST_INLINE_BORROW _LOAD_CONST_INLINE_BORROW 8,012,851 0.4% 46.1% Conflict: Operand
_GUARD_TYPE_VERSION _LOAD_ATTR_SLOT_0 7,876,796 0.4% 46.5% Conflict: Operand
... ... ... ... ...

@JeffersGlass
Copy link
Contributor Author

Update: I do think the lack of target conflicts is a bug... I am troubleshooting/adding tests.

@JeffersGlass
Copy link
Contributor Author

Indeed, I had an error in how I was combining the flags of two uops. Now corrected, the output looks something like this:

Pair Count Self Cumulative Compatibility
... ... ... ... ...
_LOAD_FAST_3 _LOAD_FAST_4 8,328,587 0.5% 43.4% No Conflict
_GUARD_NOT_EXHAUSTED_LIST _ITER_NEXT_LIST 8,311,545 0.5% 43.9% No Conflict
_SET_IP _BINARY_OP 8,137,465 0.4% 44.3% No Conflict
_STORE_FAST _LOAD_FAST 8,122,957 0.4% 44.8% Conflict: Oparg
_LOAD_ATTR _CHECK_VALIDITY 8,113,661 0.4% 45.2% No Conflict
_RESUME_CHECK _LOAD_FAST_0 8,094,103 0.4% 45.6% No Conflict
_LOAD_CONST_INLINE_BORROW _LOAD_CONST_INLINE_BORROW 8,012,851 0.4% 46.1% Conflict: Operand
_GUARD_TYPE_VERSION _LOAD_ATTR_SLOT_0 7,876,796 0.4% 46.5% Conflict: Operand,Target
_CHECK_VALIDITY _LOAD_CONST_INLINE_BORROW 7,720,714 0.4% 46.9% No Conflict
_LOAD_FAST_2 _LOAD_FAST_3 7,646,188 0.4% 47.3% No Conflict
_TO_BOOL_BOOL _GUARD_IS_TRUE_POP 7,517,743 0.4% 47.7% Conflict: Target
_LOAD_FAST_2 _SET_IP 7,374,642 0.4% 48.1% No Conflict
_GUARD_TYPE_VERSION _LOAD_ATTR_METHOD_NO_DICT 7,202,967 0.4% 48.5% Conflict: Operand
... ... ... ... ...

@brandtbucher
Copy link
Member

Additionally - I had been detecting the use of an oparg based on solely HAS_OPARG_AND_1_FLAG. But it should it also include the use of HAS_ARG_FLAG, yes?

I believe HAS_OPARG_AND_1_FLAG implies HAS_ARG_FLAG, so only the latter should be necessary (in fact, I'm pretty sure that the versions of instructions with HAS_OPARG_AND_1_FLAG don't actually make it into the tier two code that's actually run, since they're replaced with other versions by the optimizer).

@JeffersGlass
Copy link
Contributor Author

JeffersGlass commented May 30, 2024

I believe HAS_OPARG_AND_1_FLAG implies HAS_ARG_FLAG, so only the latter should be necessary (in fact, I'm pretty sure that the versions of instructions with HAS_OPARG_AND_1_FLAG don't actually make it into the tier two code that's actually run, since they're replaced with other versions by the optimizer).

It does look like both of those things are true - I've adjusted the conflict-checking code to only account for HAS_ARG_FLAG.

@JeffersGlass
Copy link
Contributor Author

Give that we're no longer pursuing superinstructions as a high-priority optimization, this is no longer going to be particularly helpful. I'll close this, though it could be revived if it turns out to be useful later.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add UOp Pair Compatibility to PyStats

5 participants