Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

generate_loop_schedule_v2 #350

Open
wants to merge 4 commits into
base: main
Choose a base branch
from
Open

Conversation

kaushikcfd
Copy link
Collaborator

@kaushikcfd kaushikcfd commented Apr 28, 2021

Implementation for finding loop nest around map in O(N.k), 'N' being the number of inames and 'k' being the max. loop depth.

For comparison, let's consider the kernel in #288: on main this map in computed in 5 minutes and this branch takes 30 0.4 seconds.

@kaushikcfd kaushikcfd requested a review from inducer April 28, 2021 23:20
setup.py Outdated Show resolved Hide resolved
Copy link
Owner

@inducer inducer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some thoughts from a quick look here.

loopy/schedule/tools.py Outdated Show resolved Hide resolved

iname_to_insns = kernel.iname_to_insns()
loop_nest_around_map = defaultdict(frozenset)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my view, the loop_nest_around_map should go. The loop nest tree is a strictly more informative data structure, and it is not of size O(n^2). I think we should get rid of it. I looked through the uses, and it shouldn't be hard to replace, and it'll fix another bottleneck in linearization.

Copy link
Collaborator Author

@kaushikcfd kaushikcfd Jun 10, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the new version, I didn't touch the old implementation (v1-scheduler) to keep things simpler in this PR. So, for such cases $O(n^2)$ stays.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can handle it as a separate PR.

Updates *tree* to make *inames_to_pull_out* a loop nesting level in
*loop_nests*

:returns: a :class:`tuple` ``(outer_loop_nest, inner_loop_nest)``, where
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Describe the types of the arguments.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, done.

Comment on lines 119 to 120
Updates *tree* to make *inames_to_pull_out* a loop nesting level in
*loop_nests*
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Describe which is nested "inside" and which is "outside".

@kaushikcfd kaushikcfd marked this pull request as draft May 4, 2021 15:08
@kaushikcfd kaushikcfd changed the title Better loop around nest map generate_loop_schedule_v2 May 4, 2021
@kaushikcfd kaushikcfd force-pushed the better_loop_around_nest_map branch 2 times, most recently from 96856f8 to 0203d09 Compare May 11, 2021 13:44
@inducer
Copy link
Owner

inducer commented May 26, 2021

Does #372 supersede this?

@kaushikcfd
Copy link
Collaborator Author

@inducer: Not really. #372 includes commits from this branch, so that I could do some tests on my test problems. But I propose we merge them separately. (I've updated the description of #372 to record that)

@kaushikcfd kaushikcfd force-pushed the better_loop_around_nest_map branch from e14a03d to 2d526dc Compare June 8, 2021 18:51
@kaushikcfd kaushikcfd marked this pull request as ready for review June 10, 2021 06:31
@kaushikcfd kaushikcfd requested a review from inducer June 11, 2021 22:00
@kaushikcfd kaushikcfd force-pushed the better_loop_around_nest_map branch 5 times, most recently from e20c6ca to 931c6e9 Compare December 10, 2021 04:50
@kaushikcfd kaushikcfd force-pushed the better_loop_around_nest_map branch from 15cea20 to 5b55bcd Compare May 6, 2022 17:30
@inducer
Copy link
Owner

inducer commented Jun 9, 2022

I was just rebasing this. You beat me to it! :)

Copy link
Owner

@inducer inducer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's a first look. I haven't really gotten very far into the core algorithm, because I'm not sure I understood _pull_out_loop_nest_tree, which seems like it's a core operation.

@@ -930,6 +933,225 @@ def _get_persistent_hashable_arg(arg):

return wrapper


# {{{ tree data structure
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is big enough to be its own file.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#694 puts the Tree class into its own file (loopy/schedule/tree.py).

@@ -930,6 +933,225 @@ def _get_persistent_hashable_arg(arg):

return wrapper


# {{{ tree data structure
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that this has type annotations, mypy should like them (and there should be CI saying as much).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I can see, mypy checks and passes the annotations.

@dataclass(frozen=True)
class Tree(Generic[T]):
"""
An immutable tree implementation.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Describe the role of the type variable T. Specifically describe that there's one Tree object, but many nodes T. Maybe rename T to NodeT?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

c53a9af hopefully clarifies this (and renames T to NodeT).



@dataclass(frozen=True)
class Tree(Generic[T]):
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this mature enough to be in pytools?

loopy/tools.py Outdated
Comment on lines 960 to 974
_parent_to_children: "PMap[T, FrozenSet[T]]"
_child_to_parent: "PMap[T, Optional[T]]"
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In terms of data structure, this is a forest (i.e. it allows multiple trees). It wouldn't take much to allow that fully... YAGNI though.

Comment on lines +386 to +575
:returns: a :class:`tuple` ``(new_tree, outer_loop_nest, inner_loop_nest)``,
where outer_loop_nest is the identifier for the new outer and inner
loop nests so that *inames_to_pull_out* is a valid nesting.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make a dataclass for the result?

:class:`frozenset` of inames representing a loop nest. For example a
tree might look like:

:arg loop_nests: A collection of nodes in *tree* that cover
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

collection isn't an ABC. Do you mean sequence?


loop_nests = sorted(loop_nests, key=lambda nest: tree.depth(nest))

for outer, inner in zip(loop_nests[:-1], loop_nests[1:]):
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that all you'd need to pass here is the innermost node in the loop nest tree, and tree.ancestors (at least the proposed changed version) would cheaply give you this without the need to enforce these invariants.

obtained after constraining *loop_nest_tree* with the constraints enforced
by *priorities*.

:arg strict_priorities: Expresses strict nesting constraints similar to
Copy link
Owner

@inducer inducer Jun 9, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"similar to" -> "using the same data structure as"?

strictly i.e. if these conditions cannot be met a
:class:`loopy.diagnostic.LoopyError` is raised.

:arg relaxed_priorities: Expresses strict nesting constraints similar to
Copy link
Owner

@inducer inducer Jun 9, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"similar to" -> "using the same data structure as"?

Copy link
Owner

@inducer inducer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Finally made it all the way through. There's a bit to digest here, but as far as I remember, it's mostly superficial. I didn't find any flaws in the thinking. Nice job!

Comment on lines +376 to +564
Returns a copy of *tree* that realizes *inames_to_pull_out* as loop
nesting.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Returns a version of the loop nest tree tree so that every node in the tree is either a subset of outermost_inames or has an empty intersection with outermost_inames.

This routine modifies at most one node of the tree. All its ancestors must satisfy ancestor <= outermost_inames. For the first node not satisfying this relationship, if node & outermost_inames is empty, no modification is made.
Otherwise, if node & outermost_inames < node, that node is split so as to separate outermost_inames in their own node.

# }}}

innermost_loop_nest = loop_nests[-1]
new_outer_loop_nest = inames_to_pull_out - reduce(frozenset.union,
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

new_outermost_node

new_outer_loop_nest = inames_to_pull_out - reduce(frozenset.union,
loop_nests[:-1],
frozenset())
new_inner_loop_nest = innermost_loop_nest - inames_to_pull_out
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

new_inner_node


# }}}

innermost_loop_nest = loop_nests[-1]
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

innermost_node

# inner iname and outer iname are indirect family members
# => must be realized via dependencies in the linearization
# phase, not implemented in v2-scheduler yet.
from loopy.schedule import V2SchedulerNotImplementedException
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/scheduler/linearizer/? (here and throughout)

for iname in node:
iname_to_tree_node_id[iname] = node

return pmap(iname_to_tree_node_id)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might be better as an immutables.Map.

if kernel.iname_tags_of_type(iname, VectorizeTag)}
parallel_inames = (concurrent_inames - ilp_inames - vec_inames)

# {{{ can v2 scheduler handle??
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

...handle this kernel?

pass


def generate_loop_schedules_v2(kernel):
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is nice! I like this quite a bit.

if outer_loop == "":
continue

for child in loop_nest_tree.children(outer_loop):
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a comment that this only works because loop_nest_tree contains a total order of loops. Otherwise it might windup with mismatched enter/leaves.

# for i
# insn
# end i
# 'insn' *must* come b/w 'for i' and 'end i'
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

b/w = between? If so, spell out.

@inducer
Copy link
Owner

inducer commented Jun 10, 2022

Unsubscribing... @-mention or request review once it's ready for a look or needs attention.

But I'm eager to get this in soon!

@kaushikcfd kaushikcfd force-pushed the better_loop_around_nest_map branch 2 times, most recently from 65a6b62 to c21c092 Compare July 6, 2022 17:49
@inducer inducer force-pushed the better_loop_around_nest_map branch from c21c092 to fadb652 Compare June 2, 2023 01:23
@inducer inducer mentioned this pull request Feb 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants