Coffee migration #98

tj-sun · 2017-02-19T04:01:43Z

This commit migrates loop optimisations in coffee to gem.

Test if functionalities are migrated

Results of flops comparison are in the link below

The result shows that gem.optimise() matches or betters coffee level 2 by and large. It also demonstrates that coffee level 2 cannot further optimise ASTs generated by gem.optimise().

There are some cases for which gem.optimise() are slightly worse. One example is mixed_poisson, dim=2, p=1, q=1, nf=0.

tsfc output
coffee o2
this PR

Notice COFFEE can merge temporaries t24 and t23. Such optimisation is not implemented here yet.

Compile time is generally OK, e.g. holzapfel takes 4.02s, vs tsfc 0.91s, COFFEE O2 13.80s.

Minor problems:

Several subtle cases to refine further, e.g. Power nodes are not analysed yet (might be beneficial to expand the product).
latex printing not completed yet.

TODOs:

factorise non argument factors in the end (this will likely require expanding the products in rest which is probably expensive, maybe this is another optimisation level)

Results link

will add tests and PR

- remove Terminal if statement (because all node have children field) - singledispatch pattern for reassociation - Memoizer for reassociation

- use the default reuse_if_untouched() instead of rewriting it - pop queue from front to make it FIFO

- idempotent test fails, haven't figured out why - new helper function to collect terms

- turns out it's problem with python set being not stable

…ests

- probably need to write a pass to rearrange operands in a specific sequence (e.g. first by rank, then alphabetically, so that A+B+C == B+C+A) to make testing easier

- only partially done, still a lot to do - some basic latex formatting for Gem nodes, to help debugging - return gem IR as well from compile_integral - is_equal() and get_hash() for Sum, so that A+B == B+A

- more latex printing - some examples

- new attempt to do factorise() - fix bug in latex() for Literal

- Most method memoized

- remove dev.py, move functionality to optimise.py - remove previous factorise function - update collect_term function - bug fix: expand child after trying factorisation with index i

- still have some problems with Memoizer - recursive call times out sometimes

- fix bug in expanding products - still need to Memoize count_flop()

- expand_all_product() - count_flop() - collect_terms()

- still quite slow, need to think about heuristics, such as avoid unnecessary expansion of products

…r) index

- still investigating why it happens

- Need to sort free indices before factorising because sometimes they appear in different order - update on tests - streamlining the algorithm slightly, more to do there

wence- · 2017-04-27T10:28:08Z

tsfc/coffee_mode.py

+            if set(argument_indices) & set(new_node.free_indices):
+                new_atomics.append(new_node)
+            else:
+                new_rest = new_node


This branching looks icky to me. Is it because monomial_sum_to_expression does the wrong thing if there is only one monomial in the monomial sum? If so, we should fix that. It feels like the "generic" logic here should work for a "single monomial" case correctly too.

This branch becomes:

node = monomial_sum_to_expression(sub_monomial_sum) if set(argument_indices) & set(new_node.free_indices): new_monomial_sum.add(sum_indices, (oa, node), one) else: new_monomial_sum.add(sum_indices, (oa, ), node)

wence- · 2017-04-27T10:51:54Z

tsfc/coffee_mode.py

+        # Create new MonomialSum for the factorised out terms
+        sub_monomial_sum = MonomialSum()
+        for _atomics, _rest in zip(all_atomics, all_rest):
+            sub_monomial_sum.add((), _atomics, _rest)


sub_monomial_sum = MonomialSum() for monomial in monomials: atomics = list(monomial.atomics) atomics.remove(oa) sub_monomial_sum.add((), atomics, monomial.rest)

wence- · 2017-04-27T10:53:09Z

tsfc/coffee_mode.py

+            # Just one monomial with this group, add to new MonomialSum straightaway
+            monomial, = monomials
+            new_monomial_sum.add(*monomial)
+            continue


This branch can go, we think.

wence- · 2017-04-27T10:58:19Z

tsfc/coffee_mode.py

+            new_monomial_sum.add(*monomial)
+    # We should not drop monomials
+    assert sum(map(len, itervalues(factor_group))) + len(new_monomial_sum) == len(monomial_sum)
+


sum_indices = next(iter(monomial_sum)).sum_indices

Should probably assert that all the provided monomials have the same sum_indices.

wence- · 2017-04-27T11:01:12Z

tsfc/coffee_mode.py

+            sub_monomial, = sub_monomial_sum
+            new_atomics = sub_monomial.atomics
+            new_atomics += (oa,)  # add back common factor
+            new_rest = sub_monomial.rest


Here we can just do:

new_monomial_sum.add(sum_indices, sub_monomial.atomics + (oa, ), sub_monomial.rest)

wence- · 2017-04-27T11:01:27Z

tsfc/coffee_mode.py

+                new_rest = new_node
+        # Pick sum indices from the first monomial
+        sum_indices = monomials[0].sum_indices
+        new_monomial_sum.add(sum_indices, new_atomics, new_rest)


These lines disappear.

wence- · 2017-04-27T11:13:30Z

tsfc/coffee_mode.py

+        sub_monomial_sum = MonomialSum()
+        for _atomics, _rest in zip(all_atomics, all_rest):
+            sub_monomial_sum.add((), _atomics, _rest)
+        sub_monomial_sum = optimise_monomial_sum(sub_monomial_sum, argument_indices)


This call does too much work, because it groups on sum_indices, but we know sub_monomial_sum only has sum_indices (). So factor out core part and call that.

calling optimise_mononials now, which doesn't do grouping (probably it should return an iterable of monomials rather than MonomialSum in the next version, will see)

wence- · 2017-04-27T11:15:52Z

tsfc/coffee_mode.py

+    if len(atomic_index) == 0:
+        return ((), ())
+    if len(atomic_index) == 1:
+        return ((next(iterkeys(atomic_index)), ), ())


Fix this bug, should do:

optimal_atomic, = atomic_index.keys() return (optimal_atomic, )

and the empty branch should just return ().

- new optimise_monomials method which does group by sum indices - better grouping by optimal atomics in factorise_atomics - remove unnecessary IF statements

miklos1 · 2017-04-27T13:20:07Z

tsfc/coffee_mode.py

+                new_monomial_sum.add(sum_indices, (oa, node), one)
+            else:
+                new_monomial_sum.add(sum_indices, (oa, ), node)
+    return new_monomial_sum


I'm thinking, is this function returning a MonomialSum because this function is returning a MonomialSum? Or is there a deeper reason?

- factorise_atomics and optimise_monomials now take in and also reutrn iterable of monomials, instead of MonomialSum - add assertion to check all monomials have same sum indices in optimise_monomials - remove logic in early return from factorise_atomics

- better assertion for checking sum indices - docstring - typos

wence-

I think I am basically happy. The comments in this round a very minor (mostly more stylistic issues), what do you think of them?

wence- · 2017-04-28T08:55:41Z

tsfc/coffee_mode.py

+            atomics = list(monomial.atomics)
+            atomics.remove(oa)  # remove common factor
+            sub_monomials.append(Monomial((), tuple(atomics), monomial.rest))
+        sub_monomials = optimise_monomials(sub_monomials, argument_indices)


OK, so we pull out the common factor, then call back into optimise_monomials to potentially exploit further factorisation opportunities in the now remaining expression.

Agreed. Added a line comment.

wence- · 2017-04-28T08:56:16Z

tsfc/coffee_mode.py

+            atomics.remove(oa)  # remove common factor
+            sub_monomials.append(Monomial((), tuple(atomics), monomial.rest))
+        sub_monomials = optimise_monomials(sub_monomials, argument_indices)
+        assert len(sub_monomials) > 0


We should get at least one monomial back. Otherwise presumably the entire expression was zero or similar?

hmm there might be a problem here. I might get no monomials back when factorising a*(stuff) + a*(-stuff), I guess I should change this to return [], and add base case to make_sum to handle empty input?

Actually it's probably okay since we don't simplify stuff + (-1)*stuff, and if we do (later), we will just get Zero out of monomial_sum_to_expression, so we still get a monomial with Zero as rest, and propagate upwards.
So I think I just need to remove this assertion. Will test it.

wence- · 2017-04-28T08:57:50Z

tsfc/coffee_mode.py

+            # new MonomialSum directly
+            sub_monomial, = sub_monomials
+            new_monomials.append(
+                Monomial(sum_indices, sub_monomial.atomics + (oa,), sub_monomial.rest))


Special case the "single monomial" case, because IIUC, there may be some structure here that you want to keep. Question: are the "atomics" in a monomial ordered? (I only ask because here you append the common factor to the new monomial, whereas below you prepend it).

I think it will (potentially) change the generated code when we do grouping of nodes with same free indices. Changed to prepend.

wence- · 2017-04-28T09:05:31Z

tsfc/coffee_mode.py

+                Monomial(sum_indices, sub_monomial.atomics + (oa,), sub_monomial.rest))
+        else:
+            # result is a sum, we need to create a new node
+            node = monomial_sum_to_expression(sub_monomials)


We've got a sum of monomials, so convert to a gem node, so that we can create a new Monomial that multiplies by the common factor?

This does an optimised for flop minimisation (?) version of:

reduce(Sum, IndexSum(reduce(Product, m.atomics, m.rest), m.sum_indices) for m in sub_monomials)

I think.

I think we discussed having a short comment explanation in the code of why you choose to split the two cases apart (because the else branch would work on a singleton list of monomials, no?).

Added some notes.

wence- · 2017-04-28T09:06:29Z

tsfc/coffee_mode.py

+            # result is a sum, we need to create a new node
+            node = monomial_sum_to_expression(sub_monomials)
+            if set(argument_indices) & set(node.free_indices):
+                new_monomials.append(Monomial(sum_indices, (oa, node), one))


If the new node's free indices intersect with the argument indices, then there might be opportunity for further refactorisation later (so the new node is a possible new ATOMIC), otherwise, it's a COMPOUND?

The terminology collect_monomials uses with @tj-sun's classifier: if node's free indices have intersection with the argument indices, then it's either ATOMIC (only one argument index) or COMPOUND (more than one argument index), otherwise it's OTHER.

wence- · 2017-04-28T10:03:03Z

tests/test_coffee_optimise.py

+                   Product(A3i, Product(Bj, Ek))),
+               Product(Z, Product(A1i, Product(Bj, Fk))))
+    result, = optimise_expressions([expr], ((j,), (k,)))
+    assert count_flop(result) == 2680


I think these tests should also assert that the resulting expression is the expected one.

All changed. Removed count_flop as well (might need it in tempoary_graph branch later)

wence- · 2017-04-28T10:04:15Z

tests/test_coffee_optimise.py

+    # A * B[i]
+    assert result.children[1].children[1] == Product(A, Bi)
+    # t * P[i]
+    assert result.children[0].children[1].children[1] == Product(t, Pi)


Would it be better to just assert that the result is equal to the expected expression?

wence- · 2017-04-28T10:04:51Z

tests/test_coffee_optimise.py

+    result = replace_division([d])[0]
+
+    assert isinstance(result, Product)
+    assert isinstance(result.children[1], Division)


Again, just test that the resulting expression is the expected one?

wence- · 2017-04-28T10:06:41Z

gem/optimise.py

+
+
+def make_product(factors, sum_indices=()):
+    """Create a Product from collection of factors. Calls sum_factorise to


Can link here, say Uses :func:`sum_factorise` to ... ?

wence- · 2017-04-28T10:16:43Z

gem/optimise.py

+            return True
+        return False
+    mapper = Memoizer(_reassociate_product)
+    mapper.stop_at = stop_at


As dicussed, this added code is not used anywhere except in the test suite. Should it be pulled out into a separate changeset?

miklos1 · 2017-04-28T12:50:38Z

Okay, this may be tricky. It is clear that Python can represent int(1e20), but it's not clear that GLPK can do as well.

…

On 28/04/17 13:39, tj-sun wrote: ***@***.**** commented on this pull request. ------------------------------------------------------------------------ In tsfc/coffee_mode.py <#98 (comment)>: > + + if len(atomic_index) == 0: + return () + if len(atomic_index) == 1: + optimal_atomics, = iterkeys(atomic_index) + return (optimal_atomics, ) + + # set up the ILP + import pulp as ilp + ilp_prob = ilp.LpProblem('gem factorise', ilp.LpMinimize) + ilp_var = ilp.LpVariable.dicts('node', range(len(atomic_index)), 0, 1, ilp.LpBinary) + + # Objective function + # Minimise number of factors to pull. If same number, favour factor with larger extent + big = 1e20 # some arbitrary big number + ilp_prob += ilp.lpSum(ilp_var[index] * (big - index_extent(atomic, argument_indices)) Ah didn't notice this underflow! It's my error when I try to distinguish between different solutions. I think COFFEE just pick whichever solution that Pulp returns. I probably go with |int(1e20)| as that will rank the solutions first by number of |1|s, then by their extent. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#98 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AEBAmXkhN6GevD9S6KyxdtOebGUAtkpKks5r0d3-gaJpZM4MFWbM>.

miklos1 · 2017-04-28T13:03:45Z

I can hardly believe that that will actually happen. Do you have an example?

…

On 28/04/17 13:55, tj-sun wrote: ***@***.**** commented on this pull request. ------------------------------------------------------------------------ In tsfc/coffee_mode.py <#98 (comment)>: > + factor_group = groupby(monomials, key=group_key) + + # We should not drop monomials + assert sum(len(ms) for _, ms in factor_group) == len(monomials) + + sum_indices = next(iter(monomials)).sum_indices + new_monomials = [] + for oa, monomials in factor_group: + # Create new MonomialSum for the factorised out terms + sub_monomials = [] + for monomial in monomials: + atomics = list(monomial.atomics) + atomics.remove(oa) # remove common factor + sub_monomials.append(Monomial((), tuple(atomics), monomial.rest)) + sub_monomials = optimise_monomials(sub_monomials, argument_indices) + assert len(sub_monomials) > 0 hmm there might be a problem here. I might get no monomials back when factorising |a*(stuff) + a*(-stuff)|, I guess I should change this to return |Zero()|, and also add a line to |monomial_sum_to_expression| to handle empty input? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#98 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AEBAmdwV-G-kF-yKknS52DWEM0R5QvfUks5r0eHPgaJpZM4MFWbM>.

miklos1 · 2017-04-28T13:22:08Z

I suggest to make this a squash merge when we get there, since the overall lines/commit rate is very low.

miklos1 · 2017-04-28T13:41:21Z

I would argue with this, because as far as I understand, the number of 1s is the number of operations in the innermost loop, and then the negative weights are pushing towards fewer operations in the outer loops amongst those solutions that have the same number of operations in the innermost one. This sounds fair, however, the innermost loop may not be many more times executed than the second innermost, so for a low-order problem, one might theoretically (not sure about practice) imagine a situation where one more operation in the innermost loop results in a fewer number of overall operations because that could allow fewer operations in outer loops. I believe we should be able to quantify this exactly, but not sure if it is actually worth doing in practice.

…

On 28/04/17 14:34, tj-sun wrote: ***@***.**** commented on this pull request. ------------------------------------------------------------------------ In tsfc/coffee_mode.py <#98 (comment)>: > + + if len(atomic_index) == 0: + return () + if len(atomic_index) == 1: + optimal_atomics, = iterkeys(atomic_index) + return (optimal_atomics, ) + + # set up the ILP + import pulp as ilp + ilp_prob = ilp.LpProblem('gem factorise', ilp.LpMinimize) + ilp_var = ilp.LpVariable.dicts('node', range(len(atomic_index)), 0, 1, ilp.LpBinary) + + # Objective function + # Minimise number of factors to pull. If same number, favour factor with larger extent + big = 1e20 # some arbitrary big number + ilp_prob += ilp.lpSum(ilp_var[index] * (big - index_extent(atomic, argument_indices)) So it needs to be large enough such that the solution with less number of |1|s are always preferred. And for solutions with equal number of |1|s, we prioritise the combination with larger extents. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#98 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AEBAmbsNanQzJ2J6kv_AZEfZ10OfPNNrks5r0eragaJpZM4MFWbM>.

- remove count_flops - remove reassociate and test - optimisation testing for expression rather than flops - update penality constant for ILP - use defaultdict to map atomics in find_optimal_atomics - rewrite some comments

miklos1 · 2017-04-29T10:23:16Z

Pushed a little cleanup to gem.optimise. Among others, fixed the docstrings of make_sum and make_product.

miklos1 · 2017-04-29T10:24:24Z

tsfc/parameters.py

@@ -14,7 +14,8 @@
    "quadrature_degree": "auto",

    # Default mode
-    "mode": "vanilla",
+    # "mode": "vanilla",
+    "mode": "coffee",


I believe we aren't ready to make this switch yet.

So we do make the switch. In this case, just remove the commented line.

miklos1 · 2017-04-29T10:38:05Z

tsfc/coffee_mode.py

+
+    :returns: an iterable of factorised :class:`Monomials`s
+    """
+    if not optimal_atomics or len(monomials) < 2:


<= 1 would be slightly better style: what is special about 2?

miklos1 · 2017-04-29T10:41:41Z

Okay, I think we can merge this next business day.

tj-sun and others added 30 commits December 16, 2016 12:35

migrate functionality: division to multiplication conversion

d58222e

migrated functionality: re-association of products

8c61eee

will add tests and PR

add tests for replace_div and reassociation

1068bea

really added the test file this time

84dfd91

fix bug in division (wrong variable name)

153e954

- use reduce()

d891df9

- remove Terminal if statement (because all node have children field) - singledispatch pattern for reassociation - Memoizer for reassociation

Some changes to reassociation

95924bd

- use the default reuse_if_untouched() instead of rewriting it - pop queue from front to make it FIFO

migrate factorisation

520051d

- idempotent test fails, haven't figured out why - new helper function to collect terms

fix idempotent error

b582986

- turns out it's problem with python set being not stable

factorise pass alltest but seems a bit slower than before, will add t…

22d3299

…ests

add two tests for factorise()

19a9c57

- probably need to write a pass to rearrange operands in a specific sequence (e.g. first by rank, then alphabetically, so that A+B+C == B+C+A) to make testing easier

prototype of optimal factorisation

5aad6ef

- only partially done, still a lot to do - some basic latex formatting for Gem nodes, to help debugging - return gem IR as well from compile_integral - is_equal() and get_hash() for Sum, so that A+B == B+A

refactoring latex logic

d7a6df0

refactoring latex printing

603df11

tidying up factorisation

e838cc3

- more latex printing - some examples

add count_flop() function

fb99acd

- new attempt to do factorise() - fix bug in latex() for Literal

New algorithm to do factorisation()

a78aee8

- Most method memoized

add examples

754860b

tidying up factorise()

92c3c35

- remove dev.py, move functionality to optimise.py - remove previous factorise function - update collect_term function - bug fix: expand child after trying factorisation with index i

PEP8 fixes

421079f

singledispatch for factorise

95daa0b

DAG-awareness for factorise()

b67ee57

- still have some problems with Memoizer - recursive call times out sometimes

Passing all TSFC tests now

cefa5c7

- fix bug in expanding products - still need to Memoize count_flop()

PEP8

ddb6f9a

Memoizer for more functions used in factorise()

1b1dec4

- expand_all_product() - count_flop() - collect_terms()

Using MemoizerArg correctly

ed31ce5

Add linear index as an argument for factorise()

91d3f66

- still quite slow, need to think about heuristics, such as avoid unnecessary expansion of products

avoid expansion of factors if node does not depend on argument (linea…

3fb21ee

…r) index

new expand_all_product() function seems to break idempotency

913fa00

- still investigating why it happens

Fix idempotency issues

f903040

- Need to sort free indices before factorising because sometimes they appear in different order - update on tests - streamlining the algorithm slightly, more to do there

wence- reviewed Apr 27, 2017

View reviewed changes

updates based on feedbacks on PR

d1e80ff

- new optimise_monomials method which does group by sum indices - better grouping by optimal atomics in factorise_atomics - remove unnecessary IF statements

miklos1 reviewed Apr 27, 2017

View reviewed changes

tj-sun added 2 commits April 27, 2017 16:01

changes to API

563bd14

- factorise_atomics and optimise_monomials now take in and also reutrn iterable of monomials, instead of MonomialSum - add assertion to check all monomials have same sum indices in optimise_monomials - remove logic in early return from factorise_atomics

updates based on feedbacks on PR

a226f18

- better assertion for checking sum indices - docstring - typos

wence- approved these changes Apr 28, 2017

View reviewed changes

tj-sun and others added 2 commits April 28, 2017 16:20

update based on feedbacks on PR

ad573d9

- remove count_flops - remove reassociate and test - optimisation testing for expression rather than flops - update penality constant for ILP - use defaultdict to map atomics in find_optimal_atomics - rewrite some comments

slightly cleanup gem/optimise.py diff

5290c6b

miklos1 reviewed Apr 29, 2017

View reviewed changes

Fix several comments.

d84bfe6

miklos1 mentioned this pull request May 2, 2017

Switch COFFEE optimisation level to 'Ov' firedrakeproject/firedrake#1060

Closed

Remove commented out code

a9cb4f5

miklos1 approved these changes May 3, 2017

View reviewed changes

tj-sun merged commit b7251e1 into master May 3, 2017

miklos1 mentioned this pull request Aug 2, 2017

Delta cancellation and amalgamate spectral and coffee modes #142

Merged

tj-sun deleted the coffee-migration branch December 15, 2017 12:07



		def make_product(factors, sum_indices=()):
		"""Create a Product from collection of factors. Calls sum_factorise to

Coffee migration #98

Coffee migration #98

Conversation

tj-sun commented Feb 19, 2017 • edited

Test if functionalities are migrated

Minor problems:

TODOs:

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wence- Apr 27, 2017 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wence- left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tj-sun Apr 28, 2017 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

miklos1 commented Apr 28, 2017 via email

miklos1 commented Apr 28, 2017 via email

miklos1 commented Apr 28, 2017

miklos1 commented Apr 28, 2017 via email

miklos1 commented Apr 29, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

miklos1 commented Apr 29, 2017

tj-sun commented Feb 19, 2017 •

edited

wence- Apr 27, 2017 •

edited

tj-sun Apr 28, 2017 •

edited