New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Coffee migration #98
Coffee migration #98
Conversation
will add tests and PR
- remove Terminal if statement (because all node have children field) - singledispatch pattern for reassociation - Memoizer for reassociation
- use the default reuse_if_untouched() instead of rewriting it - pop queue from front to make it FIFO
- idempotent test fails, haven't figured out why - new helper function to collect terms
- turns out it's problem with python set being not stable
- probably need to write a pass to rearrange operands in a specific sequence (e.g. first by rank, then alphabetically, so that A+B+C == B+C+A) to make testing easier
- only partially done, still a lot to do - some basic latex formatting for Gem nodes, to help debugging - return gem IR as well from compile_integral - is_equal() and get_hash() for Sum, so that A+B == B+A
- more latex printing - some examples
- new attempt to do factorise() - fix bug in latex() for Literal
- Most method memoized
- remove dev.py, move functionality to optimise.py - remove previous factorise function - update collect_term function - bug fix: expand child after trying factorisation with index i
- still have some problems with Memoizer - recursive call times out sometimes
- fix bug in expanding products - still need to Memoize count_flop()
- expand_all_product() - count_flop() - collect_terms()
- still quite slow, need to think about heuristics, such as avoid unnecessary expansion of products
- still investigating why it happens
- Need to sort free indices before factorising because sometimes they appear in different order - update on tests - streamlining the algorithm slightly, more to do there
tsfc/coffee_mode.py
Outdated
if set(argument_indices) & set(new_node.free_indices): | ||
new_atomics.append(new_node) | ||
else: | ||
new_rest = new_node |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This branching looks icky to me. Is it because monomial_sum_to_expression
does the wrong thing if there is only one monomial in the monomial sum? If so, we should fix that. It feels like the "generic" logic here should work for a "single monomial" case correctly too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This branch becomes:
node = monomial_sum_to_expression(sub_monomial_sum)
if set(argument_indices) & set(new_node.free_indices):
new_monomial_sum.add(sum_indices, (oa, node), one)
else:
new_monomial_sum.add(sum_indices, (oa, ), node)
tsfc/coffee_mode.py
Outdated
# Create new MonomialSum for the factorised out terms | ||
sub_monomial_sum = MonomialSum() | ||
for _atomics, _rest in zip(all_atomics, all_rest): | ||
sub_monomial_sum.add((), _atomics, _rest) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sub_monomial_sum = MonomialSum()
for monomial in monomials:
atomics = list(monomial.atomics)
atomics.remove(oa)
sub_monomial_sum.add((), atomics, monomial.rest)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Amended.
tsfc/coffee_mode.py
Outdated
# Just one monomial with this group, add to new MonomialSum straightaway | ||
monomial, = monomials | ||
new_monomial_sum.add(*monomial) | ||
continue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This branch can go, we think.
tsfc/coffee_mode.py
Outdated
new_monomial_sum.add(*monomial) | ||
# We should not drop monomials | ||
assert sum(map(len, itervalues(factor_group))) + len(new_monomial_sum) == len(monomial_sum) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sum_indices = next(iter(monomial_sum)).sum_indices
Should probably assert that all the provided monomials have the same sum_indices.
tsfc/coffee_mode.py
Outdated
sub_monomial, = sub_monomial_sum | ||
new_atomics = sub_monomial.atomics | ||
new_atomics += (oa,) # add back common factor | ||
new_rest = sub_monomial.rest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here we can just do:
new_monomial_sum.add(sum_indices, sub_monomial.atomics + (oa, ), sub_monomial.rest)
tsfc/coffee_mode.py
Outdated
new_rest = new_node | ||
# Pick sum indices from the first monomial | ||
sum_indices = monomials[0].sum_indices | ||
new_monomial_sum.add(sum_indices, new_atomics, new_rest) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These lines disappear.
tsfc/coffee_mode.py
Outdated
sub_monomial_sum = MonomialSum() | ||
for _atomics, _rest in zip(all_atomics, all_rest): | ||
sub_monomial_sum.add((), _atomics, _rest) | ||
sub_monomial_sum = optimise_monomial_sum(sub_monomial_sum, argument_indices) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This call does too much work, because it groups on sum_indices, but we know sub_monomial_sum only has sum_indices ()
. So factor out core part and call that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
calling optimise_mononials
now, which doesn't do grouping (probably it should return an iterable of monomials rather than MonomialSum in the next version, will see)
tsfc/coffee_mode.py
Outdated
if len(atomic_index) == 0: | ||
return ((), ()) | ||
if len(atomic_index) == 1: | ||
return ((next(iterkeys(atomic_index)), ), ()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix this bug, should do:
optimal_atomic, = atomic_index.keys()
return (optimal_atomic, )
and the empty branch should just return ()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
- new optimise_monomials method which does group by sum indices - better grouping by optimal atomics in factorise_atomics - remove unnecessary IF statements
tsfc/coffee_mode.py
Outdated
new_monomial_sum.add(sum_indices, (oa, node), one) | ||
else: | ||
new_monomial_sum.add(sum_indices, (oa, ), node) | ||
return new_monomial_sum |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm thinking, is this function returning a MonomialSum
because this function is returning a MonomialSum
? Or is there a deeper reason?
- factorise_atomics and optimise_monomials now take in and also reutrn iterable of monomials, instead of MonomialSum - add assertion to check all monomials have same sum indices in optimise_monomials - remove logic in early return from factorise_atomics
- better assertion for checking sum indices - docstring - typos
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I am basically happy. The comments in this round a very minor (mostly more stylistic issues), what do you think of them?
tsfc/coffee_mode.py
Outdated
atomics = list(monomial.atomics) | ||
atomics.remove(oa) # remove common factor | ||
sub_monomials.append(Monomial((), tuple(atomics), monomial.rest)) | ||
sub_monomials = optimise_monomials(sub_monomials, argument_indices) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, so we pull out the common factor, then call back into optimise_monomials
to potentially exploit further factorisation opportunities in the now remaining expression.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. Added a line comment.
tsfc/coffee_mode.py
Outdated
atomics.remove(oa) # remove common factor | ||
sub_monomials.append(Monomial((), tuple(atomics), monomial.rest)) | ||
sub_monomials = optimise_monomials(sub_monomials, argument_indices) | ||
assert len(sub_monomials) > 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should get at least one monomial back. Otherwise presumably the entire expression was zero or similar?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm there might be a problem here. I might get no monomials back when factorising a*(stuff) + a*(-stuff)
, I guess I should change this to return [], and add base case to make_sum
to handle empty input?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually it's probably okay since we don't simplify stuff + (-1)*stuff
, and if we do (later), we will just get Zero out of monomial_sum_to_expression
, so we still get a monomial with Zero as rest
, and propagate upwards.
So I think I just need to remove this assertion. Will test it.
tsfc/coffee_mode.py
Outdated
# new MonomialSum directly | ||
sub_monomial, = sub_monomials | ||
new_monomials.append( | ||
Monomial(sum_indices, sub_monomial.atomics + (oa,), sub_monomial.rest)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Special case the "single monomial" case, because IIUC, there may be some structure here that you want to keep. Question: are the "atomics" in a monomial ordered? (I only ask because here you append the common factor to the new monomial, whereas below you prepend it).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it will (potentially) change the generated code when we do grouping of nodes with same free indices. Changed to prepend.
tsfc/coffee_mode.py
Outdated
Monomial(sum_indices, sub_monomial.atomics + (oa,), sub_monomial.rest)) | ||
else: | ||
# result is a sum, we need to create a new node | ||
node = monomial_sum_to_expression(sub_monomials) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We've got a sum of monomials, so convert to a gem node, so that we can create a new Monomial that multiplies by the common factor?
This does an optimised for flop minimisation (?) version of:
reduce(Sum, IndexSum(reduce(Product, m.atomics, m.rest), m.sum_indices) for m in sub_monomials)
I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we discussed having a short comment explanation in the code of why you choose to split the two cases apart (because the else branch would work on a singleton list of monomials, no?).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added some notes.
tsfc/coffee_mode.py
Outdated
# result is a sum, we need to create a new node | ||
node = monomial_sum_to_expression(sub_monomials) | ||
if set(argument_indices) & set(node.free_indices): | ||
new_monomials.append(Monomial(sum_indices, (oa, node), one)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the new node's free indices intersect with the argument indices, then there might be opportunity for further refactorisation later (so the new node is a possible new ATOMIC), otherwise, it's a COMPOUND?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The terminology collect_monomials
uses with @tj-sun's classifier
: if node's free indices have intersection with the argument indices, then it's either ATOMIC (only one argument index) or COMPOUND (more than one argument index), otherwise it's OTHER.
tests/test_coffee_optimise.py
Outdated
Product(A3i, Product(Bj, Ek))), | ||
Product(Z, Product(A1i, Product(Bj, Fk)))) | ||
result, = optimise_expressions([expr], ((j,), (k,))) | ||
assert count_flop(result) == 2680 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think these tests should also assert that the resulting expression is the expected one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All changed. Removed count_flop
as well (might need it in tempoary_graph
branch later)
tests/test_coffee_optimise.py
Outdated
# A * B[i] | ||
assert result.children[1].children[1] == Product(A, Bi) | ||
# t * P[i] | ||
assert result.children[0].children[1].children[1] == Product(t, Pi) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be better to just assert that the result is equal to the expected expression?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes!
tests/test_coffee_optimise.py
Outdated
result = replace_division([d])[0] | ||
|
||
assert isinstance(result, Product) | ||
assert isinstance(result.children[1], Division) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, just test that the resulting expression is the expected one?
gem/optimise.py
Outdated
|
||
|
||
def make_product(factors, sum_indices=()): | ||
"""Create a Product from collection of factors. Calls sum_factorise to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can link here, say Uses :func:`sum_factorise` to ...
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Amended.
gem/optimise.py
Outdated
return True | ||
return False | ||
mapper = Memoizer(_reassociate_product) | ||
mapper.stop_at = stop_at |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As dicussed, this added code is not used anywhere except in the test suite. Should it be pulled out into a separate changeset?
Okay, this may be tricky. It is clear that Python can represent
int(1e20), but it's not clear that GLPK can do as well.
…On 28/04/17 13:39, tj-sun wrote:
***@***.**** commented on this pull request.
------------------------------------------------------------------------
In tsfc/coffee_mode.py
<#98 (comment)>:
> +
+ if len(atomic_index) == 0:
+ return ()
+ if len(atomic_index) == 1:
+ optimal_atomics, = iterkeys(atomic_index)
+ return (optimal_atomics, )
+
+ # set up the ILP
+ import pulp as ilp
+ ilp_prob = ilp.LpProblem('gem factorise', ilp.LpMinimize)
+ ilp_var = ilp.LpVariable.dicts('node', range(len(atomic_index)), 0, 1, ilp.LpBinary)
+
+ # Objective function
+ # Minimise number of factors to pull. If same number, favour factor with larger extent
+ big = 1e20 # some arbitrary big number
+ ilp_prob += ilp.lpSum(ilp_var[index] * (big - index_extent(atomic, argument_indices))
Ah didn't notice this underflow!
It's my error when I try to distinguish between different solutions. I
think COFFEE just pick whichever solution that Pulp returns. I
probably go with |int(1e20)| as that will rank the solutions first by
number of |1|s, then by their extent.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#98 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AEBAmXkhN6GevD9S6KyxdtOebGUAtkpKks5r0d3-gaJpZM4MFWbM>.
|
I can hardly believe that that will actually happen. Do you have an example?
…On 28/04/17 13:55, tj-sun wrote:
***@***.**** commented on this pull request.
------------------------------------------------------------------------
In tsfc/coffee_mode.py
<#98 (comment)>:
> + factor_group = groupby(monomials, key=group_key)
+
+ # We should not drop monomials
+ assert sum(len(ms) for _, ms in factor_group) == len(monomials)
+
+ sum_indices = next(iter(monomials)).sum_indices
+ new_monomials = []
+ for oa, monomials in factor_group:
+ # Create new MonomialSum for the factorised out terms
+ sub_monomials = []
+ for monomial in monomials:
+ atomics = list(monomial.atomics)
+ atomics.remove(oa) # remove common factor
+ sub_monomials.append(Monomial((), tuple(atomics), monomial.rest))
+ sub_monomials = optimise_monomials(sub_monomials, argument_indices)
+ assert len(sub_monomials) > 0
hmm there might be a problem here. I might get no monomials back when
factorising |a*(stuff) + a*(-stuff)|, I guess I should change this to
return |Zero()|, and also add a line to |monomial_sum_to_expression|
to handle empty input?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#98 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AEBAmdwV-G-kF-yKknS52DWEM0R5QvfUks5r0eHPgaJpZM4MFWbM>.
|
I suggest to make this a squash merge when we get there, since the overall lines/commit rate is very low. |
I would argue with this, because as far as I understand, the number of
1s is the number of operations in the innermost loop, and then the
negative weights are pushing towards fewer operations in the outer loops
amongst those solutions that have the same number of operations in the
innermost one. This sounds fair, however, the innermost loop may not be
many more times executed than the second innermost, so for a low-order
problem, one might theoretically (not sure about practice) imagine a
situation where one more operation in the innermost loop results in a
fewer number of overall operations because that could allow fewer
operations in outer loops. I believe we should be able to quantify this
exactly, but not sure if it is actually worth doing in practice.
…On 28/04/17 14:34, tj-sun wrote:
***@***.**** commented on this pull request.
------------------------------------------------------------------------
In tsfc/coffee_mode.py
<#98 (comment)>:
> +
+ if len(atomic_index) == 0:
+ return ()
+ if len(atomic_index) == 1:
+ optimal_atomics, = iterkeys(atomic_index)
+ return (optimal_atomics, )
+
+ # set up the ILP
+ import pulp as ilp
+ ilp_prob = ilp.LpProblem('gem factorise', ilp.LpMinimize)
+ ilp_var = ilp.LpVariable.dicts('node', range(len(atomic_index)), 0, 1, ilp.LpBinary)
+
+ # Objective function
+ # Minimise number of factors to pull. If same number, favour factor with larger extent
+ big = 1e20 # some arbitrary big number
+ ilp_prob += ilp.lpSum(ilp_var[index] * (big - index_extent(atomic, argument_indices))
So it needs to be large enough such that the solution with less number
of |1|s are always preferred. And for solutions with equal number of
|1|s, we prioritise the combination with larger extents.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#98 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AEBAmbsNanQzJ2J6kv_AZEfZ10OfPNNrks5r0eragaJpZM4MFWbM>.
|
- remove count_flops - remove reassociate and test - optimisation testing for expression rather than flops - update penality constant for ILP - use defaultdict to map atomics in find_optimal_atomics - rewrite some comments
Pushed a little cleanup to |
tsfc/parameters.py
Outdated
@@ -14,7 +14,8 @@ | |||
"quadrature_degree": "auto", | |||
|
|||
# Default mode | |||
"mode": "vanilla", | |||
# "mode": "vanilla", | |||
"mode": "coffee", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe we aren't ready to make this switch yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So we do make the switch. In this case, just remove the commented line.
tsfc/coffee_mode.py
Outdated
|
||
:returns: an iterable of factorised :class:`Monomials`s | ||
""" | ||
if not optimal_atomics or len(monomials) < 2: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
<= 1
would be slightly better style: what is special about 2
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Amended.
Okay, I think we can merge this next business day. |
This commit migrates loop optimisations in coffee to gem.
Test if functionalities are migrated
Results of flops comparison are in the link below
The result shows that gem.optimise() matches or betters coffee level 2 by and large. It also demonstrates that coffee level 2 cannot further optimise ASTs generated by gem.optimise().
There are some cases for which gem.optimise() are slightly worse. One example is mixed_poisson, dim=2, p=1, q=1, nf=0.
tsfc output
coffee o2
this PR
Notice COFFEE can merge temporaries t24 and t23. Such optimisation is not implemented here yet.
Compile time is generally OK, e.g. holzapfel takes 4.02s, vs tsfc 0.91s, COFFEE O2 13.80s.
Minor problems:
TODOs:
rest
which is probably expensive, maybe this is another optimisation level)Results link