Bug/789 pow binary op performance #793

coquelin77 · 2021-06-10T14:03:44Z

Description

Optimizations for pow

Issue/s resolved: #789 although resolved is a strong word...more work required

Changes proposed:

avoid calling _binary_op unless absolutely necessary and use a simpler switch instead

Type of change

Optimization

Due Diligence

All split configurations tested
Multiple dtypes tested in relevant functions
- this may have an effect on the dtype results. however the tests ran clean with multiple different operations in var which requires this
Documentation updated (if needed)
Updated changelog.md under the title "Pending Additions"

Does this change modify the behaviour of other functions? If so, which?

YES! comm.chunk now returns 4 parameters instead of 3!

codecov · 2021-06-10T14:07:22Z

Codecov Report

Merging #793 (b9f833d) into main (05a2acd) will decrease coverage by 0.18%.
The diff coverage is 85.89%.

@@            Coverage Diff             @@
##             main     #793      +/-   ##
==========================================
- Coverage   91.12%   90.93%   -0.19%     
==========================================
  Files          65       65              
  Lines        9976    10143     +167     
==========================================
+ Hits         9091     9224     +133     
- Misses        885      919      +34

Flag	Coverage Δ
gpu	`?`
unit	`90.93% <85.89%> (-0.17%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
heat/core/_operations.py	`96.04% <ø> (ø)`
heat/core/stride_tricks.py	`81.53% <50.00%> (ø)`
heat/core/arithmetics.py	`90.86% <81.35%> (-7.74%)`	⬇️
heat/core/communication.py	`89.96% <100.00%> (+0.01%)`	⬆️
heat/core/dndarray.py	`96.66% <100.00%> (+<0.01%)`	⬆️
heat/core/factories.py	`99.23% <100.00%> (-0.01%)`	⬇️
heat/core/indexing.py	`100.00% <100.00%> (ø)`
heat/core/io.py	`89.39% <100.00%> (-0.05%)`	⬇️
heat/core/linalg/basics.py	`94.22% <100.00%> (ø)`
heat/core/manipulations.py	`98.63% <100.00%> (ø)`
... and 6 more

coquelin77 · 2021-06-14T11:47:49Z

rerun tests

ClaudiaComito · 2021-06-14T16:13:12Z

heat/core/arithmetics.py

+    if isinstance(t1, DNDarray):
+        ret = factories.zeros_like(t1)
+        try:
+            t2 = manipulations.resplit(t2, t1.split)


Hey I'm thinking this might backfire. Example: two DNDarrays t1 and t2.
t1.shape = (100, 8) and t1.split=1.
t2.shape = (8,).

The shapes are broadcastable and ht.pow(t1, t2) is legitimate. t2 cannot be split along 1 though, so in the follow-up the assumption is that t2 is not a DNDarray. Line 786 will fail.

I'm wondering if we need so many checks at all. All we need to know is if t1 or t2 is a scalar (with the other one a DNDarray). How about

# t1 is a verified DNDarray if not hasattr(t2, "dtype"): ret = torch.pow(t1.larray, t2) else: # call binary_op

or something like that? What am I missing?

you are correct. that was an issue with the assumption that i made. i updated the logic in this to correct for this now. it should be able to handle anything you can throw at it (i hope).

@ClaudiaComito

…w (thanks @ClaudiaComito)

coquelin77 · 2021-07-06T12:12:16Z

rerun tests

coquelin77 · 2021-07-13T08:44:39Z

rerun tests

…eError for bad shapes

…re when multiplying by bools

…ced. \n\n when used in factories, this can be used in place of assuming that balance is true. \n other changes here related to chunk now returning 4 parameters

…rices for common use-cases

ghost · 2022-06-27T13:13:12Z

👇 Click on the image for a new way to code review

Legend

coquelin77 · 2022-06-27T13:22:56Z

New numbers:

local tests: 4 procs - a = ht.random.random((10000, 10000), dtype=ht.float32, split=split)
numbers are avg of 10 runs

pow (new / old)
- a ** a: 0.2157 / 0.2207
- 2.5 ** a: 0.2099 / 0.2051
- a ** 2.5: 0.1986 / 0.2002
- a ** 2: 0.0709 / 0.2390
add (new / old)
- a + a: 0.0531 / 0.0697
- 2.5 + a: 0.0571 / 0.0603
- a + 2.5: 0.0597 / 0.0615
- a + 2: 0.0578 / 0.0592
sub (new / old)
- a - a: 0.0590 / 0.0689*
- 2.5 - a: 0.0584 / 0.0570
- a - 2.5: 0.0571 / 0.0583
- a - 2: 0.0587 / 0.0593
div (new / old)
- a / a: 0.0582 / 0.0580
- 2.5 a: 0.0582 / 0.0574
- a / 2.5: 0.0615 / 0.0596
- a / 2: 0.0540 / 0.0601
floordiv (new / old)
- a // a: 0.1291 / 0.1374
- 2.5 // a: 0.1342 / 0.1371
- a // 2.5: 0.1339 / 0.1406
- a // 2: 0.1886 / 0.1922
mul (new / old)
- a * a: 0.0557 / 0.0579
- 2.5 * a: 0.0580 / 0.0572
- a * 2.5: 0.0600 / 0.0595
- a * 2: 0.0600 / 0.0581

coquelin77 · 2022-06-27T13:23:20Z

@mtar can you have a look at why this is failing? i dont know why gpu isnt recognized. maybe its a but somewhere that im missing

ClaudiaComito · 2023-02-13T04:53:38Z

Some of this PR has been superseded by the latest implementation of _operations.__binary_op with distribution sanitation #902 , but a lot of it is still relevant. I will make the necessary changes this week, @coquelin77 please scream if you prefer a review and to introduce the changes yourself.

ClaudiaComito · 2023-04-18T11:00:35Z

Closing this as too stale to update, changes implemented in #1141

coquelin77 added 3 commits June 10, 2021 12:19

pow rewrite

896eefe

minor updates to algorithm in pow and tests

36ae717

removed debug line

6a002f0

coquelin77 requested review from Markus-Goetz and ClaudiaComito June 10, 2021 14:03

ClaudiaComito requested changes Jun 14, 2021

View reviewed changes

coquelin77 added 2 commits June 16, 2021 15:41

Merge branch 'master' into bug/789-pow-binary-op-performance

334c15f

added tests for new pow function, and corrected the split logic in po…

844a1ff

…w (thanks @ClaudiaComito)

coquelin77 mentioned this pull request Jun 17, 2021

performance issues on a single MPI process #789

Closed

coquelin77 added 2 commits June 23, 2021 10:46

Merge branch 'master' into bug/789-pow-binary-op-performance

200d5ce

Merge branch 'master' into bug/789-pow-binary-op-performance

743f952

coquelin77 added 9 commits July 13, 2021 15:51

Merge branch 'master' into bug/789-pow-binary-op-performance

00b18c6

working towards optimizing basic ops. more work required. MESSY

9f75c22

Merge branch 'master' into bug/789-pow-binary-op-performance

1dcc07a

Merge branch 'main' into bug/789-pow-binary-op-performance

f20cbc3

stride_tricks.broadcast_shape now raises RuntimeError instead of Valu…

e28607a

…eError for bad shapes

updated where calculation to use correct dtypes, possible bug in futu…

7411ce1

…re when multiplying by bools

comm.chunk now returns a boolean to indicate if an array can be balan…

c960a1a

…ced. \n\n when used in factories, this can be used in place of assuming that balance is true. \n other changes here related to chunk now returning 4 parameters

updates to add, div, pow, mul, sub to increase speedup with large mat…

ed3e2f7

…rices for common use-cases

Merge branch 'main' into bug/789-pow-binary-op-performance

437ca5f

Merge branch 'main' into bug/789-pow-binary-op-performance

dd2afd7

ClaudiaComito added the PR talk label Jul 4, 2022

Merge branch 'main' into bug/789-pow-binary-op-performance

b9f833d

Merge branch 'main' into bug/789-pow-binary-op-performance

b32c563

ClaudiaComito removed the PR talk label Feb 13, 2023

ClaudiaComito self-assigned this Feb 13, 2023

ClaudiaComito added this to the 1.3.0 milestone Mar 29, 2023

ClaudiaComito added 2 commits March 29, 2023 11:44

Merge branch 'main' into bug/789-pow-binary-op-performance

1bf99e7

Merge branch 'main' into bug/789-pow-binary-op-performance

f005790

ClaudiaComito mentioned this pull request Apr 18, 2023

heat.pow() speed-up when exponent is int #1141

Merged

4 tasks

ClaudiaComito closed this Apr 18, 2023

mtar deleted the bug/789-pow-binary-op-performance branch February 28, 2024 10:57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bug/789 pow binary op performance #793

Bug/789 pow binary op performance #793

coquelin77 commented Jun 10, 2021 •

edited

Loading

codecov bot commented Jun 10, 2021 •

edited

Loading

coquelin77 commented Jun 14, 2021

ClaudiaComito Jun 14, 2021

coquelin77 Jun 16, 2021

coquelin77 commented Jul 6, 2021

coquelin77 commented Jul 13, 2021

ghost commented Jun 27, 2022 •

edited by ghost

Loading

Legend

coquelin77 commented Jun 27, 2022

coquelin77 commented Jun 27, 2022

ClaudiaComito commented Feb 13, 2023

ClaudiaComito commented Apr 18, 2023

Bug/789 pow binary op performance #793

Bug/789 pow binary op performance #793

Conversation

coquelin77 commented Jun 10, 2021 • edited Loading

Description

Changes proposed:

Type of change

Due Diligence

Does this change modify the behaviour of other functions? If so, which?

codecov bot commented Jun 10, 2021 • edited Loading

Codecov Report

coquelin77 commented Jun 14, 2021

ClaudiaComito Jun 14, 2021

Choose a reason for hiding this comment

coquelin77 Jun 16, 2021

Choose a reason for hiding this comment

coquelin77 commented Jul 6, 2021

coquelin77 commented Jul 13, 2021

ghost commented Jun 27, 2022 • edited by ghost Loading

Legend

coquelin77 commented Jun 27, 2022

coquelin77 commented Jun 27, 2022

ClaudiaComito commented Feb 13, 2023

ClaudiaComito commented Apr 18, 2023

coquelin77 commented Jun 10, 2021 •

edited

Loading

codecov bot commented Jun 10, 2021 •

edited

Loading

ghost commented Jun 27, 2022 •

edited by ghost

Loading