Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug/789 pow binary op performance #793

Closed
wants to merge 21 commits into from

Conversation

coquelin77
Copy link
Member

@coquelin77 coquelin77 commented Jun 10, 2021

Description

Optimizations for pow

Issue/s resolved: #789 although resolved is a strong word...more work required

Changes proposed:

  • avoid calling _binary_op unless absolutely necessary and use a simpler switch instead

Type of change

Optimization

Due Diligence

  • All split configurations tested
  • Multiple dtypes tested in relevant functions
    • this may have an effect on the dtype results. however the tests ran clean with multiple different operations in var which requires this
  • Documentation updated (if needed)
  • Updated changelog.md under the title "Pending Additions"

Does this change modify the behaviour of other functions? If so, which?

YES! comm.chunk now returns 4 parameters instead of 3!

@codecov
Copy link

codecov bot commented Jun 10, 2021

Codecov Report

Merging #793 (b9f833d) into main (05a2acd) will decrease coverage by 0.18%.
The diff coverage is 85.89%.

@@            Coverage Diff             @@
##             main     #793      +/-   ##
==========================================
- Coverage   91.12%   90.93%   -0.19%     
==========================================
  Files          65       65              
  Lines        9976    10143     +167     
==========================================
+ Hits         9091     9224     +133     
- Misses        885      919      +34     
Flag Coverage Δ
gpu ?
unit 90.93% <85.89%> (-0.17%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
heat/core/_operations.py 96.04% <ø> (ø)
heat/core/stride_tricks.py 81.53% <50.00%> (ø)
heat/core/arithmetics.py 90.86% <81.35%> (-7.74%) ⬇️
heat/core/communication.py 89.96% <100.00%> (+0.01%) ⬆️
heat/core/dndarray.py 96.66% <100.00%> (+<0.01%) ⬆️
heat/core/factories.py 99.23% <100.00%> (-0.01%) ⬇️
heat/core/indexing.py 100.00% <100.00%> (ø)
heat/core/io.py 89.39% <100.00%> (-0.05%) ⬇️
heat/core/linalg/basics.py 94.22% <100.00%> (ø)
heat/core/manipulations.py 98.63% <100.00%> (ø)
... and 6 more

@coquelin77
Copy link
Member Author

rerun tests

if isinstance(t1, DNDarray):
ret = factories.zeros_like(t1)
try:
t2 = manipulations.resplit(t2, t1.split)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey I'm thinking this might backfire. Example: two DNDarrays t1 and t2.
t1.shape = (100, 8) and t1.split=1.
t2.shape = (8,).

The shapes are broadcastable and ht.pow(t1, t2) is legitimate. t2 cannot be split along 1 though, so in the follow-up the assumption is that t2 is not a DNDarray. Line 786 will fail.

I'm wondering if we need so many checks at all. All we need to know is if t1 or t2 is a scalar (with the other one a DNDarray). How about

# t1 is a verified DNDarray
if not hasattr(t2, "dtype"):
   ret = torch.pow(t1.larray, t2)
else:
   # call binary_op

or something like that? What am I missing?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you are correct. that was an issue with the assumption that i made. i updated the logic in this to correct for this now. it should be able to handle anything you can throw at it (i hope).

@coquelin77
Copy link
Member Author

rerun tests

1 similar comment
@coquelin77
Copy link
Member Author

rerun tests

@ghost
Copy link

ghost commented Jun 27, 2022

👇 Click on the image for a new way to code review

Review these changes using an interactive CodeSee Map

Legend

CodeSee Map legend

@coquelin77
Copy link
Member Author

New numbers:

local tests: 4 procs - a = ht.random.random((10000, 10000), dtype=ht.float32, split=split)
numbers are avg of 10 runs

  • pow (new / old)
    • a ** a: 0.2157 / 0.2207
    • 2.5 ** a: 0.2099 / 0.2051
    • a ** 2.5: 0.1986 / 0.2002
    • a ** 2: 0.0709 / 0.2390
  • add (new / old)
    • a + a: 0.0531 / 0.0697
    • 2.5 + a: 0.0571 / 0.0603
    • a + 2.5: 0.0597 / 0.0615
    • a + 2: 0.0578 / 0.0592
  • sub (new / old)
    • a - a: 0.0590 / 0.0689*
    • 2.5 - a: 0.0584 / 0.0570
    • a - 2.5: 0.0571 / 0.0583
    • a - 2: 0.0587 / 0.0593
  • div (new / old)
    • a / a: 0.0582 / 0.0580
    • 2.5 a: 0.0582 / 0.0574
    • a / 2.5: 0.0615 / 0.0596
    • a / 2: 0.0540 / 0.0601
  • floordiv (new / old)
    • a // a: 0.1291 / 0.1374
    • 2.5 // a: 0.1342 / 0.1371
    • a // 2.5: 0.1339 / 0.1406
    • a // 2: 0.1886 / 0.1922
  • mul (new / old)
    • a * a: 0.0557 / 0.0579
    • 2.5 * a: 0.0580 / 0.0572
    • a * 2.5: 0.0600 / 0.0595
    • a * 2: 0.0600 / 0.0581

@coquelin77
Copy link
Member Author

@mtar can you have a look at why this is failing? i dont know why gpu isnt recognized. maybe its a but somewhere that im missing

@ClaudiaComito
Copy link
Contributor

Some of this PR has been superseded by the latest implementation of _operations.__binary_op with distribution sanitation #902 , but a lot of it is still relevant. I will make the necessary changes this week, @coquelin77 please scream if you prefer a review and to introduce the changes yourself.

@ClaudiaComito ClaudiaComito self-assigned this Feb 13, 2023
@ClaudiaComito ClaudiaComito added this to the 1.3.0 milestone Mar 29, 2023
@ClaudiaComito
Copy link
Contributor

Closing this as too stale to update, changes implemented in #1141

@mtar mtar deleted the bug/789-pow-binary-op-performance branch February 28, 2024 10:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

performance issues on a single MPI process
2 participants