Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

construct only necessary elements in OffsetCalculator #55107

Closed
wants to merge 1 commit into from

Conversation

ngimel
Copy link
Collaborator

@ngimel ngimel commented Mar 31, 2021

Per title. Elements beyond dim are never accessed because

if (dim == dims) {
break;
}
.
On addmm instruction count per 30 repetitions 1467813 -> 1452261
add 651522 -> 633462
add_ 529331 -> 511271

add benchmarking snippet:

 timer = Timer("m1.add_(b);", setup="at::Tensor m1=torch::empty({2,2},device(at::kCUDA) ); at::Tensor b = torch::empty({2}, device(at::kCUDA));", language="c++", timer=timeit.default_timer)
 stats=timer.collect_callgrind(number=30)
 print(stats.as_standardized().stats(inclusive=False))

@facebook-github-bot
Copy link
Contributor

facebook-github-bot commented Mar 31, 2021

💊 CI failures summary and remediations

As of commit 646510f (more details on the Dr. CI page):


  • 2/2 failures possibly* introduced in this PR
    • 2/2 non-scanned failure(s)

ci.pytorch.org: 1 failed


This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions to the (internal) Dr. CI Users group.

@ngimel ngimel requested review from swolchok and ezyang March 31, 2021 21:31
@codecov
Copy link

codecov bot commented Apr 1, 2021

Codecov Report

Merging #55107 (646510f) into master (4865195) will increase coverage by 0.34%.
The diff coverage is n/a.

@@            Coverage Diff             @@
##           master   #55107      +/-   ##
==========================================
+ Coverage   77.08%   77.42%   +0.34%     
==========================================
  Files        1893     1893              
  Lines      186444   186444              
==========================================
+ Hits       143715   144362     +647     
+ Misses      42729    42082     -647     

Copy link
Contributor

@ezyang ezyang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice catch!

@facebook-github-bot
Copy link
Contributor

@ngimel has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

Comment on lines 33 to 36
for (int arg = 0; arg < NARGS; arg++) {
int64_t element_size = (element_sizes == nullptr ? 1LL : element_sizes[arg]);
strides_[i][arg] = i < dims ? strides[arg][i] / element_size : 0;
strides_[i][arg] = strides[arg][i] / element_size;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

btw, might be worth emitting two loops here depending on whether element_sizes == nullptr to avoid a bunch of idiv instructions just to divide by 1.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, I'll do that.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tried it, it actually increases instruction count, and in most cases element_sizes is not nullptr.
It is silly that TI build multiplies by element sizes and stores stride in bytes, and here we copy element_sizes into array (true story,

element_sizes[i] = iter.element_size(i);
) and then divide by them, but that's outside of this fix scope.

@facebook-github-bot
Copy link
Contributor

@ngimel merged this pull request in 0d47374.

@ngimel ngimel deleted the ngimel/offset_calc branch December 26, 2021 06:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants