Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement core.checkedint in terms of LLVM intrinsics. #772

Merged
merged 1 commit into from
Nov 1, 2014

Conversation

redstar
Copy link
Member

@redstar redstar commented Nov 1, 2014

No description provided.

redstar added a commit that referenced this pull request Nov 1, 2014
Implement core.checkedint in terms of LLVM intrinsics.
@redstar redstar merged commit 8cdfab4 into ldc-developers:master Nov 1, 2014
@dnadlinger
Copy link
Member

I'm not sure whether |= is the best choice here. Whether a branch or load/or is worse pretty much depends on whether the overflow flag is in a register after inlining. See Walter's original PR for a discussion on this.

@redstar
Copy link
Member Author

redstar commented Nov 3, 2014

Thanks for the hint - I take a look at Walter's PR. I think the code generated by LLVM is pretty bad but it is difficult to create a good branch-free version. With the parameters in rax and rbx my spontaneous approach is:

mov    rdx, 1
add    rax, rbx
cmovnc rdx, overflow
mov    overflow, rdx

In the non-overflow case this creates a stall because of memory latency. I can avoid this stall by always reading memory:

mov    rcx, 1
mov    rdx, overflow
add    rax, rbx
cmovc  rdx, rcx
mov    overflow, rdx

In contrast the code with a branch is obvious:

add    rax, rbx
jnc    .L1
mov    overflow, 1
.L1:

@dnadlinger
Copy link
Member

Yeah, and since overflow is probably rare in most applications, the branch should predict really well. All in all, I favored going with a branch-y version (that might be cleaned up by instruction selection anyway?) as the code is more straightforward and it seemed to be preferred by some microbenchmarks. In any case, I guess we'll see how it turns out once somebody actually uses these in a hot loop and we have some real use case to optimize for.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants