New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/compile: slicing can be improved on ARM #23006

Open
rasky opened this Issue Dec 5, 2017 · 4 comments

Comments

Projects
None yet
5 participants
@rasky
Member

rasky commented Dec 5, 2017

This code:

func slice(s []int64, idx int) []int {
    return s[idx:]
}

generates this code on ARM:

CMP R0, R7           // bound check
B.GE panic
SUB R7, R0, R0       // len
SUB R7, R4, R4       // cap
RSB $0, R4, R9       // mask(cap)
ASR $31, R9, R9      // mask(cap)
AND R7<<$3, R9, R9   // delta&mask(cap)
ADD R5, R9, R5       // ptr+delta&mask(cap)

but it could be optimized this way:

SUBS R7, R0, R0
B.LS panic
SUBS R7, R4, R4
ADD.NE R7<<$3, R9, R5

which is much shorter and faster.

@rasky

This comment has been minimized.

Member

rasky commented Dec 5, 2017

Moving the bound check into the slice operation sounds hard to me, but the other part should probably be easier. Not sure if it's better to special-case the code generation within (*state).slice() or trying to do the magic as a rule.

/cc @cherrymui @benshi001

@benshi001

This comment has been minimized.

Member

benshi001 commented Dec 6, 2017

Currently code is frozen, and only bug fix is allowed. We can try the optimization in go1.11.

@ALTree ALTree added the Performance label Dec 6, 2017

@tklauser tklauser added this to the Go1.11 milestone Dec 12, 2017

@bradfitz bradfitz modified the milestones: Go1.11, Unplanned May 18, 2018

@rasky

This comment has been minimized.

Member

rasky commented Sep 4, 2018

@benshi001 did you try this optimization? I think it can be beneficial for slides on Arm

@benshi001

This comment has been minimized.

Member

benshi001 commented Oct 17, 2018

@rasky

I have tried to optimize arm code with ADD.S/SUB.S, if both the flags and the result are further used. But unfortunately the go1 benchmark shows some regression, which I need more tuning work.

@benshi001 benshi001 self-assigned this Oct 17, 2018

@benshi001 benshi001 modified the milestones: Unplanned, Go1.13 Oct 17, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment