-
Notifications
You must be signed in to change notification settings - Fork 18.5k
Open
Labels
NeedsFixThe path to resolution is known, but the work has not been done.The path to resolution is known, but the work has not been done.Performance
Milestone
Description
Suggestions from Torbjörn Granlund (personal e-mail):
"
The multiply primitives, in particular addMulVVW surely deserves more
attention:
Offset the pointers so that you can index with a counter register
which goes from -n to 0, saving the CMPQ.
Unroll. You can save most of the ADCQ $0, R that way. Basically,
do one run with just MULQ where you sum the old highpart (DX) with
the new lowpart (AX). You will need some MOVQ to move DX
out-of-the-way too. Then do a new run over these sums where you
bring in the memory addend. This should double the speed on some
newer CPUs.
A good addMulVVW is probably really the first thing to write in
assembly; addition and subtraction is much less important, usually.
"
odeke-em, heisen-li, svyotov and TheHackerDev
Metadata
Metadata
Assignees
Labels
NeedsFixThe path to resolution is known, but the work has not been done.The path to resolution is known, but the work has not been done.Performance