New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add sha/asm/keccak1600-c64x.pl #3708
Conversation
[skip ci]
if ($rot&1) { | ||
$code.=<<___; | ||
$p ROTL B$src,$rot/2+1,A$dst | ||
|| ROTL A$src,$rot/2, B$dst |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice idea :-)
CMPLTU LEN,BSZ,A0 ; len < bsz? | ||
|| SHRU BSZ,3,BSZ | ||
[A0] BNOP ret? | ||
||[A0] ZERO BSZ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The || indicate parallel execution, and [A0] execute if A0 != 0, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Spot on. Just in case, it's worth remembering that it's not like you can pair whatever and however you like. There are limitations of various kinds. One should be obvious, only one pair of rotations in execution packet, one on A- and one on B-file, possibly cross-wise... Another counter-intuitive thing is that branch is actually taken five cycles later. I mean instructions past this branch up to and including NOP 4 still execute...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
instructions past this branch up to and including NOP 4 still execute...
Well, non-NOPs are predicated with [BSZ], which is zero if branch is taken. So that corresponding instructions are not executed in sense that they don't affect processor state. But they are executed in sense that processor does decode them, does all the intricate steps, and then just does nothing as prescribed by current value of the predicate register.
crypto/sha/asm/keccak1600-c64x.pl
Outdated
[BSZ] LDNDW *INP++,A1:A0 | ||
||[BSZ] SUB LEN,8,LEN | ||
||[BSZ] SUB BSZ,1,BSZ | ||
NOP 4 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, interesting, initially I thought the NOP 4 here is to allow for LDNDW to hit the A1:A0,
but the BNOP above can abort the NOP one cycle earlier then ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, NOP 4 is there to allow data to show up in A1:A0. It just so happens that it coincides with the cycle branch is actually taken. Load latency is 4 cycles and it's executed one cycle after BNOP, so they are kind of "aligned".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
they are kind of "aligned"
"They" are moments the branch is taken and data becoming available in registers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I see.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I misused term "latency" here. You rather think of "delay slots" here. Branch is 5 delay slots, load - 4, while for example addition is 0. Latency is normally a non-zero value, so that it's rather amount of delay slots plus 1.
crypto/sha/asm/keccak1600-c64x.pl
Outdated
||[A0] LDW *SP[1],A2 ; pull A[][] | ||
[BSZ] LDNDW *INP++,A1:A0 | ||
||[BSZ] SUB LEN,8,LEN | ||
||[BSZ] SUB BSZ,1,BSZ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And what's the reason for the large difference in indentation here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is typo. Space vs. tab. Note that if you look at it as file and not as diff, there won't be any irregularities. I've spotted one more such typo. Fix is pushed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose this is kind of a reference to jagged alignment in #3705. Reply there was "if you see it, look for something special." But it doesn't mean that all special things are marked with jagged alignment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, I learn...
crypto/sha/asm/keccak1600-c64x.pl
Outdated
||[A0] LDW *SP[1],A2 ; pull A[][] | ||
[BSZ] LDNDW *INP++,A1:A0 | ||
||[BSZ] SUB LEN,8,LEN | ||
||[BSZ] SUB BSZ,1,BSZ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, thanks!
[skip ci] Reviewed-by: Bernd Edlinger <bernd.edlinger@hotmail.de> (Merged from #3708)
Merged. Thanks. |
[skip ci]