Switch branches/tags
Nothing to show
Find file History
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Failed to load latest commit information.


Some cool MMIX code

Here are some things I wrote for Knuth’s MMIX computer.

Greatest Common Divisor

What a great algorithm! Here are two versions, roughly following the text. The first is one using integer division, the second uses the binary GCD algorithm to avoid these costly divisions.


The file gcd.mms.m4 is generated from the file you’re reading. It uses a m4 macro so you need that. Here’s a short preamble containing an error message for the tests later.

        LOC   Data_Segment
        GREG  @
ErrMsg  BYTE  "Error!",10,0

        LOC   #100

Simple version

This is plain vanilla GCD using integer divide. The divide instruction takes 60 cycles on MMIX, so below we’ll try to do better.

a       IS $0
b       IS $1
u       IS $0
v       IS $1
k       IS $2
t       IS $3

% Gcd using division
Gcd     PUT   rD,0
1H      DIVU  t,a,b
        SET   a,b
        GET   b,rR
        PBNZ  b,1B
        POP   1,0

Binary GCD, my version

This is binary GCD using as few branches as possible and no predicated instructions at all. Such instructions introduce dependencies that can make modern processors less efficient, and they seem to be on their way out in any case. (Newer ARM processors have no predicated instructions.)

This version appears to be a bit faster than the one below published in the MMIX supplement.

p       IS $4
q       IS $5
r       IS $6
GcdRsb  OR    t,u,v
        SUBU  k,t,1
        SADD  k,k,t
        SRU   u,u,k
        SRU   v,v,k

D1      ODIF  p,u,v
        ODIF  q,v,u
        ADDU  q,p,q  % q = abs(u-v)
        BZ    q,D2
        SUBU  v,u,p  % v = min(u,v)
        SUBU  p,q,1  % p = q - 1
        SADD  p,p,q  % p = k, 2^k | p
        SRU   u,q,p  % u = q >> k
        JMP   D1
D2      SLU   u,u,k
        POP   1,0

Martin Ruckert’s Binary GCD

This is from the MMIX supplement to TAoCP. The commented lines correspond to an interesting exercise in the book, and I used them in the program above.

GcdBin  SET   k,0
0H      OR    t,u,v
        % either this:
        PBOD  t,B2
        SR    u,u,1
        SR    v,v,1
        ADD   k,k,1
        JMP   0B
        % or this:
        % SUBU  k,t,1
        % SADD  k,k,t
        % SR    u,u,k
        % SR    v,v,k

B2      NEG   t,v
        PBOD  u,B4
        SET   t,u
B3      SR    t,t,1
B4      BEV   t,B3
        CSP   u,t,t
        NEG   t,t
        CSNN  v,t,t
        SUB   t,u,v
        PBNZ  t,B3
        SL    u,u,k
        POP   1,0

GCD Main program

This is a real simple duo, one subroutine to test that things work correctly and another to test performance.

gcd     IS $1
n       IS $2
m       IS $3
expected IS $0
tst     IS $5
ret     IS $6
arg1    IS $7
arg2    IS $8

nb      IS $0
consts  GREG @
        OCTA  #100  % nmax
        OCTA  #fe  % nb

Main    LDOU  n,consts,8
        LDOU  m,consts,8
        LDOU  nb,consts,16

oloop   SUBU  n,n,1
iloop   SUBU  m,m,1
        SET   arg1,n
        ADDU  arg1,arg1,nb
        SET   arg2,m
        ADDU  arg2,arg2,nb
        PUSHJ ret,GcdRsb
        BNZ   m,iloop
        LDOU  m,consts,8
        BNZ   n,oloop
        TRAP  0,Halt,0

% an m4 macro for computing gcd from constants
define(TST,`SETL  n,(($1)&#ffff)
        INCML n,(($1)&#ffff0000)>>16
        INCMH n,(($1)&#ffff00000000)>>32
        INCH  n,(($1)&#ffff000000000000)>>48
        SETL  m,(($2)&#ffff)
        INCML m,(($2)&#ffff0000)>>16
        INCMH m,(($2)&#ffff00000000)>>32
        INCH  m,(($2)&#ffff000000000000)>>48
        SETL  expected,(($3)&#ffff)
        INCML expected,(($3)&#ffff0000)>>16
        INCMH expected,(($3)&#ffff00000000)>>32
        PUSHJ gcd,GcdRsb
        CMPU  tst,gcd,expected
        BNZ   tst,Error')

        % do some actual tests for correctness
Main2    TST(12, 18, 6)
        TST(13, 13, 13)
        TST(2772, 3883, 11)
        TST(624129, 2061517, 18913)
        TST(96495, 54221, 919)
        TST(42792, 21396, 21396)
        TST(34153, 56252, 2009)
        TST(54068, 63723, 1931)
        TST(90820, 54970, 2390)
        TST(97450, 23388, 3898)
        TST(136091, 8603318, 5917)
        TST(6564742, 5941507, 41549)
        TST(3002720, 6082040, 15320)
        TST(2454387, 3590236, 7943)
        TST(6782760, 9312675, 3405)
        TST(2635261, 9158612, 43201)

        TRAP  0,Halt,0

Error   LDA   $255,ErrMsg
        TRAP  0,Fputs,StdOut
        TRAP  0,Halt,0


Here are some odds and ends.

Count trailing zeros

This counts the number of trailing zeros in a 64 bit register x. This method is not great on MMIX, where a combination of SUBU and SADD does the same thing. However, it might make sense on some other architectures and it’s interesting, so here it is.

SLU   x,shift,1
OR    shift,shift,x
SLU   x,shift,2
OR    shift,shift,x
SLU   x,shift,4
OR    shift,shift,x
SLU   x,shift,8
OR    shift,shift,x
SLU   x,shift,16
OR    shift,shift,x
SLU   x,shift,32
NOR   shift,shift,x
SADD  shift,shift,0