fp: montgomery reduction #47

mmcloughlin · 2019-07-12T02:01:35Z

Refactors the interface into a "two phase" initialization. A field can be constructed as an object first, and then an avo context can be provided later (creating a Builder). Now the avo context is a struct field instead of provided as an argument to every function. The motivation here is it will allow addtional state to be stored alongside the context, for example initialized global data sections. This will be useful for the implementation of Montgomery fields. Updates #47

mmcloughlin · 2019-07-13T00:50:38Z

Implementation

Addition

Add word-by-word with carries to get x
Compute x-p
CMOV to select x or x-p depending on whether x-p >= 0

https://github.com/golang/go/blob/05e77d41914d247a1e7caf37d7125ccaa5a53505/src/crypto/elliptic/p256_asm_amd64.s#L1685-L1704
https://github.com/cloudflare/circl/blob/a03c5a147111a46165b047f49053ec510d5582b4/ecc/p384/arith_amd64.s#L21

Multiplication

As a first pass we will implement this with

Full multi-precision multiply producing a "double" result.
ReduceDouble using multi-word Montgomery reduction.

The main loop of Montgomery reduction is as follows:

for i = 0 ... n-1 {
    (1) u_i = a_i * m' (mod b)
    (2) A += u_i * m * b^i
}

Here A = (a_i) is the number to be reduced, b = 2^64 is the base, m is the modulus and m' satisfies m' = -1/m (mod b). Therefore the reduction requires

64-bit multiply a_i * m'. This can be omitted in the "Montgomery friendly" case, where m' = 1 (mod b). This is true of the NIST primes for example.
Multiply the modulus by a 64-bit value and accumulate into A. Since m is constant, there are opportunities to optimize this when it has special structure.

An improvement to come later is interleaving multiplication and reduction steps. One benefit of this simpler implementation for now is that the same reduction code can be used for squaring too.

Updates #47

mmcloughlin · 2019-07-13T21:06:23Z

Currently battling with register allocation failures. Just to make sure it doesn't get lost, here's an adhoc program I'm using to help debug the issue.

package main

import (
	"fmt"
	"log"

	"github.com/mmcloughlin/avo/ir"
	"github.com/mmcloughlin/avo/operand"
	"github.com/mmcloughlin/avo/pass"
	"github.com/mmcloughlin/avo/reg"
	"github.com/mmcloughlin/ec3/asm/fp/mont"
	"github.com/mmcloughlin/ec3/gen/fp"
	"github.com/mmcloughlin/ec3/prime"
)

func main() {
	cfg := fp.Config{
		Field:        mont.New(prime.NISTP256),
		InverseChain: nil,

		PackageName:     "p256",
		ElementTypeName: "Elt",
	}

	// Build asm functions.
	asm := fp.NewAsm(cfg)
	asm.Mul()
	ctx := asm.Context()

	f, err := ctx.Result()
	if err != nil {
		log.Fatal(err)
	}

	// Run compilation passes up to liveness.
	passes := pass.Concat(
		pass.FunctionPass(pass.LabelTarget),
		pass.FunctionPass(pass.CFG),
		pass.FunctionPass(pass.Liveness),
	)
	if err := passes.Execute(f); err != nil {
		log.Fatal(err)
	}

	// Inspect.
	debug(f)
}

func debug(f *ir.File) {
	for _, fn := range f.Functions() {
		function(fn)
	}
}

func function(fn *ir.Function) {
	fmt.Printf("function:\t%s\n", fn.Name)
	for _, inst := range fn.Instructions() {
		fmt.Printf("%s\n", inst.Opcode)

		fmt.Printf("\tinputs:")
		operands(inst.Inputs)
		fmt.Printf("\n")

		fmt.Printf("\toutputs:")
		operands(inst.Outputs)
		fmt.Printf("\n")

		fmt.Printf("\tlivein:")
		registers(inst.LiveIn)
		fmt.Printf("\n")

		fmt.Printf("\tliveout:")
		registers(inst.LiveIn)
		fmt.Printf("\n")
	}
}

func operands(ops []operand.Op) {
	for _, op := range ops {
		fmt.Printf(" %s", op.Asm())
	}
}

func registers(rs reg.Set) {
	for r := range rs {
		fmt.Printf(" %s", r.Asm())
	}
}

Related mmcloughlin/avo#6

mmcloughlin · 2019-07-13T21:16:28Z

At first glance, I'm thinking we have a liveness analysis bug. The first instruction in Mul has an impossible number of live variables.

function:	Mul
MOVQ
	inputs: x+8(FP)
	outputs: <virtual:16:1:8>
	livein: 13
		 <virtual:56:1:8> SP <virtual:18:1:8> SB <virtual:60:1:8> <virtual:57:1:8> <virtual:61:1:8> <virtual:58:1:8> <virtual:59:1:8> <virtual:88:1:8> <virtual:79:1:8> <virtual:70:1:8> FP
	liveout: 14
		 <virtual:61:1:8> SP <virtual:79:1:8> <virtual:58:1:8> FP <virtual:57:1:8> <virtual:70:1:8> SB <virtual:18:1:8> <virtual:59:1:8> <virtual:60:1:8> <virtual:16:1:8> <virtual:56:1:8> <virtual:88:1:8>
...

Digging some more, it seems this variable is a zero:

...
071: XORQ
	inputs: <virtual:61:1:8> <virtual:61:1:8>
	outputs: <virtual:61:1:8>
...

The others? Well, that was me being dumb and reading from uninitialized registers. Therefore liveness analysis (correctly) marks them as live all the way to the beginning of the function.

Updates #47

Previously this was not correctly propogating carries, causing errors in some edge cases. Updates #55 #47

mmcloughlin · 2019-08-12T04:41:51Z

func DebugMontMul(t *testing.T, x, y *big.Int) {
	m := new(big.Int).Mul(x, y)
	acc := m
	for i := uint(0); i < 256; i += 64 {
		t.Logf("step %3d: acc = %0129x", i, acc)

		// Step 2.1: u_i = x_i * m' (mod b)
		u := new(big.Int).Rsh(acc, i)
		u.And(u, bigint.Ones(64))

		// Step 2.2: x += u_i * m * b^i
		u.Mul(u, p)
		u.Lsh(u, i)
		acc.Add(acc, u)
	}

	t.Logf("  reduced acc = %0129x", acc)

	acc.Rsh(acc, 256)
	t.Logf("    shift acc = %0129x", acc)

	t.Logf("        cmp p = %d", acc.Cmp(p))
	if acc.Cmp(p) >= 0 {
		acc.Sub(acc, p)
	}

	t.Logf("    final acc = %0129x", acc)
}

Updates #47

Modifies the field interface to automatically montgomery encode/decode on the api boundary. This somewhat simplifies the layers above. Also generates "raw" variants of these functions. Updates #86 #47

Updates mmcloughlin/ec3#47 Extracted-from: mmcloughlin/ec3@c5883e4

Updates mmcloughlin/ec3#47 Extracted-from: mmcloughlin/ec3@6fbf925

mmcloughlin added this to Todo in P-256 via automation Jul 12, 2019

mmcloughlin added a commit that referenced this issue Jul 13, 2019

asm/fp/mont: addition generation

2199c15

Updates #47

mmcloughlin added a commit that referenced this issue Jul 14, 2019

asm/fp: montgomery multiply

7687e1b

Updates #47

mmcloughlin mentioned this issue Jul 14, 2019

ec: curve implementations #29

Open

17 tasks

mmcloughlin added a commit that referenced this issue Jul 14, 2019

asm/fp: subtraction for montgomery fields

c5883e4

Updates #47

mmcloughlin mentioned this issue Jul 15, 2019

gen/ec: code generation for efd formulae #57

Open

mmcloughlin moved this from Todo to In Progress in P-256 Jul 15, 2019

mmcloughlin added a commit that referenced this issue Aug 12, 2019

asm/fp/mont: fix carry chain

bf77dfd

Previously this was not correctly propogating carries, causing errors in some edge cases. Updates #55 #47

mmcloughlin added a commit that referenced this issue Aug 23, 2019

gen/fp,asm/mp: cmov

9a0b4c2

Updates #47

mmcloughlin added a commit that referenced this issue Aug 24, 2019

gen/fp: negation

6fbf925

Updates #47

mmcloughlin closed this as completed Sep 6, 2019

P-256 automation moved this from In Progress to Done Sep 6, 2019

mmcloughlin reopened this Sep 6, 2019

P-256 automation moved this from Done to Todo Sep 6, 2019

mmcloughlin moved this from Todo to In Progress in P-256 Sep 6, 2019

mmcloughlin added a commit that referenced this issue Sep 6, 2019

gen/fp: SetIntEncode

298e8f0

Updates #47

mmcloughlin mentioned this issue Sep 14, 2019

gen: handling of montgomery encode/decode #86

Open

mmcloughlin added a commit to mmcloughlin/addchain that referenced this issue Jan 31, 2020

asm/fp: subtraction for montgomery fields

3634744

Updates mmcloughlin/ec3#47 Extracted-from: mmcloughlin/ec3@c5883e4

mmcloughlin added a commit to mmcloughlin/addchain that referenced this issue Jan 31, 2020

gen/fp: negation

7dc0fa9

Updates mmcloughlin/ec3#47 Extracted-from: mmcloughlin/ec3@6fbf925

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fp: montgomery reduction #47

fp: montgomery reduction #47

mmcloughlin commented Jul 12, 2019 •

edited

Loading

mmcloughlin commented Jul 13, 2019 •

edited

Loading

mmcloughlin commented Jul 13, 2019

mmcloughlin commented Jul 13, 2019 •

edited

Loading

mmcloughlin commented Aug 12, 2019

fp: montgomery reduction #47

fp: montgomery reduction #47

Comments

mmcloughlin commented Jul 12, 2019 • edited Loading

mmcloughlin commented Jul 13, 2019 • edited Loading

Implementation

Addition

Multiplication

mmcloughlin commented Jul 13, 2019

mmcloughlin commented Jul 13, 2019 • edited Loading

mmcloughlin commented Aug 12, 2019

mmcloughlin commented Jul 12, 2019 •

edited

Loading

mmcloughlin commented Jul 13, 2019 •

edited

Loading

mmcloughlin commented Jul 13, 2019 •

edited

Loading