Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fp: montgomery reduction #47

Open
8 of 11 tasks
mmcloughlin opened this issue Jul 12, 2019 · 4 comments
Open
8 of 11 tasks

fp: montgomery reduction #47

mmcloughlin opened this issue Jul 12, 2019 · 4 comments
Projects

Comments

@mmcloughlin
Copy link
Owner

mmcloughlin commented Jul 12, 2019

Code generator for modular arithmetic using Montgomery reduction.

  • mont encode
  • mont decode
  • cmov
  • add
  • sub
  • neg
  • mul
  • sqr
  • inv
  • invsqrt
  • modp

Example: crypto/elliptic P-256.

@mmcloughlin mmcloughlin added this to Todo in P-256 via automation Jul 12, 2019
mmcloughlin added a commit that referenced this issue Jul 13, 2019
Refactors the interface into a "two phase" initialization. A field can
be constructed as an object first, and then an avo context can be
provided later (creating a Builder). Now the avo context is a struct
field instead of provided as an argument to every function. The
motivation here is it will allow addtional state to be stored alongside
the context, for example initialized global data sections. This will be
useful for the implementation of Montgomery fields.

Updates #47
@mmcloughlin
Copy link
Owner Author

mmcloughlin commented Jul 13, 2019

Implementation

Addition

  • Add word-by-word with carries to get x
  • Compute x-p
  • CMOV to select x or x-p depending on whether x-p >= 0

https://github.com/golang/go/blob/05e77d41914d247a1e7caf37d7125ccaa5a53505/src/crypto/elliptic/p256_asm_amd64.s#L1685-L1704
https://github.com/cloudflare/circl/blob/a03c5a147111a46165b047f49053ec510d5582b4/ecc/p384/arith_amd64.s#L21

Multiplication

As a first pass we will implement this with

  • Full multi-precision multiply producing a "double" result.
  • ReduceDouble using multi-word Montgomery reduction.

The main loop of Montgomery reduction is as follows:

for i = 0 ... n-1 {
    (1) u_i = a_i * m' (mod b)
    (2) A += u_i * m * b^i
}

Here A = (a_i) is the number to be reduced, b = 2^64 is the base, m is the modulus and m' satisfies m' = -1/m (mod b). Therefore the reduction requires

  1. 64-bit multiply a_i * m'. This can be omitted in the "Montgomery friendly" case, where m' = 1 (mod b). This is true of the NIST primes for example.
  2. Multiply the modulus by a 64-bit value and accumulate into A. Since m is constant, there are opportunities to optimize this when it has special structure.

An improvement to come later is interleaving multiplication and reduction steps. One benefit of this simpler implementation for now is that the same reduction code can be used for squaring too.

mmcloughlin added a commit that referenced this issue Jul 13, 2019
@mmcloughlin
Copy link
Owner Author

Currently battling with register allocation failures. Just to make sure it doesn't get lost, here's an adhoc program I'm using to help debug the issue.

package main

import (
	"fmt"
	"log"

	"github.com/mmcloughlin/avo/ir"
	"github.com/mmcloughlin/avo/operand"
	"github.com/mmcloughlin/avo/pass"
	"github.com/mmcloughlin/avo/reg"
	"github.com/mmcloughlin/ec3/asm/fp/mont"
	"github.com/mmcloughlin/ec3/gen/fp"
	"github.com/mmcloughlin/ec3/prime"
)

func main() {
	cfg := fp.Config{
		Field:        mont.New(prime.NISTP256),
		InverseChain: nil,

		PackageName:     "p256",
		ElementTypeName: "Elt",
	}

	// Build asm functions.
	asm := fp.NewAsm(cfg)
	asm.Mul()
	ctx := asm.Context()

	f, err := ctx.Result()
	if err != nil {
		log.Fatal(err)
	}

	// Run compilation passes up to liveness.
	passes := pass.Concat(
		pass.FunctionPass(pass.LabelTarget),
		pass.FunctionPass(pass.CFG),
		pass.FunctionPass(pass.Liveness),
	)
	if err := passes.Execute(f); err != nil {
		log.Fatal(err)
	}

	// Inspect.
	debug(f)
}

func debug(f *ir.File) {
	for _, fn := range f.Functions() {
		function(fn)
	}
}

func function(fn *ir.Function) {
	fmt.Printf("function:\t%s\n", fn.Name)
	for _, inst := range fn.Instructions() {
		fmt.Printf("%s\n", inst.Opcode)

		fmt.Printf("\tinputs:")
		operands(inst.Inputs)
		fmt.Printf("\n")

		fmt.Printf("\toutputs:")
		operands(inst.Outputs)
		fmt.Printf("\n")

		fmt.Printf("\tlivein:")
		registers(inst.LiveIn)
		fmt.Printf("\n")

		fmt.Printf("\tliveout:")
		registers(inst.LiveIn)
		fmt.Printf("\n")
	}
}

func operands(ops []operand.Op) {
	for _, op := range ops {
		fmt.Printf(" %s", op.Asm())
	}
}

func registers(rs reg.Set) {
	for r := range rs {
		fmt.Printf(" %s", r.Asm())
	}
}

Related mmcloughlin/avo#6

@mmcloughlin
Copy link
Owner Author

mmcloughlin commented Jul 13, 2019

At first glance, I'm thinking we have a liveness analysis bug. The first instruction in Mul has an impossible number of live variables.

function:	Mul
MOVQ
	inputs: x+8(FP)
	outputs: <virtual:16:1:8>
	livein: 13
		 <virtual:56:1:8> SP <virtual:18:1:8> SB <virtual:60:1:8> <virtual:57:1:8> <virtual:61:1:8> <virtual:58:1:8> <virtual:59:1:8> <virtual:88:1:8> <virtual:79:1:8> <virtual:70:1:8> FP
	liveout: 14
		 <virtual:61:1:8> SP <virtual:79:1:8> <virtual:58:1:8> FP <virtual:57:1:8> <virtual:70:1:8> SB <virtual:18:1:8> <virtual:59:1:8> <virtual:60:1:8> <virtual:16:1:8> <virtual:56:1:8> <virtual:88:1:8>
...

Digging some more, it seems this variable is a zero:

...
071: XORQ
	inputs: <virtual:61:1:8> <virtual:61:1:8>
	outputs: <virtual:61:1:8>
...

The others? Well, that was me being dumb and reading from uninitialized registers. Therefore liveness analysis (correctly) marks them as live all the way to the beginning of the function.

mmcloughlin added a commit that referenced this issue Jul 14, 2019
mmcloughlin added a commit that referenced this issue Jul 14, 2019
@mmcloughlin mmcloughlin moved this from Todo to In Progress in P-256 Jul 15, 2019
mmcloughlin added a commit that referenced this issue Aug 12, 2019
Previously this was not correctly propogating carries, causing errors in
some edge cases.

Updates #55 #47
@mmcloughlin
Copy link
Owner Author

func DebugMontMul(t *testing.T, x, y *big.Int) {
	m := new(big.Int).Mul(x, y)
	acc := m
	for i := uint(0); i < 256; i += 64 {
		t.Logf("step %3d: acc = %0129x", i, acc)

		// Step 2.1: u_i = x_i * m' (mod b)
		u := new(big.Int).Rsh(acc, i)
		u.And(u, bigint.Ones(64))

		// Step 2.2: x += u_i * m * b^i
		u.Mul(u, p)
		u.Lsh(u, i)
		acc.Add(acc, u)
	}

	t.Logf("  reduced acc = %0129x", acc)

	acc.Rsh(acc, 256)
	t.Logf("    shift acc = %0129x", acc)

	t.Logf("        cmp p = %d", acc.Cmp(p))
	if acc.Cmp(p) >= 0 {
		acc.Sub(acc, p)
	}

	t.Logf("    final acc = %0129x", acc)
}

mmcloughlin added a commit that referenced this issue Aug 23, 2019
mmcloughlin added a commit that referenced this issue Aug 24, 2019
P-256 automation moved this from In Progress to Done Sep 6, 2019
@mmcloughlin mmcloughlin reopened this Sep 6, 2019
P-256 automation moved this from Done to Todo Sep 6, 2019
@mmcloughlin mmcloughlin moved this from Todo to In Progress in P-256 Sep 6, 2019
mmcloughlin added a commit that referenced this issue Sep 6, 2019
mmcloughlin added a commit that referenced this issue Sep 14, 2019
Modifies the field interface to automatically montgomery encode/decode
on the api boundary. This somewhat simplifies the layers above. Also
generates "raw" variants of these functions.

Updates #86 #47
mmcloughlin added a commit to mmcloughlin/addchain that referenced this issue Jan 31, 2020
mmcloughlin added a commit to mmcloughlin/addchain that referenced this issue Jan 31, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
P-256
  
In Progress
Development

No branches or pull requests

1 participant