Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bedrock2 extraction tracker issue #524

Closed
3 tasks done
andres-erbsen opened this issue Feb 18, 2019 · 26 comments
Closed
3 tasks done

bedrock2 extraction tracker issue #524

andres-erbsen opened this issue Feb 18, 2019 · 26 comments

Comments

@andres-erbsen
Copy link
Contributor

andres-erbsen commented Feb 18, 2019

It would be cool to generate bedrock2 code from fiat-crypto. I think it might go roughly as follows:

  • generate fiat-crypto low-level AST with only word sizes 32 and 64
  • break each 64-bit word into two 32-bit words (e.g., using mul_split)
  • translate that AST to bedrock2.Syntax.cmd
@JasonGross
Copy link
Collaborator

The second pass should be possible as a rewriter pass. The third pass probably wants something like CStringification / the translation to fancy

@andres-erbsen
Copy link
Contributor Author

I totally forgot about to today, but I'd like to brainstorm how to do the second pass (perhaps using the new and glorious lemma reification interface once it is working enough).

@JasonGross
Copy link
Collaborator

We probably need a special identity identifier which is inline, and then we have definitions and lemmas like

Definition assemble (x y : Z) : Z := ((x << 32) + y).
Definition break (x : Z) : Z * Z := (ident.cast word32 (x >> 32), ident.cast word32 (x mod 2^32)).

ident.cast word64 x = dlet yz := break x in inline (assemble (ident.cast word32 (fst yz)) (ident.cast word32 (snd yz))
break (inline (assemble (ident.cast word32 x) (ident.cast word32 y))) = (ident.cast word32 x, ident.cast word32 y)
(inline (assemble (ident.cast word32 x1) (ident.cast word32 x2)) + inline (assemble (ident.cast word32 y1) (ident.cast word32 y2)) = stuff involving addc

Basically, the idea I have is that we first replace all of the things known to be word64 with assemble (break x), and then we push break inwards until it hits an assemble which it then cancels with.

@JasonGross
Copy link
Collaborator

Note that there is some work-in-progress on this at master...JasonGross:split-mul

It is mostly untested, and I probably won't get much of a chance to work on it before the end of the summer.

@JasonGross
Copy link
Collaborator

I have now put a bit more work into it and tested it, and it seems to mostly work well. For 64-bit 25519 mulmod, we get code like

(* /*
* Input Bounds:
* arg1: [[0x0 ~> 0x1a666666666664], [0x0 ~> 0x1a666666666664], [0x0 ~> 0x1a666666666664], [0x0 ~> 0x1a666666666664], [0x0 ~> 0x1a666666666664]]
* arg2: [[0x0 ~> 0x1a666666666664], [0x0 ~> 0x1a666666666664], [0x0 ~> 0x1a666666666664], [0x0 ~> 0x1a666666666664], [0x0 ~> 0x1a666666666664]]
* Output Bounds:
* out1: [[0x0 ~> 0x8cccccccccccc], [0x0 ~> 0x8cccccccccccc], [0x0 ~> 0x8cccccccccccc], [0x0 ~> 0x8cccccccccccc], [0x0 ~> 0x8cccccccccccc]]
*/
static void mul(uint64_t out1[5], const uint64_t arg1[5], const uint64_t arg2[5]) {
uint64_t x1;
uint64_t x2;
mulx_u64(&x1, &x2, (arg1[4]), ((arg2[4]) * (uint64_t)UINT8_C(0x13)));
uint64_t x3;
uint64_t x4;
mulx_u64(&x3, &x4, (arg1[4]), ((arg2[3]) * (uint64_t)UINT8_C(0x13)));
uint64_t x5;
uint64_t x6;
mulx_u64(&x5, &x6, (arg1[4]), ((arg2[2]) * (uint64_t)UINT8_C(0x13)));
uint64_t x7;
uint64_t x8;
mulx_u64(&x7, &x8, (arg1[4]), ((arg2[1]) * (uint64_t)UINT8_C(0x13)));
uint64_t x9;
uint64_t x10;
mulx_u64(&x9, &x10, (arg1[3]), ((arg2[4]) * (uint64_t)UINT8_C(0x13)));
uint64_t x11;
uint64_t x12;
mulx_u64(&x11, &x12, (arg1[3]), ((arg2[3]) * (uint64_t)UINT8_C(0x13)));
uint64_t x13;
uint64_t x14;
mulx_u64(&x13, &x14, (arg1[3]), ((arg2[2]) * (uint64_t)UINT8_C(0x13)));
uint64_t x15;
uint64_t x16;
mulx_u64(&x15, &x16, (arg1[2]), ((arg2[4]) * (uint64_t)UINT8_C(0x13)));
uint64_t x17;
uint64_t x18;
mulx_u64(&x17, &x18, (arg1[2]), ((arg2[3]) * (uint64_t)UINT8_C(0x13)));
uint64_t x19;
uint64_t x20;
mulx_u64(&x19, &x20, (arg1[1]), ((arg2[4]) * (uint64_t)UINT8_C(0x13)));
uint64_t x21;
uint64_t x22;
mulx_u64(&x21, &x22, (arg1[4]), (arg2[0]));
uint64_t x23;
uint64_t x24;
mulx_u64(&x23, &x24, (arg1[3]), (arg2[1]));
uint64_t x25;
uint64_t x26;
mulx_u64(&x25, &x26, (arg1[3]), (arg2[0]));
uint64_t x27;
uint64_t x28;
mulx_u64(&x27, &x28, (arg1[2]), (arg2[2]));
uint64_t x29;
uint64_t x30;
mulx_u64(&x29, &x30, (arg1[2]), (arg2[1]));
uint64_t x31;
uint64_t x32;
mulx_u64(&x31, &x32, (arg1[2]), (arg2[0]));
uint64_t x33;
uint64_t x34;
mulx_u64(&x33, &x34, (arg1[1]), (arg2[3]));
uint64_t x35;
uint64_t x36;
mulx_u64(&x35, &x36, (arg1[1]), (arg2[2]));
uint64_t x37;
uint64_t x38;
mulx_u64(&x37, &x38, (arg1[1]), (arg2[1]));
uint64_t x39;
uint64_t x40;
mulx_u64(&x39, &x40, (arg1[1]), (arg2[0]));
uint64_t x41;
uint64_t x42;
mulx_u64(&x41, &x42, (arg1[0]), (arg2[4]));
uint64_t x43;
uint64_t x44;
mulx_u64(&x43, &x44, (arg1[0]), (arg2[3]));
uint64_t x45;
uint64_t x46;
mulx_u64(&x45, &x46, (arg1[0]), (arg2[2]));
uint64_t x47;
uint64_t x48;
mulx_u64(&x47, &x48, (arg1[0]), (arg2[1]));
uint64_t x49;
uint64_t x50;
mulx_u64(&x49, &x50, (arg1[0]), (arg2[0]));
uint64_t x51;
uint1 x52;
addcarryx_u64(&x51, &x52, 0x0, x13, x7);
uint64_t x53;
uint1 x54;
addcarryx_u64(&x53, &x54, x52, x14, x8);
uint64_t x55;
uint1 x56;
addcarryx_u64(&x55, &x56, 0x0, x17, x51);
uint64_t x57;
uint1 x58;
addcarryx_u64(&x57, &x58, x56, x18, x53);
uint64_t x59;
uint1 x60;
addcarryx_u64(&x59, &x60, 0x0, x19, x55);
uint64_t x61;
uint1 x62;
addcarryx_u64(&x61, &x62, x60, x20, x57);
uint64_t x63;
uint1 x64;
addcarryx_u64(&x63, &x64, 0x0, x49, x59);
uint64_t x65;
uint1 x66;
addcarryx_u64(&x65, &x66, x64, x50, x61);
uint64_t x67 = ((x63 >> 51) | ((uint64_t)((uint128)x65 << 13) & UINT64_C(0xffffffffffffffff)));
uint64_t x68 = (x63 & UINT64_C(0x7ffffffffffff));
uint64_t x69;
uint1 x70;
addcarryx_u64(&x69, &x70, 0x0, x23, x21);
uint64_t x71;
uint1 x72;
addcarryx_u64(&x71, &x72, x70, x24, x22);
uint64_t x73;
uint1 x74;
addcarryx_u64(&x73, &x74, 0x0, x27, x69);
uint64_t x75;
uint1 x76;
addcarryx_u64(&x75, &x76, x74, x28, x71);
uint64_t x77;
uint1 x78;
addcarryx_u64(&x77, &x78, 0x0, x33, x73);
uint64_t x79;
uint1 x80;
addcarryx_u64(&x79, &x80, x78, x34, x75);
uint64_t x81;
uint1 x82;
addcarryx_u64(&x81, &x82, 0x0, x41, x77);
uint64_t x83;
uint1 x84;
addcarryx_u64(&x83, &x84, x82, x42, x79);
uint64_t x85;
uint1 x86;
addcarryx_u64(&x85, &x86, 0x0, x25, x1);
uint64_t x87;
uint1 x88;
addcarryx_u64(&x87, &x88, x86, x26, x2);
uint64_t x89;
uint1 x90;
addcarryx_u64(&x89, &x90, 0x0, x29, x85);
uint64_t x91;
uint1 x92;
addcarryx_u64(&x91, &x92, x90, x30, x87);
uint64_t x93;
uint1 x94;
addcarryx_u64(&x93, &x94, 0x0, x35, x89);
uint64_t x95;
uint1 x96;
addcarryx_u64(&x95, &x96, x94, x36, x91);
uint64_t x97;
uint1 x98;
addcarryx_u64(&x97, &x98, 0x0, x43, x93);
uint64_t x99;
uint1 x100;
addcarryx_u64(&x99, &x100, x98, x44, x95);
uint64_t x101;
uint1 x102;
addcarryx_u64(&x101, &x102, 0x0, x9, x3);
uint64_t x103;
uint1 x104;
addcarryx_u64(&x103, &x104, x102, x10, x4);
uint64_t x105;
uint1 x106;
addcarryx_u64(&x105, &x106, 0x0, x31, x101);
uint64_t x107;
uint1 x108;
addcarryx_u64(&x107, &x108, x106, x32, x103);
uint64_t x109;
uint1 x110;
addcarryx_u64(&x109, &x110, 0x0, x37, x105);
uint64_t x111;
uint1 x112;
addcarryx_u64(&x111, &x112, x110, x38, x107);
uint64_t x113;
uint1 x114;
addcarryx_u64(&x113, &x114, 0x0, x45, x109);
uint64_t x115;
uint1 x116;
addcarryx_u64(&x115, &x116, x114, x46, x111);
uint64_t x117;
uint1 x118;
addcarryx_u64(&x117, &x118, 0x0, x11, x5);
uint64_t x119;
uint1 x120;
addcarryx_u64(&x119, &x120, x118, x12, x6);
uint64_t x121;
uint1 x122;
addcarryx_u64(&x121, &x122, 0x0, x15, x117);
uint64_t x123;
uint1 x124;
addcarryx_u64(&x123, &x124, x122, x16, x119);
uint64_t x125;
uint1 x126;
addcarryx_u64(&x125, &x126, 0x0, x39, x121);
uint64_t x127;
uint1 x128;
addcarryx_u64(&x127, &x128, x126, x40, x123);
uint64_t x129;
uint1 x130;
addcarryx_u64(&x129, &x130, 0x0, x47, x125);
uint64_t x131;
uint1 x132;
addcarryx_u64(&x131, &x132, x130, x48, x127);
uint64_t x133;
uint1 x134;
addcarryx_u64(&x133, &x134, 0x0, x67, x129);
uint64_t x135 = (x134 + x131);
uint64_t x136 = ((x133 >> 51) | ((uint64_t)((uint128)x135 << 13) & UINT64_C(0xffffffffffffffff)));
uint64_t x137 = (x133 & UINT64_C(0x7ffffffffffff));
uint64_t x138;
uint1 x139;
addcarryx_u64(&x138, &x139, 0x0, x136, x113);
uint64_t x140 = (x139 + x115);
uint64_t x141 = ((x138 >> 51) | ((uint64_t)((uint128)x140 << 13) & UINT64_C(0xffffffffffffffff)));
uint64_t x142 = (x138 & UINT64_C(0x7ffffffffffff));
uint64_t x143;
uint1 x144;
addcarryx_u64(&x143, &x144, 0x0, x141, x97);
uint64_t x145 = (x144 + x99);
uint64_t x146 = ((x143 >> 51) | ((uint64_t)((uint128)x145 << 13) & UINT64_C(0xffffffffffffffff)));
uint64_t x147 = (x143 & UINT64_C(0x7ffffffffffff));
uint64_t x148;
uint1 x149;
addcarryx_u64(&x148, &x149, 0x0, x146, x81);
uint64_t x150 = (x149 + x83);
uint64_t x151 = ((x148 >> 51) | ((uint64_t)((uint128)x150 << 13) & UINT64_C(0xffffffffffffffff)));
uint64_t x152 = (x148 & UINT64_C(0x7ffffffffffff));
uint64_t x153 = (x151 * (uint64_t)UINT8_C(0x13));
uint64_t x154 = (x68 + x153);
uint64_t x155 = (x154 >> 51);
uint64_t x156 = (x154 & UINT64_C(0x7ffffffffffff));
uint64_t x157 = (x155 + x137);
uint64_t x158 = (x157 >> 51);
uint64_t x159 = (x157 & UINT64_C(0x7ffffffffffff));
uint64_t x160 = (x158 + x142);
out1[0] = x156;
out1[1] = x159;
out1[2] = x160;
out1[3] = x147;
out1[4] = x152;
}

The only remaining uint128 casts are ones that are immediately followed by a truncating & to 64-bits; we could hard-code an operation which is "unsigned truncating left shift" and use that instead. So I think this pass is basically a success.

@andres-erbsen
Copy link
Contributor Author

this looks awesome. could we just a rewrite rule that moves the uint128->uint64 conversion inside the &, and then inside the <<, and then cancel the uint64->uint128 conversation?

@JasonGross
Copy link
Collaborator

could we just a rewrite rule that moves the uint128->uint64 conversion

You're asking for something like (uint64_t)((uint64_t)x150 << 13)? That's unsound, because casts don't truncate in the source language, they produce unspecified behavior when the number is out of bounds.

Note that the >> | << & construct comes directly from the rewrite rule, so we can make it whatever we want as long as it's sound.
We could add an explicitly truncating cast and use that, or use mod and recognize it when translating to bedrock2?

@andres-erbsen
Copy link
Contributor Author

andres-erbsen commented May 18, 2019 via email

@JasonGross
Copy link
Collaborator

This entire pass already runs after bounds analysis (which doesn't guarantee boundedness, only that the overall behavior is independent of what you choose to do when something is out of bounds. Presumably we should fix this oversight some day.)

Do you want to see an explicit truncating cast, or are you fine with just recognizing mod 2^k?

@andres-erbsen
Copy link
Contributor Author

just to state it explicitly: either option is fine by me

@JasonGross
Copy link
Collaborator

JasonGross commented May 23, 2019

Let's do mod 2^k, then, so I don't need to add more identifiers. The current CStringification doesn't know how to print that; do you want it to be able to (are you planning to run after the existing C common type inference pass, or to split off before it?)

@andres-erbsen
Copy link
Contributor Author

the common type pass is not verified, right? I think I should branch off before it. also whatever I do well only work if there is only one integer type remaining in the code.

@JasonGross
Copy link
Collaborator

the common type pass is not verified, right?

That's correct

also whatever I do well only work if there is only one integer type remaining in the code.

So you want the carry bits to be stored in uint64_t? We can just tell bounds relaxation to not support 1-bit values. Then we can expect code to end up with only one integer type.

@JasonGross
Copy link
Collaborator

Actually, hm, I think we might want to add an explicitly truncating left-shift, which we can just define as truncating_shiftl bw x n := (x << n) mod 2^bw. Otherwise I don't see how to get rid of the 128-bit cast (though you can drop it and prove that you're allowed to in your pass, of course, though that seems needlessly painful)

@andres-erbsen
Copy link
Contributor Author

I'm currently looking at a 32bit target, but yes, even carry bits in the same type as other integers. truncating shift sounds fine as well.

@JasonGross
Copy link
Collaborator

I will soon have a PR that adds truncating_shiftl, and shortly after that plan to rebase the branch I mentioned above and integrate the truncating_shiftl operation.

JasonGross added a commit to JasonGross/fiat-crypto that referenced this issue Sep 5, 2019
See mit-plv#524 for discussion and justification.
JasonGross added a commit to JasonGross/fiat-crypto that referenced this issue Sep 5, 2019
See mit-plv#524 for discussion and justification.
JasonGross added a commit to JasonGross/fiat-crypto that referenced this issue Sep 5, 2019
See mit-plv#524 for discussion and justification.
JasonGross added a commit to JasonGross/fiat-crypto that referenced this issue Sep 5, 2019
See mit-plv#524 for discussion and justification.
JasonGross added a commit that referenced this issue Sep 6, 2019
See #524 for discussion and justification.
@JasonGross
Copy link
Collaborator

I have mostly gotten the rewrite to truncating_shiftl to work. There is still a strange issue with uint1 showing up, and a potentially related issue that I haven't managed to track down where making the carry bits be uint64 breaks bounds extraction of the output bounds....

Anyway, the current code (master...JasonGross:split-mul) is

(* /*
* Input Bounds:
* arg1: [[0x0 ~> 0x1a666666666664], [0x0 ~> 0x1a666666666664], [0x0 ~> 0x1a666666666664], [0x0 ~> 0x1a666666666664], [0x0 ~> 0x1a666666666664]]
* arg2: [[0x0 ~> 0x1a666666666664], [0x0 ~> 0x1a666666666664], [0x0 ~> 0x1a666666666664], [0x0 ~> 0x1a666666666664], [0x0 ~> 0x1a666666666664]]
* Output Bounds:
* out1: [[0x0 ~> 0x8cccccccccccc], [0x0 ~> 0x8cccccccccccc], [0x0 ~> 0x8cccccccccccc], [0x0 ~> 0x8cccccccccccc], [0x0 ~> 0x8cccccccccccc]]
*/
static void mul(uint64_t out1[5], const uint64_t arg1[5], const uint64_t arg2[5]) {
uint64_t x1;
uint64_t x2;
mulx_u64(&x1, &x2, (arg1[4]), ((arg2[4]) * (uint64_t)UINT8_C(0x13)));
uint64_t x3;
uint64_t x4;
mulx_u64(&x3, &x4, (arg1[4]), ((arg2[3]) * (uint64_t)UINT8_C(0x13)));
uint64_t x5;
uint64_t x6;
mulx_u64(&x5, &x6, (arg1[4]), ((arg2[2]) * (uint64_t)UINT8_C(0x13)));
uint64_t x7;
uint64_t x8;
mulx_u64(&x7, &x8, (arg1[4]), ((arg2[1]) * (uint64_t)UINT8_C(0x13)));
uint64_t x9;
uint64_t x10;
mulx_u64(&x9, &x10, (arg1[3]), ((arg2[4]) * (uint64_t)UINT8_C(0x13)));
uint64_t x11;
uint64_t x12;
mulx_u64(&x11, &x12, (arg1[3]), ((arg2[3]) * (uint64_t)UINT8_C(0x13)));
uint64_t x13;
uint64_t x14;
mulx_u64(&x13, &x14, (arg1[3]), ((arg2[2]) * (uint64_t)UINT8_C(0x13)));
uint64_t x15;
uint64_t x16;
mulx_u64(&x15, &x16, (arg1[2]), ((arg2[4]) * (uint64_t)UINT8_C(0x13)));
uint64_t x17;
uint64_t x18;
mulx_u64(&x17, &x18, (arg1[2]), ((arg2[3]) * (uint64_t)UINT8_C(0x13)));
uint64_t x19;
uint64_t x20;
mulx_u64(&x19, &x20, (arg1[1]), ((arg2[4]) * (uint64_t)UINT8_C(0x13)));
uint64_t x21;
uint64_t x22;
mulx_u64(&x21, &x22, (arg1[4]), (arg2[0]));
uint64_t x23;
uint64_t x24;
mulx_u64(&x23, &x24, (arg1[3]), (arg2[1]));
uint64_t x25;
uint64_t x26;
mulx_u64(&x25, &x26, (arg1[3]), (arg2[0]));
uint64_t x27;
uint64_t x28;
mulx_u64(&x27, &x28, (arg1[2]), (arg2[2]));
uint64_t x29;
uint64_t x30;
mulx_u64(&x29, &x30, (arg1[2]), (arg2[1]));
uint64_t x31;
uint64_t x32;
mulx_u64(&x31, &x32, (arg1[2]), (arg2[0]));
uint64_t x33;
uint64_t x34;
mulx_u64(&x33, &x34, (arg1[1]), (arg2[3]));
uint64_t x35;
uint64_t x36;
mulx_u64(&x35, &x36, (arg1[1]), (arg2[2]));
uint64_t x37;
uint64_t x38;
mulx_u64(&x37, &x38, (arg1[1]), (arg2[1]));
uint64_t x39;
uint64_t x40;
mulx_u64(&x39, &x40, (arg1[1]), (arg2[0]));
uint64_t x41;
uint64_t x42;
mulx_u64(&x41, &x42, (arg1[0]), (arg2[4]));
uint64_t x43;
uint64_t x44;
mulx_u64(&x43, &x44, (arg1[0]), (arg2[3]));
uint64_t x45;
uint64_t x46;
mulx_u64(&x45, &x46, (arg1[0]), (arg2[2]));
uint64_t x47;
uint64_t x48;
mulx_u64(&x47, &x48, (arg1[0]), (arg2[1]));
uint64_t x49;
uint64_t x50;
mulx_u64(&x49, &x50, (arg1[0]), (arg2[0]));
uint64_t x51;
uint1 x52;
addcarryx_u64(&x51, &x52, 0x0, x13, x7);
uint64_t x53;
uint1 x54;
addcarryx_u64(&x53, &x54, x52, x14, x8);
uint64_t x55;
uint1 x56;
addcarryx_u64(&x55, &x56, 0x0, x17, x51);
uint64_t x57;
uint1 x58;
addcarryx_u64(&x57, &x58, x56, x18, x53);
uint64_t x59;
uint1 x60;
addcarryx_u64(&x59, &x60, 0x0, x19, x55);
uint64_t x61;
uint1 x62;
addcarryx_u64(&x61, &x62, x60, x20, x57);
uint64_t x63;
uint1 x64;
addcarryx_u64(&x63, &x64, 0x0, x49, x59);
uint64_t x65;
uint1 x66;
addcarryx_u64(&x65, &x66, x64, x50, x61);
uint64_t x67 = ((x63 >> 51) | ((x65 << 13) & UINT64_C(0xffffffffffffffff)));
uint64_t x68 = (x63 & UINT64_C(0x7ffffffffffff));
uint64_t x69;
uint1 x70;
addcarryx_u64(&x69, &x70, 0x0, x23, x21);
uint64_t x71;
uint1 x72;
addcarryx_u64(&x71, &x72, x70, x24, x22);
uint64_t x73;
uint1 x74;
addcarryx_u64(&x73, &x74, 0x0, x27, x69);
uint64_t x75;
uint1 x76;
addcarryx_u64(&x75, &x76, x74, x28, x71);
uint64_t x77;
uint1 x78;
addcarryx_u64(&x77, &x78, 0x0, x33, x73);
uint64_t x79;
uint1 x80;
addcarryx_u64(&x79, &x80, x78, x34, x75);
uint64_t x81;
uint1 x82;
addcarryx_u64(&x81, &x82, 0x0, x41, x77);
uint64_t x83;
uint1 x84;
addcarryx_u64(&x83, &x84, x82, x42, x79);
uint64_t x85;
uint1 x86;
addcarryx_u64(&x85, &x86, 0x0, x25, x1);
uint64_t x87;
uint1 x88;
addcarryx_u64(&x87, &x88, x86, x26, x2);
uint64_t x89;
uint1 x90;
addcarryx_u64(&x89, &x90, 0x0, x29, x85);
uint64_t x91;
uint1 x92;
addcarryx_u64(&x91, &x92, x90, x30, x87);
uint64_t x93;
uint1 x94;
addcarryx_u64(&x93, &x94, 0x0, x35, x89);
uint64_t x95;
uint1 x96;
addcarryx_u64(&x95, &x96, x94, x36, x91);
uint64_t x97;
uint1 x98;
addcarryx_u64(&x97, &x98, 0x0, x43, x93);
uint64_t x99;
uint1 x100;
addcarryx_u64(&x99, &x100, x98, x44, x95);
uint64_t x101;
uint1 x102;
addcarryx_u64(&x101, &x102, 0x0, x9, x3);
uint64_t x103;
uint1 x104;
addcarryx_u64(&x103, &x104, x102, x10, x4);
uint64_t x105;
uint1 x106;
addcarryx_u64(&x105, &x106, 0x0, x31, x101);
uint64_t x107;
uint1 x108;
addcarryx_u64(&x107, &x108, x106, x32, x103);
uint64_t x109;
uint1 x110;
addcarryx_u64(&x109, &x110, 0x0, x37, x105);
uint64_t x111;
uint1 x112;
addcarryx_u64(&x111, &x112, x110, x38, x107);
uint64_t x113;
uint1 x114;
addcarryx_u64(&x113, &x114, 0x0, x45, x109);
uint64_t x115;
uint1 x116;
addcarryx_u64(&x115, &x116, x114, x46, x111);
uint64_t x117;
uint1 x118;
addcarryx_u64(&x117, &x118, 0x0, x11, x5);
uint64_t x119;
uint1 x120;
addcarryx_u64(&x119, &x120, x118, x12, x6);
uint64_t x121;
uint1 x122;
addcarryx_u64(&x121, &x122, 0x0, x15, x117);
uint64_t x123;
uint1 x124;
addcarryx_u64(&x123, &x124, x122, x16, x119);
uint64_t x125;
uint1 x126;
addcarryx_u64(&x125, &x126, 0x0, x39, x121);
uint64_t x127;
uint1 x128;
addcarryx_u64(&x127, &x128, x126, x40, x123);
uint64_t x129;
uint1 x130;
addcarryx_u64(&x129, &x130, 0x0, x47, x125);
uint64_t x131;
uint1 x132;
addcarryx_u64(&x131, &x132, x130, x48, x127);
uint64_t x133;
uint1 x134;
addcarryx_u64(&x133, &x134, 0x0, x67, x129);
uint64_t x135 = (x134 + x131);
uint64_t x136 = ((x133 >> 51) | ((x135 << 13) & UINT64_C(0xffffffffffffffff)));
uint64_t x137 = (x133 & UINT64_C(0x7ffffffffffff));
uint64_t x138;
uint1 x139;
addcarryx_u64(&x138, &x139, 0x0, x136, x113);
uint64_t x140 = (x139 + x115);
uint64_t x141 = ((x138 >> 51) | ((x140 << 13) & UINT64_C(0xffffffffffffffff)));
uint64_t x142 = (x138 & UINT64_C(0x7ffffffffffff));
uint64_t x143;
uint1 x144;
addcarryx_u64(&x143, &x144, 0x0, x141, x97);
uint64_t x145 = (x144 + x99);
uint64_t x146 = ((x143 >> 51) | ((x145 << 13) & UINT64_C(0xffffffffffffffff)));
uint64_t x147 = (x143 & UINT64_C(0x7ffffffffffff));
uint64_t x148;
uint1 x149;
addcarryx_u64(&x148, &x149, 0x0, x146, x81);
uint64_t x150 = (x149 + x83);
uint64_t x151 = ((x148 >> 51) | ((x150 << 13) & UINT64_C(0xffffffffffffffff)));
uint64_t x152 = (x148 & UINT64_C(0x7ffffffffffff));
uint64_t x153 = (x151 * (uint64_t)UINT8_C(0x13));
uint64_t x154 = (x68 + x153);
uint64_t x155 = (x154 >> 51);
uint64_t x156 = (x154 & UINT64_C(0x7ffffffffffff));
uint64_t x157 = (x155 + x137);
uint64_t x158 = (x157 >> 51);
uint64_t x159 = (x157 & UINT64_C(0x7ffffffffffff));
uint64_t x160 = (x158 + x142);
out1[0] = x156;
out1[1] = x159;
out1[2] = x160;
out1[3] = x147;
out1[4] = x152;
}

Modulo the uint1s, how does this look to you @andres-erbsen ?

@JasonGross
Copy link
Collaborator

I have tracked down the issue with uint1, but fixing it seems to require some design that I'd like your input on @andres-erbsen . There is a bounds check that ensures that we are passing the correct integer size to our adc/sbb functions, which happens during conversion to C:

bounds_check do_bounds_check "second return value of" idc 1 (* boolean carry/borrow *) e2v (snd rout);

bounds_check do_bounds_check "second (carry) return value of" idc 1 (* boolean carry/borrow *) e2v (snd rout));

Currently, we error if you try to store the carry in anything other than a uint1. I can parameterize this part of the pipeline on an argument (I'm tempted to make it a typeclass) controlling the width of the carry argument, and then pipe that through from the higher levels. But it's starting to feel a lot like spaghetti code, and I'd appreciate sitting down with you sometime and figuring out how we should structure the options related to this pass.

Anyway, other than this one issue, I believe the code is ready to have work done on translating from the IR we target (for C and Rust) to Bedrock.

JasonGross added a commit to JasonGross/fiat-crypto that referenced this issue Sep 7, 2019
See
mit-plv#524 (comment)
for discussion of remaining issues
JasonGross added a commit to JasonGross/fiat-crypto that referenced this issue Sep 9, 2019
See
mit-plv#524 (comment)
for discussion of remaining issues
@andres-erbsen
Copy link
Contributor Author

We should be bottoming out in phoas, not the C "IR".

JasonGross added a commit to JasonGross/fiat-crypto that referenced this issue Sep 10, 2019
This adds a rewriting pass (disabled by default) which can split apart
multiplications to emit code in a single integer type.

Note that we can emit PHOAS code with only a single integer type, but to
emit C code, we must also allow uint1; see
mit-plv#524 (comment)
for more details.

We leave it like this because the bedrock2 translation will be coming
from PHOAS.

Note that we must still permit wider integer types during bounds
analysis, because otherwise we'll fail to assign bounds before we even
get to the point of splitting multipliciations.

Note that this rewriting pass is still admitted, so this commit adds a
dependency on an axiom to the overall pass.  (We could create a separate
pipeline if we wanted the code we're extracting to not depend on this
admitted proof.)

The interesting files to look at here are `src/RewriterRules.v`
(defining the rewriting rules) and `src/SlowPrimeSynthesisExamples.v`
giving examples.  The admitted proof is in `src/RewriterRulesProofs.v`.
JasonGross added a commit that referenced this issue Sep 13, 2019
This adds a rewriting pass (disabled by default) which can split apart
multiplications to emit code in a single integer type.

Note that we can emit PHOAS code with only a single integer type, but to
emit C code, we must also allow uint1; see
#524 (comment)
for more details.

We leave it like this because the bedrock2 translation will be coming
from PHOAS.

Note that we must still permit wider integer types during bounds
analysis, because otherwise we'll fail to assign bounds before we even
get to the point of splitting multipliciations.

Note that this rewriting pass is still admitted, so this commit adds a
dependency on an axiom to the overall pass.  (We could create a separate
pipeline if we wanted the code we're extracting to not depend on this
admitted proof.)

The interesting files to look at here are `src/RewriterRules.v`
(defining the rewriting rules) and `src/SlowPrimeSynthesisExamples.v`
giving examples.  The admitted proof is in `src/RewriterRulesProofs.v`.
@jadephilipoom
Copy link
Collaborator

Some work on translating to bedrock2 in the bedrock-integration1 branch. I got it to work on Curve25519 64-bit, but 32-bit fails because in some places the code uses combine_at_bitwidth to make two 32-bit words a single 64-bit word [0]. Can we not do that? Am I instantiating something incorrectly here?

[0] https://github.com/mit-plv/fiat-crypto/blob/bedrock-integration1/src/Bedrock/CompilerTest.v#L1702

@JasonGross
Copy link
Collaborator

The issue is that for some reason we are not picking up the pattern that is supposed to remove combine_at_bitwidth for right-shift. I suspect this is something to do with an inconsistency in where we prefix literals with casts and where we don't; I'll look into it when I get a chance. Feel free to poke me again if I haven't said anything else here by the end of Monday.

@jadephilipoom
Copy link
Collaborator

I thought some of the more recent changes were supposed to fix this issue, but it's still there, so poke @JasonGross

@JasonGross
Copy link
Collaborator

Ah, we did make more progress this time, now there is a different issue. The issue is that inlining is not happening correctly, because the rewriter can't inline complex combine_at_bitwidth statements, only simple ones. The solution is to add let-bindings in the final >> rewrite rule as in the addition ones, i.e., change

             ; (forall xl xh offset,
                    0 < offset < bitwidth
                    -> doublewidth (cstZsingle_to_double xl xh >> singlewidth ('offset))
                       = cstZsingle_to_double
                           (Z.lor (singlewidth (singlewidth xl >> singlewidth ('offset)))
                                  (singlewidth
                                     (Z.truncating_shiftl
                                        (singlewidth ('bitwidth))
                                        (singlewidth xh)
                                        (singlewidth ('(bitwidth - offset))))))
                           (singlewidth xh >> singlewidth ('offset)))

to

             ; (forall xl xh offset,
                    0 < offset < bitwidth
                    -> doublewidth (cstZsingle_to_double xl xh >> singlewidth ('offset))
                       = dlet outl := singlewidth (Z.lor (singlewidth (singlewidth xl >> singlewidth ('offset)))
                                  (singlewidth
                                     (Z.truncating_shiftl
                                        (singlewidth ('bitwidth))
                                        (singlewidth xh)
                                        (singlewidth ('(bitwidth - offset)))))) in
                  dlet outh := singlewidth (singlewidth xh >> singlewidth ('offset)) in
cstZsingle_to_double outl outh

(Sorry for the atrocious indentation; I typed this on my phone)

@JasonGross
Copy link
Collaborator

#614 / #615 do in fact fix the issue you pointed at, but there is another missing rewrite rule: multiplying a double width by a single width and casting it to a double width has no rule (for doublewidth * 19). I'll see if I can add a rule for that, too

@JasonGross
Copy link
Collaborator

I've updated #614 / #615 @jadephilipoom and now all the ERRORs are gone and it seems to compute correctly. (I'll leave it to you to update the pasted output from the computation.)

@jadephilipoom
Copy link
Collaborator

Should we close this issue? There are still a couple of kinks to work out, like #750, but at this point it's more polishing than feature-adding. If there are no objections, I'll consider myself free to close it.

Bedrock2 Output automation moved this from To do to Done May 11, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Development

No branches or pull requests

3 participants