Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mem extention #1134

Merged
merged 3 commits into from Mar 23, 2022
Merged

Mem extention #1134

merged 3 commits into from Mar 23, 2022

Conversation

dderjoel
Copy link
Contributor

@dderjoel dderjoel commented Feb 17, 2022

this is the PR split from #1123.
I do get an Error in Parse.v

File "./src/Assembly/Parse.v", line 190, characters 129-130:
Error: Syntax error: [term level 70] expected after ';;R' (in [term]).

which I don't understand. Generally, I think I need some feedback on what I am trying to do. I just can't get the syntax right with (Z * REG)

Edit: I skipped the Bedrock files for now, because I want to get the others compile correct first.

src/Assembly/Parse.v Outdated Show resolved Hide resolved
@dderjoel
Copy link
Contributor Author

dderjoel commented Feb 17, 2022

Using *, I was trying to give it what I defined in the Record, which there is (Z * REG) and I thought it'll then match those three tokens. But yeah, works now. I removed that later on in the print function, too, and it is at least compiling now.

Now, what is that Symbolic.v:1634

 index <- match a.(mem_scale_reg) with
           | Some (z, r) => App (mulZ, [(GetReg r); z])
           | None => App ((const 0), nil)
           end;

trying to do?
What are we calculating, or what does Address mean? Are we calculating the result of whatever is calculated in the mem [...]?
I don't get, what that App is and how those (I think different types, reg and Z) can be added.
Also I think I should use some multiplication instead of add to utilize the z.
But it is now failing with:

File "./src/Assembly/Symbolic.v", line 1634, characters 39-52:
Error:
In environment
sa : AddressSize
a : MEM
base : idx
p : Z * REG
z : Z
r : REG
The term "[z]" has type "list Z" while it is expected to have type
"list (M idx)".

@JasonGross
Copy link
Collaborator

Are we calculating the result of whatever is calculated in the mem [...]?

Yes, it's the memory address.

@dderjoel
Copy link
Contributor Author

Okay, In that case, how can I multiply two numbers, or more, the 'value on r times 'z' | Some (z, r) => App (mulZ, [(GetReg r); z])
and why is there a list z ?
I also tried to get the first element from apparent 'list z' with
| Some (z, r) => App (mulZ, [(GetReg r); (fst z)]) but that error'ed even worse.

@JasonGross
Copy link
Collaborator

JasonGross commented Feb 17, 2022

App is about internalizing the result of operations in the e-graph dag. You want to write something like

base_reg <- GetReg base_reg;
other_reg <- GetReg other_reg;
scale <- App (zconst sa scale, [])
offset <- App (zconst sa offset, []);
scaled_reg <- App (mul sa, [scale; other_reg]);
App (add sa, [base_reg; scaled_reg; offset])

except with more matches to deal with options

src/Assembly/Symbolic.v Outdated Show resolved Hide resolved
@JasonGross
Copy link
Collaborator

why is there a list z ?

To explain the error message, when you write [GetReg r; z] this desugars to GetReg r :: [z]. Since GetReg r has type M idx, Coq expects the tail to be a list of these things, but instead [z] is a list Z. The "meaning" of this error message is that GetReg r and z have different types, so you can't put them in the same list.

src/Assembly/Parse.v Outdated Show resolved Hide resolved
src/Assembly/Parse.v Outdated Show resolved Hide resolved
@dderjoel
Copy link
Contributor Author

so the current parser can parse
[reg] and [reg+reg] but cannot parse [reg + Z], [reg + Z * reg], [Z*reg], (regardless of spaces). Is there any --debug -- verbose mode that I could use to debug, on what Coq tries to parse and where it fails to find a match ?

@JasonGross
Copy link
Collaborator

JasonGross commented Feb 17, 2022

so the current parser can parse
[reg] and [reg+reg] but cannot parse [reg + Z], [reg + Z * reg], [Z*reg], (regardless of spaces).

Strange

Is there any --debug -- verbose mode that I could use to debug, on what Coq tries to parse and where it fails to find a match ?

No, but you can do Goal False. pose (parse_mem "[RAX + 5]") as v. cbv [parse_mem] in v. and then keep going to see how it computes interactively. (If you want a very verbose mode, you can do something like:

Ltac reduce_first_thing_in body v :=
  lazymatch body with
  | ?f ?x => first [ reduce_first_thing_in f v | reduce_first_thing_in x v ]
  | match ?f with _ => _ end => reduce_first_thing_in f v
  | ?f => progress unfold f in (value of v) at 1
  end.
Ltac reduce_first_in v :=
  first [ progress cbv beta in v
        | progress cbv iota in v
        | progress cbv zeta in v
        | let body := (eval cbv delta [v] in v) in progress reduce_first_thing_in body v ].
Goal False. pose (parse_mem "[RAX + 5]") as v. Set Ltac Debug. Set Ltac Batch Debug. repeat reduce_first_in v.

)

@dderjoel
Copy link
Contributor Author

dderjoel commented Feb 17, 2022

adding the Goal False. pose (parse_MEM "[RAX + 5]") as v. cbv [parse_MEM] in v. after the Definition of parse_MEM (I changed, parse_mem to parse_MEM) errors in

File "./src/Assembly/Parse.v", line 459, characters 0-93:
Error: Nested proofs are discouraged and not allowed by default. This error
probably means that you forgot to close the last "Proof." with "Qed." or
"Defined.". If you really intended to use nested proofs, you can do so by
turning the "Nested Proofs Allowed" flag on.

Assuming its a syntax error, adding Qed. at the end fixes the syntax, but then it is an incomplete proof.

Edit: using that longer version I get much More output... Which I cant parse myself - where would I start to read, it keeps printing the function but I cant read any differences. (also I don't know what the code does)

@JasonGross
Copy link
Collaborator

adding the Goal False. pose (parse_MEM "[RAX + 5]") as v. cbv [parse_MEM] in v. after the Definition of parse_MEM (I changed, parse_mem to parse_MEM) errors in

This is meant for interactive exploration in either CoqIDE or emacs/PG or vscoq, etc. You'll have to add a bunch more cbv commands to manually unfold things. I also missed a case in reduce_first_thing_in, I'll go back and edit my comment to add it.

Edit: using that longer version I get much More output... Which I cant parse myself - where would I start to read, it keeps printing the function but I cant read any differences. (also I don't know what the code does)

It should print a context including v := ... and every couple of steps the body should be updated.

@dderjoel
Copy link
Contributor Author

dderjoel commented Feb 18, 2022

currently creating a logfile with that v := blocks... 5GB 25GB (Edit: 52GB; I think I' cancel now...), and counting. I'll try to dig through that once it's done, otherwise I'll get myself an ide and try the interactive approach.

@JasonGross
Copy link
Collaborator

JasonGross commented Feb 18, 2022

5GB 25GB (Edit: 52GB; I think I' cancel now...),

:-( Here's a better verbose output mode that is not quite as verbose. It should show more reasonable steps rather than every single intermediate step.

Ltac head v := lazymatch v with ?f _ => head f | _ => v end.
Ltac print_good do_print v :=
  lazymatch do_print with
  | false => idtac
  | true
    => (* work around https://github.com/coq/coq/issues/15709 *)
      let h := head v in
      match h with
      | String => idtac
      | Ascii.Ascii => idtac
      | _ => idtac v
      end
  end.
Ltac step_reduce do_cbv do_print v :=
  let go := step_reduce do_cbv false in
  lazymatch (eval cbv beta iota in v) with
  | v
    => lazymatch v with
       | ?f ?x
         => lazymatch do_cbv with
            | true
              => let x' := go x in
                 let f' := lazymatch x' with
                           | x => go f
                           | _ => f
                           end in
                 let v := constr:(f' x') in
                 let __ := print_good do_print v in
                 v
            | false
              => let f' := go f in
                 let x' := lazymatch f' with
                           | f => go x
                           | _ => x
                           end in
                 let v := constr:(f' x') in
                 let __ := print_good do_print v in
                 v
            end
       | match ?f with _ => _ end
         => lazymatch (eval cbv iota in v) with
            | v
              => lazymatch v with
                 | context v'[f]
                   => let f' := go f in
                      let v := lazymatch f' with
                               | f => (eval hnf in v)
                               | _ => context v'[f']
                               end in
                      let __ := print_good do_print v in
                      v
                 end
            | ?v' => v'
            end
       | _ => let v := match v with
                       | _ => (eval cbv delta [v] in v)
                       | _ => v
                       end in
              let __ := print_good do_print v in
              v
       end
  | ?v
    => let __ := print_good do_print v in
       v
  end.
Ltac reduce do_cbv do_print v :=
  let v' := step_reduce do_cbv do_print v in
  lazymatch v' with
  | v => v
  | _ => reduce do_cbv do_print v'
  end.
Goal False.
  pose (parse_MEM "[RAX + 5]") as v.
  let v' := reduce true true v in idtac.

@JasonGross
Copy link
Collaborator

Here's a manual debugging session that led me to realize that we can parse rax but not RAX:

Goal False.
  pose (parse_MEM "[rax + 5]") as v.
  cbv [parse_MEM] in v.
  Set Printing Coercions.
  unfold parse_map at 1 in (value of v).
  unfold ";;" in (value of v).
  set (v' := (strip_whitespace_after (parse_str "byte ")?) "[rax + 5]") in (value of v).
  vm_compute in v'; subst v'.
  cbn [List.map List.flat_map List.app] in v.
  Ltac tac v :=
    let v' := fresh "v'" in
    match (eval cbv delta [v] in v) with
    | context[?e]
      => lazymatch e with
         | strip_whitespace_after ?f (String ?h ?t) => idtac
         | parse_map ?f ?p (String ?h ?t) => idtac
         end;
         idtac e;
         set (v' := e) in (value of v);
         vm_compute in v';
         let v'' := (eval cbv delta [v'] in v') in
         idtac v'';
         subst v'
    end;
    cbn [List.map List.flat_map List.app] in v.
  tac v.
  unfold "||" in v.
  unfold parse_alt_gen in v.
  tac v.
  set (v' := parse_map _ _ (String _ _)) in (value of v) at 1.
  unfold parse_map at 1 in (value of v').
  clear v.
  Compute strip_whitespace_after parse_REG "RAX".

@JasonGross
Copy link
Collaborator

A bit more debugging:

Compute finalize parse_MEM "[rax]". (* Some ... *)
Compute finalize parse_MEM "[rax           + 0x10]". (* None *)
Compute finalize parse_MEM "[rax +     rbx]". (* Some ... *)
Compute finalize parse_MEM "[rax + 2 * rbx]". (* None *)
Compute finalize parse_MEM "[      2 * rax]". (* None *)
Compute parse_MEM "[rax]". (* one result *)
Compute parse_MEM "[rax           + 0x10]". (* two results *)
Compute parse_MEM "[rax +     rbx]". (* one result *)
Compute parse_MEM "[rax + 2 * rbx]". (* four results *)
Compute parse_MEM "[      2 * rax]". (* many results *)

shows that what's going wrong is that we have multiple ways to parse the expression.

@JasonGross
Copy link
Collaborator

Ah, the issue is that parse_Z_arith_strict permits whitespace both before and after, so we were overlapping whitespace patterns.

I've pushed a commit just now that fixes this

@dderjoel
Copy link
Contributor Author

dderjoel commented Feb 18, 2022

despite that Bedrock build failing, (works on my machine with SKIP_BEDROCK) It still fails to proof correct.
with ./ExtractionOCaml/unsaturated_solinas --no-primitives --no-wide-int --shiftr-avoid-uint1 --tight-bounds-mul-by 1.000001 -o /dev/null --output-asm /dev/null --hints-file ../fiat-amd64/10_ratio11074_seed2040113663254706_square_poly1305.asm 'poly1305' '64' '3' '2^130 - 5' carry_square
it fails with

check_args
/* Autogenerated: './ExtractionOCaml/unsaturated_solinas' --no-primitives --no-wide-int --shiftr-avoid-uint1 --tight-bounds-mul-by 1.000001 -o '/dev/null' --output-asm '/dev/null' --hints-file '../fiat-amd64/10_ratio11074_seed2040113663254706_square_poly1305.asm' poly1305 64 3 '2^130 - 5' */
/* curve description: poly1305 */
/* machine_wordsize = 64 (from "64") */
/* requested operations: (all) */
/* n = 3 (from "3") */
/* s-c = 2^130 - [(1, 5)] (from "2^130 - 5") */
/* tight_bounds_multiplier = 1.000001 (from "1.000001") */
/*  */
/* Computed values: */
/*   carry_chain = [0, 1, 2, 0, 1] */
/*   eval z = z[0] + (z[1] << 44) + (z[2] << 87) */
/*   bytes_eval z = z[0] + (z[1] << 8) + (z[2] << 16) + (z[3] << 24) + (z[4] << 32) + (z[5] << 40) + (z[6] << 48) + (z[7] << 56) + (z[8] << 64) + (z[9] << 72) + (z[10] << 80) + (z[11] << 88) + (z[12] << 96) + (z[13] << 104) + (z[14] << 112) + (z[15] << 120) + (z[16] << 128) */
/*   balance = [0x1ffffffffff6, 0xffffffffffe, 0xffffffffffe] */

#include <stdint.h>
typedef unsigned char fiat_poly1305_uint1;
typedef signed char fiat_poly1305_int1;
#ifdef __GNUC__
#  define FIAT_POLY1305_FIAT_EXTENSION __extension__
#  define FIAT_POLY1305_FIAT_INLINE __inline__
#else
#  define FIAT_POLY1305_FIAT_EXTENSION
#  define FIAT_POLY1305_FIAT_INLINE
#endif

FIAT_POLY1305_FIAT_EXTENSION typedef signed __int128 fiat_poly1305_int128;
FIAT_POLY1305_FIAT_EXTENSION typedef unsigned __int128 fiat_poly1305_uint128;

/* The type fiat_poly1305_loose_field_element is a field element with loose bounds. */
/* Bounds: [[0x0 ~> 0x300003254e71], [0x0 ~> 0x18000192a73a], [0x0 ~> 0x18000192a73a]] */
typedef uint64_t fiat_poly1305_loose_field_element[3];

/* The type fiat_poly1305_tight_field_element is a field element with tight bounds. */
/* Bounds: [[0x0 ~> 0x1000010c6f7b], [0x0 ~> 0x800008637be], [0x0 ~> 0x800008637be]] */
typedef uint64_t fiat_poly1305_tight_field_element[3];

#if (-1 & 3) != 3
#error "This code only works on a two's complement system"
#endif



In parsing:
Error while parsing assembly:
Error while parsing assembly:
mov [ + rsp], rbx; saving to stack
mov [ + rsp + 0x08 * 1], rbp; saving to stack
mov [ + rsp + 0x08 * 2], r12; saving to stack
mov [ + rsp + 0x08 * 3], r13; saving to stack
mov [ + rsp + 0x08 * 4], r14; saving to stack
mov [ + rsp + 0x08 * 5], r15; saving to stack
mov rax, [ + rsi + 0x08 * 2]
lea rax, [ + rax + 4 * rax]
lea r10, [ + 2 * rax]
mov rdx, [ + rsi + 0x08 * 0]; arg1[0] to rdx
mulx r11, rbx, [ + rsi + 0x08 * 0]; x16, x15<- arg1[0] * arg1[0]
lea r10, [ + r10 + r10]
mov rdx, [ + rsi + 0x08 * 1]; arg1[1] to rdx
mov rdx, [ + rsi + 0x08 * 2]; arg1[2] to rdx
imul rdx, [ + rsi + 0x08 * 1], 2; x4 <- arg1[1] * 0x2
mulx rdx, rcx, [ + rsi + 0x08 * 0]; x14, x13<- arg1[0] * x4
imul r12, [ + rsi + 0x08 * 1], 2; x10001 <- arg1[1] * 0x2
imul rbx, [ + rsi + 0x08 * 2], 2; x3 <- arg1[2] * 0x2
mov rdx, [ + rsi + 0x08 * 1]; arg1[1] to rdx
mov rdx, [ + rsi + 0x08 * 0]; arg1[0] to rdx
lea r15, [ + r15 + r8]
lea r15, [ + r15 + r13]
mov [ + rdi + 0x08 * 1], rbp; out1[1] = x47
lea r15, [ + r15 + r10]
mov [ + rdi + 0x08 * 0], rcx; out1[0] = x44
mov [ + rdi + 0x08 * 2], r15; out1[2] = x48
mov rbx, [ + rsp + 0x08 * 0]; restoring from stack
mov rbp, [ + rsp + 0x08 * 1]; restoring from stack
mov r12, [ + rsp + 0x08 * 2]; restoring from stack
mov r13, [ + rsp + 0x08 * 3]; restoring from stack
mov r14, [ + rsp + 0x08 * 4]; restoring from stack
mov r15, [ + rsp + 0x08 * 5]; restoring from stack

New assembly being parsed:
SECTION .text
	GLOBAL square_poly1305
square_poly1305:
sub rsp, 48 ; last 0x30 (6) for Caller - save regs 

;instead of 
; mov [ rsp + 0x0 ], rbx; saving to stack

; we also want to use
mov [ + rsp], rbx; saving to stack

mov [ + rsp + 0x08 * 1], rbp; saving to stack
mov [ + rsp + 0x08 * 2], r12; saving to stack
mov [ + rsp + 0x08 * 3], r13; saving to stack
mov [ + rsp + 0x08 * 4], r14; saving to stack
mov [ + rsp + 0x08 * 5], r15; saving to stack

; instead of 
;imul rax, [ rsi + 0x10 ], 0x5; x1 <- arg1[2] * 0x5

; we want also to use 
mov rax, [ + rsi + 0x08 * 2]
lea rax, [ + rax + 4 * rax]

; instead of 
;imul r10, rax, 0x2; x2 <- x1 * 0x2

; we also want to use
lea r10, [ + 2 * rax]

mov rdx, [ + rsi + 0x08 * 0]; arg1[0] to rdx
mulx r11, rbx, [ + rsi + 0x08 * 0]; x16, x15<- arg1[0] * arg1[0]

; instead of 
; imul r10, r10, 0x2; x10000 <- x2 * 0x2

;we want to use 
lea r10, [ + r10 + r10]

; we also still need to support basic addressing with REG+off
mov rdx, [ + rsi + 0x08 * 1]; arg1[1] to rdx

mulx r10, rbp, r10; x8, x7<- arg1[1] * x10000
xor r12, r12
adox rbp, rbx
adox r11, r10
mov rdx, [ + rsi + 0x08 * 2]; arg1[2] to rdx
mulx rax, r13, rax; x6, x5<- arg1[2] * x1
mov r14, 17592186044415 ; moving imm to reg
mov r15, rbp; x22, copying x17 here, cause x17 is needed in a reg for other than x22, namely all: , x21, x22, size: 2
and r15, r14; x22 <- x17&0xfffffffffff
imul rdx, [ + rsi + 0x08 * 1], 2; x4 <- arg1[1] * 0x2
mulx rdx, rcx, [ + rsi + 0x08 * 0]; x14, x13<- arg1[0] * x4
shrd rbp, r11, 44; x21 <- x19||x17 >> 44
xor r8, r8
adox r13, rcx
adox rdx, rax
adcx r13, rbp
adc rdx, 0
imul r12, [ + rsi + 0x08 * 1], 2; x10001 <- arg1[1] * 0x2
mov r9, r13; x34, copying x31 here, cause x31 is needed in a reg for other than x34, namely all: , x34, x35, size: 2
shrd r9, rdx, 43; x34 <- x33||x31 >> 43
imul rbx, [ + rsi + 0x08 * 2], 2; x3 <- arg1[2] * 0x2
mov rdx, [ + rsi + 0x08 * 1]; arg1[1] to rdx
mulx r12, r10, r12; x10, x9<- arg1[1] * x10001
mov rdx, [ + rsi + 0x08 * 0]; arg1[0] to rdx
mulx rbx, r11, rbx; x12, x11<- arg1[0] * x3
xor rdx, rdx
adox r10, r11
adox rbx, r12
adcx r10, r9
adc rbx, 0
mov r8, r10; x39, copying x36 here, cause x36 is needed in a reg for other than x39, namely all: , x39, x40, size: 2
shrd r8, rbx, 43; x39 <- x38||x36 >> 43
imul r8, r8, 5; x41 <- x39 * 0x5
mov rax, 8796093022207 ; moving imm to reg
and r13, rax; x35 <- x31&0x7ffffffffff
lea r15, [ + r15 + r8]
mov rcx, r15; x44, copying x42 here, cause x42 is needed in a reg for other than x44, namely all: , x44, x43, size: 2
and rcx, r14; x44 <- x42&0xfffffffffff
shr r15, 44; x43 <- x42>> 44
lea r15, [ + r15 + r13]
mov rbp, r15; x47, copying x45 here, cause x45 is needed in a reg for other than x47, namely all: , x47, x46, size: 2
and rbp, rax; x47 <- x45&0x7ffffffffff
shr r15, 43; x46 <- x45>> 43
and r10, rax; x40 <- x36&0x7ffffffffff
mov [ + rdi + 0x08 * 1], rbp; out1[1] = x47
lea r15, [ + r15 + r10]
mov [ + rdi + 0x08 * 0], rcx; out1[0] = x44
mov [ + rdi + 0x08 * 2], r15; out1[2] = x48
mov rbx, [ + rsp + 0x08 * 0]; restoring from stack
mov rbp, [ + rsp + 0x08 * 1]; restoring from stack
mov r12, [ + rsp + 0x08 * 2]; restoring from stack
mov r13, [ + rsp + 0x08 * 3]; restoring from stack
mov r14, [ + rsp + 0x08 * 4]; restoring from stack
mov r15, [ + rsp + 0x08 * 5]; restoring from stack
add rsp, 48
ret
; cpu Intel(R) Core(TM) i7-10710U CPU @ 1.10GHz
; clocked at 1600 MHz
; first cyclecount 15.38, best 10.657863145258103, lastGood 10.66546329723225
; seed 2040113663254706 
; CC / CFLAGS clang / -march=native -mtune=native -O3 
; time needed: 15590 ms / 500 runs=> 31.18ms/run
; Time spent for assembling and measureing (initial batch_size=833, initial num_batches=101): 13489 ms
; Ratio (time for assembling + measure)/(total runtime for 500runs): 0.8652341244387428
; number reverted permutation/ tried permutation: 221 / 270 =81.852%
; number reverted decision/ tried decision: 169 / 231 =73.160%


In fiat_poly1305_from_bytes:
Computed bounds (Some [Some [0x0 ~> 0xfffffffffff], Some [0x0 ~> 0x7ffffffffff], Some [0x0 ~> 0x17ffffffffff]]) are not tight enough (expected bounds not looser than (Some [Some [0x0 ~> 0x1000010c6f7b], Some [0x0 ~> 0x800008637be], Some [0x0 ~> 0x800008637be]])).
The bounds [0x0 ~> 0x17ffffffffff] are looser than the expected bounds [0x0 ~> 0x800008637be]
The syntax tree:
(λ x1,
  let x2 := x1[16] << 41 (* : [0x0 ~> 0x60000000000] *) in
  let x3 := x1[15] << 33 (* : [0x0 ~> 0x1fe00000000] *) in
  let x4 := x1[14] << 25 (* : [0x0 ~> 0x1fe000000] *) in
  let x5 := x1[13] << 17 (* : [0x0 ~> 0x1fe0000] *) in
  let x6 := x1[12] << 9 (* : [0x0 ~> 0x1fe00] *) in
  let x7 := x1[11] * 2 (* : [0x0 ~> 0x1fe] *) in
  let x8 := x1[10] << 36 (* : [0x0 ~> 0xff000000000] *) in
  let x9 := x1[9] << 28 (* : [0x0 ~> 0xff0000000] *) in
  let x10 := x1[8] << 20 (* : [0x0 ~> 0xff00000] *) in
  let x11 := x1[7] << 12 (* : [0x0 ~> 0xff000] *) in
  let x12 := x1[6] << 4 (* : [0x0 ~> 0xff0] *) in
  let x13 := x1[5] << 40 (* : [0x0 ~> 0xff0000000000] *) in
  let x14 := x1[4] << 32 (* : [0x0 ~> 0xff00000000] *) in
  let x15 := x1[3] << 24 (* : [0x0 ~> 0xff000000] *) in
  let x16 := x1[2] << 16 (* : [0x0 ~> 0xff0000] *) in
  let x17 := x1[1] << 8 (* : [0x0 ~> 0xff00] *) in
  let x18 := x1[0] (* : uint8_t *) in
  let x19 := x17 + x18 (* : uint16_t *) in
  let x20 := x16 + x19 (* : uint24_t *) in
  let x21 := x15 + x20 (* : uint32_t *) in
  let x22 := x14 + x21 (* : uint40_t *) in
  let x23 := x13 + x22 (* : uint48_t *) in
  let x24 := x23 & (2^44-1) (* : uint44_t *) in
  let x25 := x23 >> 44 (* : uint4_t *) in
  let x26 := x12 + x25 (* : uint12_t *) in
  let x27 := x11 + x26 (* : uint20_t *) in
  let x28 := x10 + x27 (* : uint28_t *) in
  let x29 := x9 + x28 (* : uint36_t *) in
  let x30 := x8 + x29 (* : uint44_t *) in
  let x31 := x30 & (2^43-1) (* : uint43_t *) in
  let x32 := x30 >> 43 (* : [0x0 ~> 0x2] *) in
  let x33 := x7 + x32 (* : [0x0 ~> 0x200] *) in
  let x34 := x6 + x33 (* : [0x0 ~> 0x20000] *) in
  let x35 := x5 + x34 (* : [0x0 ~> 0x2000000] *) in
  let x36 := x4 + x35 (* : [0x0 ~> 0x200000000] *) in
  let x37 := x3 + x36 (* : [0x0 ~> 0x20000000000] *) in
  let x38 := x2 + x37 (* : [0x0 ~> 0x80000000000] *) in
  let x39 := x38 & (2^43-1) (* : uint43_t *) in
  let x40 := x38 >> 43 (* : [0x0 ~> 0x2] *) in
  let x41 := (x40 << 43) + x39 (* : [0x0 ~> 0x17ffffffffff] *) in
  x24 :: x31 :: x41 :: []
)

which can be pretty-printed as:
/*
 * Input Bounds:
 *   arg1: [[0x0 ~> 0xff], [0x0 ~> 0xff], [0x0 ~> 0xff], [0x0 ~> 0xff], [0x0 ~> 0xff], [0x0 ~> 0xff], [0x0 ~> 0xff], [0x0 ~> 0xff], [0x0 ~> 0xff], [0x0 ~> 0xff], [0x0 ~> 0xff], [0x0 ~> 0xff], [0x0 ~> 0xff], [0x0 ~> 0xff], [0x0 ~> 0xff], [0x0 ~> 0xff], [0x0 ~> 0x3]]
 * Output Bounds:
 *   out1: None
 */
void f(uint64_t out1[3], const uint8_t arg1[17]) {
  uint64_t x1;
  uint64_t x2;
  uint64_t x3;
  uint32_t x4;
  uint32_t x5;
  uint16_t x6;
  uint64_t x7;
  uint64_t x8;
  uint32_t x9;
  uint32_t x10;
  uint8_t x11;
  uint64_t x12;
  uint64_t x13;
  uint32_t x14;
  uint32_t x15;
  uint8_t x16;
  uint8_t x17;
  uint16_t x18;
  uint32_t x19;
  uint32_t x20;
  uint64_t x21;
  uint64_t x22;
  uint64_t x23;
  uint4 x24;
  uint16_t x25;
  uint32_t x26;
  uint32_t x27;
  uint64_t x28;
  uint64_t x29;
  uint64_t x30;
  uint2 x31;
  uint16_t x32;
  uint32_t x33;
  uint32_t x34;
  uint64_t x35;
  uint64_t x36;
  uint64_t x37;
  uint64_t x38;
  uint2 x39;
  uint64_t x40;
  x1 = ((uint64_t)(uint2)(arg1[16]) << 41);
  x2 = ((uint64_t)(arg1[15]) << 33);
  x3 = ((uint64_t)(arg1[14]) << 25);
  x4 = ((uint32_t)(arg1[13]) << 17);
  x5 = ((uint32_t)(arg1[12]) << 9);
  x6 = (uint16_t)((arg1[11]) * 0x2);
  x7 = ((uint64_t)(arg1[10]) << 36);
  x8 = ((uint64_t)(arg1[9]) << 28);
  x9 = ((uint32_t)(arg1[8]) << 20);
  x10 = ((uint32_t)(arg1[7]) << 12);
  x11 = ((arg1[6]) << 4);
  x12 = ((uint64_t)(arg1[5]) << 40);
  x13 = ((uint64_t)(arg1[4]) << 32);
  x14 = ((uint32_t)(arg1[3]) << 24);
  x15 = ((uint32_t)(arg1[2]) << 16);
  x16 = ((arg1[1]) << 8);
  x17 = (arg1[0]);
  x18 = (uint16_t)(x16 + x17);
  x19 = (x15 + (uint32_t)x18);
  x20 = (x14 + x19);
  x21 = (x13 + x20);
  x22 = (x12 + x21);
  x23 = (x22 & UINT64_C(0xfffffffffff));
  x24 = (uint4)(x22 >> 44);
  x25 = (uint16_t)(x11 + x24);
  x26 = (x10 + (uint32_t)x25);
  x27 = (x9 + x26);
  x28 = (x8 + x27);
  x29 = (x7 + x28);
  x30 = (x29 & UINT64_C(0x7ffffffffff));
  x31 = (uint2)(x29 >> 43);
  x32 = (uint16_t)(x6 + x31);
  x33 = (x5 + (uint32_t)x32);
  x34 = (x4 + x33);
  x35 = (x3 + x34);
  x36 = (x2 + x35);
  x37 = (x1 + x36);
  x38 = (x37 & UINT64_C(0x7ffffffffff));
  x39 = (uint2)(x37 >> 43);
  x40 = (((uint64_t)x39 << 43) + x38);
  out1[0] = x23;
  out1[1] = x30;
  out1[2] = x40;
}

with input bounds (Some [Some [0x0 ~> 0xff], Some [0x0 ~> 0xff], Some [0x0 ~> 0xff], Some [0x0 ~> 0xff], Some [0x0 ~> 0xff], Some [0x0 ~> 0xff], Some [0x0 ~> 0xff], Some [0x0 ~> 0xff], Some [0x0 ~> 0xff], Some [0x0 ~> 0xff], Some [0x0 ~> 0xff], Some [0x0 ~> 0xff], Some [0x0 ~> 0xff], Some [0x0 ~> 0xff], Some [0x0 ~> 0xff], Some [0x0 ~> 0xff], Some [0x0 ~> 0x3]]).


Fatal error: exception Failure("Synthesis failed")

which I don't understand. Where is the problem now, It says Error while parsing assembly: and then prints a bunch of instructions with memory operands, but then prints them correctly. Different ([ + rsp + 0x08 * 1]) , but correctly (as far as I can see)

Edit: how can I provide long code in github-markdown in that lovely scroll-pane, or does that only work for referencing files?

@JasonGross
Copy link
Collaborator

Edit: how can I provide long code in github-markdown in that lovely scroll-pane, or does that only work for referencing files?

I think if you tag it with a language (like ```coq) it does syntax highlighting and scroll?

The issue here is that the pretty-printed assembly needs to be reparsable as the initial code, and the parsing code doesn't support leading + in memory addressing. If you make the show code a bit more clever, so that it never prints a leading +, it should work fine.

src/Assembly/Parse.v Outdated Show resolved Hide resolved
@dderjoel
Copy link
Contributor Author

dderjoel commented Feb 20, 2022

I think if you tag it with a language (like ```coq) it does syntax highlighting and scroll?
It does not scrollify, only the syntax.

The parsing/printing seems to work now. I can compile (yey!).
Now it Is unable to unify

;
  symbolic_reg_state := [(rax, 82), (rcx, 378), (rdx, 4), (rbx, 112), (rsp, 184), (rbp, 114), (rsi, 133), (rdi, 128), (r8, 375), (r9, 364), (r10, 383), (r11, 33), (r12, 121), (r13, 122), (r14, 123), (r15, 124)];
  symbolic_flag_state := (*flag_state*)(CF=Some 385 PF=None AF=None ZF=None SF=None ZF=None OF=Some 386);
  symbolic_mem_state :=
[(324, 124), (321, 123), (318, 122), (315, 121), (312, 114), (309, 112), (306, 142), (303, 143), (300, 144), (297, 145), (294, 146), (291, 147), (288, 148), (285, 149), (282, 150), (279, 151), (276, 152), (273, 153), (270, 154), (267, 155), (264, 156), (261, 157), (258, 158), (255, 159), (252, 160), (249, 161), (246, 162), (243, 163), (240, 164), (237, 165), (234, 166), (231, 167), (228, 168), (225, 169), (222, 170), (219, 171), (216, 172), (213, 173), (210, 174), (207, 175), (205, 176), (202, 177), (199, 178), (196, 179), (193, 180), (190, 181), (188, 182), (186, 183), (135, 0), (134, 1), (133, 2), (132, 384), (130, 381), (128, 378)]
;
|}
Unable to unify: [inr [378, 381, 384]] == [inr [101, 106, 108]]
Could not unify the values at index 0: [378, 381, 384] != [101, 106, 108]
index 0: 378 != 101
(slice 0 44, [377]) != (slice 0 44, [98])
index 0: 377 != 98
(add 64, [345, 375]) != (add 64, [57, 96])
index 0: 345 != 57
(slice 0 44, [337]) != (slice 0 44, [44])
index 0: 337 != 44
(add 64, [41, 334]) != (add 64, [25, 41])
index 1: 334 != 25
(mul 64, [1, 331]) != (mul 64, [0, 1, 22])
index 1: 331 != 0
(add 64, [329, 329]) != (old 64 0, [])
(add 64, [(mul 64, [7, 328]), (mul 64, [7, 328])]) != (old 64 0, [])
Operation mismatch: add 64 != old 64 0
0 is a special value no longer present in the symbolic machine state at the end of execution.

Fatal error: exception Failure("Synthesis failed")

Full output here: poly1305_sq.txt

Edit: I wonder what 0 it is trying to match against

@dderjoel
Copy link
Contributor Author

dderjoel commented Feb 21, 2022

also the bedrock-build fails with

 File "./src/Assembly/WithBedrock/Semantics.v", line 110, characters 46-92:
command time -f "src/Assembly/WithBedrock/Semantics.vo (real: %e, user: %U, sys: %S, mem: %M ko)" "coqc"  -time -q '-w' '+implicit-core-hint-db,+implicits-in-term,+non-reversible-notation,+deprecated-intros-until-0,+deprecated-focus,+unused-intro-pattern,+variable-collision,+omega-is-deprecated,+deprecated-instantiate-syntax,+non-recursive' -w -notation-overridden,-undeclared-scope,-deprecated-hint-rewrite-without-locality,-deprecated-hint-constr,-fragile-hint-constr,-native-compiler-disabled,-ambiguous-paths  "-w" "-deprecated-native-compiler-option" "-native-compiler" "ondemand" -R src Crypto src/Assembly/WithBedrock/Semantics.v > src/Assembly/WithBedrock/Semantics.v.timing
Error: Unknown interpretation for notation "_ <- _ ; _".

Edit: here; https://github.com/mit-plv/fiat-crypto/runs/5267476251?check_suite_focus=true#step:7:2360
What does that mean there ? Error: Unknown interpretation for notation "_ <- _ ; _".

@JasonGross
Copy link
Collaborator

(mul 64, [1, 331]) != (mul 64, [0, 1, 22])

This is the really interesting line. I should probably make the behavior slightly different when either list is longer than length 1. Maybe printing out successive inlinings is the right thing to do here?

We need to figure out why 331 is supposed to be the same as 0 * 22

@dderjoel
Copy link
Contributor Author

is 0 the number zero, or it that the variable #0?

@dderjoel
Copy link
Contributor Author

dderjoel commented Feb 21, 2022

(*16*) (const 64, []);
(*17*) (mulZ, [0, 6]);
(*18*) (mul 64, [0, 0, 3]);
(*19*) (shrZ, [17, 16]);
(*20*) (shr 64, [17, 16]);
(*22*) (const 20, []);

hmm... so maybe it has to do with my shl <- mul change, which is not in that branch at the moment.

Edit: no, there is no such things in the asm-file.

@JasonGross
Copy link
Collaborator

is 0 the number zero, or it that the variable #0?

The variable #0

There is also

(*326*) (const 4, []);
(*327*) (mul 64, [0, 326]);
(*328*) (add 64, [0, 327]);
(*329*) (mul 64, [7, 328]);

@JasonGross
Copy link
Collaborator

It seems to want 2 * (#0 + #0 * 4) + 2 * (#0 + #0 * 4) == #0 * 20. So I think we need a rewrite rule that counts the occurrences of all the terms in a sum, where it will look two levels deeper to see if any terms are products of constants with a given term. That is, we say occurences(x * const v) := occurences(const v * x) := (x, v) and otherwise occurences(x) := (x, 1), and then we do something like groupBy fst (sort (map occurences summands)) and then sum all the occurence counts and then recreate the sum. @andres-erbsen @dderjoel does this sound like the right rewrite rule?

@dderjoel
Copy link
Contributor Author

dderjoel commented Feb 21, 2022

yes, sounds right. So, how I generate the mul (const) is: If there is a variable * const * const, I reduce that by evaluating the const * const at compile time. (but this is addressed already in 59600ae, see here, in the other issue. After evaluating the constant const' <- const * const, I check in lookup map for a couple known constants (like 5) I have map to break up constants (5=1+4) to limbs, which can be computed differently (e.g. with lea [4*x + x] rather than imul x, 5).

I don't exactly know, why this problem here (5 = 1 + 4) cannot be solved like that (4 = 2 * 2), too. Is it because I 'break the constant up' myself here, rather than 'combing them' ?

Edit: wording

@JasonGross
Copy link
Collaborator

Roughly the issue is that app_consts leaves over spurious x * 1 that don't get removed by subsequent passes.

JasonGross added a commit to dderjoel/fiat-crypto that referenced this pull request Mar 22, 2022
Roughly the issue is that app_consts leaves over spurious x * 1 that
don't get removed by subsequent passes.  c.f.
mit-plv#1134 (comment)
@JasonGross
Copy link
Collaborator

Issue fixed, all assembly tests now pass on my machine

JasonGross added a commit to JasonGross/fiat-crypto that referenced this pull request Mar 22, 2022
Roughly the issue is that app_consts leaves over spurious x * 1 that
don't get removed by subsequent passes.  c.f.
mit-plv#1134 (comment)
@dderjoel
Copy link
Contributor Author

dderjoel commented Mar 22, 2022

nice, Thank you!
I can also build and 'many' files that I generate get proven correct.
But there is some interesting behavior. I've added another file for curve25519;
It has to do with the 'multiply by 19' in with leas

There is actually two occurrences oft that. Lines 17-20 and lines 131-136

The first mul can be split up into leas but for the second one it fails with
../src/ExtractionOCaml/unsaturated_solinas --no-primitives --no-wide-int --shiftr-avoid-uint1 --output /dev/null --output-asm /dev/null --tight-bounds-mul-by 1.000001 'curve25519' '64' '5' '2^255 - 19' carry_square --hints-file ./9789_ratio10114_seed1785685356_square_curve25519.asm

Unable to unify: [inr [890, 895, 894, 189, 200]] == [inr [207, 212, 214, 189, 200]]
Could not unify the values at index 0: [#890, #895, #894, #189, #200] != [#207, #212, #214, #189, #200]
index 0: #890 != #207
(slice 0 51, [#889]) != (slice 0 51, [#204])
index 0: #889 != #204
(add 64, [#108, #199, #887]) != (add 64, [#108, #202])
(add 64, [#199, #887]) != (add 64, [#202])
(add 64, [#199, (mul 64, [#9, #886])]) != (add 64, [(mul 64, [#5, #199])])
(add 64, [#199, (mul 64, [#9, (add 64, [#199, #885])])]) != (add 64, [(mul 64, [#5, #199])])
(add 64, [#199, (mul 64, [#9, (add 64, [#199, (mul 64, [#199, #237])])])]) != (add 64, [(mul 64, [#5, #199])])
(add 64, [#199, (mul 64, [#9, (add 64, [#199, (mul 64, [#199, (const 8, [])])])])]) != (add 64, [(mul 64, [#5, #199])])
(add 64, [(or 64, [#195, #197]), (mul 64, [(const 2, []), (add 64, [(or 64, [#195, #197]), (mul 64, [(or 64, [#195, #197]), (const 8, [])])])])]) != (add 64, [(mul 64, [(const 19, []), (or 64, [#195, #197])])])

the last line would 'expand' to

(add 64, [
    (or 64, [#195, #197]),
    (mul 64, [
        (const 2, []),
        (add 64, [
            (or 64, [#195, #197]),
            (mul 64, [
                (or 64, [#195, #197]),
                (const 8, [])
            ])
        ])
    ])
]) !=
(add 64, [
    (mul 64, [
        (const 19, []),
        (or 64, [#195, #197])
    ])
])

and if (or 64 ,[#195, #197]) is the value for x41, it would mean
x43 = x41 + ( 2 * ( x41 + (x41 * 8 ) ) ) != x43 = 19 * x41 +

full output in attached file.
err.txt

@dderjoel
Copy link
Contributor Author

dderjoel commented Mar 22, 2022

another test case in the poly1305 case.
this one is unable to unify

Could not unify the values at index 0: [#351, #349, #350] != [#103, #108, #110]
index 0: #351 != #103
(slice 0 44, [#345]) != (slice 0 44, [#100])
index 0: #345 != #100
(add 64, [#58, #95, #343]) != (add 64, [#58, #98])
(add 64, [#95, #343]) != (add 64, [#98])
(add 64, [#95, (mul 64, [#95, #331])]) != (add 64, [(mul 64, [#3, #95])])
(add 64, [#95, (mul 64, [#95, (const 4, [])])]) != (add 64, [(mul 64, [#3, #95])])
(add 64, [(or 64, [#91, #93]), (mul 64, [(or 64, [#91, #93]), (const 4, [])])]) != (add 64, [(mul 64, [(const 5, []), (or 64, [#91, #93])])])

where I guess the problem is the mul * 5 in line 56

(add 64, [
    (or 64, [#91, #93]), 
    (mul 64, [
        (or 64, [#91, #93]), 
        (const 4, [])
    ])
])
!=
(add 64, [
    (mul 64, [
       (const 5, []), 
       (or 64, [#91, #93])
    ])
])

which, If I understand correctly, means
if (or 64, [#91, #93]) would be x20:
x22 = x20 + x20 * 4 != x22 = 5 * x20 +

Is an empty summand treated as 0? Or another way of asking: Why is there an addition in the first place?

@dderjoel
Copy link
Contributor Author

dderjoel commented Mar 22, 2022

I think both of the cases have the same cause; but I wonder, why it works in lines 19-20. The only difference I see, is that there the argument is an argument from the input (x1 <- arg1[4]*19: arg1[4] + 2 * TMP = arg1[4] + 2 * 9 * arg1[4])

@JasonGross
Copy link
Collaborator

Unfortunately rewriting is hard to debug. Could you restore this PR to it's previous (succeeding) state, and open a new one adding the failing files? (Alternatively you could make a new branch and a new PR, and I can force-push to this branch to restore it to passing.) I'd like to merge this tomorrow to avoid merge conflicts with my other work in progress.

As far as debugging the issue, could you try (in the new PR) adding a second copy of the combine_consts pass at the end of the list in Definition expr in Symbolic.v? If this changes nothing, it means there is an issue with the rewriting pass; if it changes something, it means that one of the later passes is exposing an opportunity that gets passed up.

Is an empty summand treated as 0? Or another way of asking: Why is there an addition in the first place?

Where do you see an empty summand? But, yes, (add 64, []) is 0, and (add 64, [x]) is x mod 2^64. The reason we see add 64 with a single argument here is an artifact of the explanation process; I strip out common indices from the top-level addition. (This is what happens in going from (add 64, [#58, #95, #343]) != (add 64, [#58, #98]) to
(add 64, [#95, #343]) != (add 64, [#98]).) Maybe this is too confusing and I should instead move them to the end or something?

At a glance, your analysis seems correct to me, btw.

JasonGross added a commit to JasonGross/fiat-crypto that referenced this pull request Mar 22, 2022
Roughly the issue is that app_consts leaves over spurious x * 1 that
don't get removed by subsequent passes.  c.f.
mit-plv#1134 (comment)
JasonGross added a commit that referenced this pull request Mar 22, 2022
Roughly the issue is that app_consts leaves over spurious x * 1 that
don't get removed by subsequent passes.  c.f.
#1134 (comment)
@JasonGross
Copy link
Collaborator

I've merged #1166, which contains all but the new examples of this PR

@JasonGross
Copy link
Collaborator

My current guess is that the code turns add 64, [#95, #343] into add 64, [(or 64, [#91, #93]), (mul 64, [#95, #331])] and thereafter misses the fact that (or 64, [#91, #93]) and #95 are the same. Instead I guess it should only reveal as needed, or else it should reveal an extra level when it finds multiplications.

JasonGross added a commit to JasonGross/fiat-crypto that referenced this pull request Mar 22, 2022
Depth determines which indices get expanded, but all references to the
same index get expanded if they appear in the output.

This is because Joel's latest examples require a rewriting pass that
needs either uneven revealing or the ability to check expression
equality modulo the dag in the middle of rewriting.
mit-plv#1134 (comment)
Roughly the issue is that if we want to turn `a + 4 * a` into `5 * a`,
we need to reveal enough structure to see `4 * a`, but we need to see
that the two instances of `a` are the same (e.g., if `a` is an ExprRef
pointing to `b|c`, and we reveal uniformly, then we need to recognize
`b|c + 4 * a`)
@JasonGross
Copy link
Collaborator

JasonGross commented Mar 22, 2022

I've implemented this in #1167, which makes all the tests here pass, and I've pushed this commit atop this branch.

JasonGross added a commit to dderjoel/fiat-crypto that referenced this pull request Mar 22, 2022
Depth determines which indices get expanded, but all references to the
same index get expanded if they appear in the output.

This is because Joel's latest examples require a rewriting pass that
needs either uneven revealing or the ability to check expression
equality modulo the dag in the middle of rewriting.
mit-plv#1134 (comment)
Roughly the issue is that if we want to turn `a + 4 * a` into `5 * a`,
we need to reveal enough structure to see `4 * a`, but we need to see
that the two instances of `a` are the same (e.g., if `a` is an ExprRef
pointing to `b|c`, and we reveal uniformly, then we need to recognize
`b|c + 4 * a`)
@dderjoel
Copy link
Contributor Author

dderjoel commented Mar 22, 2022

So this now contains all the changes, right? I'm building from this then and check from my side.
Edit: works. I'm happy to merge this (or the part two, #1167).

Thanks so much for your work!

JasonGross added a commit that referenced this pull request Mar 22, 2022
Depth determines which indices get expanded, but all references to the
same index get expanded if they appear in the output.

This is because Joel's latest examples require a rewriting pass that
needs either uneven revealing or the ability to check expression
equality modulo the dag in the middle of rewriting.
#1134 (comment)
Roughly the issue is that if we want to turn `a + 4 * a` into `5 * a`,
we need to reveal enough structure to see `4 * a`, but we need to see
that the two instances of `a` are the same (e.g., if `a` is an ExprRef
pointing to `b|c`, and we reveal uniformly, then we need to recognize
`b|c + 4 * a`)
Unable to unify: [inr [351, 349, 350]] == [inr [103, 108, 110]]
Could not unify the values at index 0: [mit-plv#351, mit-plv#349, mit-plv#350] != [mit-plv#103, mit-plv#108, mit-plv#110]
index 0: mit-plv#351 != mit-plv#103
(slice 0 44, [mit-plv#345]) != (slice 0 44, [mit-plv#100])
index 0: mit-plv#345 != mit-plv#100
(add 64, [mit-plv#58, mit-plv#95, mit-plv#343]) != (add 64, [mit-plv#58, mit-plv#98])
(add 64, [mit-plv#95, mit-plv#343]) != (add 64, [mit-plv#98])
(add 64, [mit-plv#95, (mul 64, [mit-plv#95, mit-plv#331])]) != (add 64, [(mul 64, [#3, mit-plv#95])])
(add 64, [mit-plv#95, (mul 64, [mit-plv#95, (const 4, [])])]) != (add 64, [(mul 64, [#3, mit-plv#95])])
(add 64, [(or 64, [mit-plv#91, mit-plv#93]), (mul 64, [(or 64, [mit-plv#91, mit-plv#93]), (const 4, [])])]) != (add 64, [(mul 64, [(const 5, []), (or 64, [mit-plv#91, mit-plv#93])])])
@JasonGross
Copy link
Collaborator

I've rebased and force-pushed atop the now-merged #1167, and I will merge this once the CI passes

@JasonGross
Copy link
Collaborator

All non-windows tests build, and the windows one seems to build other than maybe hanging at the very end, so I'm going to merge this.

@JasonGross JasonGross merged commit cc89488 into mit-plv:master Mar 23, 2022
@dderjoel dderjoel deleted the mem-extention branch March 23, 2022 07:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants