Implement Keccak sponge #1291

jackryanservia · 2023-12-04T18:03:55Z

Implementation of the Keccak sponge and, preNist, ethereum, and SHA-3 hash functions.

This PR currently is using a workaround in checkBytes because lookup arguments are not working in pickles, but that will be updated as soon as a fix is available.

package.json

src/lib/keccak.unit-test.ts

Trivo25 · 2023-12-06T16:03:15Z

src/lib/keccak.unit-test.ts

@@ -1,106 +1,136 @@
 import { Field } from './field.js';


now that we have a full end to end functional Keccak, could you also add that (and some of the other variants, where you see fit) to the verification key regression tests?

querolita

Looks good to me. I left a few comments to ask for some clarifications regarding the exact constraints that are being added behind the scenes, but other than this it all looks good to me. Will be glad to approve as soon as we solve those concerns.

querolita · 2023-12-06T15:56:25Z

src/lib/keccak.ts

+// AUXILARY FUNCTIONS
+
+// Auxiliary function to check composition of 8 bytes into a 64-bit word
+function checkBytesToWord(word: Field, wordBytes: Field[]): void {


I must assume wordBytes is expected in little endian order then?

I guess this is to behave as in getKeccakStateOfBytes for each of the words?

querolita · 2023-12-06T16:06:16Z

src/lib/keccak.ts


 // Return a keccak state where all lanes are equal to 0
 const getKeccakStateZeros = (): Field[][] =>
  Array.from(Array(KECCAK_DIM), (_) => Array(KECCAK_DIM).fill(Field.from(0)));

+// Converts a list of bytes to a matrix of Field elements


This one does not create constraints for the byte length? Asking because keccakStateToBytes does. I guess this is because getKeccakStateOfBytes is only used at the beginning of Keccak where inputs are already constraints, but keccakStateToBytes is called at the end where you might want to check fresh bytes of the hash output?

src/lib/keccak.ts

querolita · 2023-12-06T16:16:40Z

src/lib/keccak.ts

+  return state;
+}
+
+// Converts a state of Fields to a list of bytes as Fields and creates constraints for it


Okay after reading I can see that the constraints here being referred to correspond to the decomposition of each word, but not the individual bytes composing each word. Wouldn't this need further calls to rangeCheck8 to constrain this is indeed the case? Or is this handled somewhere else in the code?

querolita · 2023-12-06T16:47:09Z

src/lib/keccak.ts

+}
+
+// TODO(jackryanservia): Use lookup argument once issue is resolved
+// Checks in the circuit that a list of Fields are at most 8 bits each


Oh, okay so the code indeed still needs some calls to checkBytes here and there

Where exactly? I've looked carefully, and I think all words that are split up by keccakStateToBytes and where the bytes are actually used, are constrained in the final checkBytes call on the output in the hash() function

But I fully agree that not having creation of bytes and checks on them co-located is confusing and error-prone.

in the final checkBytes call on the output in the hash() function

yeah, I saw that when I reached to the end of the review

querolita · 2023-12-06T16:48:13Z

src/lib/keccak.ts

+
+  // Check each Field input is 8 bits at most if it was not done before at creation time
+  if (byteChecks) {
+    checkBytes(messageFormatted);


I'm against having the checkBytes option. We never put constraints like these on inputs to a gadget. Instead, message bytes should be range-checked by whatever function creates them.

the original PR by @MartinMinkov added a UInt8 class and made that the input type: #999

That was a nice solution, because the UInt8s would already be constrained when passed to a circuit or witnessed. so, there would be no (easy) way to even create a UInt8 that isn't range-checked (or constant, and thus, verifiably within a range).

src/lib/keccak.ts

src/lib/keccak.unit-test.ts

querolita · 2023-12-06T16:53:59Z

src/lib/keccak.unit-test.ts

+)(
+  (x) => {
+    let thing = x.map(Number);
+    let result = sha3_256(new Uint8Array(thing));


Oh so this is using another implementation of the Keccaks to compare against what the gadget provides? Nice

I wonder if it could make sense to create tests using some of the test vectors proposed by the Keccak Team. Maybe not to be run inside CI all the time, but at least having run them once before shipping it for production.

mitschabaude

got about half of the way through, will continue review tomorrow morning

src/lib/keccak.ts

mitschabaude · 2023-12-06T20:23:02Z

src/lib/keccak.ts

+        return byte;
+      })
+  );


generally we constrain values where they are created. so I would've expected a rangeCheck8 on the contents of bytestring here

src/lib/keccak.ts

mitschabaude · 2023-12-06T20:46:33Z

src/lib/keccak.ts

+      for (let z = 0; z < BYTES_PER_WORD; z++) {
+        // Field element containing value 2^(8*z)
+        const shift = Field.from(2n ** BigInt(8 * z));
+        state[x][y] = state[x][y].add(shift.mul(wordBytes[z]));
+      }


I find it annoying how this duplicates the checkBytesToWord logic, and even with a different programming style (loop vs reduce)

Proposal: change checkBytesToWord(word: Field, wordBytes: Field[]) to bytesToWord(wordBytes: Field[]): Field and make it return the composed value. use it here:

state[x][y] = bytesToWord(wordBytes);

and add the extra assertion inline wherever you used checkBytesToWord before

src/lib/keccak.ts

mitschabaude

Still not done reviewing but sending it now because I found a vulnerability.

Besides that, I wanted to call out: This keccak implementation handles a fixed-length input. Like, for example, the padding and number of sponge absorb calls is hard-coded based on the message length.

I assume many applications will want to hash variable-length messages. This needs non-trivial amount of extra work, so we might want to schedule that as a follow-up task. (cc @nicc)

mitschabaude · 2023-12-07T08:30:18Z

src/lib/keccak.ts

+      // Create an array containing the 8 bytes starting on idx that correspond to the word in [x,y]
+      const word_bytes = bytestring.slice(idx, idx + BYTES_PER_WORD);
+      // Assert correct decomposition of bytes from state
+      checkBytesToWord(state[x][y], word_bytes);


Suggested change

checkBytesToWord(state[x][y], word_bytes);

state[x][y].assertEquals(bytesToWord(word_bytes));

Thanks @mitschabaude. Agreed that we should follow up with this. Have created a new issue here.

src/lib/keccak.ts

src/lib/keccak.unit-test.ts

src/lib/keccak.ts

mitschabaude · 2023-12-07T09:13:17Z

src/lib/keccak.ts

+  inpEndian: 'Big' | 'Little' = 'Big',
+  outEndian: 'Big' | 'Little' = 'Big',


I dislike the use of additional config options to support something as simple as reversing the input/output bytes. if there is a default for the endianness that is typically used with keccak, I would prefer if you just use that and remove the parameter (and document the endianness used at the interface). users who need the nonstandard endianness can just reverse the bytes themselves.

btw, little endian is good, big endian is bad. but if keccak usually takes big endian then we should go with it.

mitschabaude · 2023-12-07T09:23:25Z

src/lib/keccak.ts

+
+  // Check each Field input is 8 bits at most if it was not done before at creation time
+  if (byteChecks) {
+    checkBytes(messageFormatted);


I'm against having the checkBytes option. We never put constraints like these on inputs to a gadget. Instead, message bytes should be range-checked by whatever function creates them.

the original PR by @MartinMinkov added a UInt8 class and made that the input type: #999

That was a nice solution, because the UInt8s would already be constrained when passed to a circuit or witnessed. so, there would be no (easy) way to even create a UInt8 that isn't range-checked (or constant, and thus, verifiably within a range).

src/lib/keccak.ts

mitschabaude

I'm done with the review, and this last round is mostly about efficiency.

There's a higher-level design question which also affects efficiency: Do all use cases need to pass the inputs as bytes? Because, if someone wanted to pass in data which already comes in 64-bit chunks, that would obviously be much more efficient, we could save all the bytes-to-words conversion and externally range checking the input bytes.

The same observation applies to the output, and there's definitely a use case which is more efficient when returning the output as 64-bit chunks: ECDSA.

For ECDSA, the output of this needs to become a 256-bit field element which is then represented as 3 88-bit limbs. So, if we return 4 64-bit words directly, we can convert them into limbs as follows:

input: words w0, w1, w2, w3
output: limbs l0, l1, l2

l0 = w0 | slice(w1, 0, 24)
l1 = slice(w1, 24, 64) | slice(w2, 0, 48)
l2 = slice(w2, 48, 64) | w3

so we can even use two output words directly without cutting them apart, and need to cut the middle two words into larger pieces which is much cheaper than into bytes.

therefore, my ask is to definitely make some version of this which returns full 64-bit words. for consistency, then, taking 64-bit words as input also makes sense.

maybe the following would be a useful collection of interfaces:

a low-level Keccak interface which takes and returns 64-bit fields
a high-level ECDSA interface which takes a byte string, does both Keccak and the EC part, and handles the word-to-limb conversion internally
helper functions which split an array of 64-bit fields into 8-bit fields and vice versa

this is just a suggestion - pick something you like with the constraint that it doesn't force us to go to intermediate bytes in ECDSA

src/lib/keccak.ts

mitschabaude · 2023-12-07T11:10:22Z

src/lib/keccak.ts

+    );
+    const blockState = getKeccakStateOfBytes(paddedBlock);
+    // xor the state with the padded block
+    const stateXor = keccakStateXor(state, blockState);


This uses more constraints than necessary, correct me if I'm wrong:

blockState always consists of

rate / 64 words that come from the input message, followed by

capacity / 64 words which are all zero

For example, for length 256 we have 17 message words followed by 8 zero words.

XORing the 8 zero words with the current state will create a bunch of XOR gates, which are all no-ops, so they could just be left out.

This could be handled nicely by making Gadgets.xor() do nothing if one of its inputs is the zero constant!

XORing the 8 zero words with the current state will create a bunch of XOR gates, which are all no-ops, so they could just be left out.

Yes, this could be an optimization, leaving out those XORs and avoid extra rows that do nothing to the inputs.

by making Gadgets.xor() do nothing

if this is possible, then it sounds like a good idea to me

mitschabaude · 2023-12-07T11:18:36Z

src/lib/keccak.ts

+}
+
+// TODO(jackryanservia): Use lookup argument once issue is resolved
+// Checks in the circuit that a list of Fields are at most 8 bits each


Where exactly? I've looked carefully, and I think all words that are split up by keccakStateToBytes and where the bytes are actually used, are constrained in the final checkBytes call on the output in the hash() function

But I fully agree that not having creation of bytes and checks on them co-located is confusing and error-prone.

mitschabaude · 2023-12-07T11:20:56Z

src/lib/keccak.ts

+  }
+
+  // Obtain the hash selecting the first bitlength/8 bytes of the output array
+  const hashed = outputArray.slice(0, length / 8);


in hash(). but should be done in this function already

mitschabaude · 2023-12-07T11:23:57Z

src/lib/keccak.ts

+  const bytestring = keccakStateToBytes(state);
+  const outputBytes = bytestring.slice(0, bytesPerSqueeze);


you're converting the full state to bytes, and pay the constraints for doing that, but at the end you only use length / 8 of them.

for example, for length 256, you're converting 25 words to bytes but only use 4 of these words!

that's wasteful -- better to only convert the first length / 64 words of the state to bytes.

better to only convert the first length / 64 words of the state to bytes.

indeed, the new zkvm keccak only converts the needed 4 words at the end

nicc · 2023-12-07T18:40:52Z

Reminder to remove "coming soon" from the docs when this is merged

…ueeze

mitschabaude

I think this is good to merge, it's good as it stands. Constraint optimizations and addition of a 64-bit API can be done in another PR.

Implement Keccak sponge

b84edbb

jackryanservia marked this pull request as draft December 4, 2023 18:17

Improved test and cleaned up implementation

b5b0ee4

jackryanservia commented Dec 6, 2023

View reviewed changes

package.json Show resolved Hide resolved

src/lib/keccak.unit-test.ts Outdated Show resolved Hide resolved

src/lib/keccak.unit-test.ts Outdated Show resolved Hide resolved

src/lib/keccak.unit-test.ts Outdated Show resolved Hide resolved

jackryanservia marked this pull request as ready for review December 6, 2023 15:40

Trivo25 reviewed Dec 6, 2023

View reviewed changes

querolita reviewed Dec 6, 2023

View reviewed changes

mitschabaude reviewed Dec 6, 2023

View reviewed changes

mitschabaude requested changes Dec 7, 2023

View reviewed changes

Cleaned up and made variable names consistent

991008b

mitschabaude requested changes Dec 7, 2023

View reviewed changes

nicc mentioned this pull request Dec 7, 2023

Add support for variable-length messages to Keccak #1301

Open

jackryanservia added 11 commits December 11, 2023 18:48

Fix round constants not constrained

6e477e2

Remove switch statements for length

601e07a

Remove endian conversion

8fc218b

Replace copy in squeeze with splice builtin

86ccad7

Combines pad functions and change rate argument to bytes

3d3977f

Merge branch 'main' into feature/keccak-sponge

92fee1a

Changes all rate, length, and capacity units to bytes

45fb784

Make Keccak test run for random preimage/digest length

1003cc5

Fixes random preImageLength in Keccak unit test

bfb99e2

Cleans up asserts

898e761

Removes loop in squeeze bc standard length+capacity only does one seq…

b45ba97

…ueeze

mitschabaude approved these changes Dec 12, 2023

View reviewed changes

jackryanservia merged commit ffce6b3 into main Dec 12, 2023
13 checks passed

jackryanservia deleted the feature/keccak-sponge branch December 12, 2023 10:44

This was referenced Dec 13, 2023

Add new hashing functions (SHA & Keccak) o1-labs/o1js-bindings#53

Closed

Add new hashing functions (SHA & Keccak) #999

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement Keccak sponge #1291

Implement Keccak sponge #1291

jackryanservia commented Dec 4, 2023 •

edited

Trivo25 Dec 6, 2023

mitschabaude Dec 6, 2023

querolita left a comment

querolita Dec 6, 2023

querolita Dec 6, 2023

querolita Dec 6, 2023

querolita Dec 6, 2023

querolita Dec 6, 2023

mitschabaude Dec 7, 2023

querolita Dec 7, 2023

querolita Dec 6, 2023

mitschabaude Dec 7, 2023

querolita Dec 6, 2023

mitschabaude left a comment

mitschabaude Dec 6, 2023

mitschabaude Dec 6, 2023

mitschabaude left a comment

mitschabaude Dec 7, 2023

nicc Dec 7, 2023

mitschabaude Dec 7, 2023 •

edited

mitschabaude Dec 7, 2023

mitschabaude left a comment •

edited

mitschabaude Dec 7, 2023

querolita Dec 7, 2023

querolita Dec 7, 2023 •

edited

mitschabaude Dec 7, 2023

mitschabaude Dec 7, 2023

mitschabaude Dec 7, 2023

querolita Dec 7, 2023

nicc commented Dec 7, 2023

mitschabaude left a comment

	checkBytesToWord(state[x][y], word_bytes);
	state[x][y].assertEquals(bytesToWord(word_bytes));

		inpEndian: 'Big' \| 'Little' = 'Big',
		outEndian: 'Big' \| 'Little' = 'Big',

		const bytestring = keccakStateToBytes(state);
		const outputBytes = bytestring.slice(0, bytesPerSqueeze);

Implement Keccak sponge #1291

Implement Keccak sponge #1291

Conversation

jackryanservia commented Dec 4, 2023 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

querolita left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mitschabaude left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mitschabaude left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mitschabaude Dec 7, 2023 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mitschabaude left a comment • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

querolita Dec 7, 2023 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nicc commented Dec 7, 2023

mitschabaude left a comment

Choose a reason for hiding this comment

jackryanservia commented Dec 4, 2023 •

edited

mitschabaude Dec 7, 2023 •

edited

mitschabaude left a comment •

edited

querolita Dec 7, 2023 •

edited