Assembly instruction rationalization thoughts #171

bobbinth · 2022-04-12T21:54:01Z

Having had the benefit of writing some simple (and not so simple) programs in Miden assembly, I think there are some opportunities to simplify (and in some cases augment) current assembly instruction set. My thoughts are below - though, these are rather preliminary and many of them might not be a good idea.

Field operations

All looks good here. There is one slightly inconsistent thing in comparison operations: eq and neq operations allow for immediate parameters, while other comparison operations do not. The primary reason for allowing immediate parameters for eq is that eq.0 gets reduced to EQZ VM operation. And if we introduce eqz assembly instruction we could remove immediate parameters from eq (and neq) operation. But I'm not sure if this is a big deal.

u32 operations

I'd love to adopt the same naming convention as we used in u64 stdlib module. For example, we could do something like this:

u32add -> u32add.checked
u32add.unsafe -> u32add.overflowing
u32add.full could be eliminated

But, in this case, it is not clear how to handle immediate values. For example, u32add.checked.5 looks odd to me. And supplying immediate values to "checked" variants may be desirable because we can check whether the parameter is a valid u32 value at compile time pretty easily.

A few other thoughts in u32 category:

We could probably get rid of safe versions of u32addc and u32madd and have just u32addc.unchecked and u32madd.unchecked.
We could replace u32addc with u32add3 which would compute a sum of top 3 elements on the stack (and thus would be a more general version of u32addc). To do this, we'd need to think through how the AIR constraints would need to be changed.
We could replace u32div.full with u32divmod - this would be consistent with the approach used in u64 module.
Depending on what we do with eq and neq, we could also update u32eq and u32neq.

Stack manipulation

One thing that bothers me somewhat here is that some operations have different costs and these costs are not intuitive. For example:

dup.7 takes 1 VM cycles, but dup.8 takes 3, and then dup.9 takes 1 VM cycle again. Same goes for movup and movdn.
Some swap operations take 2 cycles while others take 4.

Thus, writing efficient code requires deep understanding of costs of each operations.

One solution to this could be to get rid of the operations which take multiple cycles. But this reduces future flexibility and I'm not sure it's worth it. We could also do some selective simplifications. For example:

Get rid of swap operations since they are always emulated using other ops anyway.
Refactor operations to have a more consistent cost model. For example, general rule could be at accessing the first 8 elements of the stack (e.g., via movup or movdn instructions) always take 1 VM cycle, and then accessing elements 8 - 16 is more expensive (e.g., 3 - 4 VM cycles).

Input operations

Similar to the above, the cost model for these operations requires deep knowledge. E.g., it is not obvious that storew.mem is 4x faster than popw.mem. But short of getting rid of these composite operations, I'm not sure what can be done.

The text was updated successfully, but these errors were encountered:

grjte · 2022-04-21T06:14:29Z

u32 operations

For example, u32add.checked.5 looks odd to me. And supplying immediate values to "checked" variants may be desirable because we can check whether the parameter is a valid u32 value at compile time pretty easily.

I agree that this looks a bit strange, and that it would be nice to have a naming convention similar to stdlib. One option is to get rid of checked operations altogether and provide a way to check which could be used before any of these ops as desired. In the parsers, most of these binary arithmetic ops go through the same helper function that checks both arguments and then either keeps their order or leaves it reversed (1 cycle cheaper). We could just provide an operation to check 2 values and then have all core operations be unchecked. Having 2 different ways to check of course introduces the same problem discussed with the stack ops etc, so I'm not sure that's a good idea, but having one might be worth thinking about.

Stack manipulation & inputs

Unfortunately, I don't see a great solution here either. I think there are good reasons for having the composite operations.

Additional points:

removing composite operations and leaving it to users to handle these manipulations could also lead to inefficiencies, as we have aimed to implement these composite ops as efficiently as possible.
refactoring movup and movdn (for example) to be more consistent would just be making some operations slower, which I don't think is worth doing just for consistency (unless you have a different thought about how we'd do this).

I think the best long term option is to just provide good information & warnings, such as showing cycle counts beside operations in an online reference similar to evm.codes or the Miden Assembly Playground from #175

In the immediate term, I think we should make sure the costs are highlighted clearly in the assembly reference, with some visual info such as color coding by cost or highlighting expensive ops, as well as listing recommended alternatives.

bobbinth · 2022-05-16T04:04:22Z

One possible solution for u32 operations is to just use underscores (as we do in stdlib for u64 module). So, for addition, we'd have 3 instructions like so:

Operation	Comments
`u32checked_add`	Same as the current `u32add`
`u32overflowing_add`	Same as the current `u32add.unsafe`
`u32wrapping_add`	Similar to `u32add` but wouldn't check for validity of inputs or outputs

The above would be consistent with u64 stdlib module where we have something like u64::checked_add etc.

We can also make it so all 3 variants accept immediate values. For example: u32overflowing_add.123.

grjte · 2022-05-16T04:27:05Z

I think the underscores approach is good 👍

bobbinth · 2022-05-23T07:09:50Z

One of the things I'm inclined to do with stack manipulation operations is as follows:

Introduce native SWAPDW operation (with identical semantics to the current swapdw assembly instruction).
Remove native operations MOVUP9, MOVUP11, MOVUP13, and MOVUP15, and introduce MOVUP8.
Remove native operations MOVDN9, MOVDN11, MOVDN13, and MOVDN15, and introduce MOVDN8.

movup.n and movdn.n operations for n > 8 can be emulated as follows:

For something like movup.10:

swapdw
movup.2
swapdw
movup.8

For something like movdn.11:

movdn.8
swapdw
movdn.3
swapdw

The benefits of doing this are:

We reduce native instruction count by 5.
We simplify stack constraints quite a bit.

The downsides are:

Accessing items deep in the stack becomes more expensive. Specifically, using movup and movdn on items beyond 8 will require 4 cycles vs. current 1 - 3 cycles.

bobbinth · 2022-06-20T04:57:08Z

Currently, the way we access memory in the VM is via LOADW and STOREW operations. These operations work with words (4 field elements). At the assembly level, we also have pop.mem and push.mem instructions which work with individual elements, and are currently emulated using LOADW and STOREW operations (this is quite expensive - I believe these require 10+ VM cycles).

We can introduce two more VM operations which can perform single element read/writes in just one VM cycles. These operations could be: MLOAD and MSTORE.

Stack transition for MLOAD operation could look as follows:

[addr, ...] -> [value, ...]

Where value is the first element in the word located at address specified by addr.

Stack transition for MSTORE operation could look as follows:

[addr, value, ...] -> [value, ...]

The result of this operation would be that the first element of the word at address addr would be updated to value. Note that this is different from the current emulation of this functionality where the entire word is overwritten with [value, 0, 0, 0].

This does add two more operations to VM operations - but I think these might be worth it.

Overcastan · 2022-07-08T20:24:20Z

A full list of planned to be changed u32 operations with underscores is presented below.
The changes are made in a similar way to the approach used in the u64 module of the stdlib.

Addition
- u32checked_add
- u32wrapping_add
- u32overflowing_add
- u32unchecked_add3
- u32checked_add.b
- u32wrapping_add.b
- u32overflowing_add.b
Subtraction
- u32checked_sub
- u32wrapping_sub
- u32overflowing_sub
- u32checked_sub.b
- u32wrapping_sub.b
- u32overflowing_sub.b
Multiplication
- u32checked_mul
- u32wrapping_mul
- u32overflowing_mul
- u32unchecked_madd
- u32checked_mul.b
- u32wrapping_mul.b
- u32overflowing_mul.b
Division
- u32checked_div - integer division, checks if the inputs are u32 values.
- u32unchecked_div - integer division, does not check if the inputs are u32 values.
- u32checked_mod - modulo operation, checks if the inputs are u32 values.
- u32unchecked_mod - modulo operation, does not check if the inputs are u32 values.
- u32checked_divmod - outputs both the quotient and the remainder, checks if the inputs are u32 values.
- u32unchecked_divmod - outputs both the quotient and the remainder, does not check if the inputs are u32 values.
- u32checked_div.b
- u32unchecked_div.b
- u32checked_mod.b
- u32unchecked_mod.b
- u32checked_divmod.b
- u32unchecked_divmod.b
Equality comparison
- u32checked_eq
- u32checked_neq
- u32checked_eq.b
- u32checked_neq.b
Ordered comparisons
- u32checked_lt
- u32checked_lte
- u32checked_gt
- u32checked_gte
- u32checked_min
- u32checked_max
- u32unchecked_lt
- u32unchecked_lte
- u32unchecked_gt
- u32unchecked_gte
- u32unchecked_min
- u32unchecked_max
Bit manipulation
- u32checked_and
- u32checked_or
- u32checked_xor
- u32checked_not
- u32checked_shl - checks if the input is a u32 value.
- u32checked_shr - checks if the input is a u32 value.
- u32checked_rotr - checks if the input is a u32 value.
- u32checked_rotl - checks if the input is a u32 value.
- u32unchecked_shl - does not check if the input is a u32 value.
- u32unchecked_shr - does not check if the input is a u32 value.
- u32unchecked_rotr - does not check if the input is a u32 value.
- u32unchecked_rotl - does not check if the input is a u32 value.
- u32overflowing_shl - keeps the "shifted out" bits in another u32, does not check if the input is a u32 value.
- u32overflowing_shr- keeps the "shifted out" bits in another u32, does not check if the input is a u32 value.
- u32checked_shl.b
- u32checked_shr.b
- u32checked_rotr.b
- u32checked_rotl.b
- u32unchecked_shl.b
- u32unchecked_shr.b
- u32unchecked_rotr.b
- u32unchecked_rotl.b
- u32overflowing_shl.b
- u32overflowing_shr.b

bobbinth · 2022-07-08T20:35:51Z

Thank you! A couple of thoughts:

I don't think we need u32checked_eqz as it is the same as u32checked_eq.0.
We probably need the unchecked version for lt, lte, gt, gte.

bobbinth · 2022-07-08T21:30:26Z

Oh - one other thing: the list is missing the following operations:

u32unchecked_add3.
u32unchecked_madd.

Overcastan · 2022-07-09T07:23:31Z

But aren't the regular (field) operators lt, lte, gt and gte work as unchecked versions of u32checked_lt, u32checked_lte, u32checked_gt and u32checked_gte?

bobbinth · 2022-07-09T07:30:23Z

But aren't the regular (field) operators lt, lte, gt and gte work as unchecked versions of u32checked_lt, u32checked_lte, u32checked_gt and u32checked_gte?

Regular field operations for comparisons are actually much less efficient than u32 versions (e.g. 17 - 18 cycles vs. 5 - 7 cycles). So, it would be rather wasteful to use regular field operations for unchecked versions.

bobbinth · 2022-07-26T02:54:40Z

Closed by #311, #326, #327 and others and remaining elements moved into #329

bobbinth added the assembly Related to Miden assembly label Apr 12, 2022

bobbinth closed this as completed Jul 26, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Assembly instruction rationalization thoughts #171

Assembly instruction rationalization thoughts #171

bobbinth commented Apr 12, 2022 •

edited

grjte commented Apr 21, 2022 •

edited

bobbinth commented May 16, 2022

grjte commented May 16, 2022

bobbinth commented May 23, 2022 •

edited

bobbinth commented Jun 20, 2022

Overcastan commented Jul 8, 2022 •

edited

bobbinth commented Jul 8, 2022

bobbinth commented Jul 8, 2022

Overcastan commented Jul 9, 2022

bobbinth commented Jul 9, 2022

bobbinth commented Jul 26, 2022

Assembly instruction rationalization thoughts #171

Assembly instruction rationalization thoughts #171

Comments

bobbinth commented Apr 12, 2022 • edited

Field operations

u32 operations

Stack manipulation

Input operations

grjte commented Apr 21, 2022 • edited

u32 operations

Stack manipulation & inputs

bobbinth commented May 16, 2022

grjte commented May 16, 2022

bobbinth commented May 23, 2022 • edited

bobbinth commented Jun 20, 2022

Overcastan commented Jul 8, 2022 • edited

bobbinth commented Jul 8, 2022

bobbinth commented Jul 8, 2022

Overcastan commented Jul 9, 2022

bobbinth commented Jul 9, 2022

bobbinth commented Jul 26, 2022

bobbinth commented Apr 12, 2022 •

edited

grjte commented Apr 21, 2022 •

edited

bobbinth commented May 23, 2022 •

edited

Overcastan commented Jul 8, 2022 •

edited