Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Assembly instruction rationalization thoughts #171

Closed
bobbinth opened this issue Apr 12, 2022 · 11 comments
Closed

Assembly instruction rationalization thoughts #171

bobbinth opened this issue Apr 12, 2022 · 11 comments
Labels
assembly Related to Miden assembly

Comments

@bobbinth
Copy link
Contributor

bobbinth commented Apr 12, 2022

Having had the benefit of writing some simple (and not so simple) programs in Miden assembly, I think there are some opportunities to simplify (and in some cases augment) current assembly instruction set. My thoughts are below - though, these are rather preliminary and many of them might not be a good idea.

Field operations

All looks good here. There is one slightly inconsistent thing in comparison operations: eq and neq operations allow for immediate parameters, while other comparison operations do not. The primary reason for allowing immediate parameters for eq is that eq.0 gets reduced to EQZ VM operation. And if we introduce eqz assembly instruction we could remove immediate parameters from eq (and neq) operation. But I'm not sure if this is a big deal.

u32 operations

I'd love to adopt the same naming convention as we used in u64 stdlib module. For example, we could do something like this:

  • u32add -> u32add.checked
  • u32add.unsafe -> u32add.overflowing
  • u32add.full could be eliminated

But, in this case, it is not clear how to handle immediate values. For example, u32add.checked.5 looks odd to me. And supplying immediate values to "checked" variants may be desirable because we can check whether the parameter is a valid u32 value at compile time pretty easily.

A few other thoughts in u32 category:

  • We could probably get rid of safe versions of u32addc and u32madd and have just u32addc.unchecked and u32madd.unchecked.
  • We could replace u32addc with u32add3 which would compute a sum of top 3 elements on the stack (and thus would be a more general version of u32addc). To do this, we'd need to think through how the AIR constraints would need to be changed.
  • We could replace u32div.full with u32divmod - this would be consistent with the approach used in u64 module.
  • Depending on what we do with eq and neq, we could also update u32eq and u32neq.

Stack manipulation

One thing that bothers me somewhat here is that some operations have different costs and these costs are not intuitive. For example:

  • dup.7 takes 1 VM cycles, but dup.8 takes 3, and then dup.9 takes 1 VM cycle again. Same goes for movup and movdn.
  • Some swap operations take 2 cycles while others take 4.

Thus, writing efficient code requires deep understanding of costs of each operations.

One solution to this could be to get rid of the operations which take multiple cycles. But this reduces future flexibility and I'm not sure it's worth it. We could also do some selective simplifications. For example:

  • Get rid of swap operations since they are always emulated using other ops anyway.
  • Refactor operations to have a more consistent cost model. For example, general rule could be at accessing the first 8 elements of the stack (e.g., via movup or movdn instructions) always take 1 VM cycle, and then accessing elements 8 - 16 is more expensive (e.g., 3 - 4 VM cycles).

Input operations

Similar to the above, the cost model for these operations requires deep knowledge. E.g., it is not obvious that storew.mem is 4x faster than popw.mem. But short of getting rid of these composite operations, I'm not sure what can be done.

@bobbinth bobbinth added the assembly Related to Miden assembly label Apr 12, 2022
@grjte
Copy link
Collaborator

grjte commented Apr 21, 2022

u32 operations

For example, u32add.checked.5 looks odd to me. And supplying immediate values to "checked" variants may be desirable because we can check whether the parameter is a valid u32 value at compile time pretty easily.

I agree that this looks a bit strange, and that it would be nice to have a naming convention similar to stdlib. One option is to get rid of checked operations altogether and provide a way to check which could be used before any of these ops as desired. In the parsers, most of these binary arithmetic ops go through the same helper function that checks both arguments and then either keeps their order or leaves it reversed (1 cycle cheaper). We could just provide an operation to check 2 values and then have all core operations be unchecked. Having 2 different ways to check of course introduces the same problem discussed with the stack ops etc, so I'm not sure that's a good idea, but having one might be worth thinking about.

Stack manipulation & inputs

Unfortunately, I don't see a great solution here either. I think there are good reasons for having the composite operations.

Additional points:

  • removing composite operations and leaving it to users to handle these manipulations could also lead to inefficiencies, as we have aimed to implement these composite ops as efficiently as possible.
  • refactoring movup and movdn (for example) to be more consistent would just be making some operations slower, which I don't think is worth doing just for consistency (unless you have a different thought about how we'd do this).

I think the best long term option is to just provide good information & warnings, such as showing cycle counts beside operations in an online reference similar to evm.codes or the Miden Assembly Playground from #175

In the immediate term, I think we should make sure the costs are highlighted clearly in the assembly reference, with some visual info such as color coding by cost or highlighting expensive ops, as well as listing recommended alternatives.

@bobbinth
Copy link
Contributor Author

One possible solution for u32 operations is to just use underscores (as we do in stdlib for u64 module). So, for addition, we'd have 3 instructions like so:

Operation Comments
u32checked_add Same as the current u32add
u32overflowing_add Same as the current u32add.unsafe
u32wrapping_add Similar to u32add but wouldn't check for validity of inputs or outputs

The above would be consistent with u64 stdlib module where we have something like u64::checked_add etc.

We can also make it so all 3 variants accept immediate values. For example: u32overflowing_add.123.

@grjte
Copy link
Collaborator

grjte commented May 16, 2022

I think the underscores approach is good 👍

@bobbinth
Copy link
Contributor Author

bobbinth commented May 23, 2022

One of the things I'm inclined to do with stack manipulation operations is as follows:

  • Introduce native SWAPDW operation (with identical semantics to the current swapdw assembly instruction).
  • Remove native operations MOVUP9, MOVUP11, MOVUP13, and MOVUP15, and introduce MOVUP8.
  • Remove native operations MOVDN9, MOVDN11, MOVDN13, and MOVDN15, and introduce MOVDN8.

movup.n and movdn.n operations for n > 8 can be emulated as follows:

For something like movup.10:

swapdw
movup.2
swapdw
movup.8

For something like movdn.11:

movdn.8
swapdw
movdn.3
swapdw

The benefits of doing this are:

  • We reduce native instruction count by 5.
  • We simplify stack constraints quite a bit.

The downsides are:

  • Accessing items deep in the stack becomes more expensive. Specifically, using movup and movdn on items beyond 8 will require 4 cycles vs. current 1 - 3 cycles.

@bobbinth
Copy link
Contributor Author

Currently, the way we access memory in the VM is via LOADW and STOREW operations. These operations work with words (4 field elements). At the assembly level, we also have pop.mem and push.mem instructions which work with individual elements, and are currently emulated using LOADW and STOREW operations (this is quite expensive - I believe these require 10+ VM cycles).

We can introduce two more VM operations which can perform single element read/writes in just one VM cycles. These operations could be: MLOAD and MSTORE.

Stack transition for MLOAD operation could look as follows:

[addr, ...] -> [value, ...]

Where value is the first element in the word located at address specified by addr.

Stack transition for MSTORE operation could look as follows:

[addr, value, ...] -> [value, ...]

The result of this operation would be that the first element of the word at address addr would be updated to value. Note that this is different from the current emulation of this functionality where the entire word is overwritten with [value, 0, 0, 0].

This does add two more operations to VM operations - but I think these might be worth it.

@Overcastan
Copy link
Contributor

Overcastan commented Jul 8, 2022

A full list of planned to be changed u32 operations with underscores is presented below.
The changes are made in a similar way to the approach used in the u64 module of the stdlib.

  • Addition

    • u32checked_add
    • u32wrapping_add
    • u32overflowing_add
    • u32unchecked_add3
    • u32checked_add.b
    • u32wrapping_add.b
    • u32overflowing_add.b
  • Subtraction

    • u32checked_sub
    • u32wrapping_sub
    • u32overflowing_sub
    • u32checked_sub.b
    • u32wrapping_sub.b
    • u32overflowing_sub.b
  • Multiplication

    • u32checked_mul
    • u32wrapping_mul
    • u32overflowing_mul
    • u32unchecked_madd
    • u32checked_mul.b
    • u32wrapping_mul.b
    • u32overflowing_mul.b
  • Division

    • u32checked_div - integer division, checks if the inputs are u32 values.
    • u32unchecked_div - integer division, does not check if the inputs are u32 values.
    • u32checked_mod - modulo operation, checks if the inputs are u32 values.
    • u32unchecked_mod - modulo operation, does not check if the inputs are u32 values.
    • u32checked_divmod - outputs both the quotient and the remainder, checks if the inputs are u32 values.
    • u32unchecked_divmod - outputs both the quotient and the remainder, does not check if the inputs are u32 values.
    • u32checked_div.b
    • u32unchecked_div.b
    • u32checked_mod.b
    • u32unchecked_mod.b
    • u32checked_divmod.b
    • u32unchecked_divmod.b
  • Equality comparison

    • u32checked_eq
    • u32checked_neq
    • u32checked_eq.b
    • u32checked_neq.b
  • Ordered comparisons

    • u32checked_lt
    • u32checked_lte
    • u32checked_gt
    • u32checked_gte
    • u32checked_min
    • u32checked_max
    • u32unchecked_lt
    • u32unchecked_lte
    • u32unchecked_gt
    • u32unchecked_gte
    • u32unchecked_min
    • u32unchecked_max
  • Bit manipulation

    • u32checked_and
    • u32checked_or
    • u32checked_xor
    • u32checked_not
    • u32checked_shl - checks if the input is a u32 value.
    • u32checked_shr - checks if the input is a u32 value.
    • u32checked_rotr - checks if the input is a u32 value.
    • u32checked_rotl - checks if the input is a u32 value.
    • u32unchecked_shl - does not check if the input is a u32 value.
    • u32unchecked_shr - does not check if the input is a u32 value.
    • u32unchecked_rotr - does not check if the input is a u32 value.
    • u32unchecked_rotl - does not check if the input is a u32 value.
    • u32overflowing_shl - keeps the "shifted out" bits in another u32, does not check if the input is a u32 value.
    • u32overflowing_shr- keeps the "shifted out" bits in another u32, does not check if the input is a u32 value.
    • u32checked_shl.b
    • u32checked_shr.b
    • u32checked_rotr.b
    • u32checked_rotl.b
    • u32unchecked_shl.b
    • u32unchecked_shr.b
    • u32unchecked_rotr.b
    • u32unchecked_rotl.b
    • u32overflowing_shl.b
    • u32overflowing_shr.b

@bobbinth
Copy link
Contributor Author

bobbinth commented Jul 8, 2022

Thank you! A couple of thoughts:

  • I don't think we need u32checked_eqz as it is the same as u32checked_eq.0.
  • We probably need the unchecked version for lt, lte, gt, gte.

@bobbinth
Copy link
Contributor Author

bobbinth commented Jul 8, 2022

Oh - one other thing: the list is missing the following operations:

  • u32unchecked_add3.
  • u32unchecked_madd.

@Overcastan
Copy link
Contributor

But aren't the regular (field) operators lt, lte, gt and gte work as unchecked versions of u32checked_lt, u32checked_lte, u32checked_gt and u32checked_gte?

@bobbinth
Copy link
Contributor Author

bobbinth commented Jul 9, 2022

But aren't the regular (field) operators lt, lte, gt and gte work as unchecked versions of u32checked_lt, u32checked_lte, u32checked_gt and u32checked_gte?

Regular field operations for comparisons are actually much less efficient than u32 versions (e.g. 17 - 18 cycles vs. 5 - 7 cycles). So, it would be rather wasteful to use regular field operations for unchecked versions.

@bobbinth
Copy link
Contributor Author

Closed by #311, #326, #327 and others and remaining elements moved into #329

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
assembly Related to Miden assembly
Projects
None yet
Development

No branches or pull requests

3 participants