Make sure x0 is not modified in mops #403

mohanson · 2024-02-23T00:59:30Z

No description provided.

xxuejie · 2024-02-23T01:15:11Z

src/machine/asm/execute_aarch64.S

@@ -1875,6 +1875,7 @@ ckb_vm_x64_execute:
  adc TEMP1, TEMP1, TEMP1
  WRITE_RD(TEMP3)
  WRITE_RS3(TEMP1)
+  str ZERO_VALUE, ZERO_ADDRESS


I'm not sure about this solution, here we are basically introducing 2 patterns:

WRITE_RD clears x0 itself

For opcode that involves WRITE_RS3(and possibly WRITE_RS1, WRITE_RS2), a manual line to clear x0 is used here

That essentially mean we will have 2 ways to do the same thing, which is hard to enforce. For example: the impl for SBB, which is immediately above this code here, also has a WRITE_RS3(TEMP1), but we are not clearing the x0 value. Yes for now TEMP1 is not x0 so we don't need to do it here. But going into the future, this knowledge can be easily ignored, leading to more bugs.

Personally, I see 2 ways of fixing the code here:

WRITE_RS1, WRITE_RS2, WRITE_RS3, and WRITE_RD should all clear x0 at the end

We can move the line to clear x0 to NEXT_INST macro, which will ensure all writes to x0 gets cleared when an opcode finishes processing

Or we can distinguish between NEXT_INST, which does not clear x0, and could be used for insts like SD, and NEXT_INST_COMPLETING_REG_WRITES, which clears x0, and could be used for insts that writes to at least one register

I'll talk about the benefits of doing this. Because I want to limit the scope of code changes to the 5 MOP instructions introduced in VERSION2, so we can ensure that this won't introduce new bugs in VERSION0 and VERSION1.

You mentioned 3 possible solutions, the third one has too broad of a scope, I consider it more of a refactoring than a bug fix. Among the first two, I prefer the first one, as it aligns with the handling in the Rust version of CKB-VM https://github.com/nervosnetwork/ckb-vm/blob/develop/src/instructions/utils.rs#L77-L89.

I will rewrite the code according to the first solution. What do you think?

I'll talk about the benefits of doing this. Because I want to limit the scope of code changes to the 5 MOP instructions introduced in VERSION2, so we can ensure that this won't introduce new bugs in VERSION0 and VERSION1.

There is actually a second reason why I preferred a more complete solution: it's not just the 5 new MOP instructions, correct me if I'm wrong, I think WRITE_RS3 is already used for MOP instructions in VERSION1 at least. That brings a different question: does a similar bug exists in VERSION1 as well? The Rust interpreter, due to its use of utils::update_register utility, is free from this bug, should be correct. But can be be sure VERSION1's code is free from this bug? This is something I would suggest we go back and give it a check.

I do remember back in the days, when we first discovered bugs in CKB-VM, we introduced a version-based design, and changed the code to replicate the exact same bug in the Rust interpreter to behave the same as the assembly interpreter. Here I think we should do the same: if VERSION1 has the same bug, we should replicate the exact same behavior in the Rust interpreter of VERSION1, and really fix the bug in VERSION2.

So if you really ask my opinion, I would do a more comprehensive check and go with solution 3. It's a more future proof reason and also can deliver slightly better performance. However if you want to go with solution 1 I'm okay but still recommend checking if existing MOP instructions are affected by similar bug

Solution 3 is best applied to the develop branch first; do it on the release branch to fix this bug is too risky. That's my main concern. So, I'll use solution 1 as a short-term solution to this problem. In the long run, I agree with you; we need to carefully reassess this logic.

It's really the same thing to me: you could do solution 3, but only use NEXT_INST_COMPLETING_REG_WRITES on newer mops introduced in version 2. We should keep version 1's current behavior anyway.

By using solution 1, we are essentially changing the semantics of WRITE_RS3, etc. I can raise the same question: code in version 1 also uses WRITE_RS3, by changing the underlying code behind WRITE_RS3, are you essentially changing the behavior of version 1's implementation?

Still, the same additional question arises: is version 1 truly unaffected by the same bug?

The current assembly implementations for x86 and aarch64 are already semantically inconsistent:

https://github.com/nervosnetwork/ckb-vm/blob/develop/src/machine/asm/execute_x64.S#L341-L348

https://github.com/nervosnetwork/ckb-vm/blob/develop/src/machine/asm/execute_aarch64.S#L78-L88

...

And that means we already have a bug, to me a proper way of doing this is:

Recognizing the current x64 assembly implementation as the canonical one, even though there might be bugs in it, consensus defines that this implementation is the de-facto one

Changing the Rust & aarch64 implementation to replicate the exact same behavior(tho buggy it is) as the x64 one in version 1

In version 2, we fix the bug in all 3 implementations

* Make sure x0 is not modified in mops * Ensure aarch64 behavior is consistent with x86 * Refactor * Quick fix previous commit

Make sure x0 is not modified in mops

5970384

mohanson requested a review from xxuejie February 23, 2024 01:06

xxuejie reviewed Feb 23, 2024

View reviewed changes

mohanson added 3 commits February 23, 2024 10:37

Ensure aarch64 behavior is consistent with x86

edd4461

Refactor

fbbe26f

Quick fix previous commit

6b41a7e

xxuejie approved these changes Feb 23, 2024

View reviewed changes

mohanson merged commit b7da390 into nervosnetwork:develop Feb 23, 2024
11 checks passed

mohanson deleted the fix_mop branch February 23, 2024 05:36

mohanson added a commit to libraries/ckb-vm that referenced this pull request Feb 23, 2024

Make sure x0 is not modified in mops (nervosnetwork#403)

0a0b045

* Make sure x0 is not modified in mops * Ensure aarch64 behavior is consistent with x86 * Refactor * Quick fix previous commit

mohanson added a commit to libraries/ckb-vm that referenced this pull request Feb 23, 2024

Make sure x0 is not modified in mops (nervosnetwork#403)

d2c9b85

* Make sure x0 is not modified in mops * Ensure aarch64 behavior is consistent with x86 * Refactor * Quick fix previous commit

mohanson added a commit that referenced this pull request Feb 23, 2024

Backport #403

2f5dbc3

* Make sure x0 is not modified in mops * Ensure aarch64 behavior is consistent with x86 * Refactor * Quick fix previous commit

mohanson added a commit that referenced this pull request Feb 26, 2024

Backport #403

2046815

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make sure x0 is not modified in mops #403

Make sure x0 is not modified in mops #403

mohanson commented Feb 23, 2024

xxuejie Feb 23, 2024

xxuejie Feb 23, 2024

mohanson Feb 23, 2024

mohanson Feb 23, 2024

xxuejie Feb 23, 2024

mohanson Feb 23, 2024 •

edited

xxuejie Feb 23, 2024

mohanson Feb 23, 2024

xxuejie Feb 23, 2024

Make sure x0 is not modified in mops #403

Make sure x0 is not modified in mops #403

Conversation

mohanson commented Feb 23, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mohanson Feb 23, 2024 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mohanson Feb 23, 2024 •

edited