Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

obront - Fault game factory can be manipulated to DOS game type using malicious l2BlockNumber #90

Open
sherlock-admin3 opened this issue Apr 4, 2024 · 32 comments
Labels
Has Duplicates A valid issue with 1+ other issues describing the same vulnerability Medium A valid Medium severity issue Reward A payout will be made for this issue Sponsor Confirmed The sponsor acknowledged this issue is valid Will Fix The sponsor confirmed this issue will be fixed

Comments

@sherlock-admin3
Copy link

sherlock-admin3 commented Apr 4, 2024

obront

medium

Fault game factory can be manipulated to DOS game type using malicious l2BlockNumber

Summary

All new games are proven against the most recent L2 block number in the ANCHOR_STATE_REGISTRY. This includes requiring that the block number we are intending to prove is greater than the latest proven block number in the registry. Due to insufficient validations of the passed L2 block number, it is possible for a user to set the latest block to type(uint256).max, blocking all possible future games from being initialized.

Vulnerability Detail

New games are created for a given root claim and L2 block number using the factory, by cloning the implementation of the specified game type and passing these values as immutable args (where _extraData is the L2 block number).

proxy_ = IDisputeGame(address(impl).clone(abi.encodePacked(_rootClaim, parentHash, _extraData)));
proxy_.initialize{ value: msg.value }();

As a part of the initialize function, we pull the latest confirmed root and rootBlockNumber from the ANCHOR_STATE_REGISTRY. These will be used as the "starting points" for our proof. In order to confirm they are valid starting points, we require that the L2 block number we passed is greater than the last proven root block number.

(Hash root, uint256 rootBlockNumber) = ANCHOR_STATE_REGISTRY.anchors(GAME_TYPE);

// Should only happen if this is a new game type that hasn't been set up yet.
if (root.raw() == bytes32(0)) revert AnchorRootNotFound();

// Set the starting output root.
startingOutputRoot = OutputRoot({ l2BlockNumber: rootBlockNumber, root: root });

// Do not allow the game to be initialized if the root claim corresponds to a block at or before the
// configured starting block number.
if (l2BlockNumber() <= rootBlockNumber) revert UnexpectedRootClaim(rootClaim());

However, the L2 block number we pass does not appear to be sufficiently validated. If we look at the Fault Dispute Game, we can see that disputed L2 block number passed to the oracle is calculated using the _execLeafIdx and does not make any reference to the L2 block number passed via extraData:

uint256 l2Number = startingOutputRoot.l2BlockNumber + disputedPos.traceIndex(SPLIT_DEPTH) + 1;

oracle.loadLocalData(_ident, uuid.raw(), bytes32(l2Number << 0xC0), 8, _partOffset);

This allows us to pass an L2 block number that is disconnected from the proof being provided.

After the claim is resolved, we update the ANCHOR_STATE_REGISTRY to include our new root by calling tryUpdateAnchorState().

function tryUpdateAnchorState() external {
    // Grab the game and game data.
    IFaultDisputeGame game = IFaultDisputeGame(msg.sender);
    (GameType gameType, Claim rootClaim, bytes memory extraData) = game.gameData();

    // Grab the verified address of the game based on the game data.
    // slither-disable-next-line unused-return
    (IDisputeGame factoryRegisteredGame,) =
        DISPUTE_GAME_FACTORY.games({ _gameType: gameType, _rootClaim: rootClaim, _extraData: extraData });

    // Must be a valid game.
    require(
        address(factoryRegisteredGame) == address(game),
        "AnchorStateRegistry: fault dispute game not registered with factory"
    );

    // No need to update anything if the anchor state is already newer.
    if (game.l2BlockNumber() <= anchors[gameType].l2BlockNumber) {
        return;
    }

    // Must be a game that resolved in favor of the state.
    if (game.status() != GameStatus.DEFENDER_WINS) {
        return;
    }

    // Actually update the anchor state.
    anchors[gameType] = OutputRoot({ l2BlockNumber: game.l2BlockNumber(), root: Hash.wrap(game.rootClaim().raw()) });
}

As long as the L2 block number we passed is greater than the last proven one, we update it with our new root. This allows us to set the ANCHOR_STATE_REGISTRY to contain an arbitrarily high blockRootNumber.

If we were to pass type(uint256).max as this value, it would be set in the anchors mapping, and would cause all other games to fail to initialize, because there is no value they could pass for the L2 block number that would be greater, and would therefore fail the check described above.

Proof of Concept

The following test can be dropped into DisputeGameFactory.t.sol to demonstrate the vulnerability:

function testZach_DOSWithMaxBlockNumber(uint256 newBlockNum) public {
    // propose a root with a block number of max uint256
    bytes memory maxBlock = abi.encode(type(uint256).max);
    IDisputeGame game = disputeGameFactory.create(GameType.wrap(0), Claim.wrap(bytes32(uint(1))), maxBlock);
    assertEq(game.l2BlockNumber(), type(uint256).max);

    // when the game passes, it's saved in the anchor registry
    vm.warp(block.timestamp + 4 days);
    game.resolveClaim(0);
    game.resolve();

    // now we can fuzz newly proposed block numbers, and all fail for the same reason
    bytes memory maxBlock2 = abi.encode(newBlockNum);
    vm.expectRevert(abi.encodeWithSelector(UnexpectedRootClaim.selector, 1));
    disputeGameFactory.create(GameType.wrap(0), Claim.wrap(bytes32(uint(1))), maxBlock2);
}

Impact

For no cost, the factory can be DOS'd from creating new games of a given type.

Code Snippet

https://github.com/sherlock-audit/2024-02-optimism-2024/blob/main/optimism/packages/contracts-bedrock/src/dispute/FaultDisputeGame.sol#L528-L539

https://github.com/sherlock-audit/2024-02-optimism-2024/blob/main/optimism/packages/contracts-bedrock/src/dispute/AnchorStateRegistry.sol#L59-L87

Tool used

Manual Review

Recommendation

In order to ensure that ordering does not need to be preserved, ANCHOR_STATE_REGISTRY should store a mapping of claims to booleans. This would allow users to prove against any proven state, instead of being restricted to proving against the latest state, which could be manipulated.

@smartcontracts
Copy link

Factually valid although the impact here isn't different than having any game resolve incorrectly which would poison the AnchorStateRegistry and require the game type to be changed.

@nevillehuang
Copy link
Collaborator

nevillehuang commented Apr 23, 2024

Based on scoping details below, I believe this issue is valid and in-scope of the contest, as the root cause stems from the lack of a sanity check within the dispute game factory allowing large l2BlockNumber to be appended

https://docs.google.com/document/d/1xjvPwAzD2Zxtx8-P6UE69TuoBwtZPbpwf5zBHAvBJBw/edit

The potential to block the entire fault proofs system entirely by preventing further creation of new games is significant, so I believe it warrants high severity given the potential to block withdrawals from an OP bridge. Although the admin can temporarily resolve this by switching game type, I believe it is not a valid solution given the attack can be easily repeated.

@sherlock-admin3 sherlock-admin3 changed the title Noisy Pewter Woodpecker - Fault game factory can be manipulated to DOS game type using malicious l2BlockNumber obront - Fault game factory can be manipulated to DOS game type using malicious l2BlockNumber Apr 23, 2024
@sherlock-admin3 sherlock-admin3 added the Reward A payout will be made for this issue label Apr 23, 2024
@0xjuaan
Copy link

0xjuaan commented Apr 24, 2024

just a thought @nevillehuang:

I believe it is not a valid solution given the attack can be easily repeated

Is this actually true? If they change the game type, the new FaultDisputeGame implementation will be fixed and won't have this vulnerability so the attack can't be repeated. Because of that, the sponsor's comment seems to make the most sense and calling this a high severity issue is quite sus.

@Evert0x
Copy link

Evert0x commented Apr 25, 2024

Forwarding a comment from the protocol team

--

This issue isn't valid because the decoupling of the L2 block number that's determined during output bisection and the one on the root claim is intentional.
They claim that you can propose an output belonging to block n (so, right hash), but for the wrong block number in the future (i.e. n + 1).
The challenger would be able to challenge this, as they would see that the output at the claimed block is wrong (or that the block just doesn't exist)
The program, once ran, can either show:

  • The output at the given block number isn't correct (i.e. the proposed block number is part of the safe chain captured by the data available on L1 at the L1 head hash persisted when the game starts)
  • The given block number cannot be derived with the data available on L1 (i.e. the block number is super far in the future, and doesn't even exist)

Essentially a proposal of this form would be invalidated by the current fault proof system so the bug itself wouldn't be possible

@Haxatron
Copy link

This bug operates under assumption that the FP system can cause a invalid game to be resolved as valid, and there were multiple ways to do this in the contest (see #8). If this can occur then no more dispute games can be created for the same game type which will lead to DoS. Only possible solution is to update game type as pointed out by comments above.

@JeffCX
Copy link

JeffCX commented Apr 26, 2024

In this case, one single invalid game resolution with very large block number DOS the whole game type,

update game type does not seems to be a long term solution, there are not many game type to update.

The fix is still add proper validation for block number or the fix in this report can be used as well

In order to ensure that ordering does not need to be preserved, ANCHOR_STATE_REGISTRY should store a mapping of claims to booleans. This would allow users to prove against any proven state, instead of being restricted to proving against the latest state, which could be manipulated.

@lemonmon1984
Copy link

For no cost, the factory can be DOS'd from creating new games of a given type.

I want to note that the attacker is risking the bonds. They will likely to lose it if any honest party challenges them.

@JeffCX
Copy link

JeffCX commented Apr 30, 2024

Whether the attacker get challenged or not is not in-scope,

the audit and report is under the assumption that the game can be resolved incorrectly

FaultDisputeGame resolution logic is not included in the scope of this contest. Participants should assume that the FaultDisputeGame can resolve incorrectly (i.e.g, can resolve to DEFENDER_WINS when it should resolve to CHALLENGER_WINS or vice versa).

and in case the game resolved incorrectly, massive DOS for game type occurs as outlined in the report.

@zobront
Copy link
Collaborator

zobront commented May 2, 2024

To share my perspective here:

TLDR: This is a difficult case, because the issue should be in scope, but the outcome that it causes is no worse than the manual fixes that would happen when the safeguards work properly.

Severity: As much as I'd like to, I can't see a justification for High. The outcome does not seem bad enough.

Scope: This does seem to meet the definitions laid out in the scope document. The issue is in the in scope contracts, and the outcome (DOS of game type) should be sufficient for a Medium. However, it is a weird dynamic because when safeguards are used, it also causes a DOS of game type, so it seems strange that the same outcome could be a valid issue.

Conclusion: My assessment is that this should remain as a valid Medium, because the contest rules didn't rule out all game type DOS, only those caused by game contract logic. That being said, I recognize this is difficult to judge and respect whatever decision the judge makes.

@JeffCX
Copy link

JeffCX commented May 2, 2024

as the original well-written report highlights

As long as the L2 block number we passed is greater than the last proven one, we update it with our new root. This allows us to set the ANCHOR_STATE_REGISTRY to contain an arbitrarily high blockRootNumber.

If we were to pass type(uint256).max as this value, it would be set in the anchors mapping, and would cause all other games to fail to initialize, because there is no value they could pass for the L2 block number that would be greater, and would therefore fail the check described above.

the impact of DOS game creation and game type means no user can finalize their withdraw transaction / execution l2 -> l1 message, which is a leads to clearly loss of fund and lock of fund as multiple duplicates highlight such as #206

@nevillehuang
Copy link
Collaborator

This issue seems invalid per sponsor comments here.

Any reasons addressing sponsor comments why this should be a valid issue?

@zobront @JeffCX @Haxatron

@Haxatron
Copy link

Haxatron commented May 3, 2024

Hi,

Already given my reasoning above.

I will defer to @zobront and @JeffCX for any additional comments

Acknowledge this one is quite tricky to judge.

@zobront
Copy link
Collaborator

zobront commented May 3, 2024 via email

@JeffCX
Copy link

JeffCX commented May 3, 2024

Forwarding a comment from the protocol team

--

This issue isn't valid because the decoupling of the L2 block number that's determined during output bisection and the one on the root claim is intentional. They claim that you can propose an output belonging to block n (so, right hash), but for the wrong block number in the future (i.e. n + 1). The challenger would be able to challenge this, as they would see that the output at the claimed block is wrong (or that the block just doesn't exist) The program, once ran, can either show:

  • The output at the given block number isn't correct (i.e. the proposed block number is part of the safe chain captured by the data available on L1 at the L1 head hash persisted when the game starts)
  • The given block number cannot be derived with the data available on L1 (i.e. the block number is super far in the future, and doesn't even exist)

Essentially a proposal of this form would be invalidated by the current fault proof system so the bug itself wouldn't be possible

Emm seems like this is saying that the game cannot be resolved incorrectly....

but during judging, we mark the game resolution logic out of scope and use the argument to invalid many issue

It is contradictory to use the argument "incorrect game resolution out of scope" to invalid many other issue.

while use the argument "game cannot be resolved incorrectly" to invalid this issue.

the comments strongly contradicts the readme as well:

from read me

Participants should assume that the FaultDisputeGame can resolve incorrectly (i.e.g, can resolve to DEFENDER_WINS when it should resolve to CHALLENGER_WINS or vice versa).

the report is perfect derivation from the statement above without worrying about the game dispute logic...

Participants should assume that the FaultDisputeGame can resolve incorrectly (i.e.g, can resolve to DEFENDER_WINS when it should resolve to CHALLENGER_WINS or vice versa).

then I think the original judging decision still stands.

@guhu95
Copy link

guhu95 commented May 5, 2024

I actually don't understand why the sponsor's claim is correct. If it does make sense to anyone else, can someone please explain?

What I understand they're saying is that the game can't resolve correctly in this way. But I don't understand how that is possible if the extraData's l2Blocknumber is never actually used by the proof system? The only user of that value is AnchorStateRegistry and it doesn't validate it.

There is no check I can see that checks that extraData's l2Blocknumber is actually in the rootClaim in any way.

From the point of view of the proof system, the block number it is using is unrelated to the one later being passed to AnchorStateRegistry.

If so, isn't it the case that anyone can always frontrun the legitimate proposals and use the legitimate data, but pass in type(uint).max as l2BlockNumber? Isn't that a perpetual DoS of the system?

What am I missing here?

@Haxatron
Copy link

Haxatron commented May 5, 2024 via email

@guhu95
Copy link

guhu95 commented May 5, 2024

@Haxatron here?. It doesn't seem to be using the l2number value from extraData here (or anywhere else in the game itself).

It does use the l1head() from extraData above but not the L2 number.

@Haxatron
Copy link

Haxatron commented May 5, 2024

Apologies, you are correct, the L2 block number referenced on that line is the anchor root block number rather than the l2 block number passed via the extraData. Perhaps, this requires more clarification from protocol team...

@guhu95
Copy link

guhu95 commented May 7, 2024

@Evert0x @nevillehuang it looks like the sponsor's argument for this being invalid is not well understood. The finding is also marked "won't fix", so if it is valid, it is not mitigated. It would appear that a mitigation would be needed at both game level and at the registry level, as fixing one without the other would leave the other vulnerable.

Can @smartcontracts maybe have another look at this, and possibly address the questions in #90 (comment)?

@Evert0x
Copy link

Evert0x commented May 7, 2024

After a discussion with the protocol it's clear that this issue should be a valid Medium

@Evert0x Evert0x added Medium A valid Medium severity issue and removed High A valid High severity issue labels May 7, 2024
@trust1995
Copy link
Collaborator

So permissionless shutdown of withdrawals/messaging until redeploy (Freeze of Funds of 2 weeks) is considered a Medium on Sherlock? Was the magnitude of effect this would have on the Optimism ecosystem considered?

A 1-hour shutdown of Blast was the most talked about incident for months, how would a 2-week FoF be interpreted?

@guhu95
Copy link

guhu95 commented May 8, 2024

In addition to the prolonged DoS, the DoS appears to be repeatable, and require changes to the registry and portal, and not just updating the game.

In order for the DoS to NOT be repeatable, the fix must be possible at the game level, such that updating the game type is sufficient. But it doesn't appear to be the case:

Please see these (new) fix PRs in OP dealing with this issue: https://github.com/ethereum-optimism/optimism/pull/10431/files, https://github.com/ethereum-optimism/optimism/pull/10434/files.

They add extensive changes to both the ASR and the Portal to deal with the l2BlockNumber issue. The changes to the game are minimal, and it appears that this issue is NOT fixable by just updating the game implementation.

[I suspect this is because in the game, the l2blocknumber is PART of disputed L2 output state, so is not a mutually agreed on external input, unlike the L1 number, which comes directly from the L1 block.number itself. The dispute is about the rootClaim output state, not about the input l2blocknumber and something about the proving setup prevents it from being used this way (or from the game being able to distinguish which one of these inputs is incorrect). But this is just my limited and possibly wrong understanding]

Summary: because the issue is not fixable by updating ONLY the game, and an upgrade of the ASR and Portal are needed, the safety measures are inadequate and the DoS WITHOUT the full fix is repeatable.

@spearfish5609
Copy link

most talked about incident for months

this incident has not even happened 2 months ago and people cared about it for a few days at most

@nirohgo
Copy link

nirohgo commented May 9, 2024

After a discussion with the protocol it's clear that this issue should be a valid Medium

@Evert0x can you elaborate why? (for those of us who weren't on that discussion)

@Evert0x
Copy link

Evert0x commented May 13, 2024

The justification for Medium severity is as follows

The Proxy Admin Owner is a TRUSTED role that can:

  • Upgrade all smart contracts that sit behind a Proxy contract.
  • Set the implementation contract for any dispute game type within the DisputeGameFactory.
  • Modify the initial bond cost for any dispute game type within the DisputeGameFactory.
  • Remove ETH from the DelayedWETH contract.

The Proxy Admin Owner is assumed to be honest and responsive with an SLA of 72 hours.

As stated in the README, part of the security model is a honest and responsive admin that can recover from a DoS within 72 hours.

Are there any off-chain mechanisms or off-chain procedures for the protocol (keeper bots, arbitrage bots, etc.)?

Off-chain mechanisms exist as part of the system but are not in scope for this competition. Assume that comprehensive monitoring exists that will detect most obviously detectable malicious activity.

The supplied block being higher than the actual block number in the EVM is obviously detectable malicious activity.

In conclusion, once the proposal is detected, the Proxy Admin Owner is trusted to be responsive within 72 hours and is able to switch the game type to a permissioned implementation within a new AnchorStateRegistry to mitigate the DoS.

Note: It’s important to note that there’s a difference between switching the game type (which invalidates all withdrawals with that game type) and switching the implementation (which does not invalidate withdrawals).

@guhu95
Copy link

guhu95 commented May 14, 2024

@Evert0x but can switching to a permissioned implementation be considered final "recovery"?

If assumed permanent - it permanently breaks core functionality (no fraud proofs from that point).

If assumed temporary - it only postpones the switch of the game type and the withdrawals DoS.

@trust1995
Copy link
Collaborator

trust1995 commented May 14, 2024

This downplayed take is plagued with intellectual dishonesty.

The supplied block being higher than the actual block number in the EVM is obviously detectable malicious activity.

If it was obvious as something to look for, Opt would have validated the l2BlockNumber is the same as the VM block number. Detection is highly unprobable. Please provide the defender off-chain code to show awareness of this vulnerability. Once again the benefit of the doubt is given to an opaque statement by the sponsor and against honest Watsons. For fairness of discussion, it must be assumed Opt is aware only at the moment games cannot be created.

able to switch the game type to a permissioned implementation within a new AnchorStateRegistry to mitigate the DoS.

For the past month where Optimism had access to the repo, their suggested fix was moving to a new game type, confirming the 2 week DOS. Only couple of days ago came the idea of overriding the same game type to avoid the DOS. Using this to reduce severity is unacceptable. It essentially extended their 3 day SLA to 1 month, letting them theorize over best response over a tremendously long time and then argue the optimized response would be what they would be rolling with on day 1. Clear intellectual dishonesty.

Additionally, from an air-gap perspective (up to High according to the README), the resolution and updating of the anchor state registry is instantaneous, making new withdrawals impossible from day 0 and bypassing intended airgaps.
A combination of FoF impact (High impact) + airgap bypass (High focus of contest) + permissionless attacker (High likelihood) makes it verry clear high severity is in order.

I will also state that over the past week Optimism has catapulted a variety of arguments against the submission which were technically proven wrong, showing they have no problem misrepresenting an issue or its characteristics in order to reduce its severity.

@spearfish5609
Copy link

I dont know why the optimism team is trying to find some weird loopholes to argue for downgrade if they could just use official sherlock docs to justify it:

according to https://docs.sherlock.xyz/audits/judging/judging#v.-how-to-identify-a-medium-issue :
Breaks core contract functionality, rendering the contract useless or leading to loss of funds. is a medium

about DOS: https://docs.sherlock.xyz/audits/judging/judging#iii.-sherlocks-standards
requires both

  1. The issue causes locking of funds for users for more than a week
  2. The issue impacts the availability of time-sensitive functions

to be high severity

1 is true and 2 is questionable if we assume that admin deploys a new game type in time so funds can be recovered and users can just make new game instance, where these functions are available again.

I dont see any air-gap bypass unless we use different definitions. My understanding is that the air-gap is the delay before withdraw of funds can happen and its not possible for users to withdraw early

@Evert0x
Copy link

Evert0x commented May 14, 2024

@guhu95 It's not a final recovery, but safety mechanisms are put in place first to mitigate the DoS and, secondly, to remove the DoS factor.

@Evert0x
Copy link

Evert0x commented May 14, 2024

@trust1995 Forwarding from the protocol team the detection code for this case.


So in a nutshell the monitoring service is here: https://github.com/ethereum-optimism/optimism/blob/5137f3b74c6ebcac4f0f5a118b0f4909df03aec6/op-dispute-mon/mon/monitor.go#L87

This service calls out to a forecasting function which checks the L2 block number and the claim provided against the real output root for that block number: https://github.com/ethereum-optimism/optimism/blob/5137f3b74c6ebcac4f0f5a118b0f4909df03aec6/op-dispute-mon/mon/forecast.go#L69

Claimed L2 block number and output are pulled from the game’s metadata: https://github.com/ethereum-optimism/optimism/blob/5137f3b74c6ebcac4f0f5a118b0f4909df03aec6/op-dispute-mon/mon/extract/extractor.go#L54

So in the case of that bug, the service would try to get the block number for the future block that doesn’t exist yet, get the following error, disagree, and raise an alert: https://github.com/ethereum-optimism/optimism/blob/5137f3b74c6ebcac4f0f5a118b0f4909df03aec6/op-dispute-mon/mon/validator.go#L40

@Evert0x
Copy link

Evert0x commented May 14, 2024

@spearfish5609 I don't think I'm using weird loopholes to decide on the severity of this issue.

It's not always clear if the DoS should be judged as indefinite just because the admin can recover from it. However, in this case, the language in the README makes it clear.

@sherlock-audit sherlock-audit deleted a comment from guhu95 May 15, 2024
@sherlock-admin2
Copy link
Contributor

The protocol team fixed this issue in the following PRs/commits:
ethereum-optimism/optimism#10438

@sherlock-admin3 sherlock-admin3 added Will Fix The sponsor confirmed this issue will be fixed and removed Won't Fix The sponsor confirmed this issue will not be fixed labels May 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Has Duplicates A valid issue with 1+ other issues describing the same vulnerability Medium A valid Medium severity issue Reward A payout will be made for this issue Sponsor Confirmed The sponsor acknowledged this issue is valid Will Fix The sponsor confirmed this issue will be fixed
Projects
None yet
Development

No branches or pull requests