Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory leak in latest foundry #8014

Open
2 tasks done
0xTimepunk opened this issue May 30, 2024 · 20 comments · Fixed by #8034
Open
2 tasks done

Memory leak in latest foundry #8014

0xTimepunk opened this issue May 30, 2024 · 20 comments · Fixed by #8034
Labels
T-bug Type: bug

Comments

@0xTimepunk
Copy link

0xTimepunk commented May 30, 2024

Component

Forge

Have you ensured that all of these are up to date?

  • Foundry
  • Foundryup

What version of Foundry are you on?

forge 0.2.0 (5494c33 2024-05-30T00:18:10.251370000Z)

What command(s) is the bug in?

forge test (make test)

Operating System

macOS (Apple Silicon)

Describe the bug

With the following PR as an example superform-xyz/superform-core#541

To reproduce

Part 1

  • Do foundryup (latest foundry)
  • forge install
  • Override RPCs in make file and run make ftest
  • Open Mac activity monitor and check memory usage. This results in a oom:
    telegram-cloud-document-4-5960961092661285381

code 137 in https://github.com/superform-xyz/superform-core/actions/runs/9289996043/job/25566018918

Please note that this is a powerful dedicated Github actions runner, so the memory limits shouldn't actually be reached

Part 2

  • Do foundryup -v nightly-cafc2606a2187a42b236df4aa65f4e8cdfcea970
  • forge install
  • Override RPCs in make file and run make ftest
  • Open Mac activity monitor and check memory usage
    telegram-cloud-photo-size-4-5963231644758425826-y

A whopping difference of 36gb.

@0xTimepunk 0xTimepunk added the T-bug Type: bug label May 30, 2024
@mattsse
Copy link
Member

mattsse commented May 30, 2024

@grandizzy I wonder if this is fuzzing related

@0xTimepunk were you able to narrow this down to a specific test?

@grandizzy
Copy link
Collaborator

grandizzy commented May 30, 2024

@grandizzy I wonder if this is fuzzing related

@0xTimepunk were you able to narrow this down to a specific test?

Invariants are excluded from project tests, so they're not the culprit. Going to check for regular fuzz tests, the build that does not reproduce it is the one right before cancun update (project is set to paris) but don't have any evidence this causes it.

@0xTimepunk
Copy link
Author

0xTimepunk commented May 30, 2024

I'm not sure if the nightly I posted is the last well operating nightly as I haven't tested. Memory usage increase seems independent of test type, however we do have fuzzying in some scenario files which are part of make ftest or make coverage

@grandizzy
Copy link
Collaborator

grandizzy commented May 30, 2024

I'm not sure if the nightly I posted is the last well operating nightly as I haven't tested. Memory usage increase seems independent of test type, however we do have fuzzying in some scenario files which are part of make ftest or make coverage

@0xTimepunk I was under the impression you're excluding invariant tests when you hit such (here's the CI failure I was pointing to check https://github.com/superform-xyz/superform-core/actions/runs/9289996043/job/25566018918 ), but looking at make ftest target it is running all of them, is there same issue when running make test-ci ? I am yet able to try it locally as the project requires several chain rpc urls which I don't have handy
The failure in CI action above relates to coverage only which has script = 'src' in FOUNDRY_PROFILE=coverage, is this desired?
Comparing with a success run (like https://github.com/superform-xyz/superform-core/actions/runs/9302286062/job/25603031010), in the failed one (https://github.com/superform-xyz/superform-core/actions/runs/9289996043/job/25566018918 ) there's no [FAIL. Reason: setup failed: Could not instantiate forked environment with fork url: ***] setUp() (gas: 0) entry while in the one that completed OK there are several such, which can indicate that the successful runs are not really executing the tests, hence passing

@0xTimepunk
Copy link
Author

0xTimepunk commented May 30, 2024

So, to give extra context. I thought using foundryup -v nightly-cafc2606a2187a42b236df4aa65f4e8cdfcea970 fixed things, but the memory leak is still there apparently

I think this is independent of invariant tests, because it happens also with make test-ci and make coverage, both exclude invariant tests.

I think it may have to do with fuzzing. Please note that our fuzz tests involve running actions repeatedly between 2 or more forks with different input parameters. The scenario / assertions are quite large ,but this didn't happen before.


on make ftest that was just an example command as this is also happening for make test-ci and make coverage


The failure in CI action above relates to coverage only which has script = 'src' in FOUNDRY_PROFILE=coverage, is this desired?

This is desired. This forces script folder not to be built as it's not needed for coverage run

@grandizzy

@0xTimepunk
Copy link
Author

0xTimepunk commented May 30, 2024

image I notice 1gb is added on each fuzz test we run (like, when the line appears as [PASS], I see 1gb +- added to the `forge` process in mac's activity monitor), when it should be stable. This is on `make test-ci` which has no invariant calls

@0xTimepunk
Copy link
Author

0xTimepunk commented May 30, 2024

@mattsse @grandizzy (also please read conclusions at the bottom)

Scenario 1 - the worst (with all latest versions)

Settings

On forge-std @52715a2 (latest)

latest foundry (foundryup)

Note: this checks out to a commit with latest forge-std and removed import of ds-test

git checkout 0769ff1 && make test-ci

image

Results: Forge hits around 11gb. More has been observed

git checkout 0769ff1 && make coverage

image

Results: Forge hits around +32gb. More has been observed Computer starts to freeze, 137 code sometimes both on local / ci

Scenario 2 - very bad

Settings

On forge-std @52715a2 (latest)

nightly-cafc2606a2187a42b236df4aa65f4e8cdfcea970

Note: this checks out to a commit with latest forge-std and removed import of ds-test

git checkout 0769ff1 && forge install && make test-ci

image

Results: Forge hits around 8gb usage. More has been observed in other runs which were not snapshoted.

git checkout 0769ff1 && forge install && make coverage

image

Results: Forge hits around 21gb of usage. More has been observed in other runs which were not snapshoted.

Scenario 3 - not possible

Settings

On forge-std @52715a2 (latest)

nightly-f625d0fa7c51e65b4bf1e8f7931cd1c6e2e285e9

Note: this checks out to a commit with latest forge-std and removed import of ds-test

git checkout 0769ff1 && forge install && make test-ci

Crashes (likely too old foundry version for most recent forge-std), so skipping analysis here.

image

Scenario 4 - not possible

Settings

On forge-std@2f11269 (6 months old, the one we had previously)

AND Using ds-test@e282159

latest foundry (foundryup)

Note: this checks out to a commit which has our previous forge-std version, the one from 6 months ago, together with ds-test, both still operational

git checkout e9826a51 && forge install && make test-ci

Crashes. Likely new forge version did something that made it stop working with forge-std (unsure why should this happen). Skipping test.

image

Scenario 5 - somewhat healthy ram usage

On forge-std@2f11269 (6 months old, the one we had previously)

AND Using ds-test@e282159

nightly-cafc2606a2187a42b236df4aa65f4e8cdfcea970

Note: this checks out to a commit which has our previous forge-std version, the one from 6 months ago, together with ds-test, both still operational

git checkout e9826a51 && forge install && make test-ci

image

Results: 2gb. looks healthy

git checkout e9826a51 && forge install && make test-coverage

image

Results: 15gb. Worse than scenario 6, 3 times more gb usage, definitely something started to happen with forge between version nightly-f625d0fa7c51e65b4bf1e8f7931cd1c6e2e285e9 (check below) and nightly-cafc2606a2187a42b236df4aa65f4e8cdfcea970(this one). Still better than with new forge-std in scenario 2.

Scenario 6 - the healthiest RAM usage

Settings

On forge-std@2f11269 (6 months old, the one we had previously)

AND Using ds-test@e282159

nightly-f625d0fa7c51e65b4bf1e8f7931cd1c6e2e285e9

Note: this checks out to a commit which has our previous forge-std version, the one from 6 months ago, together with ds-test, both still operational

git checkout e9826a51 && forge install && make test-ci

image

Results: Forge hits around 1.79gb usage

git checkout e9826a51 && forge install && make coverage

image

Results: Forge hits around 5gb usage


Conclusions

  • Pinning forge-std to an old commit together with using ds-test, plus reverting to an older foundry version is the best setup to stop the memory leak and continue operating with foundry for us for now as a temporary measure. The best scenario is 6.
  • Old forge-std versions work well up until nightly cafc260 forge. In newest forge it crashes
  • New forge-std version without ds-test import works well back to cafc260. In older forge like f625d0f it crashes
  • We believe new forge version should keep on supporting projects that wish to use older forge-std together with ds-test and not introduce crashes.
    - We only decided to update forge-std to the latest version because forge started crashing with new forge (basically scenario 4 and because we were used to update forge with foundryup regularly). Then we were forced to start taking the memory leak which is impeding our operations.

@0xTimepunk
Copy link
Author

0xTimepunk commented May 30, 2024

This issue is extra tricky to solve in a temporary fix because using latest forge allowed to overcome the problem (totally unrelated) in #7852 (which was fixed this month), and by having to use an old one we'll probably need to maintain two forge versions to continue operating. (Or perhaps try to use a87faf6 associated nightly which has the fix and the memory leak is not so bad, together with old forge-std + ds-test)

Edit: the nightly associated a87faf6 has the memory leak, so we will operate with two foundry versions as needed for now

@0xTimepunk
Copy link
Author

0xTimepunk commented May 30, 2024

Update:

While this works on our main/develop branches (scenario 6, forge-std 1.7.3 and forge f625d0f), this breaks something in an upstream branch where tests on mainnet are pinned to a more recent block compared to main/develop. In this block a USDC upgrade was already made, which only works with forge-std 1.8.0+ (see release notes of https://github.com/foundry-rs/forge-std/releases/tag/v1.8.0 and foundry-rs/forge-std#505)

Some testing

Testing forge-std 1.8.0 with f625d0f does not work due to the crash in scenario 3 above.

Then I upgraded again foundry to cafc260.

Testing once again (1.8.0 + cafc260) to see if the memory leak is there by running

image

Results: 13gb, the memory leak is back

Conclusion: the memory leak seems to be definitely have been brought up by forge-std in versions >=1.8.0 crossover with forge.

for awareness @mattsse @grandizzy and tagging @mds1

There is an urgency in solving this situation as this is almost completely halting our development. We need more advice how we can short term fix this to put down the memory leak and have the USDC fix brought up by 1.8.0 in place (is there a commit we can cherry pick for this)

@grandizzy
Copy link
Collaborator

grandizzy commented May 30, 2024

@0xTimepunk running with trace enabled I can see mem grows when test forks chains (there are many tests with many chains forked running at the same time so eventually OS kicks in and kills the task). What worked for me locally (and I think you can apply it for now as an workaround too to get you going) is to split the scenarios test in several steps, as in

test-scenarios :; forge test --match-path "test/fuzz/scenarios/scenarios-deposit-multiDst*/*.sol" --no-match-path "test/invariant/**/*.sol" \
&& forge test --match-path "test/fuzz/scenarios/scenarios-deposit-singleDirect*/*.sol" --no-match-path "test/invariant/**/*.sol" ...

so mem doesn't ever grow up to the point where process needs to be killed (this also helped me not to get rate limited by infura)

Looking more into this issue but I think you can use such for now?

@0xTimepunk
Copy link
Author

@0xTimepunk running with trace enabled I can see mem grows when test forks chains (there are many tests with many chains forked running at the same time so eventually OS kicks in and kills the task). What worked for me locally (and I think you can apply it for now as an workaround too to get you going) is to split the scenarios test in several steps, as in


test-scenarios :; forge test --match-path "test/fuzz/scenarios/scenarios-deposit-multiDst*/*.sol" --no-match-path "test/invariant/**/*.sol" \

&& forge test --match-path "test/fuzz/scenarios/scenarios-deposit-singleDirect*/*.sol" --no-match-path "test/invariant/**/*.sol" ...

so mem doesn't ever grow up to the point where process needs to be killed (this also helped me not to get rate limited by infura)

Looking more into this issue but I think you can use such for now?

Hey @grandizzy it's super cumbersome to run a large test suite like ours like that suggested way unfortunately.
Thinking on using pre 1.8.0 forge-std plus a temporary fix to avoid dealing USDC

is there any further help we can give on this problem?

Thanks

@grandizzy
Copy link
Collaborator

yeah, I'm actively looking into this, just provided some thoughts while debugging

@0xTimepunk
Copy link
Author

0xTimepunk commented Jun 21, 2024

@mattsse @grandizzy requesting to re-open the issue as something got merged around 2 days ago or more into foundry that made the issue appear once again

Ci exits once again with error 137 (out of memory)

Running make coverage on superform-xyz/superform-core#516 yield 40gb of ram

telegram-cloud-photo-size-4-6030658894684078297-y

This is blocking our test/coverage checks mainly on CI (because it auto-kills with 137) and impeding progress

@mattsse mattsse reopened this Jun 21, 2024
@mattsse
Copy link
Member

mattsse commented Jun 21, 2024

@klkvr @grandizzy does this ring a bell?

@0xTimepunk
Copy link
Author

0xTimepunk commented Jun 21, 2024

forge 0.2.0 (fb86e5d 2024-06-05T00:17:33.761343000Z) was one of the first versions which contained the initial fix to the memory issue which was working. However, given that the nightlies are removed after 3 days we can no longer access this nightly in the CI.

You can probably use this version to diff check against current one to see what changed

@grandizzy
Copy link
Collaborator

grandizzy commented Jun 21, 2024

@klkvr @grandizzy does this ring a bell?

nothing I could think at but will do some tests shortly and try figure out if something changed to affect this. @0xTimepunk to rule out the possibility of your PR 516 exposing a new issue (as it adds new code and tests), could you please check if you see same coverage problem running latest forge against your master (or v1) branch?

(Also, if I correctly understand the issue now is with forge coverage, initially the failure described in this issue was with forge test, so pretty sure that's not caused by the same)

@grandizzy
Copy link
Collaborator

I am not able to locally spot any difference between June 5 forge build mentioned as working OK (fb86e5d) and latest (fb86e5d) - when running make coverage on superform-xyz/superform-core@68320ff commit both runs cap around 25G of res mem and completes succcessfully

same when testing v1 branch with both forge versions so I am pretty sure there's no regression. @0xTimepunk maybe there's an issue with the RPC URLs used by your CI that could somehow slow down tests / make mem accumulate?

@Inphi
Copy link
Contributor

Inphi commented Jun 28, 2024

I'm seeing a variant of this when running forge test --debug. Though happy to create a separate issue if this is something else. Here's a small reproducer:

Foundry Version

forge 0.2.0 (c4a984f 2024-06-28T00:53:16.491646468Z)

Reproducer

// SPDX-License-Identifier: UNLICENSED
pragma solidity ^0.8.13;

import {Test, console} from "forge-std/Test.sol";
import {Counter} from "../src/Counter.sol";

contract CounterTest is Test {
    Counter public counter;
    address addr = address(uint160(uint256(keccak256("ekans"))));

    function setUp() public {
        bytes memory buffer = new bytes(2000000);
        vm.etch(address(addr), buffer);

        counter = new Counter();
        counter.setNumber(0);
    }

    function test_Increment() public {
        counter.increment();
        assertEq(counter.number(), 1);
    }

    function testFuzz_SetNumber(uint256 x) public {
        counter.setNumber(x);
        assertEq(counter.number(), x);
    }
}

Running forge test --debug test_Increment --mc CounterTest consumes over 32 GB on my machine before the kernel kills it.
I think the critical part is this line here:

vm.etch(address(addr), buffer)

Etching a really large runtime code seems to trigger the issue. I use an empty buffer for illustrative purposes, but the run still consumes alot of memory even when a valid runtime bytecode is used. Reducing the size of the runtime code seems to work, with minimal memory usage. I'm sure there's some buffer size threshold where memory usage spikes, but I haven't narrowed it down.

@grandizzy
Copy link
Collaborator

@Inphi yes, please open a new issue for etch, going to use this one to optimize coverage report creation. Thank you.

@Inphi
Copy link
Contributor

Inphi commented Jun 28, 2024

@Inphi yes, please open a new issue for etch, going to use this one to optimize coverage report creation. Thank you.

Created an issue here.

Seems related to this issue based on this comment. I imagine that the Debugger inspector is also used to track coverage which would explain the memory blowup.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
T-bug Type: bug
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants