Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[bug]: Failure to run TestTaprootAssetsDaemon/multi_signature_on_all_levels #958

Open
Nsandomeno opened this issue Jun 19, 2024 · 12 comments
Labels
bug Something isn't working needs triage

Comments

@Nsandomeno
Copy link

Nsandomeno commented Jun 19, 2024

Background

I'm failing to debug through the TestTaprootAssetsDaemon/multi_signature_on_all_levels but experiencing broad PASS's when debugging all of the TestTaprootAssetsDaemon suite with the same debugging configuration and also with make itest/make itest-only. Despite the PASS's, the issue I depict below does occur on other tests at least, TestTaprootAssetsDaemon/mint_fund_seal_assets.

The logs from all scenarios point out to something wrong with the HTLC Interceptor. I ensure the presence of the routerrpc in the case where I am using VS Code's debugger. Another suspect is a Context deadline being passed but I'm not sure why that would be happening.

The error logs when running with the debug configuration are in the attached file, but align with the snippets below from other itest runs.
TaprootAssetsDaemon_multi_signature_on_all_levels_debug_logs.txt

The error logs when running with make itest or make build-itest look like this:

=== NAME  TestTaprootAssetsDaemon
    test_harness.go:139: Error outside of test: *errors.errorString Error running server: received critical error from subsystem: receive failed: EOF
        /Users/nicholassandomeno/lightning/dev/taproot-assets/itest/tapd_harness.go:322 (0x3fb36fa)
                (*tapdHarness).start.func2: hs.ht.Fatalf("Error running server: %v", err)
        /Users/nicholassandomeno/go/pkg/mod/golang.org/toolchain@v0.0.1-go1.22.3.darwin-amd64/src/runtime/asm_amd64.s:1695 (0x2a54aa1)
                goexit: BYTE    $0x90   // NOP
2024-06-19 08:55:37.934 [WRN] RFQS: Error received from stream handler wire message channel: receive failed: rpc error: code = Unavailable desc = error reading from server: EOF
2024-06-19 08:55:37.934 [WRN] RFQS: Raw peer messages channel closed unexpectedly
2024-06-19 08:55:37.934 [ERR] TCHN: Error setting up invoice acceptor: rpc error: code = Unavailable desc = error reading from server: EOF
2024-06-19 08:55:37.934 [ERR] RFQS: Error setting up HTLC interception: unable to setup incoming HTLC interceptor: rpc error: code = Unavailable desc = error reading from server: EOF
2024-06-19 08:55:40.300 [ERR] GRDN: Aborting main custodian event loop: receive failed: EOF
2024-06-19 08:55:40.300 [ERR] SRVR: Shutting down because error in main method: received critical error from subsystem: receive failed: EOF
2024-06-19 08:55:40.300 [INF] RPCS: Stopping RPC Server
2024-06-19 08:55:40.300 [INF] SRVR: Shutdown complete

    test_harness.go:139: Error outside of test: *errors.errorString Error running server: received critical error from subsystem: receive failed: EOF
        /Users/nicholassandomeno/lightning/dev/taproot-assets/itest/tapd_harness.go:322 (0x3fb36fa)
                (*tapdHarness).start.func2: hs.ht.Fatalf("Error running server: %v", err)
        /Users/nicholassandomeno/go/pkg/mod/golang.org/toolchain@v0.0.1-go1.22.3.darwin-amd64/src/runtime/asm_amd64.s:1695 (0x2a54aa1)
                goexit: BYTE    $0x90   // NOP
2024-06-19 08:55:40.379 [ERR] TCHN: Error setting up invoice acceptor: rpc error: code = Unavailable desc = error reading from server: EOF
2024-06-19 08:55:40.379 [WRN] RFQS: Error received from stream handler wire message channel: receive failed: rpc error: code = Unavailable desc = error reading from server: EOF
2024-06-19 08:55:40.379 [WRN] RFQS: Raw peer messages channel closed unexpectedly
2024-06-19 08:55:40.379 [ERR] RFQS: Error setting up HTLC interception: unable to setup incoming HTLC interceptor: rpc error: code = Unavailable desc = error reading from server: EOF
--- FAIL: TestTaprootAssetsDaemon (287.09s)
    --- PASS: TestTaprootAssetsDaemon/mint_assets (52.10s)
    --- PASS: TestTaprootAssetsDaemon/mint_batch_resume (14.96s)
    --- PASS: TestTaprootAssetsDaemon/asset_meta_validation (14.85s)
        --- PASS: TestTaprootAssetsDaemon/asset_meta_validation/opaque_asset (0.17s)
        --- PASS: TestTaprootAssetsDaemon/asset_meta_validation/json_asset (0.10s)
        --- PASS: TestTaprootAssetsDaemon/asset_meta_validation/invalid_json (0.00s)
        --- PASS: TestTaprootAssetsDaemon/asset_meta_validation/custom_meta_type (0.06s)
    --- PASS: TestTaprootAssetsDaemon/asset_name_collision_raises_mint_error (21.72s)
    --- PASS: TestTaprootAssetsDaemon/mint_assets_with_tap_sibling (20.93s)
    --- FAIL: TestTaprootAssetsDaemon/mint_fund_seal_assets (50.83s)
FAIL
FAIL    github.com/lightninglabs/taproot-assets/itest   288.532s
FAIL
make: *** [itest-only] Error 1

The same from make itest-only:

2024-06-19 09:04:22.902 [INF] LNDC: Using network regtest
    mint_fund_seal_test.go:511: 
                Error Trace:    /Users/nicholassandomeno/lightning/dev/taproot-assets/itest/mint_fund_seal_test.go:511
                                                        /Users/nicholassandomeno/lightning/dev/taproot-assets/itest/mint_fund_seal_test.go:89
                                                        /Users/nicholassandomeno/lightning/dev/taproot-assets/itest/test_harness.go:159
                                                        /Users/nicholassandomeno/lightning/dev/taproot-assets/itest/integration_test.go:94
                Error:          Received unexpected error:
                                unable to derive key ring: rpc error: code = DeadlineExceeded desc = context deadline exceeded
                Test:           TestTaprootAssetsDaemon/mint_fund_seal_assets
2024-06-19 09:04:52.904 [INF] SRVR: Stopping Main Server
2024-06-19 09:04:52.904 [INF] RPCS: Stopping RPC Server
2024-06-19 09:04:52.904 [INF] GRDN: Stopping ChainPlanter
2024-06-19 09:04:52.904 [INF] GRDN: Stopping re-org watcher
2024-06-19 09:04:52.904 [INF] RFQS: Stopping RFQ system
2024-06-19 09:04:52.904 [INF] RFQS: Stopping subsystem: order handler
2024-06-19 09:04:52.904 [INF] RFQS: Stopping subsystem: stream handler
2024-06-19 09:04:52.904 [INF] RFQS: Stopping subsystem: quote negotiator
2024-06-19 09:04:52.904 [INF] TCHN: Stopping aux leaf creator
2024-06-19 09:04:52.904 [INF] TCHN: Stopping aux leaf signer
2024-06-19 09:04:52.904 [INF] TCHN: Stopping aux traffic shaper
2024-06-19 09:04:52.904 [INF] TCHN: Stopping aux invoice manager
2024-06-19 09:04:52.904 [ERR] TCHN: Error setting up invoice acceptor: context canceled
2024-06-19 09:04:54.362 [ERR] GRDN: Aborting main custodian event loop: receive failed: EOF
2024-06-19 09:04:54.362 [ERR] SRVR: Shutting down because error in main method: received critical error from subsystem: receive failed: EOF
2024-06-19 09:04:54.362 [INF] RPCS: Stopping RPC Server
2024-06-19 09:04:54.362 [INF] SRVR: Shutdown complete

=== NAME  TestTaprootAssetsDaemon
    test_harness.go:139: Error outside of test: *errors.errorString Error running server: received critical error from subsystem: receive failed: EOF
        /Users/nicholassandomeno/lightning/dev/taproot-assets/itest/tapd_harness.go:322 (0xa6c16fa)
                (*tapdHarness).start.func2: hs.ht.Fatalf("Error running server: %v", err)
        /Users/nicholassandomeno/go/pkg/mod/golang.org/toolchain@v0.0.1-go1.22.3.darwin-amd64/src/runtime/asm_amd64.s:1695 (0x9162aa1)
                goexit: BYTE    $0x90   // NOP
2024-06-19 09:04:54.425 [WRN] RFQS: Error received from stream handler wire message channel: receive failed: rpc error: code = Unavailable desc = error reading from server: EOF
2024-06-19 09:04:54.425 [WRN] RFQS: Raw peer messages channel closed unexpectedly
2024-06-19 09:04:54.425 [ERR] TCHN: Error setting up invoice acceptor: rpc error: code = Unavailable desc = error reading from server: EOF
2024-06-19 09:04:54.426 [ERR] RFQS: Error setting up HTLC interception: unable to setup incoming HTLC interceptor: rpc error: code = Unavailable desc = error reading from server: EOF
2024-06-19 09:04:56.502 [ERR] GRDN: Aborting main custodian event loop: receive failed: EOF
2024-06-19 09:04:56.502 [ERR] SRVR: Shutting down because error in main method: received critical error from subsystem: receive failed: EOF
2024-06-19 09:04:56.502 [INF] RPCS: Stopping RPC Server
2024-06-19 09:04:56.503 [INF] SRVR: Shutdown complete

    test_harness.go:139: Error outside of test: *errors.errorString Error running server: received critical error from subsystem: receive failed: EOF
        /Users/nicholassandomeno/lightning/dev/taproot-assets/itest/tapd_harness.go:322 (0xa6c16fa)
                (*tapdHarness).start.func2: hs.ht.Fatalf("Error running server: %v", err)
        /Users/nicholassandomeno/go/pkg/mod/golang.org/toolchain@v0.0.1-go1.22.3.darwin-amd64/src/runtime/asm_amd64.s:1695 (0x9162aa1)
                goexit: BYTE    $0x90   // NOP
2024-06-19 09:04:56.602 [ERR] TCHN: Error setting up invoice acceptor: rpc error: code = Unavailable desc = error reading from server: EOF
2024-06-19 09:04:56.602 [WRN] RFQS: Error received from stream handler wire message channel: receive failed: rpc error: code = Unavailable desc = error reading from server: EOF
2024-06-19 09:04:56.602 [WRN] RFQS: Error received from stream handler wire message channel: <nil>
2024-06-19 09:04:56.602 [WRN] RFQS: Error received from stream handler wire message channel: <nil>
2024-06-19 09:04:56.602 [WRN] RFQS: Error received from stream handler wire message channel: <nil>
2024-06-19 09:04:56.602 [WRN] RFQS: Raw peer messages channel closed unexpectedly
2024-06-19 09:04:56.602 [ERR] RFQS: Error setting up HTLC interception: unable to setup incoming HTLC interceptor: rpc error: code = Unavailable desc = error reading from server: EOF
--- FAIL: TestTaprootAssetsDaemon (311.43s)
    --- PASS: TestTaprootAssetsDaemon/mint_assets (43.24s)
    --- PASS: TestTaprootAssetsDaemon/mint_batch_resume (17.79s)
    --- PASS: TestTaprootAssetsDaemon/asset_meta_validation (13.49s)
        --- PASS: TestTaprootAssetsDaemon/asset_meta_validation/opaque_asset (0.44s)
        --- PASS: TestTaprootAssetsDaemon/asset_meta_validation/json_asset (0.26s)
        --- PASS: TestTaprootAssetsDaemon/asset_meta_validation/invalid_json (0.00s)
        --- PASS: TestTaprootAssetsDaemon/asset_meta_validation/custom_meta_type (0.23s)
    --- PASS: TestTaprootAssetsDaemon/asset_name_collision_raises_mint_error (22.49s)
    --- PASS: TestTaprootAssetsDaemon/mint_assets_with_tap_sibling (28.21s)
    --- FAIL: TestTaprootAssetsDaemon/mint_fund_seal_assets (49.01s)
FAIL
FAIL    github.com/lightninglabs/taproot-assets/itest   312.936s
FAIL
make: *** [itest-only] Error 1

And lastly, the error logs when running with make itest-only and commenting out all other test cases in (test_list_on_test.go):

=== NAME  TestTaprootAssetsDaemon
    test_harness.go:139: Error outside of test: *errors.errorString Error running server: received critical error from subsystem: receive failed: EOF
        /Users/nicholassandomeno/lightning/dev/taproot-assets/itest/tapd_harness.go:322 (0x5e6685a)
                (*tapdHarness).start.func2: hs.ht.Fatalf("Error running server: %v", err)
        /Users/nicholassandomeno/go/pkg/mod/golang.org/toolchain@v0.0.1-go1.22.3.darwin-amd64/src/runtime/asm_amd64.s:1695 (0x4915aa1)
                goexit: BYTE    $0x90   // NOP
2024-06-19 09:17:39.695 [WRN] RFQS: Error received from stream handler wire message channel: receive failed: rpc error: code = Unavailable desc = error reading from server: EOF
2024-06-19 09:17:39.695 [WRN] RFQS: Raw peer messages channel closed unexpectedly
2024-06-19 09:17:39.695 [ERR] TCHN: Error setting up invoice acceptor: rpc error: code = Unavailable desc = error reading from server: EOF
2024-06-19 09:17:39.695 [ERR] RFQS: Error setting up HTLC interception: unable to setup incoming HTLC interceptor: rpc error: code = Unavailable desc = error reading from server: EOF
2024-06-19 09:17:41.721 [ERR] GRDN: Aborting main custodian event loop: receive failed: EOF
2024-06-19 09:17:41.721 [ERR] SRVR: Shutting down because error in main method: received critical error from subsystem: receive failed: EOF
2024-06-19 09:17:41.721 [INF] RPCS: Stopping RPC Server
2024-06-19 09:17:41.722 [INF] SRVR: Shutdown complete

    test_harness.go:139: Error outside of test: *errors.errorString Error running server: received critical error from subsystem: receive failed: EOF
        /Users/nicholassandomeno/lightning/dev/taproot-assets/itest/tapd_harness.go:322 (0x5e6685a)
                (*tapdHarness).start.func2: hs.ht.Fatalf("Error running server: %v", err)
        /Users/nicholassandomeno/go/pkg/mod/golang.org/toolchain@v0.0.1-go1.22.3.darwin-amd64/src/runtime/asm_amd64.s:1695 (0x4915aa1)
                goexit: BYTE    $0x90   // NOP
2024-06-19 09:17:41.897 [ERR] TCHN: Error setting up invoice acceptor: rpc error: code = Unavailable desc = error reading from server: EOF
2024-06-19 09:17:41.897 [WRN] RFQS: Error received from stream handler wire message channel: receive failed: rpc error: code = Unavailable desc = error reading from server: EOF
2024-06-19 09:17:41.897 [WRN] RFQS: Raw peer messages channel closed unexpectedly
2024-06-19 09:17:41.897 [ERR] RFQS: Error setting up HTLC interception: unable to setup incoming HTLC interceptor: rpc error: code = Unavailable desc = error reading from server: EOF
--- FAIL: TestTaprootAssetsDaemon (181.91s)
    --- FAIL: TestTaprootAssetsDaemon/multi_signature_on_all_levels (29.68s)

The debugging config is specific to VS Code, however, it intends to capture the flags that the Makefile and make/testing_flags.mk:

{
    "configurations": [
    {
        "name": "Launch test function",
        "type": "go",
        "request": "launch",
        "mode": "test",
        "env": {
            "CGO_ENABLED": "0",
        },
        "buildFlags": [
            "-v",
            "-tags=dev,monitoring,autopilotrpc,chainrpc,invoicesrpc,peersrpc,routerrpc,signrpc,verrpc,walletrpc,watchtowerrpc,wtclientrpc,tapdevrpc,mintrpc,assetwalletrpc,taprpc,universerpc,integration,itest,btcd"
        ],
        "program": "${workspaceFolder}/itest",
        "args": [
            "-btcdexec=./btcd-itest",
            "-logdir=regtest",
            "-test.run=TestTaprootAssetsDaemon/multi_signature_on_all_levels",
            "-test.timeout=20m",
            "-logoutput",
            "-logdir=regtest",
            "-optional=false"
        ]
    }
    ]
}

Environment

The results above are from the latest commit of main
Please see debug config above as well.

nicholassandomeno@Nicholass-MacBook-Pro taproot-assets % git pull
Already up to date.
nicholassandomeno@Nicholass-MacBook-Pro taproot-assets % date
Wed Jun 19 08:41:23 EDT 2024
nicholassandomeno@Nicholass-MacBook-Pro taproot-assets % git status
On branch main
Your branch is up to date with 'origin/main'.

Untracked files:
  (use "git add <file>..." to include in what will be committed)
        itest/__debug_bin3991762036

nothing added to commit but untracked files present (use "git add" to track)
nicholassandomeno@Nicholass-MacBook-Pro taproot-assets % 

Steps to reproduce

  • make install
  • make build itest
  • Setup the config in VS Code or proceed with the make steps only
  • Start the VS Code debugger
  • Alternatively, run make itest and examine output for TestTaprootAssetsDaemon/
  • Alternatively, run make itest-only and examine output for TestTaprootAssetDaemon/

Expected behavior

Passing test.

Actual behavior

Failing test.

@Nsandomeno Nsandomeno added bug Something isn't working needs triage labels Jun 19, 2024
@Nsandomeno
Copy link
Author

I do see a similar error in the logs for the run itests CI test on PR #933 here

@guggero
Copy link
Member

guggero commented Jun 20, 2024

The actual error you're getting seems to be: unable to derive key ring: rpc error: code = DeadlineExceeded desc = context deadline exceeded, all the other (EOF) related ones are follow-up errors.
So it sounds like your lnd backend node isn't responding in time. Can you upload all the logs of a failed run please?
They should be in itest/regtest.

@Nsandomeno
Copy link
Author

Nsandomeno commented Jun 20, 2024

This includes 0--Alice, 1--Bob, 2--uni-server-lnd, output_btcd_chainbackend, and output_btcd_miner - is that everything you would be looking for?
regtest.zip

After quickly sifting through now, the only thing that jumps out immediately is many failed to connect messages in both of the files related to btcd.

@guggero
Copy link
Member

guggero commented Jun 20, 2024

Thanks! Yeah, there's nothing that jumps out. Can you please upload the full output of the integration test itself when you run only the multi signature test? You can trigger that alone by just running make itest icase=multi_signature_on_all_levels btw, no need to comment out things.

@Nsandomeno
Copy link
Author

Nsandomeno commented Jun 20, 2024

Nice to finally get that single run down here - thanks! Attached is that output dumped in a file.

If something is taking to long to respond it seems to be the second taprpc.NewAddrRequest that withdraws back to Alice. This feels odd since the same call is made earlier in the itest, but it looks like this call could have to collect additional data since auth material is not passed via the request.

I'm happy to help, but would need a push in the right direction. Also can provide additional details about my computer if you think it could be OS specific.

Original file wasn't saved here is the dump:
output-from-make-icase.txt

@guggero
Copy link
Member

guggero commented Jun 20, 2024

Hmm, are you running things on a very old or otherwise slow machine? It looks like it's just timing out because things take too long. It looks like just starting up the node almost takes 10 seconds...

@Nsandomeno
Copy link
Author

2020 MacBook Pro, 16 GB memory, I don't typically experience issues like that... and, currently have much less running than is typical. I also have plenty of disk space.

If it is simply a matter of a deadline, do you see anyway for me to configure that to be longer?

@guggero
Copy link
Member

guggero commented Jun 20, 2024

You can bump this value, which is causing the new address creation to time out:

defaultTimeout = time.Second * 10

But I think something's quite wrong here.
Looking at this:

2024-06-20 07:29:59.409 [INF] TADB: 1/u macaroons (184.185699ms)
2024-06-20 07:29:59.696 [INF] TADB: 2/u assets (470.972884ms)
2024-06-20 07:29:59.997 [INF] TADB: 3/u addrs (771.805208ms)
2024-06-20 07:30:00.278 [INF] TADB: 4/u mssmt (1.053011165s)
2024-06-20 07:30:00.660 [INF] TADB: 5/u transfers (1.434461231s)
2024-06-20 07:30:01.055 [INF] TADB: 6/u addr_event (1.830356951s)
2024-06-20 07:30:01.292 [INF] TADB: 7/u universe (2.06696322s)
2024-06-20 07:30:01.448 [INF] TADB: 8/u universe_events_timestamp (2.222675209s)
2024-06-20 07:30:01.721 [INF] TADB: 9/u universe_configs (2.496170662s)
2024-06-20 07:30:01.962 [INF] TADB: 10/u universe_stats (2.736820453s)
2024-06-20 07:30:02.226 [INF] TADB: 11/u transfers (3.001343023s)
2024-06-20 07:30:02.505 [INF] TADB: 12/u anchor_txid (3.095945268s)
2024-06-20 07:30:02.770 [INF] TADB: 13/u universe_fed_proof_sync_log (3.073679374s)
2024-06-20 07:30:03.224 [INF] TADB: 14/u multiverse_tree (3.226987341s)
2024-06-20 07:30:03.466 [INF] TADB: 15/u asset_witnesses (3.187647965s)
2024-06-20 07:30:03.942 [INF] TADB: 16/u tapscript_trees (3.281604757s)
2024-06-20 07:30:04.223 [INF] TADB: 17/u seedling_script_group_keys (3.167564985s)
2024-06-20 07:30:04.478 [INF] TADB: 18/u script_key_declared_known (3.185782426s)
2024-06-20 07:30:04.829 [INF] TADB: 19/u managed_utxo_commitment_version (3.38130933s)

These migrations on an empty database taking multiple seconds each is very abnormal. They each take between 2 and 30 milliseconds on my machine.
So this could be the MacOS fsync issue (see https://bonsaidb.io/blog/acid-on-apple/ for more details, the tl;dr is that MacOS lies about syncing the file system and data isn't actually safe until you do a full fsync which takes muuuch longer).

Just out of curiosity, if you set this to false (and the above synchronous to normal), do you see the problem go away?

@Nsandomeno
Copy link
Author

Nsandomeno commented Jun 20, 2024

It did not no, unfortunately. It looks to be the same output, which is attached. And, here is the pragmaOptions content for your confirmation:

	pragmaOptions := []struct {
		name  string
		value string
	}{
		{
			name:  "foreign_keys",
			value: "on",
		},
		{
			name:  "journal_mode",
			value: "WAL",
		},
		{
			name:  "busy_timeout",
			value: "5000",
		},
		{
			// With the WAL mode, this ensures that we also do an
			// extra WAL sync after each transaction. The normal
			// sync mode skips this and gives better performance,
			// but risks durability.
			name:  "synchronous",
			value: "normal",
		},
		{
			// This is used to ensure proper durability for users
			// running on Mac OS. It uses the correct fsync system
			// call to ensure items are fully flushed to disk.
			name:  "fullfsync",
			value: "false",
		},
	}

icase-output-sqlite-settings-change.txt

For our purposes we can move past this for now, assuming it does have a sprawl effect of changes in the case its fix is important. If it is something you thing is very important, I'm happy to look into this more intensely.

@guggero
Copy link
Member

guggero commented Jun 20, 2024

Hmm, okay, so things are slightly faster but not really fast enough. To be honest, I believe this could be an issue with your hardware. Because we have other team members that are also running on MacOS without any issues. So it can't be a general problem. Do you have another machine you could try it out to rule out something specific to your machine?

@Nsandomeno
Copy link
Author

I'm going to work with someone eventually to do that, but I appreciate your time looking in.

@Nsandomeno
Copy link
Author

Nsandomeno commented Jun 21, 2024

One thing I'm still having a bit of a struggle understanding is who so much other success on other tests?

In the even we are running just this one itest case, shouldn't the timeout be running in step with the the few tests when running the entire suite OR is there anything about this test in specific straining this MacOS weakness?

Any thoughts here would be much appreciated @guggero !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working needs triage
Projects
None yet
Development

No branches or pull requests

2 participants