Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lotus daemon being killed and causing websocket errors #10050

Open
8 of 18 tasks
davidgasquez opened this issue Jan 18, 2023 · 5 comments
Open
8 of 18 tasks

Lotus daemon being killed and causing websocket errors #10050

davidgasquez opened this issue Jan 18, 2023 · 5 comments
Labels
area/chain Area: Chain kind/bug Kind: Bug need/team-input Hint: Needs Team Input P1 P1: Must be resolved

Comments

@davidgasquez
Copy link

Checklist

  • This is not a security-related bug/issue. If it is, please follow please follow the security policy.
  • This is not a question or a support request. If you have any lotus related questions, please ask in the lotus forum.
  • This is not a new feature request. If it is, please file a feature request instead.
  • This is not an enhancement request. If it is, please file a improvement suggestion instead.
  • I have searched on the issue tracker and the lotus forum, and there is no existing related issue or discussion.
  • I am running the Latest release, or the most recent RC(release canadiate) for the upcoming release or the dev branch(master), or have an issue updating to any of these.
  • I did not make any code changes to lotus.

Lotus component

  • lotus daemon - chain sync
  • lotus miner - mining and block production
  • lotus miner/worker - sealing
  • lotus miner - proving(WindowPoSt)
  • lotus miner/market - storage deal
  • lotus miner/market - retrieval deal
  • lotus miner/market - data transfer
  • lotus client
  • lotus JSON-RPC API
  • lotus message management (mpool)
  • Other

Lotus Version

Daemon `v1.18.0`-

Describe the Bug

We are running some jobs on non-bootstrapped nodes using Lily. Some of these nodes usually get the following message.

"level":"debug","ts":"2023-01-02T23:55:04.987Z","logger":"rpc","caller":"go-jsonrpc@v0.1.8/websocket.go:624","msg":"websocket error","error":"websocket: close 1000 (normal)"}

The lily daemon gets killed in this case. I think @TippyFlitsUK has been able to reproduce it in Lotus.

This also appears in the Lily logs:

panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0xd4cbda]

goroutine 28318710 [running]:
github.com/filecoin-project/go-amt-ipld/v2.(*Node).forEachAt(0xc09bc7a510, {0x56150f0?, 0xc0001a2000?}, {0x7fe57c6d2910?, 0xc0d461e140?}, 0xc696ac6000?, 0x0, 0x0, 0xc5ef75dba8)
        /go/pkg/mod/github.com/filecoin-project/go-amt-ipld/v2@v2.1.1-0.20201006184820-924ee87a1349/amt.go:270 +0x19a
github.com/filecoin-project/go-amt-ipld/v2.(*Root).ForEach(...)
        /go/pkg/mod/github.com/filecoin-project/go-amt-ipld/v2@v2.1.1-0.20201006184820-924ee87a1349/amt.go:257
github.com/filecoin-project/specs-actors/v2/actors/util/adt.(*Array).ForEach(0xc696989848?, {0x55fbfa0?, 0xc696ac6000?}, 0xc5ef75dc08?)
        /go/pkg/mod/github.com/filecoin-project/specs-actors/v2@v2.3.6/actors/util/adt/array.go:81 +0xc7
github.com/filecoin-project/lily/chain/actors/builtin/miner.(*deadline2).ForEachPartition(0xc689bcd550, 0xd763f68d80)
        /build/lily/chain/actors/builtin/miner/v2.go:535 +0xc7
github.com/filecoin-project/lily/tasks/actorstate/miner.LoadSectorState.func1(0x4047220?, {0x5619700, 0xc689bcd550})
        /build/lily/tasks/actorstate/miner/sector_events.go:302 +0xd8
github.com/filecoin-project/lily/chain/actors/builtin/miner.(*state2).ForEachDeadline.func1(0xc001566900?, 0xc40cd01860)
        /build/lily/chain/actors/builtin/miner/v2.go:345 +0xe2
github.com/filecoin-project/specs-actors/v2/actors/builtin/miner.(*Deadlines).ForEach(0xc08f360460?, {0x7fe64427ff48, 0xc0d461e140}, 0xc5ef75dd30)
        /go/pkg/mod/github.com/filecoin-project/specs-actors/v2@v2.3.6/actors/builtin/miner/deadline_state.go:89 +0x7c
github.com/filecoin-project/lily/chain/actors/builtin/miner.(*state2).ForEachDeadline(0xc08f360460, 0xd48934ede0)
        /build/lily/chain/actors/builtin/miner/v2.go:344 +0xd6
github.com/filecoin-project/lily/tasks/actorstate/miner.LoadSectorState({0x56150b8, 0xc6079230c0}, {0x5639c30, 0xc08f360460})
        /build/lily/tasks/actorstate/miner/sector_events.go:301 +0x222
github.com/filecoin-project/lily/tasks/actorstate/miner.DiffMinerSectorStates.func2()
        /build/lily/tasks/actorstate/miner/sector_events.go:383 +0x5f
golang.org/x/sync/errgroup.(*Group).Go.func1()
        /go/pkg/mod/golang.org/x/sync@v0.0.0-20220722155255-886fb9371eb4/errgroup/errgroup.go:75 +0x64
created by golang.org/x/sync/errgroup.(*Group).Go
        /go/pkg/mod/golang.org/x/sync@v0.0.0-20220722155255-886fb9371eb4/errgroup/errgroup.go:72 +0xa5

@rvagg left a comment on that in the original issue: filecoin-project/filet#22 (comment).

Logging Information

Shared before.

Repo Steps

  1. Run '...'
  2. Do '...'
  3. See error '...'
    ...
@TippyFlitsUK TippyFlitsUK added P1 P1: Must be resolved need/team-input Hint: Needs Team Input area/chain Area: Chain and removed need/triage labels Jan 18, 2023
@TippyFlitsUK
Copy link
Contributor

Many thanks @davidgasquez

I can confirm that I was able to reproduce this error locally. My initial suspicion was a resource usage limitation but after testing on a 64 core, 1 TiB RAM system the issue is still present.

@davidgasquez
Copy link
Author

Is there anything I can try/do to help add more information to this one? We're running lily to index the chain and this causes jobs to fail. Recent jobs, being bigger, have more chances to fail due this error.

@frrist
Copy link
Member

frrist commented Jan 19, 2023

@TippyFlitsUK could you share the steps you followed to reproduce this?

@TippyFlitsUK
Copy link
Contributor

Hey Forrest 👋
Simply ran the following command on different spec servers:
lotus chain export --recent-stateroots=2880 test.chain

@davidgasquez
Copy link
Author

Any updates on this one @TippyFlitsUK? Basically, I'm curious if this is something that will get worked on soon. 😅 No worries if not as Forrest is patching things up on Lily's side but still wanted to check.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/chain Area: Chain kind/bug Kind: Bug need/team-input Hint: Needs Team Input P1 P1: Must be resolved
Projects
None yet
Development

No branches or pull requests

3 participants