Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stack overflow on chain fork on macOS with RocksDBStore #2338

Closed
tkiapril opened this issue Sep 22, 2022 · 5 comments · Fixed by #2379
Closed

Stack overflow on chain fork on macOS with RocksDBStore #2338

tkiapril opened this issue Sep 22, 2022 · 5 comments · Fixed by #2379
Assignees
Labels
bug Something isn't working storage Related to storage (Libplanet.Store)

Comments

@tkiapril
Copy link
Contributor

On macOS (tested on Intel Macs,) in NineChronicles.Headless, with snapshot supplied, when a chain fork is attempted, MoveNext() in RocksDBStore results in a call stack overflow. This does not affect Windows and Linux, and is present in versions since at least 0.40.

[15:35:20 INF] [Swarm] Fetched 33 excerpts from 37 peers.
[15:35:20 INF] [Swarm] As the local tip (#5009681 4d0e9efe9e060ee6f382aacf33ea852490d0d921060d9029a1061508a458f48a) is still not close enough to the topmost tip in the network (#5011430 296d74739c36ad1764bbfc7046b1be509f8c774dcc482d5dde53b234150a72a7), preload one more time...
[15:35:20 INF] [Swarm] Preloading (trial #1) started...
[15:35:20 DBG] [BlockChain] Trying to fork chain at 4d0e9efe9e060ee6f382aacf33ea852490d0d921060d9029a1061508a458f48a(prevId: d7ef1c41-b14e-45ab-9f4f-48fd43f470d1) (forkedId: ed7cc6bd-2a38-4a7b-bcce-e341064c400d)
Stack overflow.
   at RocksDbSharp.Native.rocksdb_get(IntPtr, IntPtr, Byte[], Int64, IntPtr ByRef, RocksDbSharp.ColumnFamilyHandle)
   at RocksDbSharp.Native.rocksdb_get(IntPtr, IntPtr, Byte[], Int64, RocksDbSharp.ColumnFamilyHandle)
   at Libplanet.RocksDBStore.RocksDBStore.GetPreviousChainInfo(System.Guid)
   at Libplanet.RocksDBStore.RocksDBStore+<IterateIndexes>d__107.MoveNext()
   at Libplanet.RocksDBStore.RocksDBStore+<IterateIndexes>d__107.MoveNext()
   at Libplanet.RocksDBStore.RocksDBStore+<IterateIndexes>d__107.MoveNext()

...

   at Libplanet.RocksDBStore.RocksDBStore+<IterateIndexes>d__107.MoveNext()
   at System.Collections.Generic.LargeArrayBuilder`1[[Libplanet.Blocks.BlockHash, Libplanet, Version=0.40.0.0, Culture=neutral, PublicKeyToken=null]].AddRange(System.Collections.Generic.IEnumerable`1<Libplanet.Blocks.BlockHash>)
   at System.Collections.Generic.EnumerableHelpers.ToArray[[Libplanet.Blocks.BlockHash, Libplanet, Version=0.40.0.0, Culture=neutral, PublicKeyToken=null]](System.Collections.Generic.IEnumerable`1<Libplanet.Blocks.BlockHash>)
   at System.Linq.Enumerable.ToArray[[Libplanet.Blocks.BlockHash, Libplanet, Version=0.40.0.0, Culture=neutral, PublicKeyToken=null]](System.Collections.Generic.IEnumerable`1<Libplanet.Blocks.BlockHash>)
   at Libplanet.RocksDBStore.RocksDBStore.ForkBlockIndexes(System.Guid, System.Guid, Libplanet.Blocks.BlockHash)
   at Libplanet.Headless.ReducedStore.ForkBlockIndexes(System.Guid, System.Guid, Libplanet.Blocks.BlockHash)
   at Libplanet.Blockchain.BlockChain`1[[System.__Canon, System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e]].Fork(Libplanet.Blocks.BlockHash, Boolean)
   at Libplanet.Net.Swarm`1+<PreloadAsync>d__114[[System.__Canon, System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e]].MoveNext()
   at System.Threading.ExecutionContext.RunInternal(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object)
   at System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1+AsyncStateMachineBox`1[[System.Threading.Tasks.VoidTaskResult, System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e],[System.__Canon, System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e]].MoveNext(System.Threading.Thread)
   at System.Threading.Tasks.AwaitTaskContinuation.RunOrScheduleAction(System.Runtime.CompilerServices.IAsyncStateMachineBox, Boolean)
   at System.Threading.Tasks.Task.RunContinuations(System.Object)
   at System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1[[System.__Canon, System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e]].SetExistingTaskResult(System.Threading.Tasks.Task`1<System.__Canon>, System.__Canon)
   at Libplanet.Net.Swarm`1+<GetPeersWithExcerpts>d__126[[System.__Canon, System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e]].MoveNext()
   at System.Threading.ExecutionContext.RunInternal(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object)
   at System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1+AsyncStateMachineBox`1[[System.__Canon, System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e],[System.__Canon, System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e]].MoveNext(System.Threading.Thread)
   at System.Threading.Tasks.AwaitTaskContinuation.RunOrScheduleAction(System.Runtime.CompilerServices.IAsyncStateMachineBox, Boolean)
   at System.Threading.Tasks.Task.RunContinuations(System.Object)
   at System.Threading.Tasks.Task.FinishSlow(Boolean)
   at System.Threading.Tasks.Task.ExecuteWithThreadLocal(System.Threading.Tasks.Task ByRef, System.Threading.Thread)
   at System.Threading.ThreadPoolWorkQueue.Dispatch()
   at System.Threading.PortableThreadPool+WorkerThread.WorkerThreadStart()
   at System.Threading.Thread.StartCallback()
Abort trap: 6
@boscohyun
Copy link
Member

I have same issue.

  • m1 mac

  • 9c-headless:v100291

  • Rider 2022.2.2

  • dotnet x64 6.0.401

  • snapshot: https://s3.console.aws.amazon.com/s3/object/9c-snapshots?prefix=main%2Fpartition%2Farchive%2Ffull%2F20220907075443_9c-main-snapshot.zip&region=ap-northeast-2#

  • CLI options:

    -V=100291/6ec8E598962F1f475504F82fD5bF3410eAE58B9B/MEUCIQCA3lzUAt0QBfG.+ezw4CQ69zBy669sANEt5juSJgzqcgIgbozfpcyeuKJDeJoT5exyGYDYBqCpxklsMEfs0SQ6qzo=/ZHUxNjpXaW5kb3dzQmluYXJ5VXJsdTcyOmh0dHBzOi8vcmVsZWFzZS5uaW5lLWNocm9uaWNsZXMuY29tL21haW4vdjEwMDI5MS9sYXVuY2hlci92MS9XaW5kb3dzLnppcHU5OnRpbWVzdGFtcHUxMDoyMDIyLTA5LTA3ZQ==
    -G=https://release.nine-chronicles.com/genesis-block-9c-main
    --store-type=rocksdb
    --store-path=/Users/seungmin/Downloads/FileZilla/9c-snapshots/20220907075443_9c-main-snapshot
    --peer=027bd36895d68681290e570692ad3736750ceaab37be402442ffb203967f98f7b6,9c-main-tcp-seed-1.planetarium.dev,31234
    -I=turn://0ed3e48007413e7c2e638f13ddd75ad272c6c507e081bd76a75e4b7adc86c9af:0apejou+ycZFfwtREeXFKdfLj2gCclKzz5ZJ49Cmy6I=@turn-us.planetarium.dev:3478
    -T=03eeedcd574708681afb3f02fb2aef7c643583089267d17af35e978ecaf2a1184e
    --port=31234
    --graphql-server
    --graphql-host=0.0.0.0
    --graphql-port=80
    --workers=1000
    --chain-tip-stale-behavior-type=reboot
    --no-miner
    
  • 9c-headless-stack-overflow.log

@longfin longfin added storage Related to storage (Libplanet.Store) bug Something isn't working labels Sep 26, 2022
@longfin longfin self-assigned this Sep 26, 2022
@sky1045
Copy link
Contributor

sky1045 commented Sep 26, 2022

may be relevent to dotnet/runtime#33622
default thread stack size may be platform specific

@moreal
Copy link
Contributor

moreal commented Oct 3, 2022

It isn't solution but you may be able to bypass this issue with COMPlus_DefaultStackSize=1000000000 environment variable. (setting default stack size as 1GB 🙄)

@dahlia
Copy link
Contributor

dahlia commented Oct 4, 2022

The allowed size for call stacks vary depending on OSes and their settings, but it's still true that it's probably your fault if your program overflows its stack, because usually stack size is more or less enough for the most programs.

I believe it's rather a signal that we tried to solve a problem that cannot be solved by recursions with recursions. We need to rewrite it using plain loops in order to address the problem instead of configuring stack size on macOS. Even if we gloss over the problem this time by runtime settings, we will eventually face the same problem again and again as the blockchain goes longer.

@moreal
Copy link
Contributor

moreal commented Oct 12, 2022

Are you still researching this issue, @longfin? If not, can I take this issue?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working storage Related to storage (Libplanet.Store)
Projects
Status: Done
Archived in project
Development

Successfully merging a pull request may close this issue.

6 participants