Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stacks API Event Replay Procedure does not Succeed #1879

Closed
AshtonStephens opened this issue Mar 4, 2024 · 4 comments · Fixed by #1903
Closed

Stacks API Event Replay Procedure does not Succeed #1879

AshtonStephens opened this issue Mar 4, 2024 · 4 comments · Fixed by #1903
Assignees

Comments

@AshtonStephens
Copy link

Describe the bug
Stacks API event replay procedure cannot complete.

To Reproduce
Steps to reproduce the behavior:

  1. Download event archive from https://archive.hiro.so/testnet/stacks-blockchain-api/testnet-stacks-blockchain-api-latest.gz
  2. Run the event replay. Note your machine will likely need access to > 16Gb of RAM.
  3. Ingest the events from the output following the api documentation.

Below is a script that does the majority of what I did, minus some initial installation. I did not check whether this script works but it's fully representative of what I did, including the environment. The only difference is I made a separate fork with two changes that I elaborate on below.

EVENT_REPLAY_NUM_WORKERS=4
WORKING_DIR=$(pwd)

# Download archive
 wget -q \
  "https://archive.hiro.so/testnet/stacks-blockchain-api/testnet-stacks-blockchain-api-latest.gz" \
  -O event-replay-archive.gz

# Download and run the event replay.
(
    git clone https://github.com/hirosystems/stacks-event-replay.git -b "v1.0.2"
    cd stacks-event-replay
    
    # Make virtual env
    python3 -m venv .venv
    source ".venv/bin/activate"
    pip install --upgrade pip
    pip install -r requirements.txt
    python3 -m event_replay --tsv-file ../event-replay-archive.gz
    
    # move the event output up a level
    mv events ../events
)

# Make sure the postgres output data path exists.
mkdir -p /var/lib/postgresql/data
# Run the postgres image
docker run -d \
  --name postgres \
  -p 5432:5432 \
  -e POSTGRES_DB=stacks_blockchain_api \
  -e POSTGRES_USER=postgres \
  -e POSTGRES_PASSWORD=postgres \
  -v postgresql_data:/var/lib/postgresql/data \
  postgres:15-alpine

(
    # Now run and build the API.
    # Note, you probably won't  get as far as my setup does until you upgrade duckdb to version "0.10.0",
    # and you'll also need to adjust the "1400" tx batch parameter I mention below.
    git clone https://github.com/hirosystems/stacks-blockchain-api.git -b "v7.8.2"
    
    # Note: Another nit is that the API cannot be compiled outside of a git repo, so it might make sense
    # to remove the tagged release binaries like are listed in the link because you cannot compile
    # them: https://github.com/hirosystems/stacks-blockchain-api/releases/tag/v7.8.2
    echo "GIT_TAG=v7.8.2" >> .env
    
    # Make sure you have node v20
    npm install
    npm run build
    npm prune --production
    
    {
          echo "STACKS_EVENTS_DIR=$WORKING_DIR/events"
          # Set the max memory usage to 29Gb.
          # I started at the recommended 8Gb but I kept needing to increase it because
          # it would crash.
          echo "NODE_OPTIONS=\"--max-old-space-size=29696\""
          echo "PG_PORT=5432"
          echo "PG_DATABASE=stacks_blockchain_api"
    
          # configure the chainID/networkID; testnet: 0x80000000, mainnet: 0x00000001
          echo "STACKS_CHAIN_ID=0x80000000"
          # manually set testnet values to connect to the blockstack testnet.
          echo "BTC_RPC_HOST=bitcoind.testnet.stacks.co"
          echo "BTC_RPC_PORT=18332"
          echo "BTC_RPC_USER=blockstack"
          echo "BTC_RPC_PW=blockstacksystem"
          # These probably don't matter for event replay, but they're part of my setup.
          echo "STACKS_CORE_EVENT_PORT=3700"
          echo "STACKS_CORE_EVENT_HOST=127.0.0.1"
          echo "STACKS_BLOCKCHAIN_API_PORT=3999"
          echo "STACKS_BLOCKCHAIN_API_HOST=127.0.0.1"
          echo "STACKS_CORE_RPC_HOST=127.0.0.1"
          echo "STACKS_CORE_RPC_PORT=20443"
    } >> .env
    
    node ./lib/index.js from-parquet-events --workers="$EVENT_REPLAY_NUM_WORKERS"
 )

What you'll likely see:

One bug I found in this process is here: https://github.com/hirosystems/stacks-blockchain-api/blob/develop/src/event-replay/parquet-based/importers/new-block-importer.ts#L87, where the API should not be batching 1400 Txs. It looks like these each turn into more than 46 parameters to the SQL database, meaning that this number exceeds the maximum parameter count of 65534.

Below is the error, but changing the line I highlighted to 500 Fixes it for the time being.

{"level":"info","time":"2024-03-03T01:23:25.269Z","pid":21,"hostname":"d5c4f5914a8f","name":"stacks-blockchain-api","component":"event-replay","msg":"NEW_BLOCK events process started"}
node:internal/process/promises:289
            triggerUncaughtException(err, true /* fromPromise */);
            ^

Error: MAX_PARAMETERS_EXCEEDED: Max number of parameters (65534) exceeded
    at toBuffer (/app/node_modules/@hirosystems/api-toolkit/node_modules/postgres/cjs/src/connection.js:182:20)

There are some other issues as well, some with duckdb, but after upgrading to version 0.10.0 (latest) of duckdb that went away. Maybe that's a problem but the program progressed after I upgraded so I suspect it's fine.

But once all the other errors went away we now get this error:

{"level":"info","time":"2024-03-04T00:29:41.059Z","pid":43764,"hostname":"...","name":"stacks-blockchain-api","component":"event-replay","msg":"Worker has finished"}
┌─────────┬────────────────────┬───────────┐
│ (index) │ name               │ seconds   │
├─────────┼────────────────────┼───────────┤
│ 0       │ 'NEW_BLOCK_EVENTS' │ '528.712' │
└─────────┴────────────────────┴───────────┘
{"level":"info","time":"2024-03-04T00:29:41.062Z","pid":43764,"hostname":"...,"name":"stacks-blockchain-api","component":"event-replay","msg":"RAW events process started"}

<--- Last few GCs --->

[43764:0x59640e0]   564172 ms: Scavenge (reduce) 4093.7 (4138.6) -> 4093.7 (4139.3) MB, 12.65 / 0.00 ms  (average mu = 0.972, current mu = 0.806) allocation failure;
[43764:0x59640e0]   564323 ms: Mark-Compact (reduce) 4094.9 (4139.6) -> 4094.9 (4140.6) MB, 144.38 / 0.00 ms  (+ 15.8 ms in 18 steps since start of marking, biggest step 15.4 ms, walltime since start of marking 209 ms) (average mu = 0.947, current mu = 0.

<--- JS stacktrace --->

FATAL ERROR: Reached heap limit Allocation failed - JavaScript heap out of memory
----- Native stack trace -----

 1: 0xca5580 node::Abort() [node]
 2: 0xb781f9  [node]
 3: 0xeca4d0 v8::Utils::ReportOOMFailure(v8::internal::Isolate*, char const*, v8::OOMDetails const&) [node]
 4: 0xeca7b7 v8::internal::V8::FatalProcessOutOfMemory(v8::internal::Isolate*, char const*, v8::OOMDetails const&) [node]
 5: 0x10dc505  [node]
 6: 0x10f4388 v8::internal::Heap::CollectGarbage(v8::internal::AllocationSpace, v8::internal::GarbageCollectionReason, v8::GCCallbackFlags) [node]
 7: 0x10ca4a1 v8::internal::HeapAllocator::AllocateRawWithLightRetrySlowPath(int, v8::internal::AllocationType, v8::internal::AllocationOrigin, v8::internal::AllocationAlignment) [node]
 8: 0x10cb635 v8::internal::HeapAllocator::AllocateRawWithRetryOrFailSlowPath(int, v8::internal::AllocationType, v8::internal::AllocationOrigin, v8::internal::AllocationAlignment) [node]
 9: 0x10a7d56 v8::internal::Factory::AllocateRaw(int, v8::internal::AllocationType, v8::internal::AllocationAlignment) [node]
10: 0x1099984 v8::internal::FactoryBase<v8::internal::Factory>::AllocateRawWithImmortalMap(int, v8::internal::AllocationType, v8::internal::Map, v8::internal::AllocationAlignment) [node]
11: 0x109c166 v8::internal::FactoryBase<v8::internal::Factory>::NewRawOneByteString(int, v8::internal::AllocationType) [node]
12: 0x10b3184 v8::internal::Factory::NewStringFromUtf8(v8::base::Vector<char const> const&, v8::internal::AllocationType) [node]
13: 0xedcdc2 v8::String::NewFromUtf8(v8::Isolate*, char const*, v8::NewStringType, int) [node]
14: 0xc479f5 napi_create_string_utf8 [node]
15: 0x7f96a9bb1edb Napi::String::New(napi_env__*, char const*, unsigned long) [.../stacks-blockchain-api/node_modules/duckdb/lib/binding/duckdb.node]
16: 0x7f96a9bb0196  [.../stacks-blockchain-api/node_modules/duckdb/lib/binding/duckdb.node]
17: 0x7f96a9bb083d  [.../stacks-blockchain-api/node_modules/duckdb/lib/binding/duckdb.node]

The program has 29Gb of Ram available to it on a 32Gb Ram machine. I could get a 64Gb machine going, but at this point I suspect there is something else is going wrong. It might make sense to include running the event replay procedure to some local testing steps so that this is easier to run when Nakamoto releases.

Expected behavior
We should be able to run the API and ingest the event archive in the way listed in the docs.

Additional context

This is needed to run part of a potential Nakamoto debugging environment, and is the only current part of the network that is failing to start up. It would be great if we could get this fixed in the very near future.

@AshtonStephens
Copy link
Author

@wileyj

@csgui
Copy link
Collaborator

csgui commented Mar 4, 2024

@AshtonStephens @wileyj GM. I can check this issue, as well, since I've worked on this event-replay implementation. Thanks!

@AshtonStephens
Copy link
Author

Its not the end of the world that it takes 16Gb for the event replay, but it would be really nice if it didn't. I think you could have pandas output the files as it goes as opposed to in one fell swoop.

@zone117x zone117x linked a pull request Mar 21, 2024 that will close this issue
zone117x pushed a commit that referenced this issue Mar 21, 2024
* chore: bump duckdb

* feat: event-replay readiness for nakamoto
blockstack-devops pushed a commit that referenced this issue Mar 21, 2024
## [7.9.0-nakamoto.9](v7.9.0-nakamoto.8...v7.9.0-nakamoto.9) (2024-03-21)

### Bug Fixes

* event-replay readiness for nakamoto & fix for [#1879](#1879) ([#1903](#1903)) ([1572e73](1572e73))
blockstack-devops pushed a commit that referenced this issue Mar 21, 2024
## [7.10.0-nakamoto.1](v7.9.0...v7.10.0-nakamoto.1) (2024-03-21)

### Features

* add signer-keys from pox4 events ([#1857](#1857)) ([c17ad23](c17ad23))
* ingest signer_bitvec ([#1900](#1900)) ([aa1750f](aa1750f))
* nakamoto block timestamps ([#1886](#1886)) ([f547832](f547832))
* pox 4 revoke events and signer-key support ([#1829](#1829)) ([5e5650a](5e5650a)), closes [#1849](#1849)
* pox stacker & signer cycle details ([#1873](#1873)) ([d2c2805](d2c2805))

### Bug Fixes

* event-replay readiness for nakamoto & fix for [#1879](#1879) ([#1903](#1903)) ([1572e73](1572e73))
* remove signer columns from tenure-change transactions ([#1845](#1845)) ([8ec726b](8ec726b))
* sql transactional consistency bug with fetching chaintip in various areas ([#1853](#1853)) ([ada8536](ada8536))
blockstack-devops pushed a commit that referenced this issue Mar 21, 2024
## [7.10.0-beta.1](v7.9.0...v7.10.0-beta.1) (2024-03-21)

### Features

* add signer-keys from pox4 events ([#1857](#1857)) ([c17ad23](c17ad23))
* ingest signer_bitvec ([#1900](#1900)) ([aa1750f](aa1750f))
* nakamoto block timestamps ([#1886](#1886)) ([f547832](f547832))
* pox 4 revoke events and signer-key support ([#1829](#1829)) ([5e5650a](5e5650a)), closes [#1849](#1849)
* pox stacker & signer cycle details ([#1873](#1873)) ([d2c2805](d2c2805))

### Bug Fixes

* event-replay readiness for nakamoto & fix for [#1879](#1879) ([#1903](#1903)) ([1572e73](1572e73))
* remove signer columns from tenure-change transactions ([#1845](#1845)) ([8ec726b](8ec726b))
* sql transactional consistency bug with fetching chaintip in various areas ([#1853](#1853)) ([ada8536](ada8536))
@csgui
Copy link
Collaborator

csgui commented Mar 21, 2024

The work behind this event-replay implementation was to make it fast. So, to achieve that there is a tradeoff between computer resources usage and speed. Previous versions was taken days to finish.

Some improvements that were made:

  1. The batch size to insert data on the txs table was reduced. This will reduce the amount of parameters being passed to postgresql.
  2. When inserting raw events, all the related Parquet files were read in one operation, which increased memory usage. This process has been changed to read one file at a time.
  3. Some changes to support nakamoto data.

To validate those changes, the file https://archive.hiro.so/testnet/stacks-blockchain-api/testnet-stacks-blockchain-api-latest.gz was used and the event-replay process has finished with success in a Apple M1 Max with 64GB of RAM.

The suggestions above will be taken into consideration in improvements to the event-replay process. Thanks.

@AshtonStephens @wileyj please, fee free to reach out if anything else is need.

@csgui csgui closed this as completed Mar 21, 2024
blockstack-devops pushed a commit that referenced this issue Apr 15, 2024
## [7.10.0](v7.9.1...v7.10.0) (2024-04-15)

### Features

* add nakamoto block time to v2 endpoints ([#1921](#1921)) ([ae6bbe8](ae6bbe8))
* add signer-keys from pox4 events ([#1857](#1857)) ([c17ad23](c17ad23))
* ingest signer_bitvec ([#1900](#1900)) ([aa1750f](aa1750f))
* nakamoto block timestamps ([#1886](#1886)) ([f547832](f547832))
* pox 4 revoke events and signer-key support ([#1829](#1829)) ([5e5650a](5e5650a)), closes [#1849](#1849)
* pox stacker & signer cycle details ([#1873](#1873)) ([d2c2805](d2c2805))
* rosetta pox4 stacking support ([#1928](#1928)) ([2ba36f9](2ba36f9)), closes [#1929](#1929)

### Bug Fixes

* add nakamoto testnet to openapi docs ([#1910](#1910)) ([01fb971](01fb971))
* batch drop mempool transactions ([#1920](#1920)) ([a7ee96d](a7ee96d))
* cycle signer filter ([#1916](#1916)) ([dc7d600](dc7d600))
* cycles response for empty cycle info ([#1914](#1914)) ([a7a4558](a7a4558))
* delegate-stx burn-op parsing and test fix ([#1939](#1939)) ([73ec0db](73ec0db))
* event-replay readiness for nakamoto & fix for [#1879](#1879) ([#1903](#1903)) ([1572e73](1572e73))
* log message when sql migration is performed ([#1942](#1942)) ([49a4d25](49a4d25))
* other empty result responses ([#1915](#1915)) ([3cd2c64](3cd2c64))
* pox4 stack-stx burn-op handling ([#1936](#1936)) ([9e9a464](9e9a464))
* remove signer columns from tenure-change transactions ([#1845](#1845)) ([8ec726b](8ec726b))
* sql transactional consistency bug with fetching chaintip in various areas ([#1853](#1853)) ([ada8536](ada8536))
blockstack-devops pushed a commit that referenced this issue Apr 15, 2024
## [7.10.0-beta.1](v7.9.1...v7.10.0-beta.1) (2024-04-15)

### Features

* add nakamoto block time to v2 endpoints ([#1921](#1921)) ([ae6bbe8](ae6bbe8))
* add signer-keys from pox4 events ([#1857](#1857)) ([c17ad23](c17ad23))
* ingest signer_bitvec ([#1900](#1900)) ([aa1750f](aa1750f))
* nakamoto block timestamps ([#1886](#1886)) ([f547832](f547832))
* pox 4 revoke events and signer-key support ([#1829](#1829)) ([5e5650a](5e5650a)), closes [#1849](#1849)
* pox stacker & signer cycle details ([#1873](#1873)) ([d2c2805](d2c2805))
* rosetta pox4 stacking support ([#1928](#1928)) ([2ba36f9](2ba36f9)), closes [#1929](#1929)
* support multiple STX faucet source accounts ([#1946](#1946)) ([5d69c7c](5d69c7c))

### Bug Fixes

* add nakamoto testnet to openapi docs ([#1910](#1910)) ([01fb971](01fb971))
* batch drop mempool transactions ([#1920](#1920)) ([a7ee96d](a7ee96d))
* cycle signer filter ([#1916](#1916)) ([dc7d600](dc7d600))
* cycles response for empty cycle info ([#1914](#1914)) ([a7a4558](a7a4558))
* delegate-stx burn-op parsing and test fix ([#1939](#1939)) ([73ec0db](73ec0db))
* event-replay readiness for nakamoto & fix for [#1879](#1879) ([#1903](#1903)) ([1572e73](1572e73))
* log message when sql migration is performed ([#1942](#1942)) ([49a4d25](49a4d25))
* other empty result responses ([#1915](#1915)) ([3cd2c64](3cd2c64))
* pox4 stack-stx burn-op handling ([#1936](#1936)) ([9e9a464](9e9a464))
* remove signer columns from tenure-change transactions ([#1845](#1845)) ([8ec726b](8ec726b))
* sql transactional consistency bug with fetching chaintip in various areas ([#1853](#1853)) ([ada8536](ada8536))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

2 participants