Opening database takes a very long time #3065

bowenwang1996 · 2020-07-31T15:40:19Z

After the hard-fork on testnet yesterday we observe that when restarting a node, it can potentially take more than half an hour to just open the database. To reproduce, initialize and start a testnet node by

neard --home=<home> init --chain-id="testnet" --genesis=genesis.json
neard --home=<home> run

where genesis.json can be found at https://s3-us-west-1.amazonaws.com/build.nearprotocol.com/nearcore-deploy/testnet/genesis.json.

After the node starts, you should be able to see that it is "Waiting for peers". Now shut down the node and restart by neard --home=<home> run and observe that it gets stuck.

The text was updated successfully, but these errors were encountered:

frol · 2020-08-03T10:07:00Z

My $0.02:

strace reports tons of pread64 reads to the same .sst file, that file is over 264MB (276170959 bytes) while the rest of the .sst files are usually quite small (order of kbytes).
pread64 tries to read 3566667 bytes from the same offset 272281385 over and over again (it succeeds as strace reports that it successfully reads 3566667 bytes, and the file size (276170959) is bigger than 272281385 + 3566667 (275848052))

after 30-35 minutes it manages to reach "INFO stats: Server listening ...":

Aug 03 12:16:35.054  INFO near: Opening store database at "/home/frol/.near/testnet/data"    
Aug 03 12:51:37.397  INFO stats: Server listening at ed25519:612SsfzbL8UWShMRpu3j2cmENbAkXLHL4NiM2d4SPL6S@0.0.0.0:24567

frol · 2020-08-03T10:43:36Z

There is a similar report in Ceph: https://tracker.ceph.com/issues/21092

It seems it is related to the cache size of RocksDB.

frol · 2020-08-03T12:27:48Z

@ailisp Was it you who tweaked RocksDB before? Maybe you have an idea where we miss the cache?

ailisp · 2020-08-03T16:52:18Z

pread64 tries to read 3566667 bytes from the same offset 272281385 over and over again (it succeeds as strace reports that it successfully reads 3566667 bytes, and the file size (276170959) is bigger than 272281385 + 3566667 (275848052))

Great finding! This sounds can't be resolved by smaller write-ahead-log size (.log file), or size of data file (.sst). But rather the node exit before sst file is properly updated and flushed.

after 30-35 minutes it manages to reach "INFO stats: Server listening ...":

Rocksdb might have a 30min timeout on load sst, and would abandon to avoid infinite loop like above case, we might want to abandon earlier to speed up

frol · 2020-08-03T17:03:31Z

@ailisp Can you take it over or help @mikhailOK with this?

P.S. How critical is having a corrupted .sst file? (I just want to learn, so I can better understand how to provide useful debug information)

ailisp · 2020-08-03T17:15:58Z

P.S. How critical is having a corrupted .sst file? (I just want to learn, so I can better understand how to provide useful debug information)

I think in this case it's already corrupt, it shouldn't retry for 30 minute. In general i'm not sure. From how rocksdb works it suppose to keep the write-ahead-log file until it's fully flushed to sst. So if the last sst file isn't fully written, log file should exist and it should be okay to delete the last sst file and reconstruct from the write-ahead-log file.

@ailisp Can you take it over or help @mikhailOK with this?

Will help @mikhailOK

frol · 2020-08-03T18:03:34Z

So if the last sst file isn't fully written, log file should exist and it should be okay to delete the last sst file and reconstruct from the write-ahead-log file.

I tried removing it, but then it complained that there is not such .sst file 😄

ailisp · 2020-08-03T18:11:12Z

I tried removing it, but then it complained that there is not such .sst file 😄

That's bad, probably rocksdb thought last log has applied to sst and deleted the log :(

Or i think, more likely (given last sst file is much bigger than the old ones), rocksdb is doing appending into the latest sst. And compact it into several smaller ones once it grows to some limit. If this is the case we have to use rocksdb's functions to fix sst by undo the latest log.

ailisp · 2020-08-03T18:14:58Z

Is it possible sst corrupt because we didn't do this?

frol · 2020-08-03T20:44:54Z

Or i think, more likely (given last sst file is much bigger than the old ones)

In fact, it was not the last sst file, and usually it is the first or close to the first (based on the numbering in the name).

Use rocksdb merge operator for ColState. No longer need to atomically read + write on update. Fixes #3065 Test plan --------- sanity test manually check performance

Use rocksdb merge operator for ColState. No longer need to atomically read + write on update. Fixes #3065 Test plan --------- sanity test manually check performance manually check store version upgrade

bowenwang1996 assigned mikhailOK Jul 31, 2020

bowenwang1996 added C-bug Category: This is a bug P-critical Priority: critical labels Jul 31, 2020

frol mentioned this issue Aug 5, 2020

nearcore fall out of sync and ultimately stuck unable to broadcast messages on testnet #3085

Closed

weekly-digest bot mentioned this issue Aug 7, 2020

Weekly Digest (31 July, 2020 - 7 August, 2020) #3100

Closed

mikhailOK added a commit that referenced this issue Aug 10, 2020

fix(store): Fix refcount logic slowness

30c04f7

Use rocksdb merge operator for ColState. No longer need to atomically read + write on update. Fixes #3065 Test plan --------- sanity test manually check performance

mikhailOK mentioned this issue Aug 10, 2020

fix(store): Fix refcount logic slowness #3123

Merged

mikhailOK closed this as completed in #3123 Aug 12, 2020

mikhailOK mentioned this issue Aug 12, 2020

Don't write genesis to storage on restart #3151

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Opening database takes a very long time #3065

Opening database takes a very long time #3065

bowenwang1996 commented Jul 31, 2020

frol commented Aug 3, 2020

frol commented Aug 3, 2020 •

edited

frol commented Aug 3, 2020

ailisp commented Aug 3, 2020 •

edited

frol commented Aug 3, 2020

ailisp commented Aug 3, 2020 •

edited

frol commented Aug 3, 2020

ailisp commented Aug 3, 2020

ailisp commented Aug 3, 2020

frol commented Aug 3, 2020

Opening database takes a very long time #3065

Opening database takes a very long time #3065

Comments

bowenwang1996 commented Jul 31, 2020

frol commented Aug 3, 2020

frol commented Aug 3, 2020 • edited

frol commented Aug 3, 2020

ailisp commented Aug 3, 2020 • edited

frol commented Aug 3, 2020

ailisp commented Aug 3, 2020 • edited

frol commented Aug 3, 2020

ailisp commented Aug 3, 2020

ailisp commented Aug 3, 2020

frol commented Aug 3, 2020

frol commented Aug 3, 2020 •

edited

ailisp commented Aug 3, 2020 •

edited

ailisp commented Aug 3, 2020 •

edited