Shutdown issue with badgerDS - keeps reading from the disk #7283

RubenKelevra · 2020-05-06T19:53:41Z

Version information:

go-ipfs version: 0.6.0-dev
Repo version: 9
System version: amd64/linux
Golang version: go1.14.2

Commit 591c541

Description:

I created a fresh datastore with ipfs init --profile=badgerds.
Started the daemon
I pinned QmdB8kVBeWvLKyZrvxAAzrVfkLZC3zqcu6o7twLAqUcC67
IPFS run for some hours with no user input

Then I tried to shut down the daemon. Unexpectedly IPFS started to read on the disk, while nothing was written (according to iotop for minutes):

The following experimental features was activated at the time in the config:

Filestore
URLStore
QUIC

Datastore config:

"Datastore": {
    "BloomFilterSize": 0,
    "GCPeriod": "1h",
    "HashOnRead": false,
    "Spec": {
      "child": {
        "path": "badgerds",
        "syncWrites": false,
        "truncate": true,
        "type": "badgerds"
      },
      "prefix": "badger.datastore",
      "type": "measure"
    },
    "StorageGCWatermark": 90,
    "StorageMax": "1000GB"
  },

The IPFS binary was set cap_net_bind_service=+ep to be able to run on port 443.
The environment variable LIBP2P_SWARM_FD_LIMIT was set to 1000.
IPFS was called with /usr/bin/ipfs daemon --init --migrate

I fetched the debug data and killed it with SIGABRT to get the stack trace - both are attached.

stacktrace.txt
debug.tar.gz

The text was updated successfully, but these errors were encountered:

Stebalien · 2020-05-06T20:38:32Z

Ah, interesting. Badger is garbage collecting at that point. Or, to be accurate, it's scanning to see if there's anything that needs to be garbage collected.

I've filed dgraph-io/badger#1324. However, for now, we should probably do the same systemd-notify dance on shutdown.

RubenKelevra · 2020-05-06T21:23:31Z

Let's

...?

I had to wait a while to get all data collected for the other bug report, but I guess on the startup badger is doing the same stuff, since the second startup is within 1 second.

Stebalien · 2020-05-06T21:38:15Z

...?

Sorry, dangling edit.

I had to wait a while to get all data collected for the other bug report, but I guess on the startup badger is doing the same stuff, since the second startup is within 1 second.

Well, on startup badger may need to clean something if it was killed on shutdown. Otherwise, I'm not sure what it's doing.

RubenKelevra · 2020-05-06T22:03:27Z

I had to wait a while to get all data collected for the other bug report, but I guess on the startup badger is doing the same stuff, since the second startup is within 1 second.

Well, on startup badger may need to clean something if it was killed on shutdown. Otherwise, I'm not sure what it's doing.

Yeah, the stack trace sounds like something like this is happening "valuelog open, valuelog replayLog, valuelog iterate".

But it's strange that the same datastore can be opened within a second if the first opening process is killed.

Maybe there's a detection for 'not clean recovery' which avoids the second attempt?

Stebalien · 2020-05-06T22:16:14Z

Ah, badger may then recognize that the datastore is corrupted and, instead of trying to fix it, it just truncates the unsynced changes (we've configured it to do that because we explicitly call Sync() before/after pinning).

RubenKelevra · 2020-05-07T04:56:17Z

Ah, interesting. Badger is garbage collecting at that point. Or, to be accurate, it's scanning to see if there's anything that needs to be garbage collected.

I thought about that again... what's the trigger for this garbage collection in the first place?

Shouldn't we just start to garbage collect right after the IPFS GC was running (which is not active in this setup)? 🤔

I mean, is there anything that badger is able to clean up, if we haven't run our own GC?

Stebalien · 2020-05-07T10:12:44Z

ipfs/go-ds-badger#51

RubenKelevra · 2020-05-07T10:35:11Z

@Stebalien what data exactly is stored temporarily or semi-temporarily in the datastore which would collect if we wouldn't run the badger GC? DHT data?

If so, can I avoid having this background GC running if I switch to DHTclient?

Just searching for temporary solution that my shutdowns don't crash :)

I wrote regarding the badger GC in ipfs/go-ds-badger#54 (comment):

Maybe we can print a warning if we run a GC event on Badger-DB on the console, to at least inform the user, what's going on.

This would the behavior a bit more transparent.

Stebalien · 2020-05-07T10:55:29Z

DHT data, local provider records, other misc stuff? I'd extend your shutdown timer for now. Also, how much data do you have?

RubenKelevra · 2020-05-07T14:48:08Z

DHT data, local provider records, other misc stuff? I'd extend your shutdown timer for now. Also, how much data do you have?

That's is the real database, not the test-database:

[ipfs@vidar ~]$ ipfs repo stat --human
NumObjects: 677400
RepoSize:   154 GB
StorageMax: 1.0 TB
RepoPath:   /home/ipfs/.ipfs
Version:    fs-repo@9

But I plan to use a lot more storage on this server for another cluster... like 1-1.5 TB.

I mean, reading up to 2 TB with 2 MB/s isn't going to terminate any time soon (if really all data is read as well).

And shutting down hard on any security update, is also no good option either.

RubenKelevra added kind/bug A bug in existing code (including security flaws) need/triage Needs initial labeling and prioritization labels May 6, 2020

This was referenced May 6, 2020

IPFS startup with Badger Datastore is hitting the systemd-timeout #7273

Open

GC: Data may remain after garbage collecting everything ipfs/go-ds-badger#54

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Shutdown issue with badgerDS - keeps reading from the disk #7283

Shutdown issue with badgerDS - keeps reading from the disk #7283

RubenKelevra commented May 6, 2020

Stebalien commented May 6, 2020 •

edited

RubenKelevra commented May 6, 2020

Stebalien commented May 6, 2020

RubenKelevra commented May 6, 2020

Stebalien commented May 6, 2020

RubenKelevra commented May 7, 2020

Stebalien commented May 7, 2020

RubenKelevra commented May 7, 2020 •

edited

Stebalien commented May 7, 2020 via email

RubenKelevra commented May 7, 2020 •

edited

Shutdown issue with badgerDS - keeps reading from the disk #7283

Shutdown issue with badgerDS - keeps reading from the disk #7283

Comments

RubenKelevra commented May 6, 2020

Version information:

Description:

Stebalien commented May 6, 2020 • edited

RubenKelevra commented May 6, 2020

Stebalien commented May 6, 2020

RubenKelevra commented May 6, 2020

Stebalien commented May 6, 2020

RubenKelevra commented May 7, 2020

Stebalien commented May 7, 2020

RubenKelevra commented May 7, 2020 • edited

Stebalien commented May 7, 2020 via email

RubenKelevra commented May 7, 2020 • edited

Stebalien commented May 6, 2020 •

edited

RubenKelevra commented May 7, 2020 •

edited

RubenKelevra commented May 7, 2020 •

edited