very large database #1320

RubenKelevra · 2021-03-07T18:09:13Z

Additional information:

OS: Linux
IPFS Cluster version: 0.13.1
Installation method: dist.ipfs.io

Describe the bug:

I think the garbage collection in the database is broken/turned off. Otherwise, this database size is quite excessive for ~50k changes on a cluster.

$ du -hc .ipfs-cluster
9,1G    .ipfs-cluster/badger
9,1G    .ipfs-cluster
9,1G    total

The text was updated successfully, but these errors were encountered:

hsanjuan · 2021-03-09T12:26:07Z

Yes, it is. I observe a similar thing. I don't think the actual data stored is so large, I wonder if triggering a compaction would fix things.

RubenKelevra · 2021-03-11T21:41:21Z

I wonder how this is triggered in the first place. I mean, we just write the pinset to the cluster, right? In this case, we shouldn't really discard anything from the database, so a compaction should not have to clean up anything - I suppose?

hsanjuan · 2021-03-12T13:03:25Z

Depends. Pins that are removed from the cluster are also removed from the pins state (which is separate from the consensus state). The crdt part is mostly write only, except the heads. When a head is replaced by a new one, the old one is deleted. And then, I don't know what sort of accounting badger is carrying internally and how it is handling tables etc.

In general I'm growing very weary of badger.

As said, triggering a compaction on open would give a first clue on whether there is indeed a lot of trash.

RubenKelevra · 2021-03-12T14:39:58Z

Maybe not on startup, this already takes quite long. Compaction should be possible all the time, no? So why not just do it randomly every 2 hours or so? :)

hsanjuan · 2021-03-12T15:37:55Z

Ah yeah, I was not suggesting as permanent solution, I was mentioning just for testing. But anyways, cannot turn compaction regularly as essentially they make everything crawl to a stop, and that is not something that should happen randomly. Afaik, Filecoin compacts on start.

hsanjuan · 2021-05-31T13:36:10Z

I have checked one of our cluster nodes with 160k pins and 70GB of badger datastore. The badger tool gave the following:

[Summary]
Level 0 size:          0 B
Level 1 size:       154 MB
Level 2 size:       207 MB
Total index size:   362 MB
Value log size:      74 GB

That is 74GB of value logs. After backing up and restoring using the badger tool:

[Summary]
Level 0 size:          0 B
Level 1 size:       116 MB
Total index size:   116 MB
Value log size:     238 MB

I have no idea how badger manages to put so much trash on disk. There are effectively no Deletes happening on this cluster.

hsanjuan · 2021-05-31T13:44:10Z

@RubenKelevra are you able to test by setting value_threshold in the badger config to something like 9999 ?

RubenKelevra · 2021-05-31T14:43:44Z

Yeah sure

RubenKelevra · 2021-05-31T14:59:49Z

Settings previously:

  "datastore": {
    "badger": {
      "badger_options": {
        "dir": "",
        "value_dir": "",
        "sync_writes": true,
        "table_loading_mode": 2,
        "value_log_loading_mode": 0,
        "num_versions_to_keep": 1,
        "max_table_size": 67108864,
        "level_size_multiplier": 10,
        "max_levels": 7,
        "value_threshold": 32,
        "num_memtables": 5,
        "num_level_zero_tables": 5,
        "num_level_zero_tables_stall": 10,
        "level_one_size": 268435456,
        "value_log_file_size": 1073741823,
        "value_log_max_entries": 1000000,
        "num_compactors": 2,
        "compact_l_0_on_close": true,
        "read_only": false,
        "truncate": true
      } 
    }   
  }

Size of .ipfs_cluster: 14 G

RubenKelevra · 2021-05-31T15:02:02Z

Hey @jarifibrahim mind having a look at this?

jarifibrahim · 2021-05-31T16:56:13Z

Value log size: 74 GB

This is a known issue with Badger v1 and v2. The value log GC wasn't very effective at cleaning up the vlog files and for this exact reason, we have badger v3.

Badger v1 and v2 use vlog (value log) files as the Write Ahead Log (WAL) and they also store big values in those files. The vlog files keep accumulating and the vlog GC isn't able to clean them up fast.

Badger v3 uses vlog files to only store values that are greater than 1 MB and the memtables are the WAL and they're removed as soon as the data reaches the disk.

I would recommend migrating over to badger v3. v2 and v3 are incompatible but you can do a backup/restore to migrate.

jarifibrahim · 2021-05-31T16:58:14Z

@RubenKelevra are you able to test by setting value_threshold in the badger config to something like 9999 ?

Doing this will reduce the number of vlog files that are generated. This would also help if you cannot migrate to badger v3 right now.

Set the valuethreshold to 1MB .
You would still have the vlog files because of the WAL but they should get cleaned up faster because they will have very less useful data.

hsanjuan · 2021-05-31T17:00:43Z

@jarifibrahim do you know why value logs keep growing with otherwise gc-able data when there are not many deletes? I think that most data is add-only and never deleted, so unless I'm missing something, I'm not sure what makes those 69GB that can be just freed.

Also, can you confirm that by increasing the value_threshold things will improve as less things go onto the value log?

I know we can migrate to v3, but we have no go-datastore wrapper for it yet (whereas we do for leveldb).

jarifibrahim · 2021-05-31T17:08:35Z

@jarifibrahim do you know why value logs keep growing with otherwise gc-able data when there are not many deletes? I think that most data is add-only and never deleted, so unless I'm missing something, I'm not sure what makes those 69GB that can be just freed.

The vlog file is also the write-ahead log. You might have a lot of old vlog files which backup+restore cleaned up.
Here's how vlog GC works

Pick a candidate vlog file
Sample 10% of the file and calculate the amount of data that is useless
If the amount of useless data found in step 2 is greater than vlogGCThreshold then GC the current file and move the valid data to a new file.

Step 2 might not find useless data easily if you have only added data or your adds and deletions are interleaved (which means the sampling will find all valid data or all stale data).

Also, can you confirm that by increasing the value_threshold things will improve as less things go onto the value log?

It should improve things but you would still have some vlog files lying around. We use 1 MB as the default in badger master https://github.com/dgraph-io/badger/blob/74ade987faa5561e4704ce568f1b265d168b3e95/options.go#L182 .

I know we can migrate to v3, but we have no go-datastore wrapper for it yet (whereas we do for leveldb).

Is this something @RubenKelevra would be able to help with?

RubenKelevra · 2021-05-31T17:24:55Z

@jarifibrahim when I understand you right we could reduce the threshold to rewrite those files more often.

Can we increase the sample size from 10% to 100%?

Performance isn't a key factor in this application, since the daemon is async to latency critical operations. :)

jarifibrahim · 2021-06-03T11:39:36Z

Can we increase the sample size from 10% to 100%?

@RubenKelevra This would make GC very very slow. If we're going to sample 100% of the data, we're better off GC-ing the file directly instead of sampling it.

However, we could expose an option to set the sample size if that would help you.

Note - v3 gc doesn't do sampling. It wouldn't have this problem.

RubenKelevra · 2021-06-03T12:27:32Z

Well, at least in my case I could basically "shut down" the database for a while, say after midnight when I know there isn't any new data going to be added and let the GC run once.

This would be better than shutting down the daemon, since all the network connections and the current status of the other peers would have to be exchanged again in this case.

If sampling is slow, a forced GC of the whole database is maybe the better option in this case. :)

V3 sounds nice btw, we have to look into it.

Badger can take 1000x the amount of needed space if not GC'ed or compacted (#1320), even for non heavy usage. Cluster has no provisions to run datastore GC operations and while they could be added, they are not ensured to help. Improvements on Badger v3 might help but would still need to GC explicitally. Cluster was however designed to support any go-datastore as backend. This commit adds LevelDB support. LevelDB go-datastore wrapper is mature, does not need GC and should work well for most cluster usecases, which are not overly demanding. A new `--datastore` flag has been added on init. The store backend is selected based on the value in the configuration, similar to how raft/crdt is. The default is set to leveldb. From now on it should be easier to add additional backends, i.e. badgerv3.

jarifibrahim · 2021-06-13T08:23:51Z

Hey, @RubenKelevra I no longer have access to dgraph-io/badger repository. You'll have to tag someone else from dgraph to help fixing this.

Well, at least in my case I could basically "shut down" the database for a while, say after midnight when I know there isn't any new data going to be added and let the GC run once.

The db has a streamDB API that allows you to stream DB and create a fresh one. This would be better than running GC since streaming the data would remove all stale data (and that too very quickly).

RubenKelevra · 2021-06-13T10:23:00Z

Hey, @RubenKelevra I no longer have access to dgraph-io/badger repository. You'll have to tag someone else from dgraph to help fixing this.

Oh sorry to hear that. Hope you're allright.

Thanks for all the fish!

Well, at least in my case I could basically "shut down" the database for a while, say after midnight when I know there isn't any new data going to be added and let the GC run once.

The db has a streamDB API that allows you to stream DB and create a fresh one. This would be better than running GC since streaming the data would remove all stale data (and that too very quickly).

jarifibrahim · 2021-06-13T11:41:36Z

Thanks @RubenKelevra . I am happy to help debug any issues you encounter with Badger DB.

Fix #1320: Add automatic GC to Badger datastore

RubenKelevra · 2021-07-18T17:30:04Z

@hsanjuan great! It reduced my space consumption from 19.3 GB to 0.7 GB :)

RubenKelevra added kind/bug A bug in existing code (including security flaws) need/triage Needs initial labeling and prioritization labels Mar 7, 2021

hsanjuan mentioned this issue Jun 9, 2021

Support a levelDB backend for cluster #1364

Merged

hsanjuan added this to the Release v0.13.4 milestone Jun 28, 2021

hsanjuan self-assigned this Jun 28, 2021

hsanjuan closed this as completed in 4ac2cf3 Jul 1, 2021

hsanjuan added a commit that referenced this issue Jul 1, 2021

Merge pull request #1370 from ipfs/feat/badger-gc

c9783c0

Fix #1320: Add automatic GC to Badger datastore

RubenKelevra mentioned this issue Jul 29, 2021

Alternative cluster pinning option RubenKelevra/pacman.store#48

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

very large database #1320

very large database #1320

RubenKelevra commented Mar 7, 2021

hsanjuan commented Mar 9, 2021

RubenKelevra commented Mar 11, 2021

hsanjuan commented Mar 12, 2021

RubenKelevra commented Mar 12, 2021

hsanjuan commented Mar 12, 2021

hsanjuan commented May 31, 2021

hsanjuan commented May 31, 2021

RubenKelevra commented May 31, 2021

RubenKelevra commented May 31, 2021

RubenKelevra commented May 31, 2021

jarifibrahim commented May 31, 2021

jarifibrahim commented May 31, 2021

hsanjuan commented May 31, 2021 •

edited

jarifibrahim commented May 31, 2021

RubenKelevra commented May 31, 2021

jarifibrahim commented Jun 3, 2021 •

edited

RubenKelevra commented Jun 3, 2021

jarifibrahim commented Jun 13, 2021 •

edited

RubenKelevra commented Jun 13, 2021

jarifibrahim commented Jun 13, 2021 •

edited

RubenKelevra commented Jul 18, 2021

very large database #1320

very large database #1320

Comments

RubenKelevra commented Mar 7, 2021

hsanjuan commented Mar 9, 2021

RubenKelevra commented Mar 11, 2021

hsanjuan commented Mar 12, 2021

RubenKelevra commented Mar 12, 2021

hsanjuan commented Mar 12, 2021

hsanjuan commented May 31, 2021

hsanjuan commented May 31, 2021

RubenKelevra commented May 31, 2021

RubenKelevra commented May 31, 2021

RubenKelevra commented May 31, 2021

jarifibrahim commented May 31, 2021

jarifibrahim commented May 31, 2021

hsanjuan commented May 31, 2021 • edited

jarifibrahim commented May 31, 2021

RubenKelevra commented May 31, 2021

jarifibrahim commented Jun 3, 2021 • edited

RubenKelevra commented Jun 3, 2021

jarifibrahim commented Jun 13, 2021 • edited

RubenKelevra commented Jun 13, 2021

jarifibrahim commented Jun 13, 2021 • edited

RubenKelevra commented Jul 18, 2021

hsanjuan commented May 31, 2021 •

edited

jarifibrahim commented Jun 3, 2021 •

edited

jarifibrahim commented Jun 13, 2021 •

edited

jarifibrahim commented Jun 13, 2021 •

edited