Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

very large database #1320

Closed
RubenKelevra opened this issue Mar 7, 2021 · 21 comments
Closed

very large database #1320

RubenKelevra opened this issue Mar 7, 2021 · 21 comments
Assignees
Labels
effort/days Estimated to take multiple days, but less than a week exp/wizard Extensive knowledge (implications, ramifications) required kind/bug A bug in existing code (including security flaws) kind/enhancement A net-new feature or improvement to an existing feature P1 High: Likely tackled by core team if no one steps up status/ready Ready to be worked

Comments

@RubenKelevra
Copy link
Collaborator

Additional information:

  • OS: Linux
  • IPFS Cluster version: 0.13.1
  • Installation method: dist.ipfs.io

Describe the bug:

I think the garbage collection in the database is broken/turned off. Otherwise, this database size is quite excessive for ~50k changes on a cluster.

$ du -hc .ipfs-cluster
9,1G    .ipfs-cluster/badger
9,1G    .ipfs-cluster
9,1G    total
@RubenKelevra RubenKelevra added kind/bug A bug in existing code (including security flaws) need/triage Needs initial labeling and prioritization labels Mar 7, 2021
@hsanjuan hsanjuan added exp/wizard Extensive knowledge (implications, ramifications) required effort/days Estimated to take multiple days, but less than a week kind/enhancement A net-new feature or improvement to an existing feature P1 High: Likely tackled by core team if no one steps up status/ready Ready to be worked and removed need/triage Needs initial labeling and prioritization labels Mar 9, 2021
@hsanjuan
Copy link
Collaborator

hsanjuan commented Mar 9, 2021

Yes, it is. I observe a similar thing. I don't think the actual data stored is so large, I wonder if triggering a compaction would fix things.

@RubenKelevra
Copy link
Collaborator Author

I wonder how this is triggered in the first place. I mean, we just write the pinset to the cluster, right? In this case, we shouldn't really discard anything from the database, so a compaction should not have to clean up anything - I suppose?

@hsanjuan
Copy link
Collaborator

Depends. Pins that are removed from the cluster are also removed from the pins state (which is separate from the consensus state). The crdt part is mostly write only, except the heads. When a head is replaced by a new one, the old one is deleted. And then, I don't know what sort of accounting badger is carrying internally and how it is handling tables etc.

In general I'm growing very weary of badger.

As said, triggering a compaction on open would give a first clue on whether there is indeed a lot of trash.

@RubenKelevra
Copy link
Collaborator Author

Maybe not on startup, this already takes quite long. Compaction should be possible all the time, no? So why not just do it randomly every 2 hours or so? :)

@hsanjuan
Copy link
Collaborator

Ah yeah, I was not suggesting as permanent solution, I was mentioning just for testing. But anyways, cannot turn compaction regularly as essentially they make everything crawl to a stop, and that is not something that should happen randomly. Afaik, Filecoin compacts on start.

@hsanjuan
Copy link
Collaborator

I have checked one of our cluster nodes with 160k pins and 70GB of badger datastore. The badger tool gave the following:

[Summary]
Level 0 size:          0 B
Level 1 size:       154 MB
Level 2 size:       207 MB
Total index size:   362 MB
Value log size:      74 GB

That is 74GB of value logs. After backing up and restoring using the badger tool:

[Summary]
Level 0 size:          0 B
Level 1 size:       116 MB
Total index size:   116 MB
Value log size:     238 MB

I have no idea how badger manages to put so much trash on disk. There are effectively no Deletes happening on this cluster.

@hsanjuan
Copy link
Collaborator

@RubenKelevra are you able to test by setting value_threshold in the badger config to something like 9999 ?

@RubenKelevra
Copy link
Collaborator Author

Yeah sure

@RubenKelevra
Copy link
Collaborator Author

Settings previously:

  "datastore": {
    "badger": {
      "badger_options": {
        "dir": "",
        "value_dir": "",
        "sync_writes": true,
        "table_loading_mode": 2,
        "value_log_loading_mode": 0,
        "num_versions_to_keep": 1,
        "max_table_size": 67108864,
        "level_size_multiplier": 10,
        "max_levels": 7,
        "value_threshold": 32,
        "num_memtables": 5,
        "num_level_zero_tables": 5,
        "num_level_zero_tables_stall": 10,
        "level_one_size": 268435456,
        "value_log_file_size": 1073741823,
        "value_log_max_entries": 1000000,
        "num_compactors": 2,
        "compact_l_0_on_close": true,
        "read_only": false,
        "truncate": true
      } 
    }   
  }   

Size of .ipfs_cluster: 14 G

@RubenKelevra
Copy link
Collaborator Author

Hey @jarifibrahim mind having a look at this?

@jarifibrahim
Copy link

Value log size: 74 GB

This is a known issue with Badger v1 and v2. The value log GC wasn't very effective at cleaning up the vlog files and for this exact reason, we have badger v3.

Badger v1 and v2 use vlog (value log) files as the Write Ahead Log (WAL) and they also store big values in those files. The vlog files keep accumulating and the vlog GC isn't able to clean them up fast.

Badger v3 uses vlog files to only store values that are greater than 1 MB and the memtables are the WAL and they're removed as soon as the data reaches the disk.

I would recommend migrating over to badger v3. v2 and v3 are incompatible but you can do a backup/restore to migrate.

@jarifibrahim
Copy link

@RubenKelevra are you able to test by setting value_threshold in the badger config to something like 9999 ?

Doing this will reduce the number of vlog files that are generated. This would also help if you cannot migrate to badger v3 right now.

Set the valuethreshold to 1MB .
You would still have the vlog files because of the WAL but they should get cleaned up faster because they will have very less useful data.

@hsanjuan
Copy link
Collaborator

hsanjuan commented May 31, 2021

@jarifibrahim do you know why value logs keep growing with otherwise gc-able data when there are not many deletes? I think that most data is add-only and never deleted, so unless I'm missing something, I'm not sure what makes those 69GB that can be just freed.

Also, can you confirm that by increasing the value_threshold things will improve as less things go onto the value log?

I know we can migrate to v3, but we have no go-datastore wrapper for it yet (whereas we do for leveldb).

@jarifibrahim
Copy link

@jarifibrahim do you know why value logs keep growing with otherwise gc-able data when there are not many deletes? I think that most data is add-only and never deleted, so unless I'm missing something, I'm not sure what makes those 69GB that can be just freed.

The vlog file is also the write-ahead log. You might have a lot of old vlog files which backup+restore cleaned up.
Here's how vlog GC works

  1. Pick a candidate vlog file
  2. Sample 10% of the file and calculate the amount of data that is useless
  3. If the amount of useless data found in step 2 is greater than vlogGCThreshold then GC the current file and move the valid data to a new file.

Step 2 might not find useless data easily if you have only added data or your adds and deletions are interleaved (which means the sampling will find all valid data or all stale data).

Also, can you confirm that by increasing the value_threshold things will improve as less things go onto the value log?

It should improve things but you would still have some vlog files lying around. We use 1 MB as the default in badger master https://github.com/dgraph-io/badger/blob/74ade987faa5561e4704ce568f1b265d168b3e95/options.go#L182 .

I know we can migrate to v3, but we have no go-datastore wrapper for it yet (whereas we do for leveldb).

Is this something @RubenKelevra would be able to help with?

@RubenKelevra
Copy link
Collaborator Author

@jarifibrahim when I understand you right we could reduce the threshold to rewrite those files more often.

Can we increase the sample size from 10% to 100%?

Performance isn't a key factor in this application, since the daemon is async to latency critical operations. :)

@jarifibrahim
Copy link

jarifibrahim commented Jun 3, 2021

Can we increase the sample size from 10% to 100%?

@RubenKelevra This would make GC very very slow. If we're going to sample 100% of the data, we're better off GC-ing the file directly instead of sampling it.

However, we could expose an option to set the sample size if that would help you.

Note - v3 gc doesn't do sampling. It wouldn't have this problem.

@RubenKelevra
Copy link
Collaborator Author

Well, at least in my case I could basically "shut down" the database for a while, say after midnight when I know there isn't any new data going to be added and let the GC run once.

This would be better than shutting down the daemon, since all the network connections and the current status of the other peers would have to be exchanged again in this case.

If sampling is slow, a forced GC of the whole database is maybe the better option in this case. :)

V3 sounds nice btw, we have to look into it.

hsanjuan added a commit that referenced this issue Jun 9, 2021
Badger can take 1000x the amount of needed space if not GC'ed or compacted
(#1320), even for non heavy usage. Cluster has no provisions to run datastore
GC operations and while they could be added, they are not ensured to
help. Improvements on Badger v3 might help but would still need to GC
explicitally.

Cluster was however designed to support any go-datastore as backend.

This commit adds LevelDB support. LevelDB go-datastore wrapper is mature, does
not need GC and should work well for most cluster usecases, which are not
overly demanding.

A new `--datastore` flag has been added on init. The store backend is selected
based on the value in the configuration, similar to how raft/crdt is. The
default is set to leveldb. From now on it should be easier to add additional
backends, i.e. badgerv3.
@jarifibrahim
Copy link

jarifibrahim commented Jun 13, 2021

Hey, @RubenKelevra I no longer have access to dgraph-io/badger repository. You'll have to tag someone else from dgraph to help fixing this.

Well, at least in my case I could basically "shut down" the database for a while, say after midnight when I know there isn't any new data going to be added and let the GC run once.

The db has a streamDB API that allows you to stream DB and create a fresh one. This would be better than running GC since streaming the data would remove all stale data (and that too very quickly).

@RubenKelevra
Copy link
Collaborator Author

Hey, @RubenKelevra I no longer have access to dgraph-io/badger repository. You'll have to tag someone else from dgraph to help fixing this.

Oh sorry to hear that. Hope you're allright.

Thanks for all the fish!

Well, at least in my case I could basically "shut down" the database for a while, say after midnight when I know there isn't any new data going to be added and let the GC run once.

The db has a streamDB API that allows you to stream DB and create a fresh one. This would be better than running GC since streaming the data would remove all stale data (and that too very quickly).

@jarifibrahim
Copy link

jarifibrahim commented Jun 13, 2021

Thanks @RubenKelevra . I am happy to help debug any issues you encounter with Badger DB.

@hsanjuan hsanjuan added this to the Release v0.13.4 milestone Jun 28, 2021
@hsanjuan hsanjuan self-assigned this Jun 28, 2021
hsanjuan added a commit that referenced this issue Jul 1, 2021
Fix #1320: Add automatic GC to Badger datastore
@RubenKelevra
Copy link
Collaborator Author

@hsanjuan great! It reduced my space consumption from 19.3 GB to 0.7 GB :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
effort/days Estimated to take multiple days, but less than a week exp/wizard Extensive knowledge (implications, ramifications) required kind/bug A bug in existing code (including security flaws) kind/enhancement A net-new feature or improvement to an existing feature P1 High: Likely tackled by core team if no one steps up status/ready Ready to be worked
Projects
None yet
Development

No branches or pull requests

3 participants