Skip to content

Badger Allocates A Lot Of Memory When Iterating Over Large Key Value Stores #1326

@bonedaddy

Description

@bonedaddy

What version of Go are you using (go version)?

go version go1.14.2 linux/amd64

What operating system are you using?

NAME="Ubuntu"
VERSION="18.04.4 LTS (Bionic Beaver)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 18.04.4 LTS"
VERSION_ID="18.04"
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
VERSION_CODENAME=bionic
UBUNTU_CODENAME=bionic

What version of Badger are you using?

v2.0.3

Does this issue reproduce with the latest master?

Haven't tried

Steps to Reproduce the issue

  1. Store a ton of data in your key-value store (in this case 1.7TB)
  2. Restart badger
  3. After service startup iterate over all keys in the key-store

What Badger options were set?

Default options with the following modifications:

	DefaultOptions = Options{
		GcDiscardRatio: 0.2,
		GcInterval:     15 * time.Minute,
		GcSleep:        10 * time.Second,
		Options:        badger.DefaultOptions(""),
	}
	DefaultOptions.Options.CompactL0OnClose = false
	DefaultOptions.Options.Truncate = true

I've also set the following:

  • ValueLogLoadingMode = FileIO
  • TableLoadingMode = FileIO
  • SyncWrites = false

What did you do?

At the start of my service, the key-value store will be iterated over to announce to peers data in the key-value store. Unfortunately however when storing a large amount of data in that key-value store (1.7TB), iterative over the kv allocates a large amount of memory.

What did you expect to see?

Being able to iterate over the keys without allocating a large amount of memory

What did you see instead?

2GB+ of allocations when iterating over all the keys in a large datastore of 1.7TB

Additional Information

I recorded the following profile which shows what's responsible for the memory allocations:

 2239.12MB 57.90% 57.90%  2239.12MB 57.90%  github.com/RTradeLtd/go-datastores/badger.(*txn).query
  687.09MB 17.77% 75.66%   687.09MB 17.77%  github.com/dgraph-io/badger/v2/table.(*Table).read
  513.05MB 13.27% 88.93%  1139.44MB 29.46%  github.com/RTradeLtd/go-datastores/badger.(*txn).query.func1
   83.20MB  2.15% 91.08%    83.20MB  2.15%  github.com/dgraph-io/badger/v2/skl.newArena
   69.16MB  1.79% 92.87%   109.17MB  2.82%  github.com/dgraph-io/badger/v2/pb.(*TableIndex).Unmarshal
      40MB  1.03% 93.90%       40MB  1.03%  github.com/dgraph-io/badger/v2/pb.(*BlockOffset).Unmarshal

It looks like this is because I have a function that is iterative over all the keys in the key-value store to broadcast the keys to another peer. I'm not sure why this would result in a massive amount of memory being allocated though.

This seems somewhat related to other reported issues such as #1268. The usage of FileIO for table and value log loading mode seems to decrease memory usage abit, however it seems like the overall process of reading keys and/org value from badger requires a lot of memory

Metadata

Metadata

Assignees

No one assigned

    Labels

    area/performancePerformance related issues.kind/enhancementSomething could be better.priority/P2Somehow important but would not block a release.status/acceptedWe accept to investigate or work on it.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions