Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Slow write performance when initializing with many keys #392

Closed
marcus-wishes opened this issue Jan 25, 2023 · 12 comments
Closed

Slow write performance when initializing with many keys #392

marcus-wishes opened this issue Jan 25, 2023 · 12 comments

Comments

@marcus-wishes
Copy link

Hi,
maybe I am using BoltDB for a not fitting purpose, but the I have the following situation. I need a high performance key/value store for reading data. The store is initialized with a large amount of keys and small payloads (but in sum several GB) and after initialization only used in read-only mode (the store is closed a opened again sometimes, but not very frequently).
My problem is that the initialization is taking a huge amount of time.
What I tried so far:

  • used the normal Update approach
  • used a caching queue and Batch writing, which helped a bit
    I am using both Windows and Linux, and the database files are also copied between different machines with both OSs.

Is there anything I am missing, or is this normal behavior of the tree rebalancing?

@ahrtr
Copy link
Member

ahrtr commented Jan 30, 2023

What's the value of NoFreelistSync when opening the db? If freelist isn't synced to disk, it may take a while to load all the freelist on startup.

Note only one goroutine can write at a time, so batching writing is right to me.

@ptabor
Copy link
Contributor

ptabor commented Jan 30, 2023

  • NoFreelistSync=false as Benjamin wrote should be here a game changer
  • Please make sure you open the store as ReadOnly
  • Please experiment with bigger pages in such case, like 64KB - or even more.
  • Defragmentation of the DB after writing should improve the throughput thanks to better read-ahead.

To optimize writing, consider setting higher 'AllocSize' option to minimize number of file resizes during write.

@ahrtr
Copy link
Member

ahrtr commented Jan 30, 2023

To optimize writing, consider setting higher 'AllocSize' option to minimize number of file resizes during write.

Unfortunately, we don't expose the AllocSize in Options. It's hard coded to DefaultAllocSize = 16 * 1024 * 1024.

It seems we should add an item into Options.

@marcus-wishes
Copy link
Author

Thank you guys, for the input! I really appreciate it.
I have a test setup within my code base where I initialize a small database, and the results are the following. All on an PCIE 4 Nvme SSD:

PageSize default, NoFreelistSync false - took 321.526453s.
PageSize default, NoFreelistSync true - took 325.458892s.

PageSize 64kb, NoFreelistSync true - took 301.830299s.
PageSize 64kb, NoFreelistSync false - took 282.907058s.

During usage the database is opened in read-only mode.

@ptabor
Copy link
Contributor

ptabor commented Jan 31, 2023

Interesting. Collecting stacktrace using pprof or at least anectodal evidence stacktrace using SIGQUIT would give more light on what's taking the time. I think checkBuckets is not being called for RO transaction... and that's the logic I would suspect for causing the proactive scan of the whole tree.

@ahrtr
Copy link
Member

ahrtr commented Jan 31, 2023

@marcus-wishes

  1. What exact bbolt version are you testing against?
  2. What exact size is the db file?

An e2e test case includes two stages, the first stage is to generate the sample db file, the second stage is to perform whatever test on the file. NoFreelistSync used in the first stage is more important. If the db file you are testing on doesn't have synced freelist, so the test in the second stage will always scan the db to load the freelist.

Please run the following two commands,

$ go run ./cmd/bbolt/ page path-to-db 0
$ go run ./cmd/bbolt/ page path-to-db 1

@marcus-wishes
Copy link
Author

marcus-wishes commented Mar 8, 2023

Sorry for the late response, I moved on to a custom solution, which works well.
I use the current version of bbolt (1.3.7).

When I run the test, I do the following:
I add 1 million keys using uint64(i) as keys with binary.LittleEndian.PutUint64(keyBuf, i), values are always a []byte of length 12, but this is just in this test now, it will be variable sized in the real use case.
Because this took a long time, to add single entries, I added a custom WAL, and use:

bucketName := []byte(idx.wal.Bucket())
idx.db.Batch(func(tx *bolt.Tx) error {
		bucket, err := tx.CreateBucketIfNotExists(bucketName)
		if err != nil {
			return packageError(fmt.Sprintf("error synchronizing WAL, unable to create the level bucket %s", bucketName), err)
		}

		for idx.wal.Len() > 0 {
			key, value := idx.wal.Pop()
			err = bucket.Put(key, value)
			if err != nil {
				return fmt.Errorf("error synchronizing WAL at bucket %s and key %s: %v", bucketName, key, err)
			}
		}
		return nil
	})
}

to write the WAL entries into the DB. The resulting database file is 268.435.456 bytes.

During creation I use:

options := &bolt.Options{
    Timeout:        10 * time.Second,
    NoFreelistSync: true,
    PageSize:       64000, //64kb page size
}

I tried to play around with the PageSize values and also turned NoFreelistSync on an off but the difference was only very little. To fill the database it needs around 25 minutes.

$ go run ./cmd/bbolt/ testdb 0
Page ID:    0
Page Type:  meta
Total Size: 6400 bytes
Overflow pages: 0
Version:    2
Page Size:  6400 bytes
Flags:      00000000
Root:       <pgid=13437>
Freelist:   <pgid=18446744073709551615>
HWM:        <pgid=13438>
Txn ID:     4
Checksum:   8d76844bbfa04ef7

$ go run ./cmd/bbolt/ testdb 1
Page ID:    1
Page Type:  meta
Total Size: 6400 bytes
Overflow pages: 0
Version:    2
Page Size:  6400 bytes
Flags:      00000000
Root:       <pgid=26871>
Freelist:   <pgid=18446744073709551615>
HWM:        <pgid=26872>
Txn ID:     5
Checksum:   4317cb8880925ada

Maybe it is just not a good use case. And sorry for this minimalistic description. I got a lot of things going on currently.

@cenkalti
Copy link
Member

This is a known issue. bbolt get very slow when inserting many keys in a single transaction.

Code for demonstrating the issue: f75d1e0

Output:

% go run ./cmd/bbolt-init/main.go -v 100
putting 1 keys... took: 0s db-size: 0 MB
putting 2 keys... took: 0s db-size: 0 MB
putting 4 keys... took: 0s db-size: 0 MB
putting 8 keys... took: 0s db-size: 0 MB
putting 16 keys... took: 0s db-size: 0 MB
putting 32 keys... took: 0s db-size: 0 MB
putting 64 keys... took: 0s db-size: 0 MB
putting 128 keys... took: 0s db-size: 0 MB
putting 256 keys... took: 0s db-size: 0 MB
putting 512 keys... took: 0s db-size: 0 MB
putting 1024 keys... took: 0s db-size: 0 MB
putting 2048 keys... took: 0s db-size: 1 MB
putting 4096 keys... took: 0s db-size: 2 MB
putting 8192 keys... took: 0s db-size: 4 MB
putting 16384 keys... took: 0s db-size: 8 MB
putting 32768 keys... took: 0s db-size: 8 MB
putting 65536 keys... took: 0s db-size: 16 MB
putting 131072 keys... took: 0s db-size: 47 MB
putting 262144 keys... took: 0s db-size: 79 MB
putting 524288 keys... took: 0s db-size: 142 MB
putting 1048576 keys... took: 1s db-size: 268 MB
putting 2097152 keys... took: 3s db-size: 521 MB
putting 4194304 keys... took: 7s db-size: 1027 MB
putting 8388608 keys... took: 29s db-size: 2038 MB
putting 16777216 keys... took: 1m28s db-size: 4060 MB
putting 33554432 keys... ^Csignal: interrupt

As you can see the numbers do not grow linearly with the key count. IIRC there was some operation in B+ tree that shifts a large slice every time a page is written to disk. Need to capture pprof to make sure.

Until this get fixed, you can commit the transaction intermittently as a workaround.

Examples:

@cenkalti
Copy link
Member

image

@cenkalti
Copy link
Member

It can be seen from the above flame graph that most of the time is spent on DB.mmap method as bbolt need to dereference all inodes before growing the file. Setting InitialMmapSize large enough (I set it to 10 G in this case) makes the picture much better:

% go run ./cmd/bbolt-init/main.go -v 100
putting 1 keys... took: 0s db-size: 16 MB
putting 2 keys... took: 0s db-size: 16 MB
putting 4 keys... took: 0s db-size: 16 MB
putting 8 keys... took: 0s db-size: 16 MB
putting 16 keys... took: 0s db-size: 16 MB
putting 32 keys... took: 0s db-size: 16 MB
putting 64 keys... took: 0s db-size: 16 MB
putting 128 keys... took: 0s db-size: 16 MB
putting 256 keys... took: 0s db-size: 16 MB
putting 512 keys... took: 0s db-size: 16 MB
putting 1024 keys... took: 0s db-size: 16 MB
putting 2048 keys... took: 0s db-size: 16 MB
putting 4096 keys... took: 0s db-size: 17 MB
putting 8192 keys... took: 0s db-size: 18 MB
putting 16384 keys... took: 0s db-size: 20 MB
putting 32768 keys... took: 0s db-size: 24 MB
putting 65536 keys... took: 0s db-size: 31 MB
putting 131072 keys... took: 0s db-size: 47 MB
putting 262144 keys... took: 0s db-size: 79 MB
putting 524288 keys... took: 0s db-size: 142 MB
putting 1048576 keys... took: 0s db-size: 268 MB
putting 2097152 keys... took: 1s db-size: 521 MB
putting 4194304 keys... took: 2s db-size: 1027 MB
putting 8388608 keys... took: 7s db-size: 2038 MB
putting 16777216 keys... took: 17s db-size: 4060 MB
putting 33554432 keys... took: 48s db-size: 8105 MB

Still not ideal but better than previous results.

@cenkalti
Copy link
Member

Without mmap, flame graph looks like this:
image

@cenkalti
Copy link
Member

@marcus-wishes You can try setting InitialMmapSize and committing the transaction intermittently to prevent it from growing too large.

@ahrtr @ptabor We can document it in README and close this issue. What do you think?

@cenkalti cenkalti self-assigned this May 18, 2023
@github-actions github-actions bot added the stale label Apr 18, 2024
@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale May 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

4 participants