-
Notifications
You must be signed in to change notification settings - Fork 9.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
etcdutl snapshot restore cannot allocate memory #16052
Comments
Hi @fatihusta - Can you please confirm are you running the |
Hi @jmhbnz I suspected boltdb initial memory size. |
@fatihusta What's the DB size? Have you verified that setting a small value (what value?) for |
@ahrtr I tested and worked. I'm using podman on macos(M1). podman machine list
NAME VM TYPE CREATED LAST UP CPUS MEMORY DISK SIZE
podman-machine-default* qemu 7 weeks ago Currently running 5 10.74GB 107.4GB
Here is full output with and without --initial-memory-map-size. bash test_original.sh
du -sh snap.db
24K snap.db
rm -rf /tmp/etcd
ETCDCTL_API=3 etcdutl snapshot restore snap.db \
--skip-hash-check=true \
--data-dir=/tmp/etcd \
--name=etc02 \
--initial-cluster=etc01=http://10.88.0.19:2380,etc03=http://10.88.0.21:2380,etc02=http://10.88.0.20:2380 \
--initial-advertise-peer-urls=http://10.88.0.20:2380
# STACK messaga
2023-07-19T12:49:24Z info snapshot/v3_snapshot.go:248 restoring snapshot {"path": "snap.db", "wal-dir": "/tmp/etcd/member/wal", "data-dir": "/tmp/etcd", "snap-dir": "/tmp/etcd/member/snap", "stack": "go.etcd.io/etcd/etcdutl/v3/snapshot.(*v3Manager).Restore\n\tgo.etcd.io/etcd/etcdutl/v3/snapshot/v3_snapshot.go:254\ngo.etcd.io/etcd/etcdutl/v3/etcdutl.SnapshotRestoreCommandFunc\n\tgo.etcd.io/etcd/etcdutl/v3/etcdutl/snapshot_command.go:147\ngo.etcd.io/etcd/etcdutl/v3/etcdutl.snapshotRestoreCommandFunc\n\tgo.etcd.io/etcd/etcdutl/v3/etcdutl/snapshot_command.go:117\ngithub.com/spf13/cobra.(*Command).execute\n\tgithub.com/spf13/cobra@v1.1.3/command.go:856\ngithub.com/spf13/cobra.(*Command).ExecuteC\n\tgithub.com/spf13/cobra@v1.1.3/command.go:960\ngithub.com/spf13/cobra.(*Command).Execute\n\tgithub.com/spf13/cobra@v1.1.3/command.go:897\nmain.Start\n\tgo.etcd.io/etcd/etcdutl/v3/ctl.go:50\nmain.main\n\tgo.etcd.io/etcd/etcdutl/v3/main.go:23\nruntime.main\n\truntime/proc.go:255"}
panic: failed to open database
goroutine 1 [running]:
go.uber.org/zap/zapcore.(*CheckedEntry).Write(0xc0000ec6c0, {0xc00044c100, 0x2, 0x2})
go.uber.org/zap@v1.17.0/zapcore/entry.go:234 +0x499
go.uber.org/zap.(*Logger).Panic(0xc0000343f0, {0xb8395b, 0x11a4fa0}, {0xc00044c100, 0x2, 0x2})
go.uber.org/zap@v1.17.0/logger.go:227 +0x59
go.etcd.io/etcd/server/v3/mvcc/backend.newBackend({{0xc0000343f0, 0x18}, 0x5f5e100, 0x2710, {0x0, 0x0}, 0x280000000, 0xc000110cd0, 0x0, 0x0, ...})
go.etcd.io/etcd/server/v3@v3.5.7/mvcc/backend/backend.go:189 +0x43f
go.etcd.io/etcd/server/v3/mvcc/backend.NewDefaultBackend(...)
go.etcd.io/etcd/server/v3@v3.5.7/mvcc/backend/backend.go:169
go.etcd.io/etcd/etcdutl/v3/snapshot.(*v3Manager).saveDB(0xc0005dfc50)
go.etcd.io/etcd/etcdutl/v3/snapshot/v3_snapshot.go:295 +0x198
go.etcd.io/etcd/etcdutl/v3/snapshot.(*v3Manager).Restore(0xc0005dfc50, {{0x4000800813, 0x7}, {0x400080084e, 0x5}, {0x400080083d, 0x9}, {0xc000034180, 0x14}, {0xc000079810, ...}, ...})
go.etcd.io/etcd/etcdutl/v3/snapshot/v3_snapshot.go:257 +0x8f9
go.etcd.io/etcd/etcdutl/v3/etcdutl.SnapshotRestoreCommandFunc({0x4000800866, 0x56}, {0xb7b8cb, 0xc}, {0x400080083d, 0x5355d1}, {0x0, 0x0}, {0x40008008db, 0x16}, ...)
go.etcd.io/etcd/etcdutl/v3/etcdutl/snapshot_command.go:147 +0x365
go.etcd.io/etcd/etcdutl/v3/etcdutl.snapshotRestoreCommandFunc(0xc000445400, {0xc00007f320, 0x6, 0x6})
go.etcd.io/etcd/etcdutl/v3/etcdutl/snapshot_command.go:117 +0xed
github.com/spf13/cobra.(*Command).execute(0xc000445400, {0xc00007f2c0, 0x6, 0x6})
github.com/spf13/cobra@v1.1.3/command.go:856 +0x5f8
github.com/spf13/cobra.(*Command).ExecuteC(0x1165a20)
github.com/spf13/cobra@v1.1.3/command.go:960 +0x3ad
github.com/spf13/cobra.(*Command).Execute(...)
github.com/spf13/cobra@v1.1.3/command.go:897
main.Start(...)
go.etcd.io/etcd/etcdutl/v3/ctl.go:50
main.main()
go.etcd.io/etcd/etcdutl/v3/main.go:23 +0x5e
bash test_patched.sh
du -sh snap.db
24K snap.db
rm -rf /tmp/etcd
ETCDCTL_API=3 ./etcdutl snapshot restore snap.db \
--skip-hash-check=true \
--data-dir=/tmp/etcd \
--name=etc02 \
--initial-advertise-peer-urls=http://10.88.0.20:2380 \
--initial-cluster=etc01=http://10.88.0.19:2380,etc03=http://10.88.0.21:2380,etc02=http://10.88.0.20:2380 \
--initial-memory-map-size=104857600 # 100mb
2023-07-19T12:49:30Z info snapshot/v3_snapshot.go:263 restoring snapshot {"path": "snap.db", "wal-dir": "/tmp/etcd/member/wal", "data-dir": "/tmp/etcd", "snap-dir": "/tmp/etcd/member/snap", "initial-memory-map-size": 104857600}
2023-07-19T12:49:30Z info schema/membership.go:122 Trimming membership information from the backend...
2023-07-19T12:49:30Z info membership/cluster.go:403 added member {"cluster-id": "3f205b6177ed4a5c", "local-member-id": "0", "added-peer-id": "13e536940eb0139f", "added-peer-peer-urls": ["http://10.88.0.21:2380"], "added-peer-is-learner": false}
2023-07-19T12:49:30Z info membership/cluster.go:403 added member {"cluster-id": "3f205b6177ed4a5c", "local-member-id": "0", "added-peer-id": "59263d6b1540b571", "added-peer-peer-urls": ["http://10.88.0.19:2380"], "added-peer-is-learner": false}
2023-07-19T12:49:30Z info membership/cluster.go:403 added member {"cluster-id": "3f205b6177ed4a5c", "local-member-id": "0", "added-peer-id": "c9533c31e73abc10", "added-peer-peer-urls": ["http://10.88.0.20:2380"], "added-peer-is-learner": false}
2023-07-19T12:49:30Z info snapshot/v3_snapshot.go:284 restored snapshot {"path": "snap.db", "wal-dir": "/tmp/etcd/member/wal", "data-dir": "/tmp/etcd", "snap-dir": "/tmp/etcd/member/snap", "initial-memory-map-size": 104857600} I tried to validate on linux after your ask. I defined 200mb memory limit for per etcd container. Here is output. du -sh snap.db
20K snap.db
rm -rf /tmp/etcd
ETCDCTL_API=3 etcdutl snapshot restore snap.db \
--skip-hash-check=true \
--data-dir=/tmp/etcd \
--name=etc02 \
--initial-cluster=etc01=http://10.88.0.43:2380,etc03=http://10.88.0.42:2380,etc02=http://10.88.0.41:2380 \
--initial-advertise-peer-urls=http://10.88.0.41:2380
# STACK messaga
2023-07-19T15:45:16+03:00 info snapshot/v3_snapshot.go:248 restoring snapshot {"path": "snap.db", "wal-dir": "/tmp/etcd/member/wal", "data-dir": "/tmp/etcd", "snap-dir": "/tmp/etcd/member/snap", "stack": "go.etcd.io/etcd/etcdutl/v3/snapshot.(*v3Manager).Restore\n\tgo.etcd.io/etcd/etcdutl/v3/snapshot/v3_snapshot.go:254\ngo.etcd.io/etcd/etcdutl/v3/etcdutl.SnapshotRestoreCommandFunc\n\tgo.etcd.io/etcd/etcdutl/v3/etcdutl/snapshot_command.go:147\ngo.etcd.io/etcd/etcdutl/v3/etcdutl.snapshotRestoreCommandFunc\n\tgo.etcd.io/etcd/etcdutl/v3/etcdutl/snapshot_command.go:117\ngithub.com/spf13/cobra.(*Command).execute\n\tgithub.com/spf13/cobra@v1.1.3/command.go:856\ngithub.com/spf13/cobra.(*Command).ExecuteC\n\tgithub.com/spf13/cobra@v1.1.3/command.go:960\ngithub.com/spf13/cobra.(*Command).Execute\n\tgithub.com/spf13/cobra@v1.1.3/command.go:897\nmain.Start\n\tgo.etcd.io/etcd/etcdutl/v3/ctl.go:50\nmain.main\n\tgo.etcd.io/etcd/etcdutl/v3/main.go:23\nruntime.main\n\truntime/proc.go:255"}
2023-07-19T15:45:16+03:00 info membership/store.go:141 Trimming membership information from the backend...
2023-07-19T15:45:16+03:00 info membership/cluster.go:421 added member {"cluster-id": "b8cdf9557d6554c8", "local-member-id": "0", "added-peer-id": "5f49874613b800fe", "added-peer-peer-urls": ["http://10.88.0.42:2380"]}
2023-07-19T15:45:16+03:00 info membership/cluster.go:421 added member {"cluster-id": "b8cdf9557d6554c8", "local-member-id": "0", "added-peer-id": "814cd60c86f73ec0", "added-peer-peer-urls": ["http://10.88.0.43:2380"]}
2023-07-19T15:45:16+03:00 info membership/cluster.go:421 added member {"cluster-id": "b8cdf9557d6554c8", "local-member-id": "0", "added-peer-id": "aebd9e0da60f2a61", "added-peer-peer-urls": ["http://10.88.0.41:2380"]}
2023-07-19T15:45:16+03:00 info snapshot/v3_snapshot.go:269 restored snapshot {"path": "snap.db", "wal-dir": "/tmp/etcd/member/wal", "data-dir": "/tmp/etcd", "snap-dir": "/tmp/etcd/member/snap"}
bash test_patched.sh
du -sh snap.db
20K snap.db
rm -rf /tmp/etcd
ETCDCTL_API=3 ./etcdutl snapshot restore snap.db \
--skip-hash-check=true \
--data-dir=/tmp/etcd \
--name=etc02 \
--initial-cluster=etc01=http://10.88.0.43:2380,etc03=http://10.88.0.42:2380,etc02=http://10.88.0.41:2380 \
--initial-advertise-peer-urls=http://10.88.0.41:2380 \
--initial-memory-map-size=104857600 # 100mb
2023-07-19T15:53:04+03:00 info snapshot/v3_snapshot.go:263 restoring snapshot {"path": "snap.db", "wal-dir": "/tmp/etcd/member/wal", "data-dir": "/tmp/etcd", "snap-dir": "/tmp/etcd/member/snap", "initial-memory-map-size": 104857600}
2023-07-19T15:53:04+03:00 info schema/membership.go:122 Trimming membership information from the backend...
2023-07-19T15:53:04+03:00 info membership/cluster.go:403 added member {"cluster-id": "b8cdf9557d6554c8", "local-member-id": "0", "added-peer-id": "5f49874613b800fe", "added-peer-peer-urls": ["http://10.88.0.42:2380"], "added-peer-is-learner": false}
2023-07-19T15:53:04+03:00 info membership/cluster.go:403 added member {"cluster-id": "b8cdf9557d6554c8", "local-member-id": "0", "added-peer-id": "814cd60c86f73ec0", "added-peer-peer-urls": ["http://10.88.0.43:2380"], "added-peer-is-learner": false}
2023-07-19T15:53:04+03:00 info membership/cluster.go:403 added member {"cluster-id": "b8cdf9557d6554c8", "local-member-id": "0", "added-peer-id": "aebd9e0da60f2a61", "added-peer-peer-urls": ["http://10.88.0.41:2380"], "added-peer-is-learner": false}
2023-07-19T15:53:04+03:00 info snapshot/v3_snapshot.go:284 restored snapshot {"path": "snap.db", "wal-dir": "/tmp/etcd/member/wal", "data-dir": "/tmp/etcd", "snap-dir": "/tmp/etcd/member/snap", "initial-memory-map-size": 104857600} |
Please backport the fix to 3.5 and add a changelog. thx |
/assign ivanvc |
@ahrtr: GitHub didn't allow me to assign the following users: ivanvc. Note that only etcd-io members with read permissions, repo collaborators and people who have commented on this issue/PR can be assigned. Additionally, issues/PRs can only have 10 assignees at the same time. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Discussed during sig-etcd triage meeting, @ivanvc the assignment above failed did you still want to work on this? |
@jmhbnz, I didn't get this notification as the mention didn't have the at (@) 😅. I can backport this, no problem. |
Bug report criteria
What happened?
etcdutl crashs when I restore db from snapshot in the container.
What did you expect to happen?
.
How can we reproduce it (as minimally and precisely as possible)?
restore db in not enough memory container
Anything else we need to know?
No response
Etcd version (please run commands below)
Etcd configuration (command line flags or environment variables)
paste your configuration here
Etcd debug information (please run commands below, feel free to obfuscate the IP address or FQDN in the output)
Relevant log output
The text was updated successfully, but these errors were encountered: