Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failed to open cluster database - Can't start/do anything with LXD #4808

Closed
Julien-Marcou opened this Issue Jul 19, 2018 · 10 comments

Comments

4 participants
@Julien-Marcou
Copy link

Julien-Marcou commented Jul 19, 2018

Hello,

I'm trying to run/start LXD, but can't do anything (all commands are executed as root).

# service lxd status
 * lxd is not running

Trying to start it, just make the command run forever without starting LXD

# service lxd start
Starting Container hypervisor based on LXC: lxd.

If I watch the logs while trying to start LXD, here are the results :

# tail -f /var/log/lxd/lxd.log
lvl=info msg="LXD 3.0.1 is starting in normal mode" path=/var/lib/lxd t=2018-07-19T16:05:06+0200
lvl=info msg="Kernel uid/gid map:" t=2018-07-19T16:05:06+0200
lvl=info msg=" - u 0 0 4294967295" t=2018-07-19T16:05:06+0200
lvl=info msg=" - g 0 0 4294967295" t=2018-07-19T16:05:06+0200
lvl=info msg="Configured LXD uid/gid map:" t=2018-07-19T16:05:06+0200
lvl=info msg=" - u 0 100000 65536" t=2018-07-19T16:05:06+0200
lvl=info msg=" - g 0 100000 65536" t=2018-07-19T16:05:06+0200
lvl=warn msg="AppArmor support has been disabled because of lack of kernel support" t=2018-07-19T16:05:06+0200
lvl=warn msg="Couldn't find the CGroup blkio, I/O limits will be ignored." t=2018-07-19T16:05:06+0200
lvl=warn msg="Couldn't find the CGroup CPU controller, CPU time limits will be ignored." t=2018-07-19T16:05:06+0200
lvl=warn msg="Couldn't find the CGroup CPUacct controller, CPU accounting will not be available." t=2018-07-19T16:05:06+0200
lvl=warn msg="Couldn't find the CGroup CPUset controller, CPU pinning will be ignored." t=2018-07-19T16:05:06+0200
lvl=warn msg="Couldn't find the CGroup devices controller, device access control won't work." t=2018-07-19T16:05:06+0200
lvl=warn msg="Couldn't find the CGroup memory controller, memory limits will be ignored." t=2018-07-19T16:05:06+0200
lvl=warn msg="Couldn't find the CGroup network class controller, network limits will be ignored." t=2018-07-19T16:05:06+0200
lvl=warn msg="Couldn't find the CGroup pids controller, process limits will be ignored." t=2018-07-19T16:05:06+0200
lvl=warn msg="CGroup memory swap accounting is disabled, swap limits will be ignored." t=2018-07-19T16:05:06+0200
lvl=info msg="Initializing local database" t=2018-07-19T16:05:06+0200
lvl=info msg="Initializing database gateway" t=2018-07-19T16:05:06+0200
address= id=1 lvl=info msg="Start database node" t=2018-07-19T16:05:06+0200
lvl=info msg="Raft: Restored from snapshot 1-182-1532008986500" t=2018-07-19T16:05:06+0200
lvl=info msg="Raft: Initial configuration (index=1): [{Suffrage:Voter ID:1 Address:0}]" t=2018-07-19T16:05:06+0200
lvl=info msg="Raft: Node at 0 [Leader] entering Leader state" t=2018-07-19T16:05:06+0200
lvl=info msg="LXD isn't socket activated" t=2018-07-19T16:05:06+0200
lvl=info msg="Starting /dev/lxd handler:" t=2018-07-19T16:05:06+0200
lvl=info msg=" - binding devlxd socket" socket=/var/lib/lxd/devlxd/sock t=2018-07-19T16:05:06+0200
lvl=info msg="REST API daemon:" t=2018-07-19T16:05:06+0200
lvl=info msg=" - binding Unix socket" socket=/var/lib/lxd/unix.socket t=2018-07-19T16:05:06+0200
lvl=info msg="Initializing global database" t=2018-07-19T16:05:06+0200
lvl=eror msg="Failed to start the daemon: failed to open cluster database: failed to ensure schema: failed to update node version info: updated 0 rows instead of 1" t=2018-07-19T16:05:06+0200
lvl=info msg="Starting shutdown sequence" t=2018-07-19T16:05:06+0200
lvl=info msg="Stopping REST API handler:" t=2018-07-19T16:05:06+0200
lvl=info msg=" - closing socket" socket=/var/lib/lxd/unix.socket t=2018-07-19T16:05:06+0200
lvl=info msg="Stopping /dev/lxd handler" t=2018-07-19T16:05:06+0200
lvl=info msg=" - closing socket" socket=/var/lib/lxd/devlxd/sock t=2018-07-19T16:05:06+0200
lvl=info msg="Stop database gateway" t=2018-07-19T16:05:06+0200
lvl=info msg="Stop raft instance" t=2018-07-19T16:05:06+0200
lvl=info msg="Raft: Starting snapshot up to 192" t=2018-07-19T16:05:06+0200
lvl=info msg="Raft: Compacting logs from 55 to 64" t=2018-07-19T16:05:06+0200
lvl=info msg="Raft: Snapshot to 192 complete" t=2018-07-19T16:05:06+0200
lvl=info msg="Stopping REST API handler:" t=2018-07-19T16:05:06+0200
lvl=info msg="Stopping /dev/lxd handler" t=2018-07-19T16:05:06+0200
lvl=info msg="Stopping REST API handler:" t=2018-07-19T16:05:06+0200
lvl=info msg="Stopping /dev/lxd handler" t=2018-07-19T16:05:06+0200
lvl=info msg="Saving simplestreams cache" t=2018-07-19T16:05:06+0200
lvl=info msg="Saved simplestreams cache" t=2018-07-19T16:05:06+0200
# lxd --debug --group lxd
INFO[07-19|16:07:11] LXD 3.0.1 is starting in normal mode     path=/var/lib/lxd
DBUG[07-19|16:07:11] Unknown backing filesystem type: 0x53464846
INFO[07-19|16:07:11] Kernel uid/gid map:
INFO[07-19|16:07:11]  - u 0 0 4294967295
INFO[07-19|16:07:11]  - g 0 0 4294967295
INFO[07-19|16:07:11] Configured LXD uid/gid map:
INFO[07-19|16:07:11]  - u 0 100000 65536
INFO[07-19|16:07:11]  - g 0 100000 65536
WARN[07-19|16:07:11] AppArmor support has been disabled because of lack of kernel support
WARN[07-19|16:07:11] Couldn't find the CGroup blkio, I/O limits will be ignored.
WARN[07-19|16:07:11] Couldn't find the CGroup CPU controller, CPU time limits will be ignored.
WARN[07-19|16:07:11] Couldn't find the CGroup CPUacct controller, CPU accounting will not be available.
WARN[07-19|16:07:11] Couldn't find the CGroup CPUset controller, CPU pinning will be ignored.
WARN[07-19|16:07:11] Couldn't find the CGroup devices controller, device access control won't work.
WARN[07-19|16:07:11] Couldn't find the CGroup memory controller, memory limits will be ignored.
WARN[07-19|16:07:11] Couldn't find the CGroup network class controller, network limits will be ignored.
WARN[07-19|16:07:11] Couldn't find the CGroup pids controller, process limits will be ignored.
WARN[07-19|16:07:11] CGroup memory swap accounting is disabled, swap limits will be ignored.
INFO[07-19|16:07:11] Initializing local database
INFO[07-19|16:07:11] Initializing database gateway
INFO[07-19|16:07:11] Start database node                      address= id=1
INFO[07-19|16:07:11] Raft: Restored from snapshot 1-192-1532009106190
INFO[07-19|16:07:11] Raft: Initial configuration (index=1): [{Suffrage:Voter ID:1 Address:0}]
INFO[07-19|16:07:12] Raft: Node at 0 [Leader] entering Leader state
INFO[07-19|16:07:12] LXD isn't socket activated
INFO[07-19|16:07:12] Starting /dev/lxd handler:
INFO[07-19|16:07:12]  - binding devlxd socket                 socket=/var/lib/lxd/devlxd/sock
INFO[07-19|16:07:12] REST API daemon:
INFO[07-19|16:07:12]  - binding Unix socket                   socket=/var/lib/lxd/unix.socket
INFO[07-19|16:07:12] Initializing global database
DBUG[07-19|16:07:12] Database error: failed to update node version info: updated 0 rows instead of 1
EROR[07-19|16:07:12] Failed to start the daemon: failed to open cluster database: failed to ensure schema: failed to update node version info: updated 0 rows instead of 1
INFO[07-19|16:07:12] Starting shutdown sequence
INFO[07-19|16:07:12] Stopping REST API handler:
INFO[07-19|16:07:12]  - closing socket                        socket=/var/lib/lxd/unix.socket
INFO[07-19|16:07:12] Stopping /dev/lxd handler
INFO[07-19|16:07:12]  - closing socket                        socket=/var/lib/lxd/devlxd/sock
INFO[07-19|16:07:12] Stop database gateway
INFO[07-19|16:07:12] Stop raft instance
INFO[07-19|16:07:12] Raft: Starting snapshot up to 202
INFO[07-19|16:07:12] Raft: Compacting logs from 65 to 74
INFO[07-19|16:07:12] Raft: Snapshot to 202 complete
INFO[07-19|16:07:12] Stopping REST API handler:
INFO[07-19|16:07:12] Stopping /dev/lxd handler
INFO[07-19|16:07:12] Stopping REST API handler:
INFO[07-19|16:07:12] Stopping /dev/lxd handler
DBUG[07-19|16:07:12] Not unmounting temporary filesystems (containers are still running)
INFO[07-19|16:07:12] Saving simplestreams cache
INFO[07-19|16:07:12] Saved simplestreams cache
Error: failed to open cluster database: failed to ensure schema: failed to update node version info: updated 0 rows instead of 1

Do you know what is going on with my setup ?
I'm using LXD 3.0.1 on Ubuntu 18.04 LTS

@Julien-Marcou Julien-Marcou changed the title Unable to do anything with LXD (3.0.1) Failed to open cluster database - Can't start/do anything with LXD Jul 19, 2018

@brauner

This comment has been minimized.

Copy link
Member

brauner commented Jul 19, 2018

@freeekanayaka looks like you might know what's happening.

@stgraber

This comment has been minimized.

Copy link
Member

stgraber commented Jul 19, 2018

Can you do:

  • systemctl stop lxd lxd.socket
  • lxd --debug --group lxd
@Julien-Marcou

This comment has been minimized.

Copy link
Author

Julien-Marcou commented Jul 19, 2018

I already saw you giving this advice on other issues, but its where I'm having some trouble 🤕, because I don't have any service called lxd.socket :

lxd.socket: unrecognized service

@freeekanayaka

This comment has been minimized.

Copy link
Member

freeekanayaka commented Jul 19, 2018

It looks like the information stored in the database is inconsistent. There should be a row matching the node's address. I'm not sure how things ended up here, did you experience some other failure prior this one?

From the logs it looks like you are not exposing your node on the network (i.e. you answered "no" to the "Would you like LXD to be available over the network? " question from lxd init, and you didn't set a network address afterwise). Can you confirm that?

If that is the case, we can probably fix the situation with a one-off database query. Please try to create a /var/lib/lxd/database/global/patch.global.sql file with this content:

DELETE FROM nodes;
INSERT INTO nodes(id, name, address) VALUES(1, 'none', '0.0.0.0');

and start the daemon again.

@Julien-Marcou

This comment has been minimized.

Copy link
Author

Julien-Marcou commented Jul 19, 2018

Well, if I try this, starting LXD again, seems to do nothing more (same log as above).

What I tried, is to remove the database (local.db and global folder from /var/lib/lxd/database/), removing LXD package et re-installing it again.
Then trying lxd init just gives me Error: Failed to connect to local LXD: Get http://unix.socket/1.0: dial unix /var/lib/lxd/unix.socket: connect: no such file or directory
This is where I believe (may be wrongly), that LXD service should be running, and it is not.

So I try to start it for the first time service lxd start and it just hangs without doing anything, looking at the logs give me this and then nothing happens :

lvl=info msg="LXD 3.0.1 is starting in normal mode" path=/var/lib/lxd t=2018-07-19T17:58:43+0200
lvl=info msg="Kernel uid/gid map:" t=2018-07-19T17:58:43+0200
lvl=info msg=" - u 0 0 4294967295" t=2018-07-19T17:58:43+0200
lvl=info msg=" - g 0 0 4294967295" t=2018-07-19T17:58:43+0200
lvl=info msg="Configured LXD uid/gid map:" t=2018-07-19T17:58:43+0200
lvl=info msg=" - u 0 100000 65536" t=2018-07-19T17:58:43+0200
lvl=info msg=" - g 0 100000 65536" t=2018-07-19T17:58:43+0200
lvl=warn msg="AppArmor support has been disabled because of lack of kernel support" t=2018-07-19T17:58:43+0200
lvl=warn msg="Couldn't find the CGroup blkio, I/O limits will be ignored." t=2018-07-19T17:58:43+0200
lvl=warn msg="Couldn't find the CGroup CPU controller, CPU time limits will be ignored." t=2018-07-19T17:58:43+0200
lvl=warn msg="Couldn't find the CGroup CPUacct controller, CPU accounting will not be available." t=2018-07-19T17:58:43+0200
lvl=warn msg="Couldn't find the CGroup CPUset controller, CPU pinning will be ignored." t=2018-07-19T17:58:43+0200
lvl=warn msg="Couldn't find the CGroup devices controller, device access control won't work." t=2018-07-19T17:58:43+0200
lvl=warn msg="Couldn't find the CGroup memory controller, memory limits will be ignored." t=2018-07-19T17:58:43+0200
lvl=warn msg="Couldn't find the CGroup network class controller, network limits will be ignored." t=2018-07-19T17:58:43+0200
lvl=warn msg="Couldn't find the CGroup pids controller, process limits will be ignored." t=2018-07-19T17:58:43+0200
lvl=warn msg="CGroup memory swap accounting is disabled, swap limits will be ignored." t=2018-07-19T17:58:43+0200
lvl=info msg="Initializing local database" t=2018-07-19T17:58:43+0200
lvl=info msg="Initializing database gateway" t=2018-07-19T17:58:44+0200
address= id=1 lvl=info msg="Start database node" t=2018-07-19T17:58:44+0200

If I then kill the service lxd start command (Ctrl+C) then try lxd --debug --group lxd I got this :

DBUG[07-19|18:03:34] Connecting to a local LXD over a Unix socket
DBUG[07-19|18:03:34] Sending request to LXD                   etag= method=GET url=http://unix.socket/1.0
INFO[07-19|18:03:34] LXD 3.0.1 is starting in normal mode     path=/var/lib/lxd
DBUG[07-19|18:03:34] Unknown backing filesystem type: 0x53464846
INFO[07-19|18:03:34] Kernel uid/gid map:
INFO[07-19|18:03:34]  - u 0 0 4294967295
INFO[07-19|18:03:34]  - g 0 0 4294967295
INFO[07-19|18:03:34] Configured LXD uid/gid map:
INFO[07-19|18:03:34]  - u 0 100000 65536
INFO[07-19|18:03:34]  - g 0 100000 65536
WARN[07-19|18:03:34] AppArmor support has been disabled because of lack of kernel support
WARN[07-19|18:03:34] Couldn't find the CGroup blkio, I/O limits will be ignored.
WARN[07-19|18:03:34] Couldn't find the CGroup CPU controller, CPU time limits will be ignored.
WARN[07-19|18:03:34] Couldn't find the CGroup CPUacct controller, CPU accounting will not be available.
WARN[07-19|18:03:34] Couldn't find the CGroup CPUset controller, CPU pinning will be ignored.
WARN[07-19|18:03:34] Couldn't find the CGroup devices controller, device access control won't work.
WARN[07-19|18:03:34] Couldn't find the CGroup memory controller, memory limits will be ignored.
WARN[07-19|18:03:34] Couldn't find the CGroup network class controller, network limits will be ignored.
WARN[07-19|18:03:34] Couldn't find the CGroup pids controller, process limits will be ignored.
WARN[07-19|18:03:34] CGroup memory swap accounting is disabled, swap limits will be ignored.
INFO[07-19|18:03:34] Initializing local database
INFO[07-19|18:03:34] Initializing database gateway
INFO[07-19|18:03:34] Start database node                      address= id=1
INFO[07-19|18:03:34] Raft: Initial configuration (index=1): [{Suffrage:Voter ID:1 Address:0}]
INFO[07-19|18:03:34] Raft: Node at 0 [Leader] entering Leader state
INFO[07-19|18:03:34] LXD isn't socket activated
DBUG[07-19|18:03:34] Connecting to a local LXD over a Unix socket
DBUG[07-19|18:03:34] Sending request to LXD                   etag= method=GET url=http://unix.socket/1.0
DBUG[07-19|18:03:34] Detected stale unix socket, deleting
DBUG[07-19|18:03:34] Detected stale unix socket, deleting
INFO[07-19|18:03:34] Starting /dev/lxd handler:
INFO[07-19|18:03:34]  - binding devlxd socket                 socket=/var/lib/lxd/devlxd/sock
INFO[07-19|18:03:34] REST API daemon:
INFO[07-19|18:03:34]  - binding Unix socket                   socket=/var/lib/lxd/unix.socket
INFO[07-19|18:03:34] Initializing global database
panic: cannot free page 0 or 1: 0

goroutine 12 [running]:
github.com/boltdb/bolt.(*freelist).free(0xc42019d9b0, 0x14, 0x7fe6b4650000)
        /build/lxd-0FDBXp/lxd-3.0.1/obj-x86_64-linux-gnu/src/github.com/boltdb/bolt/freelist.go:113 +0x3ae
github.com/boltdb/bolt.(*Tx).Commit(0xc4204021c0, 0xc4204f85d0, 0x8)
        /build/lxd-0FDBXp/lxd-3.0.1/obj-x86_64-linux-gnu/src/github.com/boltdb/bolt/tx.go:176 +0x1dc
github.com/hashicorp/raft-boltdb.(*BoltStore).StoreLogs(0xc42019b980, 0xc420200050, 0x1, 0x1, 0x0, 0x0)
        /build/lxd-0FDBXp/lxd-3.0.1/obj-x86_64-linux-gnu/src/github.com/hashicorp/raft-boltdb/bolt_store.go:187 +0x26b
github.com/hashicorp/raft.(*Raft).dispatchLogs(0xc4200ffb80, 0xc420081b10, 0x1, 0x1)
        /build/lxd-0FDBXp/lxd-3.0.1/obj-x86_64-linux-gnu/src/github.com/hashicorp/raft/raft.go:856 +0x29d
github.com/hashicorp/raft.(*Raft).leaderLoop(0xc4200ffb80)
        /build/lxd-0FDBXp/lxd-3.0.1/obj-x86_64-linux-gnu/src/github.com/hashicorp/raft/raft.go:599 +0x73a
github.com/hashicorp/raft.(*Raft).runLeader(0xc4200ffb80)
        /build/lxd-0FDBXp/lxd-3.0.1/obj-x86_64-linux-gnu/src/github.com/hashicorp/raft/raft.go:420 +0x385
github.com/hashicorp/raft.(*Raft).run(0xc4200ffb80)
        /build/lxd-0FDBXp/lxd-3.0.1/obj-x86_64-linux-gnu/src/github.com/hashicorp/raft/raft.go:140 +0x67
github.com/hashicorp/raft.(*Raft).(github.com/hashicorp/raft.run)-fm()
        /build/lxd-0FDBXp/lxd-3.0.1/obj-x86_64-linux-gnu/src/github.com/hashicorp/raft/api.go:505 +0x2a
github.com/hashicorp/raft.(*raftState).goFunc.func1(0xc4200ffb80, 0xc4201ab120)
        /build/lxd-0FDBXp/lxd-3.0.1/obj-x86_64-linux-gnu/src/github.com/hashicorp/raft/state.go:146 +0x53
created by github.com/hashicorp/raft.(*raftState).goFunc
        /build/lxd-0FDBXp/lxd-3.0.1/obj-x86_64-linux-gnu/src/github.com/hashicorp/raft/state.go:144 +0x66

After that I come back to the starting point, of not being able to do anything and everything ends with a Error: failed to open cluster database: failed to ensure schema: failed to update node version info: updated 0 rows instead of 1 as in my original post.

@freeekanayaka

This comment has been minimized.

Copy link
Member

freeekanayaka commented Jul 19, 2018

Ok, so the "Error: failed to open cluster database: failed to ensure schema: failed to update node version info: updated 0 rows instead of 1" is most probably a consequence of the first failure ("panic: cannot free page 0 or 1: 0").

I'm not sure why the first failure happens, we've never seen it before. Perhaps a disk failure? I'll have to dig into boltdb's code to know.

@stgraber

This comment has been minimized.

Copy link
Member

stgraber commented Jul 23, 2018

@Julien-Marcou could you share a tarball of /var/lib/lxd/database? This should let us reproduce that crash on our end and better investigate what's going on.

You can e-mail it to stgraber_at_ubuntu_dot_com

@stgraber stgraber added the Incomplete label Jul 23, 2018

@freeekanayaka

This comment has been minimized.

Copy link
Member

freeekanayaka commented Jul 26, 2018

@Julien-Marcou I gave a look at your database, and the solution I first gave you is actually working, except that I had given you the wrong path for the SQL patch file :/

Here's the correct path and content:

Path: /var/lib/lxd/database/patch.global.sql
Content:

DELETE FROM nodes;
INSERT INTO nodes(id, name, address, schema, api_extensions, pending) VALUES(1, 'none', '0.0.0.0', 1, 1, 0);

After that file is in place, you should be able to start the daemon again.

@Julien-Marcou

This comment has been minimized.

Copy link
Author

Julien-Marcou commented Jul 31, 2018

Sorry for the late answer, it indeed worked and I could start LXD, but I'm now experiencing a lot of other issues :

  • service lxd start : hanging without doing anything (no error in logs)
  • lxd init : hanging at the end without doing anything (no error in logs), I had to configure lxd without storage and without bridge to continue
  • lxc network create lxdbr0 : Error: Failed to run: ip link set dev lxdbr0 mtu 1500: RTNETLINK answers: Invalid argument

I had to manually create the bridge with ip link add dev lxdbr0 type bridge and then configure lxd init with that bridge Would you like to configure LXD to use an existing bridge or host interface? (yes/no) [default=no]: yes
Then finally I was able to create a container without LXD complaining... almost... I still had to create a default profile with lxc profile device add default root disk path=/ pool=default otherwise it would say Error: Failed container creation: No root device could be found.
But ultimately lxc launch ubuntu:bionic test failed again Error: Failed container creation: websocket: close 1006 (abnormal closure): unexpected EOF without any error in the logs, either for /var/log/lxd.log or /var/log/lxd/test/lxc.log

But there is something I didn't tell, I'm trying this on Windows Subsystem for Linux (aka WSL) with the last Windows 10 update of April 2018. And it make no doubt it is still not ready to be used with LXD as I have no problem using LXD on a real linux system.

I'll call this experiment a fail :-(

@stgraber

This comment has been minimized.

Copy link
Member

stgraber commented Jul 31, 2018

Oh yeah, there's no way you can run LXD under WSL. I'm actually very surprised you got that far.

WSL doesn't support namespaces, cgroups or as you noticed, more complex networking, all of which are needed for LXD to work. I think a basic cgroup implementation is planned but I doubt that they have any intention to do complex networking in there and supporting all the container features (namespaces, seccomp, apparmor, ...) would take them years to reverse engineer and re-implement so seems very doubtful that it'd ever make it on their roadmap.

@stgraber stgraber closed this Jul 31, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.