Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[1.8.3 to 1.9.0] Error starting daemon: could not restore image #17688

Closed
mbentley opened this issue Nov 4, 2015 · 14 comments
Closed

[1.8.3 to 1.9.0] Error starting daemon: could not restore image #17688

mbentley opened this issue Nov 4, 2015 · 14 comments
Labels
kind/bug Bugs are bugs. The cause may or may not be known at triage time so debugging may be needed.
Milestone

Comments

@mbentley
Copy link
Contributor

mbentley commented Nov 4, 2015

Description of problem:
After upgrading from 1.8.3 to 1.9.0, I noticed that on one of my boxes, the daemon didn't come back up.

docker version:

# docker version
Client:
 Version:      1.9.0
 API version:  1.21
 Go version:   go1.4.2
 Git commit:   76d6bc9
 Built:        Tue Nov  3 17:43:42 UTC 2015
 OS/Arch:      linux/amd64
Cannot connect to the Docker daemon. Is the docker daemon running on this host?

docker info:
n/a

uname -a:

# uname -a
Linux docker1 3.13.0-66-generic #108-Ubuntu SMP Wed Oct 7 15:20:27 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux

Environment details (AWS, VirtualBox, physical, etc.):
KVM VM

How reproducible:
I can reproduce it on this one box I have; not sure about any others but I haven't seen the same upgrade issues on any other Ubuntu 14.04 box.

Steps to Reproduce:

  1. Start from Docker engine 1.8.3 on Ubuntu 14.04
  2. Upgrade engine
  3. Daemon doesn't start

Actual Results:
The daemon does not start; daemon logs show:

INFO[0000] [graphdriver] using prior storage driver "aufs"
INFO[0000] Option DefaultDriver: bridge
INFO[0000] Option DefaultNetwork: bridge
WARN[0000] /!\ DON'T BIND ON ANY IP ADDRESS WITHOUT setting -tlsverify IF YOU DON'T KNOW WHAT YOU'RE DOING /!\
INFO[0000] Listening for HTTP on tcp (0.0.0.0:2375)
INFO[0000] Listening for HTTP on unix (/var/run/docker.sock)
WARN[0000] Running modprobe bridge nf_nat br_netfilter failed with message: modprobe: WARNING: Module br_netfilter not found.
insmod /lib/modules/3.13.0-66-generic/kernel/net/llc/llc.ko
insmod /lib/modules/3.13.0-66-generic/kernel/net/802/stp.ko
insmod /lib/modules/3.13.0-66-generic/kernel/net/bridge/bridge.ko
insmod /lib/modules/3.13.0-66-generic/kernel/net/netfilter/nf_conntrack.ko
insmod /lib/modules/3.13.0-66-generic/kernel/net/netfilter/nf_nat.ko
, error: exit status 1
INFO[0000] Firewalld running: false
INFO[0000] Loading containers: start.
......
INFO[0000] Loading containers: done.
INFO[0000] Daemon has completed initialization
INFO[0000] Docker daemon                                 commit=f4bf5c7 execdriver=native-0.2 graphdriver=aufs version=1.8.3
INFO[0000] GET /v1.15/containers/json?all=0&size=0
INFO[0000] GET /v1.15/events
INFO[0000] GET /v1.15/containers/3688aa605635/json
INFO[0000] GET /v1.15/containers/0ad5731dcda8/json
INFO[0000] GET /v1.15/containers/a29267e946c7/json
INFO[149424] Processing signal 'terminated'
INFO[149425] GET /v1.15/containers/json?all=0&size=0
INFO[149425] GET /v1.15/containers/3688aa605635/json
INFO[149425] GET /v1.15/containers/0ad5731dcda8/json
INFO[149425] GET /v1.15/containers/json?all=0&size=0
INFO[149425] GET /v1.15/containers/3688aa605635/json
INFO[149425] GET /v1.15/containers/0ad5731dcda8/json
INFO[149425] GET /v1.15/containers/json?all=0&size=0
INFO[149425] GET /v1.15/containers/3688aa605635/json
INFO[149425] GET /v1.15/containers/0ad5731dcda8/json
WARN[0000] /!\ DON'T BIND ON ANY IP ADDRESS WITHOUT setting -tlsverify IF YOU DON'T KNOW WHAT YOU'RE DOING /!\
INFO[0000] API listen on [::]:2375
INFO[0000] API listen on /var/run/docker.sock
INFO[0000] [graphdriver] using prior storage driver "aufs"
FATA[0000] Error starting daemon: could not restore image 27cf784147099545: invalidimageid: image ID '27cf784147099545' is invalid

Expected Results:
Upgrade works as expected and daemon starts

Additional info:
I was able to roll back to Docker 1.8.3 so I can get detailed information about the containers and any other environmental info that might be helpful.

@runcom runcom added the kind/bug Bugs are bugs. The cause may or may not be known at triage time so debugging may be needed. label Nov 4, 2015
@runcom runcom added this to the 1.9.1 milestone Nov 4, 2015
@runcom
Copy link
Member

runcom commented Nov 4, 2015

Now images are restored on daemon startup and you actually have an image which is invalid (probably an old one I suppose?)
ping @calavera do we actually want not to fail in this case? Should we prevent corrupted images to be written under graph/ at all?

@mbentley you should have a folder under /var/lib/docker/0.0/graph/27cf784147099545/ which seems to be corrupted (?), could you check? could you also backup that dir and delete it and see if restarting the daemon works? you shouldn't even be able to see that image ID in docker images -a, could you also check this? Thanks!

@mbentley
Copy link
Contributor Author

mbentley commented Nov 4, 2015

@runcom - Yup, that fixed it. It existed under /var/lib/docker/graph/27cf784147099545/. I kept a tar of it if it is of any interest.

@runcom
Copy link
Member

runcom commented Nov 4, 2015

@mbentley could be useful to share it if you want so we can see which files got corrupted, but to me, it really seems like an old image (ID hasn't been validated at all), could you actually see that image id in docker images -a?

@mbentley
Copy link
Contributor Author

mbentley commented Nov 4, 2015

@runcom - Nope, I couldn't see it with a docker images -a.

Here is the tar of the layer. It's pretty small.
https://dl.dropboxusercontent.com/u/30237834/27cf784147099545.tar

@runcom
Copy link
Member

runcom commented Nov 4, 2015

@mbentley thanks a lot

@calavera
Copy link
Contributor

calavera commented Nov 4, 2015

do we actually want not to fail in this case? Should we prevent corrupted images to be written under graph/ at all?

I'm not sure about it. Maybe we should just go back to the old behavior, ignoring all corrupted images. It looks like there are more cases than we expected. I think logging the error about invalid images is good in any case.

@runcom
Copy link
Member

runcom commented Nov 4, 2015

I'm not sure about it. Maybe we should just go back to the old behavior, ignoring all corrupted images. It looks like there are more cases than we expected. I think logging the error about invalid images is good in any case.

fair, I'm gonna remove the return when the error != EOF so everything is logged but it won't actually make the daemon crash

@jkaplon
Copy link

jkaplon commented Nov 5, 2015

Same issue for me on the upgrade, same image ID.

@runcom
Copy link
Member

runcom commented Nov 5, 2015

@jkaplon #17695 has been merged and will probably be in 1.9.1 soon

@srfrnk
Copy link

srfrnk commented Nov 6, 2015

Had same problem, folder above didn't exist in the exact path.
Found it at /var/lib/docker/graph
Had to 'sudo mv /var/lib/docker/graph /var/lib/docker/graph.old' for it to work.
Then after ' sudo systemctl start docker' it seems to create the 'graph' folder. It's now empty but all is working well inc. auto started containers.
Hope it helps someone.

@runcom
Copy link
Member

runcom commented Nov 6, 2015

@srfrnk the folder I mentioned above was his case and could be not the same as everyone

@runcom
Copy link
Member

runcom commented Nov 6, 2015

but thanks!

@srfrnk
Copy link

srfrnk commented Nov 6, 2015

@runcom sure - no prob :) just thought maybe it'l save time for next person

@mminke
Copy link

mminke commented Nov 6, 2015

I also had the same problem. Removing the folder from /var/lib/docker/graph fixed the startup.
Btw I use ubuntu 15.10 with the https://apt.dockerproject.org/repo repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Bugs are bugs. The cause may or may not be known at triage time so debugging may be needed.
Projects
None yet
Development

No branches or pull requests

6 participants