New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

graphdriver: prefer prior driver state #11999

Merged
merged 1 commit into from Apr 14, 2015

Conversation

Projects
None yet
10 participants
@vbatts
Copy link
Contributor

vbatts commented Apr 1, 2015

Before this, a storage driver would be defaulted to based on the
priority list, and only print a warning if there is state from other
drivers.

This meant a reordering of priority list would "break" users in an
upgrade of docker, such that there images in the prior driver's state
were now invisible.

With this change, prior state is scanned, and if present that driver is
preferred.

As such, we can reorder the priority list, and after an upgrade,
existing installs with prior drivers can have a contiguous experience,
while fresh installs may default to a driver in the new priority list.

Ref: #11962 (comment)

Upgrade overlayfs to first place, now that this will not break driver
usage on existing installs. See #12354 for overlay promotion.

/cc @jfrazelle @crosbymichael @tianon @unclejack

@@ -173,9 +199,19 @@ func New(root string, options []string) (driver Driver, err error) {
return nil, fmt.Errorf("No supported storage backend found")
}

func scanPriorDrivers(root string) []string {
priorDrivers := []string{}

This comment has been minimized.

@cpuguy83

cpuguy83 Apr 1, 2015

Contributor

Can this be a map[string]struct{} instead so we don't have to double-loop above?

This comment has been minimized.

@vbatts

vbatts Apr 1, 2015

Contributor

let me look at that

This comment has been minimized.

@vbatts

vbatts Apr 1, 2015

Contributor

I don't think I'll do a map here, because that would be unordered and still require the comparison. Perhaps I can just revisit the double iteration

return GetDriver(name, root, options)
}
}

// Guess for prior driver

This comment has been minimized.

@tianon

tianon Apr 1, 2015

Member

The fact that this is just guessing based on in-directory state is a little scary IMO (which is why what I described in #11962 (comment) included bits about explicitly recording requested and acheived states). Using this as a hint for "auto" in the case of not having any explicit state for which should be selected seems sane as a transition mechanism, but maybe I'm just being over-paranoid?

This comment has been minimized.

@vbatts

vbatts Apr 1, 2015

Contributor

though, the specified (e.g. -s <DRIVER> and DOCKER_DRIVER=<DRIVER>) has already been resolved above this though.

Are you recommending to output to a config file? I don't think that is needed. Also, I've added Info for recording this data, for easier troubleshooting. The only thing that would make this "scary" is if someone is using various drivers in the same directory, and then hoping that by not specifying a driver that docker would make the right choice (for some definition of right)

This comment has been minimized.

@tianon

tianon Apr 1, 2015

Member

I mean for current users, this might already be a problem. They might already be "auto" on devicemapper, for example, and had been kicked there from AUFS (but of course their images were gone so they went along their merry way and recreated everything in devicemapper), so they have data in their /var/lib/docker for both.

This comment has been minimized.

@tianon

tianon Apr 1, 2015

Member

(ie, yes, I'm recommending we actually explicitly specify the current choice in a flat file on disk so that we can give appropriate warnings when it's changed, for example, and so there's no guesswork involved)

This comment has been minimized.

@vbatts

vbatts Apr 1, 2015

Contributor

Let's walk through these cases, because i'm of the opinion that the "state" needed here is in the directories present.

Users on "auto", would not be kicked to devicemapper from AUFS, unless aufstools aren't present. But if they had been using devicemapper, pulled images, then installed aufs-tools, yes they would need to recreate/pull images. But there is already logging about that.

This comment has been minimized.

@vbatts

vbatts Apr 2, 2015

Contributor

@dqminh let me think on that. The mtime of a parent directory does not always change based on sub directories, though it may give warmer-fuzzies to a guess of recent driver used.

This comment has been minimized.

@vbatts

vbatts Apr 2, 2015

Contributor

@dqminh for example, i've just tested that:

  • clean slate
  • starting a daemon with -s devicemapper
  • docker pull busybox
  • kill daemon
  • starting daemon with -s overlay this time
  • `docker pull busybox'

as expected, overlay directory is newer. Now:

  • daemon with -s overlay
  • docker run busybox uptime
  • kill daemon
  • start daemon with -s devicemapper
  • docker run busybox uptime

now, my most recent used driver is devicemapper, but the newer mtime directory is still overlay.

I think deducing by cross referencing what directory is present and the priority list is a fairly solid guess. There is log output for the curious. Going beyond this would require additional state on disk, and may honestly be a very small percentage of real world cases.

This comment has been minimized.

@tianon

tianon Apr 2, 2015

Member

I still think we should bail in the "guess" path if there's any ambiguity, and force the human to resolve it appropriately.

This comment has been minimized.

@vbatts

vbatts Apr 2, 2015

Contributor

But I feel there is a case needing to allow bouncing back and forth... Not
blocking that. That is why it is Warn level log
On Apr 2, 2015 12:26 PM, "Tianon Gravi" notifications@github.com wrote:

In daemon/graphdriver/driver.go
#11999 (comment):

        return GetDriver(name, root, options)
    }
}
  • // Guess for prior driver

I still think we should bail in the "guess" path if there's any ambiguity,
and force the human to resolve it appropriately.


Reply to this email directly or view it on GitHub
https://github.com/docker/docker/pull/11999/files#r27673154.

This comment has been minimized.

@thaJeztah

thaJeztah Apr 2, 2015

Member

But I feel there is a case needing to allow bouncing back and forth...

But isn't that where the -s flag? If I am testing graph drivers, I can just restart the daemon with a different -s flag to toggle between graph drivers.

Bailing out with;

-----------------------------------------------
ERROR
-----------------------------------------------

The docker daemon failed to start because we were unable to determine 
the graph driver you're using. (Multiple graphs were found).

The following graphs were found;

- overlay
- aufs
- devmapper

Please specify which graph driver should be used by setting the '-s **drivername**' 
option and restart the daemon to use the selected graph driver.

(Or similar) Doesn't sound too bad?

if prior == name {
driver, err = GetDriver(name, root, options)
if err != nil {
if err == ErrNotSupported || err == ErrPrerequisites || err == ErrIncompatibleFS {

This comment has been minimized.

@vbatts

vbatts Apr 1, 2015

Contributor

@tianon @thaJeztah right here, you're saying that unlike below, in this scan for prior, if it is found and is not supported/compatible/prereq then it should fail. Right?

This comment has been minimized.

@tianon

tianon Apr 1, 2015

Member

IMO yeah

This comment has been minimized.

@tianon

tianon Apr 1, 2015

Member

Since then the user gets to make the choice of either fix their system to get support back, or to switch Docker's backend explicitly.

This comment has been minimized.

@vbatts

vbatts Apr 1, 2015

Contributor

understood.

This comment has been minimized.

@vbatts

vbatts Apr 1, 2015

Contributor

also, are we saying as well, if len(priorDrivers) > 1 bail as well?

This comment has been minimized.

@vbatts

vbatts Apr 1, 2015

Contributor

because that could seem problematic ...

This comment has been minimized.

@thaJeztah

thaJeztah Apr 1, 2015

Member

(Hmf GitHub cache (only noticed the comments above))

Yes, I think if there's any doubt; ask the user.

Users searching for "my images disappeared" problems cluttering the issue tracker (or worse; blogging about it first) isn't the best user experience.

A clear error (if possible) with clear instructions seems the better choice.

This comment has been minimized.

@tianon

tianon via email Apr 1, 2015

Member

@vbatts vbatts force-pushed the vbatts:vbatts-decide_storage branch from dfa69eb to 75aec33 Apr 1, 2015

@vbatts

This comment has been minimized.

Copy link
Contributor

vbatts commented Apr 1, 2015

Updated. PTAL

@vbatts

This comment has been minimized.

Copy link
Contributor

vbatts commented Apr 1, 2015

I'll paste output from the use cases once I get my laptop on the www

return nil, err
}
logrus.Infof("[graphdriver] using prior storage driver %q", name)
checkPriorDriver(name, root) // print out beacause there may be multiple

This comment has been minimized.

@thaJeztah

thaJeztah Apr 1, 2015

Member

s/beacause/because/

This comment has been minimized.

@vbatts

vbatts Apr 1, 2015

Contributor

Hahaha
On Apr 1, 2015 5:59 PM, "Sebastiaan van Stijn" notifications@github.com
wrote:

In daemon/graphdriver/driver.go
#11999 (comment):

  •   }
    
  •   for _, prior := range priorDrivers {
    
  •       // of the state found from prior drivers, check in order of our priority
    
  •       // which we would prefer
    
  •       if prior == name {
    
  •           driver, err = GetDriver(name, root, options)
    
  •           if err != nil {
    
  •               // unlike below, we will return error here, because there is prior
    
  •               // state, and now it is no longer supported/prereq/compatible, so
    
  •               // something changed and needs attention. Otherwise the daemon's
    
  •               // images would just "disappear".
    
  •               logrus.Errorf("[graphdriver] prior storage driver %q failed: %s", name, err)
    
  •               return nil, err
    
  •           }
    
  •           logrus.Infof("[graphdriver] using prior storage driver %q", name)
    
  •           checkPriorDriver(name, root) // print out beacause there may be multiple
    

s/beacause/because/


Reply to this email directly or view it on GitHub
https://github.com/docker/docker/pull/11999/files#r27616218.

@thaJeztah

This comment has been minimized.

Copy link
Member

thaJeztah commented Apr 1, 2015

Thanks so much, @vbatts for picking this up. Hope the maintainers can agree on this approach.

Only thing I think might be needed, is some basic instructions/pointers to the user how to solve the issue, but perhaps this can be added after is clear if this direction can be agreed upon.

@vbatts vbatts force-pushed the vbatts:vbatts-decide_storage branch from 75aec33 to 5b901d5 Apr 2, 2015

@vbatts

This comment has been minimized.

Copy link
Contributor

vbatts commented Apr 2, 2015

## start from a clean slate
vbatts@noyee ~ (master) $ sudo rm -rf /home/docker/
## initialize a devicemapper driver
vbatts@noyee ~ (master) $ sudo /home/vbatts/src/docker/docker/bundles/1.5.0-dev/dynbinary/docker-1.5.0-dev -d -g /home/docker -p /home/docker.pid -H unix:///home/docker.sock -s devicemapper
INFO[0000] +job serveapi(unix:///home/docker.sock)      
INFO[0000] Listening for HTTP on unix (/home/docker.sock) 
INFO[0000] [graphdriver] trying provided driver "devicemapper" 
INFO[0002] +job init_networkdriver()                    
INFO[0002] -job init_networkdriver() OK                 
INFO[0003] Loading containers: start.                   

INFO[0003] Loading containers: done.                    
INFO[0003] docker daemon: 1.5.0-dev 75aec33; execdriver: native-0.2; graphdriver: devicemapper 
INFO[0003] +job acceptconnections()                     
INFO[0003] -job acceptconnections() OK                  
INFO[0003] Daemon has completed initialization          
^CINFO[0006] Received signal 'interrupt', starting shutdown of docker... 
INFO[0006] -job serveapi(unix:///home/docker.sock) OK
## make another driver directory that won't work on fedora
vbatts@noyee ~ (master) $ sudo mkdir /home/docker/aufs
## start the daemon again. Since AUFS is higher priority, won't work here, but appears to be a prior driver, it should fail the whole daemon
vbatts@noyee ~ (master) $ sudo /home/vbatts/src/docker/docker/bundles/1.5.0-dev/dynbinary/docker-1.5.0-dev -d -g /home/docker -p /home/docker.pid -H unix:///home/docker.sock 
ERRO[0000] [graphdriver] prior storage driver "aufs" failed: driver not supported 
INFO[0000] +job serveapi(unix:///home/docker.sock)      
INFO[0000] Listening for HTTP on unix (/home/docker.sock) 
FATA[0000] Shutting down daemon due to errors: error intializing graphdriver: driver not supported
## and it does


## now restart but _specifying_ overlay. it should work
vbatts@noyee ~ (master) $ sudo /home/vbatts/src/docker/docker/bundles/1.5.0-dev/dynbinary/docker-1.5.0-dev -d -g /home/docker -p /home/docker.pid -H unix:///home/docker.sock -s overlay
INFO[0000] [graphdriver] trying provided driver "overlay" 
INFO[0000] +job init_networkdriver()                    
INFO[0000] +job serveapi(unix:///home/docker.sock)      
INFO[0000] Listening for HTTP on unix (/home/docker.sock) 
INFO[0000] -job init_networkdriver() OK                 
INFO[0000] Loading containers: start.                   

INFO[0000] Loading containers: done.                    
INFO[0000] docker daemon: 1.5.0-dev 75aec33; execdriver: native-0.2; graphdriver: overlay 
INFO[0000] +job acceptconnections()                     
INFO[0000] -job acceptconnections() OK                  
INFO[0000] Daemon has completed initialization          
^CINFO[0002] Received signal 'interrupt', starting shutdown of docker... 
INFO[0002] -job serveapi(unix:///home/docker.sock) OK   
## works fine. now run again, but with no specificication. And since one of the scanned prior devices is present _and_ supported, it should work _and_ log that there were others found
vbatts@noyee ~ (master) $ sudo /home/vbatts/src/docker/docker/bundles/1.5.0-dev/dynbinary/docker-1.5.0-dev -d -g /home/docker -p /home/docker.pid -H unix:///home/docker.sock 
INFO[0000] [graphdriver] using prior storage driver "overlay" 
WARN[0000] Graphdriver overlay selected. Your graphdriver directory /home/docker already contains data managed by other graphdrivers: aufs,devicemapper 
INFO[0000] +job init_networkdriver()                    
INFO[0000] +job serveapi(unix:///home/docker.sock)      
INFO[0000] Listening for HTTP on unix (/home/docker.sock) 
INFO[0000] -job init_networkdriver() OK                 
INFO[0000] Loading containers: start.                   

INFO[0000] Loading containers: done.                    
INFO[0000] docker daemon: 1.5.0-dev 75aec33; execdriver: native-0.2; graphdriver: overlay 
INFO[0000] +job acceptconnections()                     
INFO[0000] -job acceptconnections() OK                  
INFO[0000] Daemon has completed initialization
## Which it does.

## :-)
@vbatts

This comment has been minimized.

Copy link
Contributor

vbatts commented Apr 2, 2015

@jfrazelle @crosbymichael @unclejack @tiborvass there's been discussion here and it's ready for your review. LMKWYT

@jessfraz

This comment has been minimized.

Copy link
Contributor

jessfraz commented Apr 3, 2015

LGTM

return nil, err
}
logrus.Infof("[graphdriver] using prior storage driver %q", name)
checkPriorDriver(name, root) // print out because there may be multiple

This comment has been minimized.

@jessfraz

jessfraz Apr 3, 2015

Contributor

I don't understand the point in keeping this, am i missing something, its just iterating the range again

This comment has been minimized.

@vbatts

vbatts Apr 6, 2015

Contributor

for the sake of a print out. At this point you would get an output of the chosen driver, then if there were more than 1 prior drivers (not including vfs) then you'll see output that there is additional state present.

This comment has been minimized.

@vbatts

vbatts Apr 6, 2015

Contributor

@jfrazelle like the WARN from the last section of #11999 (comment)

This comment has been minimized.

@estesp

estesp Apr 6, 2015

Contributor

I assume this is related to the comment to the right: it's being used to print out the (existing) log message that roughly says "hey, we're using driver XYZ, but you have data managed by {list of prior drivers}". Given you already know if "len(priorDrivers) > 0" you could add the message directly here, but given the list is short (and I assume it has little room to grow by any large measure), it isn't much of a hit to scan the list again.

I think the function might just be more clearly renamed as "logPriorDrivers()"?

This comment has been minimized.

@vbatts

vbatts Apr 6, 2015

Contributor

fair. and since we're now acting on the prior drivers, the call to checkPriorDriver() in the subsequent loops will not even be needed. This output would only ever be needed at this very spot. I'll fix that.

@vbatts vbatts force-pushed the vbatts:vbatts-decide_storage branch from 5b901d5 to 50616a5 Apr 6, 2015

@vbatts

This comment has been minimized.

Copy link
Contributor

vbatts commented Apr 6, 2015

rebased on master, for good measure

@estesp

This comment has been minimized.

Copy link
Contributor

estesp commented Apr 6, 2015

I think you found the wrath of docker-py in the latest janky run, but other than that need for re-run, I like this. Especially if we want more use of overlay, I don't see how we get that until it becomes a default for new uses, and this allows that and keeps from frustrating upgraders from aufs.

@vbatts

This comment has been minimized.

Copy link
Contributor

vbatts commented Apr 6, 2015

@estesp yep. That's the plan.

@tianon

This comment has been minimized.

Copy link
Member

tianon commented Apr 6, 2015

IMO,

WARN[0000] Graphdriver overlay selected. Your graphdriver directory /home/docker already contains data managed by other graphdrivers: aufs,devicemapper 

should be fatal. If there's any ambiguity, then just guessing isn't going to help the user out very much, and will instead just hide the problem and they'll never know. I'm fully +1 to forcing the user to disambiguate manually via -s or nuking their data directory to go back to automatic behavior.

@vbatts

This comment has been minimized.

Copy link
Contributor

vbatts commented Apr 6, 2015

@jessfraz

This comment has been minimized.

Copy link
Contributor

jessfraz commented Apr 7, 2015

im with @tianon no sense in guessing

@vbatts vbatts force-pushed the vbatts:vbatts-decide_storage branch from 50616a5 to 5ade252 Apr 7, 2015

@vbatts

This comment has been minimized.

Copy link
Contributor

vbatts commented Apr 7, 2015

alrighty. Updated PTAL @jfrazelle @tianon
Here's the output that users would see in a "guess" scenario with multiple drivers present http://pastebin.com/qr9GgsVe

@thaJeztah

This comment has been minimized.

Copy link
Member

thaJeztah commented Apr 7, 2015

Nice! That literal error should probably be added to the docs somewhere so that people can Goooogle it. (guess #11999 (comment) would be too verbose for a daemon error)

if len(priorDrivers) > 0 {
logrus.Warnf("Graphdriver %s selected. Your graphdriver directory %s already contains data managed by other graphdrivers: %s", name, root, strings.Join(priorDrivers, ","))

return errors.New(fmt.Sprintf("Graphdriver %s selected. Your graphdriver directory %s already contains data managed by other graphdrivers: %s", name, root, strings.Join(priorDrivers, ",")))

This comment has been minimized.

@cpuguy83

cpuguy83 Apr 7, 2015

Contributor

I'd get rid of what driver is selected and just say why it's bailing out.
Also should include something like you must explicitly set a storage driver

@vbatts

This comment has been minimized.

Copy link
Contributor

vbatts commented Apr 8, 2015

@thaJeztah I'll truncate that down as much as possible. I don't much care for newlines in the log lines.
Also, perhaps the docs reference is interesting, but if this is clear enough, it may not be required.

@vbatts vbatts force-pushed the vbatts:vbatts-decide_storage branch from 5ade252 to 2819972 Apr 8, 2015

@crosbymichael

This comment has been minimized.

Copy link
Member

crosbymichael commented Apr 13, 2015

@jfrazelle I moved it

@crosbymichael

This comment has been minimized.

Copy link
Member

crosbymichael commented Apr 13, 2015

LGTM

anyone else want to review before merge?

@jessfraz

This comment has been minimized.

Copy link
Contributor

jessfraz commented Apr 13, 2015

LGTM

@LK4D4

This comment has been minimized.

Copy link
Contributor

LK4D4 commented Apr 13, 2015

#12327
#12080
Regressions in overlay, first is pretty nasty.

@vbatts

This comment has been minimized.

Copy link
Contributor

vbatts commented Apr 13, 2015

ick. Should i split the overlay promotion into its own PR, since the
driver-guess logic does not affect that and is itself ready for merge?
On Apr 13, 2015 6:18 PM, "Alexander Morozov" notifications@github.com
wrote:

#12327 #12327
#12080 #12080
Regressions in overlay, first is pretty nasty.


Reply to this email directly or view it on GitHub
#11999 (comment).

@thaJeztah

This comment has been minimized.

Copy link
Member

thaJeztah commented Apr 13, 2015

ick. Should i split the overlay promotion into its own PR

sgtm (but not my say :))

@jessfraz

This comment has been minimized.

Copy link
Contributor

jessfraz commented Apr 13, 2015

sgtm, and idk if regressions necessarily because they fail on 1.5.0 too

On Mon, Apr 13, 2015 at 3:54 PM, Sebastiaan van Stijn <
notifications@github.com> wrote:

ick. Should i split the overlay promotion into its own PR

sgtm (but not my say :))


Reply to this email directly or view it on GitHub
#11999 (comment).

@jessfraz

This comment has been minimized.

Copy link
Contributor

jessfraz commented Apr 13, 2015

but we should definitely fix before promoting to no.1

On Mon, Apr 13, 2015 at 3:55 PM, Jessica Frazelle jess@docker.com wrote:

sgtm, and idk if regressions necessarily because they fail on 1.5.0 too

On Mon, Apr 13, 2015 at 3:54 PM, Sebastiaan van Stijn <
notifications@github.com> wrote:

ick. Should i split the overlay promotion into its own PR

sgtm (but not my say :))


Reply to this email directly or view it on GitHub
#11999 (comment).

@vbatts vbatts force-pushed the vbatts:vbatts-decide_storage branch from 2819972 to 4c29682 Apr 14, 2015

@GordonTheTurtle GordonTheTurtle added dco/yes and removed dco/no labels Apr 14, 2015

@vbatts vbatts referenced this pull request Apr 14, 2015

Closed

graphdriver: promote overlay driver to first #12354

0 of 2 tasks complete

@vbatts vbatts changed the title graphdriver: prefer prior driver state and promote overlay to first place graphdriver: prefer prior driver state Apr 14, 2015

@vbatts vbatts force-pushed the vbatts:vbatts-decide_storage branch from 4c29682 to deec690 Apr 14, 2015

@GordonTheTurtle GordonTheTurtle added dco/yes and removed dco/no labels Apr 14, 2015

graphdriver: prefer prior driver state
Before this, a storage driver would be defaulted to based on the
priority list, and only print a warning if there is state from other
drivers.

This meant a reordering of priority list would "break" users in an
upgrade of docker, such that there images in the prior driver's state
were now invisible.

With this change, prior state is scanned, and if present that driver is
preferred.

As such, we can reorder the priority list, and after an upgrade,
existing installs with prior drivers can have a contiguous experience,
while fresh installs may default to a driver in the new priority list.

Ref: #11962 (comment)

Signed-off-by: Vincent Batts <vbatts@redhat.com>

@vbatts vbatts force-pushed the vbatts:vbatts-decide_storage branch from deec690 to b68e161 Apr 14, 2015

@GordonTheTurtle GordonTheTurtle added dco/yes and removed dco/no labels Apr 14, 2015

@vbatts

This comment has been minimized.

Copy link
Contributor

vbatts commented Apr 14, 2015

updated to only be the prior-driver priority selection.

overlay promotion is now #12354

PTAL

@thaJeztah

This comment has been minimized.

Copy link
Member

thaJeztah commented Apr 14, 2015

Don't see any downsides, so LGTM :)

@jessfraz

This comment has been minimized.

Copy link
Contributor

jessfraz commented Apr 14, 2015

LGTM

jessfraz pushed a commit that referenced this pull request Apr 14, 2015

Jessie Frazelle
Merge pull request #11999 from vbatts/vbatts-decide_storage
graphdriver: prefer prior driver state

@jessfraz jessfraz merged commit 74f4a88 into moby:master Apr 14, 2015

3 checks passed

docker/dco-signed All commits signed
Details
janky Jenkins build Docker-PRs 5729 has succeeded
Details
windows Jenkins build Windows-PRs 2698 has succeeded
Details
@thaJeztah

This comment has been minimized.

Copy link
Member

thaJeztah commented Apr 19, 2015

@jfrazelle I think this didn't make it in 1.6?

@jessfraz

This comment has been minimized.

Copy link
Contributor

jessfraz commented Apr 21, 2015

ya it didnt need to

@vbatts vbatts deleted the vbatts:vbatts-decide_storage branch Apr 27, 2016

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment