New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: Add --exclude parameter to docker commit #12594

Closed
wants to merge 3 commits into
base: master
from

Conversation

Projects
None yet
7 participants
@mountkin
Contributor

mountkin commented Apr 21, 2015

Sometimes the container might generate temporary files or unix socket
files. Commit those temporary files to the image makes no sense.
I think adding a new parameter to docker commit API would be useful in
such cases.

Usage: docker commit --exclude=/tmp --exclude=/var/run/mysql.sock your-container

@cpuguy83

This comment has been minimized.

Contributor

cpuguy83 commented Apr 21, 2015

Oh that's a good idea.
Really interesting if we add a Dockerfile command for this.

@mountkin

This comment has been minimized.

Contributor

mountkin commented Apr 21, 2015

Test is broken, I'll look into it later.

@mountkin

This comment has been minimized.

Contributor

mountkin commented Apr 22, 2015

@cpuguy83 since the user can put any shell command into the Dockerfile, including rm -r /files-to-exclude, I don't think adding a Dockerfile command is that useful.
By the way, I noticed that the instructions supported by docker commit --change is a subset of Dockerfile instructions. Are there any special considerations that we should limit the usage of other instructions?
If the RUN command was supported by docker commit --change, this PR could be closed because the user can simply remove the temporary files via docker commit --change "RUN rm -r xxx".

@icecrime icecrime removed the dco/yes label Apr 23, 2015

@mountkin

This comment has been minimized.

Contributor

mountkin commented Apr 24, 2015

ping @jfrazelle @icecrime @cpuguy83 for design review, thanks

@duglin

This comment has been minimized.

Contributor

duglin commented Apr 24, 2015

@mountkin if you don't think a Dockerfile command would be useful, I'm curious to know why we need this feature if people can just do docker exec rm ... on the container prior to a docker commit ?

I'm a little nervous (not a full -1 or anything :-) ) about a command asks for a snapshot of a container to be created but then starts down the path of saying "well, not really a full snapshot. Don't include this, and this and this...." and then suddenly we have a mess.

@mountkin

This comment has been minimized.

Contributor

mountkin commented Apr 24, 2015

@duglin docker exec rm ... sometimes may interrupt the service because some of the temporary files can't be removed when the service is running (such as the unix socket file).
And some not well designed software might rely on a pid file at startup to ensure only one instance is running. But it doesn't check whether it's a stale pid file. If a stale pid file exists, the service won't even start. (I was bitten by this many times.)

I hope the two examples above can help to explain why I think the --exclude parameter is useful.
There is also a question on stackoverflow addressing the similar problem. http://stackoverflow.com/questions/29740079/docker-how-to-commit-but-exclude-directory-from-image

Of course during docker build temp files may also be created, but it's quite easy to clean them by appending a && rm tmp-files at the end of the RUN command. For example I usually add && apt-get clean after a apt-get install to reduce the size of the image. That's why I think a Dockerfile command is not necessary.

@duglin

This comment has been minimized.

Contributor

duglin commented Apr 24, 2015

@mountkin bummer that you can't run docker exec on a stopped container, then you could just stop it (and its services), do docker exec rm ... and then commit. Oh well.
I guess if you think of a 'commit' like the 'tar' command then this feature makes sense.
Thanks for the additional info.
design SGTM

@icecrime

This comment has been minimized.

Contributor

icecrime commented May 5, 2015

Design LGTM, thank @mountkin!

newExcludes := make([]string, len(excludes))
for i, ex := range excludes {
newExcludes[i] = strings.TrimLeft(ex, "/")
}

This comment has been minimized.

@icecrime

icecrime May 5, 2015

Contributor

I don't really understand this block:

  • archive.TarWithOptions uses filepath.Match, there's no assumption of relative or absolute path in there as far as I can tell
  • Why not leave it to the Match to fail, and to the user to fix his pattern?

I think I'd rather remove this, but please tell me what I'm missing!

This comment has been minimized.

@mountkin

mountkin May 6, 2015

Contributor

@icecrime In this line https://github.com/docker/docker/blob/master/pkg/archive/archive.go#L435 the srcPath is converted to a relative path (relative to the diff path of the container).
If the leading '/' was not stripped before calling archive.TarWithOptions, when the user run docker commit --exclude=/path/file.txt cid img, fileutils.OptimizedMatches won't mach the pattern and the file won't be excluded.

// So we have to strip the leading '/'.
newExcludes := make([]string, len(excludes))
for i, ex := range excludes {
newExcludes[i] = strings.TrimLeft(ex, "/")

This comment has been minimized.

@duglin

duglin May 14, 2015

Contributor

Do you ever actually check to make sure the incoming paths are absolute? I didn't notice it. I think we should generate an error if they're not.

Not 100% sure of the best place to do the check but this func seems reasonable since you want to catch all code paths that might hit this func. Of course, putting the check in the archive code might not be bad either, since if you're correct that its assuming relative paths then it really should enforce it.

This comment has been minimized.

@mountkin

mountkin May 18, 2015

Contributor

Perhaps we can check if the paths are absolute in the postCommit method of the server API?

This comment has been minimized.

@duglin

duglin May 26, 2015

Contributor

What was the final resolution of this comment/thread?

This comment has been minimized.

@mountkin

mountkin May 27, 2015

Contributor

I added the validation logic in the "postCommit" function of the server API.

@@ -24,6 +24,10 @@ Using an existing container's name or ID you can create a new image.
Apply specified Dockerfile instructions while committing the image
Supported Dockerfile instructions: `CMD`|`ENTRYPOINT`|`ENV`|`EXPOSE`|`ONBUILD`|`USER`|`VOLUME`|`WORKDIR`
**--exclude**=[]
Exclude the specified files while commiting the image.
Only the files that were generated by the container can be excluded.

This comment has been minimized.

@duglin

duglin May 14, 2015

Contributor

May want to mention that they must be absolute paths.

This comment has been minimized.

@duglin

duglin May 26, 2015

Contributor

missed this one

@@ -1740,6 +1740,8 @@ Json Parameters:
Query Parameters:
- **container** – source container
- **changes** – the change instructions to apply
- **excludes** – files to exclude

This comment has been minimized.

@duglin

duglin May 14, 2015

Contributor

absolute paths of files to exclude

@@ -386,7 +386,21 @@ func ExportChanges(dir string, changes []Change) (Archive, error) {
// during e.g. a diff operation the container can continue
// mutating the filesystem and we can see transient errors
// from this
changesLoop:

This comment has been minimized.

@duglin

duglin May 14, 2015

Contributor

Could be my old-school mindset (and I know goto's are ok in most languages) but it gives me the heebie-jeebies :-) Just saying...

This comment has been minimized.

@vbatts

vbatts May 26, 2015

Contributor

though in golang, these named loops allow for much cleaner breaking of the outside loop.

for _, change := range changes {
for _, exclude := range excludes {
exclude := filepath.Join("/", exclude)

This comment has been minimized.

@duglin

duglin May 14, 2015

Contributor

Need some help following the flow here. First, to level-set, 'changes' is not relative to 'dir' but 'excludes' is, right?

If so, I'm not sure I understand the first if-stmt below. When would 'exclude' ever match change.Path since 'exclude' should be a relative path from the root of the FS, while change.Path should include the FS path on the host, right?

This comment has been minimized.

@mountkin

mountkin May 18, 2015

Contributor

Hello @duglin
ExportChanges is called in NaiveDiffDriver, which is used when the storage-driver is set to vfs and overlay. If the storage-driver is set to aufs, archive.TarWithOptions will be called.
archive.TarWithOptions uses relative path to compare the exclude patterns, while ExportChanges uses absolute path.

The first if-sttm below is used to match --exclude=/path/file/* or --exclude=/path/file*.txt, but it can't match --exclude=/directory/to/exclude. The else block tries to match the --exclude=/director/to/exclude pattern.

@duglin

This comment has been minimized.

Contributor

duglin commented May 22, 2015

rebase is needed

@mountkin

This comment has been minimized.

Contributor

mountkin commented May 23, 2015

rebase done

@duglin

This comment has been minimized.

Contributor

duglin commented May 26, 2015

@vbatts could you look over the archive changes?

@@ -359,7 +359,7 @@ func ChangesSize(newDir string, changes []Change) int64 {
}
// ExportChanges produces an Archive from the provided changes, relative to dir.
func ExportChanges(dir string, changes []Change) (Archive, error) {
func ExportChanges(dir string, changes []Change, excludes []string) (Archive, error) {

This comment has been minimized.

@vbatts

vbatts May 26, 2015

Contributor

perhaps lets add a function of ExportChangesExcludes or similar. It is the best effort of the ./pkg/ to not break api for folks that may be using it. It's cool if you refactor this and the new function to reuse logic where possible.

This comment has been minimized.

@mountkin

mountkin May 27, 2015

Contributor

@vbatts If add a new function I thinks I'll do it like this:

  1. Rename the current ExportChanges to exportChangesImpl.
  2. Call exportChangesImpl in ExportChanges(dir string, changes []Change) with an empty string slice.
  3. Add a new function ExportChanges(dir string, changes []Change, excludes []string) to call exportChangesImpl directly.

Please let me know if it makes sense. Thanks.

This comment has been minimized.

@vbatts

vbatts May 27, 2015

Contributor

Sounds like what I expected. Perhaps just exportChanges would suffice,
since it won't collide with the exported function and is clearly related.
The Impl suffix feels a little java-ish. :-)
On May 27, 2015 01:19, "Shijiang Wei" notifications@github.com wrote:

In pkg/archive/changes.go
#12594 (comment):

@@ -359,7 +359,7 @@ func ChangesSize(newDir string, changes []Change) int64 {
}

// ExportChanges produces an Archive from the provided changes, relative to dir.
-func ExportChanges(dir string, changes []Change) (Archive, error) {
+func ExportChanges(dir string, changes []Change, excludes []string) (Archive, error) {

@vbatts https://github.com/vbatts If add a new function I thinks I'll
do it like this:

  1. Rename the current ExportChanges to exportChangesImpl.
  2. Call exportChangesImpl in ExportChanges(dir string, changes []Change)
    with an empty string slice.
  3. Add a new function ExportChanges(dir string, changes []Change,
    excludes []string) to call exportChangesImpl directly.

Please let me know if it makes sense. Thanks.


Reply to this email directly or view it on GitHub
https://github.com/docker/docker/pull/12594/files#r31103948.

This comment has been minimized.

@mountkin

mountkin May 27, 2015

Contributor

Ok. I'll refactor it and add the test tomorrow.

@vbatts

This comment has been minimized.

Contributor

vbatts commented May 26, 2015

given that you've introduced new behavior in pkg/archive, let's add some tests there to ensure it's outcome.

@duglin duglin referenced this pull request Jun 3, 2015

Closed

Proposal: The Docker Vault #10310

@cyphar

This comment has been minimized.

Contributor

cyphar commented Jun 4, 2015

A Dockerfile directive to this effect would be very useful for setting up ethereal data that shouldn't be committed. The only issue is that it would have to work via a different mechanism to this PR, because you'd expect to be able to access data tagged as "ethereal" during the entire build process (but not having it show up in the image history graph) as opposed to excluding a change from being committed. Both are useful, but they are marginally incompatible features.

@mountkin

This comment has been minimized.

Contributor

mountkin commented Jun 5, 2015

@cyphar What about add a new parameter to docker build to map a volume to the intermediate containers? Such as docker build -v /path/on/the/host:/data -t image-tag .

Edit:
After searching the github issues I found this one #3056, they are discussing about adding BIND_CONTEXT to Dockerfile, which seems like can solve your problem. The -v parameter was voted down by @cpuguy83 there.

@cyphar

This comment has been minimized.

Contributor

cyphar commented Jun 5, 2015

Volumes for builds break reproducible builds. We actually just need ethereal paths -- they maintain reproducibility without also breaking secrecy during builds.

Add --exclude parameter to docker commit
Sometimes the container might generate temporary files or unix socket
files. Commit those temporary files to the image makes no sense.
I think adding a new parameter to `docker commit` API would be useful in
such cases.

Usage: `docker commit --exclude=/tmp --exclude=/var/run/mysql.sock container img`

Signed-off-by: Shijiang Wei <mountkin@gmail.com>

mountkin added some commits May 27, 2015

Make sure the exclude patterns are absolute paths
Signed-off-by: Shijiang Wei <mountkin@gmail.com>
refactor and add tests for archive.ExportChanges
Signed-off-by: Shijiang Wei <mountkin@gmail.com>
@icecrime

This comment has been minimized.

Contributor

icecrime commented Jun 16, 2015

Collective review with @jfrazelle @crosbymichael @calavera

Sorry we validated design earlier, we agree with @cyphar that this is better solve with ethereal paths (aka tmpfs). We are discussing the right implementation for this (cf. #13587), but right now we're not going to go further with this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment