Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Finalize TarSum Version 1 w/ refactor #8869

Merged
merged 1 commit into from
Nov 13, 2014
Merged

Conversation

jlhawn
Copy link
Contributor

@jlhawn jlhawn commented Oct 30, 2014

The current Dev version of TarSum includes hashing of extended
file attributes and omits inclusion of modified time headers.

I refactored the logic around the version differences to make it
more clear that the difference between versions is in how tar
headers are selected and ordered.

TarSum Version 1 is now declared with the new Dev version continuing
to track it.

Docker-DCO-1.1-Signed-off-by: Josh Hawn josh.hawn@docker.com (github: jlhawn)

Version0 Version = iota
// Prefix of "tarsum.dev"
Version1
// VersionDev is the prefix of "tarsum.dev"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be "VersionDev has", also pretty sure these comments are intended to specify what the prefix of each type is so either should be with each version or removed.

@jlhawn
Copy link
Contributor Author

jlhawn commented Oct 30, 2014

I've run the benchmarks on my machine with both the master version of the package and this branch. It looks like we get some negligible to small improvement in speed and a significant improvement in memory performance which is most noticeable in archives with several files. This is probably due to the changes in how allocations are done when selecting tar headers. Have a look:

Benchmark9kTar                  @ master    @ branch    % delta

    ns/op                       12085       12038       -0.389 %
    MB/s                        762.57      765.55      +0.391 %
    B/op                        14183       14200       +0.120 %
    allocs/op                   22          22          +0.000 %


Benchmark9kTarGzi               @ master    @ branch    % delta

    ns/op                       206875      206610      -0.128 %
    MB/s                        44.55       44.61       +0.135 %
    B/op                        1470279     1470294     +0.001 %
    allocs/op                   74          74          +0.000 %


Benchmark1mbSingleFile          @ master    @ branch    % delta

    ns/op                       6505530     6510376     +0.074 %
    MB/s                        161.18      161.06      -0.074 %
    B/op                        42212       41625       -1.391 %
    allocs/op                   352         348         -1.136 %


Benchmark1mbSingleFileTarG      @ master    @ branch    % delta

    ns/op                       19138067    18679059    -2.398 %
    MB/s                        54.79       56.14       +2.464 %
    B/op                        1906895     1906389     -0.027 %
    allocs/op                   3372        3371        -0.030 %


Benchmark1kFiles                @ master    @ branch    % delta

    ns/op                       24156129    23880668    -1.140 %
    MB/s                        43.41       43.91       +1.151 %
    B/op                        2804772     2186320     -22.05 %
    allocs/op                   75203       71120       -5.429 %


Benchmark1kFilesTarG            @ master    @ branch    % delta

    ns/op                       60346171    59714904    -1.046 %
    MB/s                        17.38       17.56       +1.036 %
    B/op                        11574602    10949201    -5.403 %
    allocs/op                   145456      141310      -2.850 %

@jlhawn
Copy link
Contributor Author

jlhawn commented Oct 30, 2014

@ewindisch @vbatts @unclejack please review

@vbatts
Copy link
Contributor

vbatts commented Oct 31, 2014

reading over it, it LGTM. Perhaps, the maneuver to drop the mtime header could be more explicit, but at least you have a comment there ;-)

@vbatts
Copy link
Contributor

vbatts commented Oct 31, 2014

Also, I've made progress on writing a specification for the TarSum, and should have a PR for it. Hopefully next week.

@jlhawn
Copy link
Contributor Author

jlhawn commented Oct 31, 2014

updated with fixed variable name. I had called something an encoder when I should have called it a selector.

@jlhawn
Copy link
Contributor Author

jlhawn commented Nov 4, 2014

ping @ewindisch

@unclejack
Copy link
Contributor

@jlhawn Why is mtime being removed?

@vbatts
Copy link
Contributor

vbatts commented Nov 4, 2014

@unclejack the .wh. files primarily. but really are not needed.

@unclejack
Copy link
Contributor

LGTM

@dmp42
Copy link
Contributor

dmp42 commented Nov 4, 2014

ping @vbatts @ewindisch

1: "tarsum.dev",
Version0: "tarsum",
Version1: "tarsum.v1",
VersionDev: "tarsum.dev",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Thanks for that. I didn't mean to leave those as just integers. :-)

@vbatts
Copy link
Contributor

vbatts commented Nov 5, 2014

LGTM
(also, your git commit message says "TurSum" :-)

@jlhawn
Copy link
Contributor Author

jlhawn commented Nov 5, 2014

gah... will amend now.

The current Dev version of TarSum includes hashing of extended
file attributes and omits inclusion of modified time headers.

I refactored the logic around the version differences to make it
more clear that the difference between versions is in how tar
headers are selected and ordered.

TarSum Version 1 is now declared with the new Dev version continuing
to track it.

Docker-DCO-1.1-Signed-off-by: Josh Hawn <josh.hawn@docker.com> (github: jlhawn)
@jlhawn
Copy link
Contributor Author

jlhawn commented Nov 6, 2014

ping @vbatts @unclejack @jfrazelle @crosbymichael @vieux Please review again due to rebase 👍

@vbatts
Copy link
Contributor

vbatts commented Nov 6, 2014

@jlhawn did you functionally change anything with the rebase, or just double checking from LGTMs?

@jlhawn
Copy link
Contributor Author

jlhawn commented Nov 6, 2014

no functional changes, no. Just fixed my commit message, and since PR history shows LGTMs before the commit I just wanted double check from maintainers.

@vbatts
Copy link
Contributor

vbatts commented Nov 6, 2014

right
LGTM

@jlhawn
Copy link
Contributor Author

jlhawn commented Nov 6, 2014

I've been trying to ping @ewindisch too. It's odd that his GitHub username doesn't autocomplete so I'm not sure if he's even getting notifications about this. :( If that's the case I'll need some extra LGTMs from the core maintainers I guess.

@unclejack
Copy link
Contributor

still LGTM

@ewindisch
Copy link
Contributor

@jlhawn I've been at a conference this past week. I'll review first chance.

@dmp42
Copy link
Contributor

dmp42 commented Nov 10, 2014

ping @crosbymichael @vieux @jfrazelle

}
}

func v1TarHeaderSelect(h *tar.Header) (orderedHeaders [][2]string) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not really a fan of version number in a function name, could have probably been an interface, but I understand it might be out of scope for this PR

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, there is an interface tarHeaderSelector. I made it so that the tarHeaderSelectFunc type implements the interface. It's like http.Handler and http.HandlerFunc. The version numbers are in the func name because I hadn't thought of a better way to differentiate them.

@jessfraz
Copy link
Contributor

LGTM

@jlhawn
Copy link
Contributor Author

jlhawn commented Nov 10, 2014

thanks @ewindisch (hey auto-complete works again!)

for k := range h.Xattrs {
xAttrKeys = append(xAttrKeys, k)
}
sort.Strings(xAttrKeys)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hrm. hold the phone. we should add tests around the case sensitivity of the these keys. How that relates to sorting and whether there could be collisions.

vbatts added a commit to vbatts/moby that referenced this pull request Nov 11, 2014
Ensuring case size of headers will still be accounted for.
moby#8869 (comment)

Signed-off-by: Vincent Batts <vbatts@redhat.com>
@vbatts
Copy link
Contributor

vbatts commented Nov 11, 2014

OK. Just gave myself necessary warm-fuzzies.
This PR is just waiting on a stamp-of-approval from @ewindisch.

@jlhawn
Copy link
Contributor Author

jlhawn commented Nov 13, 2014

ping @ewindisch

@ewindisch
Copy link
Contributor

LGTM. I initially had some issues, but upon review, it was nothing that wasn't already in the existing code.

@unclejack
Copy link
Contributor

Are we good to merge? Do we want anyone else to review?

@vbatts
Copy link
Contributor

vbatts commented Nov 13, 2014

LGTM good to go

@jlhawn
Copy link
Contributor Author

jlhawn commented Nov 13, 2014

+1

unclejack added a commit that referenced this pull request Nov 13, 2014
Finalize TarSum Version 1 w/ refactor
@unclejack unclejack merged commit 3aa2245 into moby:master Nov 13, 2014
mapuri pushed a commit to mapuri/docker that referenced this pull request Dec 4, 2014
Ensuring case size of headers will still be accounted for.
moby#8869 (comment)

Signed-off-by: Vincent Batts <vbatts@redhat.com>
@vbatts vbatts mentioned this pull request Dec 18, 2014
@jlhawn jlhawn deleted the tarsum_version branch July 31, 2015 18:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants