New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Absorb prune into ensure #944

Closed
sdboyer opened this Issue Aug 4, 2017 · 36 comments

Comments

Projects
None yet
@sdboyer
Member

sdboyer commented Aug 4, 2017

dep prune currently exists as a separate command. This is not optimal; we want to incorporate its behavior into dep ensure directly, obviating the need to run the extra command.

Projects relying on pruning will necessarily have to give up on (fast) vendor/ verification - #121, as that verification will need to rely on a hash of each dependency project's entire tree, prior to pruning. That's just going to be a cost of doing business.

This will involve a few things:

  1. Create a set of flags (express as a bitset, e.g. type PruneOptions uint8) in gps representing the orthogonal, combinable pruning modes:
    i. Remove *_test.go files
    ii. Remove non-go files (except LICENSE, COPYING)
    iii. Remove unused packages
    iv. Remove nested vendor dirs (note that gps needs to have this as a flag, but dep unconditionally sets it - we do not give the user the option to NOT strip vendor directories)
  2. Introduce a new top-level section in Gopkg.toml that allows the user to specify each of these separately. I'm imagining something like this:
    [prune]
      non-go = true
      test-go = true
      unused-packages = true
    (All default to false)
  3. Modify gps.WriteDepTree() to take a struct with a PruneOptions bitfield. (Keep an eye on #903, this overlaps with that)
  4. Hack up the logic currently in cmd/dep/prune.go, moving it over into unexported functions in gps that gps.WriteDepTree() calls into itself to perform the pruning.

#120 is absurdly long and less relevant now, so I'm going to close it in favor of this.

  • Remember to update the README when we're done with this!
@VojtechVitek

This comment has been minimized.

Show comment
Hide comment
@VojtechVitek

VojtechVitek Aug 4, 2017

Can we keep LICENSE and COPYING files with a non-go option?

VojtechVitek commented Aug 4, 2017

Can we keep LICENSE and COPYING files with a non-go option?

@sdboyer

This comment has been minimized.

Show comment
Hide comment
@sdboyer

sdboyer Aug 4, 2017

Member

@VojtechVitek 💯 yes, will update OP, i think it'd be needlessly creating pain not to do that

Member

sdboyer commented Aug 4, 2017

@VojtechVitek 💯 yes, will update OP, i think it'd be needlessly creating pain not to do that

@ibrasho

This comment has been minimized.

Show comment
Hide comment
@ibrasho

ibrasho Aug 4, 2017

Collaborator

I assume you meant gps.WriteDepTree instead of gps.WriteVendorTree. 😉

I already did 1, 2 & 3, I can send a PR (it can even be merged before 4 is completed since it doesn't impact behavior). I'm still working on 4 though.

Collaborator

ibrasho commented Aug 4, 2017

I assume you meant gps.WriteDepTree instead of gps.WriteVendorTree. 😉

I already did 1, 2 & 3, I can send a PR (it can even be merged before 4 is completed since it doesn't impact behavior). I'm still working on 4 though.

@ibrasho

This comment has been minimized.

Show comment
Hide comment
@ibrasho

ibrasho Aug 4, 2017

Collaborator

@sdboyer Just to check, after reading the comments in #903 I decided to update the signature of gps.WriteDepTree and make it a method of gps.SourceManager. The new signature is:

func (sm SourceManager) WriteDepTree(basedir string, l Lock, options PruneOptions) error

I've moved the current implementation to SourceMgr and added a dummy implementation for the other dummy sms.

Collaborator

ibrasho commented Aug 4, 2017

@sdboyer Just to check, after reading the comments in #903 I decided to update the signature of gps.WriteDepTree and make it a method of gps.SourceManager. The new signature is:

func (sm SourceManager) WriteDepTree(basedir string, l Lock, options PruneOptions) error

I've moved the current implementation to SourceMgr and added a dummy implementation for the other dummy sms.

@sdboyer

This comment has been minimized.

Show comment
Hide comment
@sdboyer

sdboyer Aug 4, 2017

Member

I assume you meant gps.WriteDepTree instead of gps.WriteVendorTree. 😉

derp, yes. updated OP. once, long ago, it was called WriteVendorTree(), and i still have that stuck in my mind

I already did 1, 2 & 3 already, I can send a PR (it can even be merged before 4 is completed since it doesn't impact behavior)

🎉 🎉 🎊 🎊 killin it!

Member

sdboyer commented Aug 4, 2017

I assume you meant gps.WriteDepTree instead of gps.WriteVendorTree. 😉

derp, yes. updated OP. once, long ago, it was called WriteVendorTree(), and i still have that stuck in my mind

I already did 1, 2 & 3 already, I can send a PR (it can even be merged before 4 is completed since it doesn't impact behavior)

🎉 🎉 🎊 🎊 killin it!

@karrick

This comment has been minimized.

Show comment
Hide comment
@karrick

karrick Aug 4, 2017

Contributor

@sdboyer The code I just uploaded to my version of dep creates a prune tree while it's verifying dependency hashes.

It might be advantageous to look at this and #121 together.

Contributor

karrick commented Aug 4, 2017

@sdboyer The code I just uploaded to my version of dep creates a prune tree while it's verifying dependency hashes.

It might be advantageous to look at this and #121 together.

@sdboyer

This comment has been minimized.

Show comment
Hide comment
@sdboyer

sdboyer Aug 4, 2017

Member

@karrick there's definitely a lot of overlap...or at least, adjacency...eventually between pruning and verification, but if the prune tree is the one we discussed in #121 (i haven't looked yet), then i think that's a separate kind of pruning from what's going on here.

now, it's possible that we might be able to join these together. however, my current plan (as noted in the OP) is that if you do the kind of pruning this issue is focused on, then you don't get vendor/ verification. that's probably a bridge we COULD cross, but i'd want that to be a separate thing, as there's a long-term goal of having the hash digests we put into Gopkg.lock be the same as the ones that end up being used for content-addressability in the "third space".

Member

sdboyer commented Aug 4, 2017

@karrick there's definitely a lot of overlap...or at least, adjacency...eventually between pruning and verification, but if the prune tree is the one we discussed in #121 (i haven't looked yet), then i think that's a separate kind of pruning from what's going on here.

now, it's possible that we might be able to join these together. however, my current plan (as noted in the OP) is that if you do the kind of pruning this issue is focused on, then you don't get vendor/ verification. that's probably a bridge we COULD cross, but i'd want that to be a separate thing, as there's a long-term goal of having the hash digests we put into Gopkg.lock be the same as the ones that end up being used for content-addressability in the "third space".

@docmerlin

This comment has been minimized.

Show comment
Hide comment
@docmerlin

docmerlin Aug 14, 2017

I'd like a flag to turn off pruning, for situations where speed is essential such as CI.

docmerlin commented Aug 14, 2017

I'd like a flag to turn off pruning, for situations where speed is essential such as CI.

@pierrre

This comment has been minimized.

Show comment
Hide comment
@pierrre

pierrre Aug 14, 2017

@docmerlin if you care about speed:

  • vendor your dependencies
  • don't run dep on CI

pierrre commented Aug 14, 2017

@docmerlin if you care about speed:

  • vendor your dependencies
  • don't run dep on CI
@sdboyer

This comment has been minimized.

Show comment
Hide comment
@sdboyer

sdboyer Aug 14, 2017

Member

i don't think pruning, at least once absorbed into ensure, is ever likely to contribute much to slowdowns. i'd need to see evidence to the contrary before i could consider adding another flag.

Member

sdboyer commented Aug 14, 2017

i don't think pruning, at least once absorbed into ensure, is ever likely to contribute much to slowdowns. i'd need to see evidence to the contrary before i could consider adding another flag.

@VojtechVitek

This comment has been minimized.

Show comment
Hide comment
@VojtechVitek

VojtechVitek Aug 14, 2017

I'd like a flag to turn off pruning, for situations where speed is essential such as CI.

Yeah, pruning will likely be even faster, since we can avoid all the extra I/O operations.

Let's not add any more flags. This should be the default behavior imho.

VojtechVitek commented Aug 14, 2017

I'd like a flag to turn off pruning, for situations where speed is essential such as CI.

Yeah, pruning will likely be even faster, since we can avoid all the extra I/O operations.

Let's not add any more flags. This should be the default behavior imho.

@sdboyer

This comment has been minimized.

Show comment
Hide comment
@sdboyer

sdboyer Aug 14, 2017

Member

Yeah, pruning will likely be even faster, since we can avoid all the extra I/O operations.

for now, at least, we generally have to write out the full tree, then prune items out, so we can't realize this gain right now. but you're certainly right that we have the headroom to be able to explore avoiding writing files that will be pruned in the first place.

Member

sdboyer commented Aug 14, 2017

Yeah, pruning will likely be even faster, since we can avoid all the extra I/O operations.

for now, at least, we generally have to write out the full tree, then prune items out, so we can't realize this gain right now. but you're certainly right that we have the headroom to be able to explore avoiding writing files that will be pruned in the first place.

@GlenDC

This comment has been minimized.

Show comment
Hide comment
@GlenDC

GlenDC Sep 8, 2017

Can we please also have some more fine grained options for when needed.

For example when using a dependency such as https://github.com/capnproto/go-capnproto2, I would like to still vendor the https://github.com/capnproto/go-capnproto2/tree/master/capnpc-go directory (and all its content) and the https://github.com/capnproto/go-capnproto2/blob/master/std/go.capnp file. This is useful as those files are still dependencies for our builds, even though they are not used go packages in our project or go-files.

In such cases it could be great if we could add some exception filters or w/e on a per dependency basis. Perhaps this scenarios is rare. But without such a possibility I would be resorted to use some shell script on top of dep to copy those files over afterwards somehow (during my bi-weekly dependency upgrade cycle).

GlenDC commented Sep 8, 2017

Can we please also have some more fine grained options for when needed.

For example when using a dependency such as https://github.com/capnproto/go-capnproto2, I would like to still vendor the https://github.com/capnproto/go-capnproto2/tree/master/capnpc-go directory (and all its content) and the https://github.com/capnproto/go-capnproto2/blob/master/std/go.capnp file. This is useful as those files are still dependencies for our builds, even though they are not used go packages in our project or go-files.

In such cases it could be great if we could add some exception filters or w/e on a per dependency basis. Perhaps this scenarios is rare. But without such a possibility I would be resorted to use some shell script on top of dep to copy those files over afterwards somehow (during my bi-weekly dependency upgrade cycle).

@pxue

This comment has been minimized.

Show comment
Hide comment
@pxue

pxue Sep 8, 2017

question about pruning / might be related, is there similar configuration as the govendor ignore build tag flag? use case is to ignore all *_test.go and appengine related code from vendor. The ignored config in the toml doesn't seem to be working as expected.

pxue commented Sep 8, 2017

question about pruning / might be related, is there similar configuration as the govendor ignore build tag flag? use case is to ignore all *_test.go and appengine related code from vendor. The ignored config in the toml doesn't seem to be working as expected.

@karrick

This comment has been minimized.

Show comment
Hide comment
@karrick

karrick Sep 8, 2017

Contributor

I'm not too certain what we're trying to solve with dep prune.

It seems like there's risk in pruning files that need to be around for a project dependency to build.

Contributor

karrick commented Sep 8, 2017

I'm not too certain what we're trying to solve with dep prune.

It seems like there's risk in pruning files that need to be around for a project dependency to build.

@sdboyer

This comment has been minimized.

Show comment
Hide comment
@sdboyer

sdboyer Sep 9, 2017

Member

I'm not too certain what we're trying to solve with dep prune.

my initial take was more along these lines, but many folks consider this a hard requirement for using dep, as they want to keep the size of their vendor directory minimal. so, i relented - see #120 for the history there.

question about pruning / might be related, is there similar configuration as the govendor ignore build tag flag? use case is to ignore all *_test.go and appengine related code from vendor. The ignored config in the toml doesn't seem to be working as expected.

ah, that's a separate thing. pruning is strictly about removing excess files from vendor/. what you're after, i think, goes deeper, as it speaks to what paths we actually follow when analyzing code. it's on the roadmap, but backburnered, as it's not something people have been really coming out of the woodwork to demand - #291

Can we please also have some more fine grained options for when needed.

it's possible that we could add some more rules, but we need to be careful about this. we're going to have vendor/ verification coming pretty soon, which is going to be a drastic performance improvement (and reliability as well) - we'll no longer need to rewrite all of vendor/ on each dep ensure run. that's going to be predicated on having a new hash digest value in Gopkg.lock - something like tree-digest this:

[[projects]]
  branch = "master"
  name = "github.com/foo/bar"
  packages = ["."]
  revision = "cd8b52f8269e01eb486dfeef29f8fe4d5b397e0b"
  tree-digest = "0ba2aa4b3860c5fcc1219b31afaf30c1fcf3b97cbb0a5116b498cafedbb06a1b"

that'll be a hash digest of the contents of the original, unpruned tree. awesome if you're not pruning, but useless if you are - the digests won't match. we need another hash of the tree AFTER pruning for it to be useful:

  tree-digest = "0ba2aa4b3860c5fcc1219b31afaf30c1fcf3b97cbb0a5116b498cafedbb06a1b"
  pruned-digest = "1ed417a0bec57ffe988fae1cba8f3d49934fb873394d61844e0b3c96d69573fe"

if we're going to do that, though, then we need new ways of answering the question of "are imports/Gopkg.toml in sync with Gopkg.lock?" - because if the pruning options in Gopkg.toml change, then the pruned-digest in Gopkg.lock will need to change, as will the contents of vendor. so, we can't name it pruned-digest, as that obfuscates the pruning options that went into it.

the main reason i proposed pruning be controlled via a set of boolean flags is because it would make a comprehensible way of capturing what the applied pruning options are. say V represents vendor stripping, U unused packages, T test files, N non-go files. we might then represent the tree digests like this:

  [tree-digests]
    V = "0ba2aa4b3860c5fcc1219b31afaf30c1fcf3b97cbb0a5116b498cafedbb06a1b"
    VUTN = "1ed417a0bec57ffe988fae1cba8f3d49934fb873394d61844e0b3c96d69573fe"

note that i'm not saying this will be the final form - maybe we use full words, maybe we express as the integer value of the bitfield; maybe we only include the explicit pruned value, rather than always having just the V type. lots of possibilities, i'm just sketching it out. point is, simply by virtue of what key(s) are present, we know which pruning options were set in Gopkg.toml, and thus can tell when we need to sync those up.

if we provide anything other than boolean controls here, then we'd likely have to just hash the prune options. i think that would probably work, but it'd be a hash key pointing to a hash value - utterly useless for understanding what's going on.

we can certainly think about adding more boolean options - for example, we could divide "non-go" files into strictly *.go files vs. all the other types of possible files (.c, cc, .h, etc...) files. or something else ¯\_(ツ)_/¯.

now, maybe those wouldn't cover your *.capnp example. if so, you'd have to just not set the N flag for that particular dependency. but, i think it's preferable for us to walk this line of comprehensibility and stick with flags, rather than giving more arbitrary control. any individual user can always choose to not care about dep's verification&fast paths, and implement their own, more fine-grained pruner. but minimizing vendor size is an exercise with seriously diminishing returns, and i'll happily trade some small efficiency losses there for greater comprehensibility.

Member

sdboyer commented Sep 9, 2017

I'm not too certain what we're trying to solve with dep prune.

my initial take was more along these lines, but many folks consider this a hard requirement for using dep, as they want to keep the size of their vendor directory minimal. so, i relented - see #120 for the history there.

question about pruning / might be related, is there similar configuration as the govendor ignore build tag flag? use case is to ignore all *_test.go and appengine related code from vendor. The ignored config in the toml doesn't seem to be working as expected.

ah, that's a separate thing. pruning is strictly about removing excess files from vendor/. what you're after, i think, goes deeper, as it speaks to what paths we actually follow when analyzing code. it's on the roadmap, but backburnered, as it's not something people have been really coming out of the woodwork to demand - #291

Can we please also have some more fine grained options for when needed.

it's possible that we could add some more rules, but we need to be careful about this. we're going to have vendor/ verification coming pretty soon, which is going to be a drastic performance improvement (and reliability as well) - we'll no longer need to rewrite all of vendor/ on each dep ensure run. that's going to be predicated on having a new hash digest value in Gopkg.lock - something like tree-digest this:

[[projects]]
  branch = "master"
  name = "github.com/foo/bar"
  packages = ["."]
  revision = "cd8b52f8269e01eb486dfeef29f8fe4d5b397e0b"
  tree-digest = "0ba2aa4b3860c5fcc1219b31afaf30c1fcf3b97cbb0a5116b498cafedbb06a1b"

that'll be a hash digest of the contents of the original, unpruned tree. awesome if you're not pruning, but useless if you are - the digests won't match. we need another hash of the tree AFTER pruning for it to be useful:

  tree-digest = "0ba2aa4b3860c5fcc1219b31afaf30c1fcf3b97cbb0a5116b498cafedbb06a1b"
  pruned-digest = "1ed417a0bec57ffe988fae1cba8f3d49934fb873394d61844e0b3c96d69573fe"

if we're going to do that, though, then we need new ways of answering the question of "are imports/Gopkg.toml in sync with Gopkg.lock?" - because if the pruning options in Gopkg.toml change, then the pruned-digest in Gopkg.lock will need to change, as will the contents of vendor. so, we can't name it pruned-digest, as that obfuscates the pruning options that went into it.

the main reason i proposed pruning be controlled via a set of boolean flags is because it would make a comprehensible way of capturing what the applied pruning options are. say V represents vendor stripping, U unused packages, T test files, N non-go files. we might then represent the tree digests like this:

  [tree-digests]
    V = "0ba2aa4b3860c5fcc1219b31afaf30c1fcf3b97cbb0a5116b498cafedbb06a1b"
    VUTN = "1ed417a0bec57ffe988fae1cba8f3d49934fb873394d61844e0b3c96d69573fe"

note that i'm not saying this will be the final form - maybe we use full words, maybe we express as the integer value of the bitfield; maybe we only include the explicit pruned value, rather than always having just the V type. lots of possibilities, i'm just sketching it out. point is, simply by virtue of what key(s) are present, we know which pruning options were set in Gopkg.toml, and thus can tell when we need to sync those up.

if we provide anything other than boolean controls here, then we'd likely have to just hash the prune options. i think that would probably work, but it'd be a hash key pointing to a hash value - utterly useless for understanding what's going on.

we can certainly think about adding more boolean options - for example, we could divide "non-go" files into strictly *.go files vs. all the other types of possible files (.c, cc, .h, etc...) files. or something else ¯\_(ツ)_/¯.

now, maybe those wouldn't cover your *.capnp example. if so, you'd have to just not set the N flag for that particular dependency. but, i think it's preferable for us to walk this line of comprehensibility and stick with flags, rather than giving more arbitrary control. any individual user can always choose to not care about dep's verification&fast paths, and implement their own, more fine-grained pruner. but minimizing vendor size is an exercise with seriously diminishing returns, and i'll happily trade some small efficiency losses there for greater comprehensibility.

@ibrasho

This comment has been minimized.

Show comment
Hide comment
@ibrasho

ibrasho Sep 9, 2017

Collaborator

@GlenDC It's worth it to reaffirm that pruning (beyond removing nested vendor dirs) is opt-in and not forced. So in the worst case scenario, you can just keep all options off and now files will be deleted.

I was thinking of adding an exception list to prune options, but that would complicate vendor verification in addition to the points @sdboyer mentioned.

Collaborator

ibrasho commented Sep 9, 2017

@GlenDC It's worth it to reaffirm that pruning (beyond removing nested vendor dirs) is opt-in and not forced. So in the worst case scenario, you can just keep all options off and now files will be deleted.

I was thinking of adding an exception list to prune options, but that would complicate vendor verification in addition to the points @sdboyer mentioned.

@vergenzt

This comment has been minimized.

Show comment
Hide comment
@vergenzt

vergenzt Nov 10, 2017

What's the status of this? It looks like some relevant PRs have been merged, but I can't tell whether this is nearing completion or not. Does it seem close? :)

vergenzt commented Nov 10, 2017

What's the status of this? It looks like some relevant PRs have been merged, but I can't tell whether this is nearing completion or not. Does it seem close? :)

@ibrasho

This comment has been minimized.

Show comment
Hide comment
@ibrasho

ibrasho Nov 11, 2017

Collaborator

@vergenzt Almost there. After #1226 gets merged, I'll push one more commit to complete #1219 and then we should have the initial implementation of this feature in dep.

Collaborator

ibrasho commented Nov 11, 2017

@vergenzt Almost there. After #1226 gets merged, I'll push one more commit to complete #1219 and then we should have the initial implementation of this feature in dep.

@ixdy

This comment has been minimized.

Show comment
Hide comment
@ixdy

ixdy Jan 9, 2018

Any update? Both #1226 and #1219 have merged.

ixdy commented Jan 9, 2018

Any update? Both #1226 and #1219 have merged.

@sdboyer

This comment has been minimized.

Show comment
Hide comment
@sdboyer

sdboyer Jan 9, 2018

Member
Member

sdboyer commented Jan 9, 2018

@sdboyer

This comment has been minimized.

Show comment
Hide comment
@sdboyer

sdboyer Jan 24, 2018

Member

and we are finally, FINALLY ready.

Member

sdboyer commented Jan 24, 2018

and we are finally, FINALLY ready.

@sdboyer

This comment has been minimized.

Show comment
Hide comment
@sdboyer

sdboyer Jan 24, 2018

Member

It's merged in, closing this. Release coming presently...

Member

sdboyer commented Jan 24, 2018

It's merged in, closing this. Release coming presently...

@sdboyer sdboyer closed this Jan 24, 2018

@sttts

This comment has been minimized.

Show comment
Hide comment
@sttts

sttts commented Jan 24, 2018

/cc @fejta

bpicode added a commit to bpicode/fritzctl that referenced this issue Feb 10, 2018

Define prune options in Gopkg.toml
The 'prune' command has been absorbed into 'ensure',
see golang/dep#944. Also
it received some makeover, allowing tests to be pruned.
'dep ensure' now got rid of a bunch of files for us.

bpicode added a commit to bpicode/depbot that referenced this issue Feb 10, 2018

Do not run prune
'prune' is now part of 'ensure' and configured on per-project-level in
Gopkg.toml. See golang/dep#944.

bartekn added a commit to stellar/go that referenced this issue Aug 2, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment