Deduping broken? #123

bitprophet opened this Issue Feb 28, 2014 · 5 comments


None yet

1 participant


Report from the ML via @flamingbear

Operating System = Mac OS X 10.9.1

(invoke)savoie@savoie-laptop ~/projects/invoke-test$ python --version
Python 2.7.6

(invoke)savoie@savoie-laptop ~/projects/invoke-test$ pip freeze
-e git+

from invoke import task, run

def clean():

def build():

def package():

and the output:

(invoke)savoie@savoie-laptop ~/projects/invoke-test$ invoke build package

I checked real quick and the docs + state of Executor both imply deduping should be on by default, and there's basic tests proving it, so this is likely a case or cases not tested for, eg:

  • 2x tasks called at one time; I think my existing test covers this but it's low level so there may be interplay between CLI and Executor not tested;
  • One of the tasks is both explicitly given and is specified as a pre-task of the other explicitly given task.

Back on this and #120 after fixing #135 \o/

Double checked and the error in this ticket is because currently execution considers each top level (CLI-given) task independently, re: pre/post tasks and deduping.

I feel like it would probably make sense for them to "know about" each other for deduplication purposes.

I'm torn on whether we'd want this to be the case when they are not consecutive, e.g. two tasks both calling a clean beforehand might want to be clean -> taskA -> clean -> taskB instead of clean -> taskA -> taskB. Will leave duplication in its existing 'aggressive' mode for now; we can probably add a third 'consecutive only' option later if people seem to need it.


Now wondering exactly why I stuck in "call tracking" based deduping too, at some point (apparently in 511393d). Or rather, why it and the other form of deduping (the one that looks at the entire list and tries removing duplicates before executing) are both implemented - they overlap significantly in behavior. IIRC it was done as a quick-and-easy method of deduplicating before the recent shakeup.

The downside of call tracking (which uses state within Task instances) is that it won't play nicely with multiprocess/threaded/etc forms of parallelism, which are planned. However, the "check before you start" methodology still would.

Feels like leaving the call tracking implementation in (because it's useful metadata) but not using it in deduplication is the way to go for now.

@bitprophet bitprophet added a commit that referenced this issue May 28, 2014
@bitprophet bitprophet Tweak no-deduping test to actually break right.
Old one was brokenly passing due to #123

Seem to have this working now. Left to do:

  • Finish remaining TODOs in the newly reworked Executor code
  • Make sure some more tests get written or skipped tests get filled out (eg at least some TODOs should spawn new tests)
  • Implement post-tasks now while we're at it, should be trivial (famous last words)
  • Write/update a changelog entry spanning this, #120, etc (feel like there was a third one in there?)

Also noting for the record/linkage that #45 still exists but I do think at this point, that level of extra control can and should wait. Probably wants the newer execute() to be teased out into more subroutines for extensibility - then a subclass implementing eg "dedupe pre/post tasks overlapping CLI ones, but not amongs themselves" would be a real possibility.


All done!

@bitprophet bitprophet closed this Jun 9, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment