Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Complex dependency deduping #45

Closed
bitprophet opened this issue Feb 27, 2013 · 4 comments
Closed

Complex dependency deduping #45

bitprophet opened this issue Feb 27, 2013 · 4 comments

Comments

@bitprophet
Copy link
Member

Re #41

I didn't tackle this right off so as not to overcomplicate v1.0, but I could see some additional tweaks/options/flavors to how we handle deduplication:

  • Right now we basically build a list of all to-be-invoked tasks and naively dedupe. No fuss no muss.
  • However, task source could matter: invoked due to pre/post is not necessarily the same "priority level" as invoked directly by hand. Users might expect pre/post "dependencies" to get deduped but for directly-invoked things to always run.
    • For example, $ invoke foo foo being deduped might be confusing.
    • Make itself, however, does do this level of deduping; make foo foo will run foo once and then say foo is up to date.
@bitprophet
Copy link
Member Author

See #298 for one not-me user encountering confusion re: current state of deduplication and pre/post tasks. Has explicit examples of what they ran, expected, and got. ❤️

Some of their "expected" outputs aren't actual bugs, but simply how the current implementation is supposed to work (though it may not be documented perfectly). The crux of that issue is the thing that wasn't strongly considered, though - having a specific task appear in BOTH the pre AND post lists for explicitly selected tasks.

Basically, deduping is being done "locally" (IIRC) such that one doesn't end up with the same task repeated in a row. Another option could be to "fully" deduplicate, as in, if a task is already in the runlist at all, never add another copy of it. More likely, we'd want to only apply that to pre/post tasks - if a user said inv foo bar foo we would not want to skip that 2nd foo.

In the end though, this ticket's core problem remains: there are a lot of strategies one could reasonably expect/employ for deduplicating tasks and we need to figure out the best way to select/swap them, or try to hone a single, least-surprising algorithm that captures most use cases.

@presidento
Copy link
Contributor

Without maintaining the dependency graph, the perfect deduplication is almost impossible. And parallel running is absolutely impossible. (If pre hook means the pre-hooked task must be run completely before calling the original task, and post hook means it must be called somewhere after the original task is fully finished. Because you don't know which part of the queue can be paralellized, and which tasks must wait for the end of the execution of the precedent tasks.)

Nevertheless, with pre-post annotations on the Call object and a two-rounded deduplication, a good enough deduplication can be reached. See my pull request. (#299)

@bitprophet
Copy link
Member Author

Don't think either myself or Bruce noted this in another ticket, and this is probably the most appropriate one anyways - Bruce Eckel dug around during PyCon sprints for prior art re: dealing with DAGs in Python, so that we'd be prepared to implement a serious graph-oriented setup for dependencies.

He found dagger and updated a copy of it to be Python 2+3 compat (it was not 3 compat): https://github.com/BruceEckel/pythondagger23

@bitprophet
Copy link
Member Author

Rolling into #461

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants