Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

prune operation if it isn't needed #247

Merged
merged 8 commits into from Apr 19, 2015

Conversation

dollschasingmen
Copy link
Contributor

No description provided.

(grouped FilterOperation))
(let [generators (concat (grouped Generator)
(grouped GeneratorSet))
nessecary-fields (->> check-input-operations
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

necessary-fields

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oops

@Quantisan
Copy link
Collaborator

@dollschasingmen A couple questions to help me understand please. What's the gain in doing this? Is there any assumption being made in the pruning?

@dollschasingmen
Copy link
Contributor Author

main use case:

(def my-query [outvars] (<- outvars (my-gen !a !b) (pred1 !a !b :> c) (pred2 !a b :> !d))

then suppose you call it like this:

(my-query ["!a])

without the pruning code, both pred1 and pred2 would be called. with the pruning code only pred1 would be called. so you can use it to dynamically only request the functions that you need to run in the query. should be more efficient this way (since you just skip calling code that where the output is ignored). in practice, this is best if pred2, for example, is really slow/heavy.

so tl:dr, it only runs code that is actually needed.

assumptions - in theory none :) the main approach of this is to check if the output of a particular predicate is necessary -- aka, is the output var being used somewhere else.

there are a few exceptions to this -- which the test document.

@sritchie
Copy link
Collaborator

@dollschasingmen, sorry I never really looked at this. I know it's been a long, long time.

One thing we need to guarantee is that we include filtering variables in the set of variables that we protect, not just the final projection. a query like

(<- [?x]
  (src ?x)
  (* ?x ?x :> ?square)
  (even? ?square))

Needs to return only even squares even though the filter variable's removed. Does that make sense? Let me know if you're still interested here and I can give you a super-fast review turnaround time. Thanks again!

dollschasingmen added 2 commits April 18, 2015 20:42
@dollschasingmen
Copy link
Contributor Author

@sritchie, np, thanks for not forgetting this PR :)

This function doesn't prune filter predicates under any circumstances. The pruning action is limited to only non-filter and non-generator predicates. I added a few more tests to assert that (see latest commits).

Sorry, the function is a bit convoluted. It has been a long time and reading through it was not super simple even for me, ha. Not sure how to make it the logic more obvious/readable (suggestions very much appreciated here), but I re-wrote the function and test descriptions to make it all a bit more clear (hopefully?)

(let [check-input-operations (concat (grouped Operation)
(grouped FilterOperation)
(grouped Aggregator))]
(if (some #(empty? (:input %)) check-input-operations)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

check for existance of no-input operators

@sritchie
Copy link
Collaborator

This looks great. Currently compilation's forced at the subquery boundary, but I can see this advancing down the road to actually filter down across subqueries when the user does some projection like (src ?x _) when using a query.

I'm happy to merge this now, but I do have one more Q / suggestion; I might be wrong here, but reading the code it looks like this optimization might not be able to handle chains of operations that end in some unused variable, like

(<- [?x]
  (src ?x)
  (* ?x ?x :> ?square)
  (dec ?square :> ?square-minus-one))

I bet the easiest way to deal with this would be to run the same optimization you're currently running, but to pass it through a fixed-point function like

(defn fixed-point [f guess]
  (let [next (f guess)]
    (if (= guess next)
      next
      (fixed-point f next))))

and cull all operations backwards. Up to you if you think this is a good idea or worth doing in this pull request. I always feel lame asking for MORE work in a pull request, especially given how badly I dropped the ball on reviewing this one.

Thanks again!

@sritchie
Copy link
Collaborator

This is pretty closely related to

#24

Both the chained-and-ignored predicate thing and duplicate operations happen a lot as a result of predicate macros.

@dollschasingmen
Copy link
Contributor Author

yea - good point re: handling chains of operations. i think i got it to work using basically that fixed-point idea, see latest commit.

also agree about the subquery boundary and pred-macros -- sure would be nice to be able to prune into that as well :) but that I might prefer to leave to another PR.

@dollschasingmen
Copy link
Contributor Author

hm not sure why CI failed, test passed locally

sritchie added a commit that referenced this pull request Apr 19, 2015
@sritchie sritchie merged commit 5150e6a into nathanmarz:develop Apr 19, 2015
@sritchie
Copy link
Collaborator

Looks like a dependency download failure with cascalog-math. Thanks again!

@slpsys
Copy link

slpsys commented Apr 20, 2015

Sweet!

@sritchie
Copy link
Collaborator

Love that this works now:

cascalog.api> (defn throw! [& xs] (throw (RuntimeException. "gotcha!")))
#'cascalog.api/throw!
cascalog.api> (??<- [?x] ([[1]] ?x) (throw! ?x :> ?ignored))
([1])
cascalog.api> (??<- [?x ?ignored] ([[1]] ?x) (throw! ?x :> ?ignored))
FlowException local step failed  cascading.flow.planner.FlowStepJob.blockOnJob (FlowStepJob.java:219)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants