New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove Cascading platform from the base logic #256

Merged
merged 150 commits into from Jan 3, 2015

Conversation

Projects
None yet
6 participants
@bhiles
Contributor

bhiles commented Jul 11, 2014

To allow additional platforms to be added to Cascalog, we must first extract the Cascading platform from the base logic. This pull request begins the attempt to do that. Additionally, it begins adding in a Clojure platform, but that is barely built.

How to use

To set the platform you need to use the with-context macro and specify which platform to use.

Cascading

(require '[cascalog.api :as api]         
    '[cascalog.logic.platform :as platform]
    '[cascalog.cascading.platform])
(import 'cascalog.cascading.platform.CascadingPlatform)

(let [nums [[1][2]]]
    (platform/with-context (CascadingPlatform.)
        (api/??<- [?n] (nums ?n))))
=> ([1] [2])

Clojure

(require '[cascalog.clojure.platform])
(import 'cascalog.clojure.platform.ClojurePlatform)

(let [nums [[1][2]]]
    (platform/with-context (ClojurePlatform.)
        (api/??<- [?n] (nums ?n))))
=> ([(1)] [(2)])

Steps taken so far

  • removed all cascading dependencies from the logic namespace (except for testing.clj)
  • pulled in some functions from the cascading namespace into the logic.platform.clj since they appear platform agnostic (like compile-query)
  • extended IPlatform to handle to-generator so we only need one dynamic var (context) and not two (context and *runner)
  • built my really simple clojure platform and gave it a new namespace
  • modified some tests so they continue using the Cascading platform. But many tests (about 21) currently fail and I will fix them once I'm more confident the direction I'm moving is the right one.

Questions

  • Is this approach moving in the right direction?
  • Is with-context a good way to handle specifying which platform to use?
  • Is creating EmptyPlatform a suitable solution to giving a dynamic var a default value without pulling in an implementation namespace?
@patrickwmcgee

This comment has been minimized.

Contributor

patrickwmcgee commented Jul 11, 2014

This is awesome! I had a version that isn't as far along as this for a school project that I'm still working on https://github.com/patrickwmcgee/cascalog .

;; source-map is a map of identifier to tap, or source. Pipe is the
;; current pipe that the user needs to operate on.
(defrecord ClojureFlow [source-map sink-map trap-map tails pipe name])

This comment has been minimized.

@sritchie

sritchie Jul 14, 2014

Collaborator

so, this guy should probably be a Cascading only thing - my thought was that the IGenerator protocol could describe a generator for each specific combination of platform + generator (via multimethod double dispatch).

The pipes, tails, etc, don't need to come along for the ride with some other platform, ya know?

This comment has been minimized.

@bhiles

bhiles Jul 15, 2014

Contributor

I moved ClojureFlow back into into the cascading namespace.

(defn gen? [g]
(generator? *context* g))
(defn compile-query [query]

This comment has been minimized.

@sritchie

sritchie Jul 14, 2014

Collaborator

I agree that this should probably be general, and I like re-using the cascalog-zip stuff.

(defn set-context! [c]
(alter-var-root #'*context* (constantly c)))
(defmacro with-context

This comment has been minimized.

@sritchie

sritchie Jul 14, 2014

Collaborator

definitely think this is the right approach, BUT, I don't like that all existing code will break. I think we should add mutating "set-context!" functions too, so you can just set this stuff for your entire platform. Alternatively we could have ?- and ??- macros defined in each cascalog.*.api namespace? Thoughts?

This comment has been minimized.

@bhiles

bhiles Jul 14, 2014

Contributor

I don't really have a preference. I think it's an interesting idea to add a cascalog.cascading.api namespace, but that would definitely break existing code (although pretty easily fixed).

I implemented a set-context-* in cascalog.api since it was the simplest implementation. So the code will look like this:

(require '[cascalog.api :as api])
(api/set-cascading-context!)
(let [nums [[1][2]]]
    (api/??<- [?n] (nums ?n)))
=> ([1] [2])

So I defer to you and others on what design is preferred.

Additional changes I've made

  • got the tests in cascalog-core to pass
  • ripped out my hacky way of setting the context from all the tests and have implemented a setup step in the tests to set the context (like this)
  • modified can-generate? to be a function so it gets the context at runtime instead of initialization

This comment has been minimized.

@jeroenvandijk

jeroenvandijk Jul 16, 2014

I'm really excited about this too!

Just 2 my cents regarding the context switching. I'm not very familiar with the new code, so please ignore this if it doesn't make sense.

I think it makes development for the end-user easier if you can switch context without having to switch namespaces. This would allow for instance to have your tests run against the Clojure platform with one simple change. Everyone would program against cascalog.api and switch the context depending on where it needs to run. So +1 for your current approach.

I also think you don't have to break existing code in this case, because you can set the default context to cascading if it has not been set yet.

(defrecord ClojurePlatform []
IPlatform
(generator? [_ x]

This comment has been minimized.

@sritchie

sritchie Jul 14, 2014

Collaborator

yeah, this is what I want to hook into when defining other platforms. This should probably be a multimethod, honestly, for double dispatch.

This comment has been minimized.

@bhiles

bhiles Jul 15, 2014

Contributor

@sritchie I'm not quite sure what you want the platform interface to look like, so any elaboration on what it should look like and why would be very helpful.

I took a stab at an implementation by just moving IGenerator's into multimethods here, bhiles#1

@patrickwmcgee

This comment has been minimized.

Contributor

patrickwmcgee commented Jul 14, 2014

@bhiles to follow up, I'm also trying to work on this would you be interested in collaborating?

@bhiles

This comment has been minimized.

Contributor

bhiles commented Jul 14, 2014

@patrickwmcgee sounds great! But, I'm not quite sure how to divide up the work though.

I'm going to keep moving forward removing Cascading from the base logic (as per @sritchie's suggestions), so that that there is a clean abstraction that you can plug different platforms into. You have a lot of in-memory platform code written that might be easily pushed into the cleaned up abstraction. If that appeals to you, I'd be happy to take any pull requests on my branch.

Or, I'm open to other suggestions on ways we could collaborate :)

@sorenmacbeth

This comment has been minimized.

Collaborator

sorenmacbeth commented Jul 14, 2014

Just wanted to say I'm super excited about this happening!

@bhiles

This comment has been minimized.

Contributor

bhiles commented Jul 14, 2014

@sritchie thank you so much for the comments! I'm excited that I'm learning enough cascalog to modify it!

loop-join-fields
(cons [j-fields j-type] rest-type-seqs))))))
(defmulti agg-clojure

This comment has been minimized.

@sritchie

sritchie Dec 28, 2014

Collaborator

docstring

@@ -0,0 +1,83 @@
(ns cascalog.in-memory.tuple
(:refer-clojure :exclude [sort])

This comment has been minimized.

@sritchie

sritchie Dec 28, 2014

Collaborator

let's add some docs in this namespace (or in the platform namespace?) about the tuple structure, and how the in-memory platform defines this stuff.

(defn smallest-arity [fun]
"Returns the smallest number of arguments the function takes"
(->> fun meta :arglists first count))

This comment has been minimized.

@sritchie

sritchie Dec 28, 2014

Collaborator

nice

[tmpfiles (vec tmpforms)]))
(defn- doublify [tuples]
(defn doublify [tuples]

This comment has been minimized.

@sritchie

sritchie Dec 28, 2014

Collaborator

I forget why I did this. Probably for equality, yeah?

This comment has been minimized.

@sritchie

sritchie Dec 28, 2014

Collaborator

if you know, let's add a docstring

@@ -55,6 +55,14 @@ public static void execute(List<Object> taps, List<Object> gens) {
execute(null, taps, gens);
}
public static void setCascadingPlatform() {

This comment has been minimized.

@sritchie

sritchie Dec 28, 2014

Collaborator

nice. JCascalog is a maintenance nightmare, since none of the maintainers use it. I'd almost like to pull it out into a separate project.

This comment has been minimized.

@dkincaid

dkincaid Dec 28, 2014

Contributor

Since I haven't been able to convince our powers that be to try Clojure in any meaningful way we are using only JCascalog right now and using it extensively. I'd be happy to help contribute, but would need some help understanding the code base better. Everything I try doing it on my own I get very confused.

In any event it is extremely important to us to keep JCasclog working, so whatever I can do to help make sure that happens I'm in.

This comment has been minimized.

@sritchie

sritchie Dec 28, 2014

Collaborator

okay, this is great news. Definitely committed to keeping it going.

@sritchie

This comment has been minimized.

Collaborator

sritchie commented Dec 28, 2014

This is terrific work. Not many comments at all.

Are you free to jump on skype or IRC this week? I want to jump in and help write some docs on how to use the new in-memory platform - doing a pass together would help me get my mind around what's missing, from your perspective, and where we should go next with this.

Tackle those few comments and let's get this thing merged.

@sritchie

This comment has been minimized.

Collaborator

sritchie commented Dec 28, 2014

Oh, also, can you bump the major version in project.clj? This certainly busts semver in some way, which is fine. Finally adding new platforms is a huge deal.

@bhiles

This comment has been minimized.

Contributor

bhiles commented Dec 28, 2014

Awesome!

Skype (I'm bhiles) or IRC work or me, your call. I'm free anytime tomorrow, the morning on Tuesday, and the evening of Friday. Do any of those times work for you?

And I'm planning on addressing the comments tonight.

On Dec 28, 2014, at 11:21 AM, Sam Ritchie notifications@github.com wrote:

This is terrific work. Not many comments at all.

Are you free to jump on skype or IRC this week? I want to jump in and help write some docs on how to use the new in-memory platform - doing a pass together would help me get my mind around what's missing, from your perspective, and where we should go next with this.

Tackle those few comments and let's get this thing merged.


Reply to this email directly or view it on GitHub.

@bhiles

This comment has been minimized.

Contributor

bhiles commented Dec 30, 2014

@sritchie I believe I've addressed all of your comments.

@sritchie

This comment has been minimized.

Collaborator

sritchie commented Jan 1, 2015

@bhiles , how about next week? I thought the holidays would be less busy, but of course I was wrong.

If you can fix this merge conflict, I'll merge this. Nice work!!

@sritchie

This comment has been minimized.

Collaborator

sritchie commented Jan 2, 2015

@bhiles just wanted to ping and let you know that I'm good to merge this once the merge conflicts are fixed. Cheers!

@bhiles

This comment has been minimized.

Contributor

bhiles commented Jan 3, 2015

@sritchie I'm free almost anytime Wednesday - Sunday this week.

bhiles added some commits Jan 3, 2015

Merge branch 'develop' into bsh-clojure-engine-2
Conflicts:
	VERSION
	cascalog-core/src/clj/cascalog/cascading/types.clj
	cascalog-core/src/java/jcascalog/Api.java
	cascalog-core/test/cascalog/jcascalog_test.clj
@bhiles

This comment has been minimized.

Contributor

bhiles commented Jan 3, 2015

@sritchie the merge conflicts are resolved!!!! :D

sritchie added a commit that referenced this pull request Jan 3, 2015

Merge pull request #256 from bhiles/bsh-clojure-engine
Remove Cascading platform from the base logic

@sritchie sritchie merged commit 6467356 into nathanmarz:develop Jan 3, 2015

1 check passed

continuous-integration/travis-ci The Travis CI build passed
Details
@sritchie

This comment has been minimized.

Collaborator

sritchie commented Jan 3, 2015

Niiiiiice. @sorenmacbeth, you pysched?

@sritchie

This comment has been minimized.

Collaborator

sritchie commented Jan 3, 2015

@bhiles I did find one issue with this. The cascading platform maps "collectify" across sources, so

[1 2 3] and [[1] [2] [3]] are equivalent and valid sources. Can we make this change for the in memory?

@patrickwmcgee

This comment has been minimized.

Contributor

patrickwmcgee commented Jan 3, 2015

🎆 Awesome job! 🎆

@bhiles

This comment has been minimized.

Contributor

bhiles commented Jan 3, 2015

@sritchie yay! First bug found! I created #266 to address the collectify issue.

@bhiles

This comment has been minimized.

Contributor

bhiles commented Jan 3, 2015

@patrickwmcgee thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment