Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Libraries/Functions in closures #136

Closed
ljank opened this issue Mar 2, 2015 · 7 comments
Closed

Libraries/Functions in closures #136

ljank opened this issue Mar 2, 2015 · 7 comments

Comments

@ljank
Copy link
Contributor

ljank commented Mar 2, 2015

As stated in the docs:

PigPen supports a number of different types of closures. (...) Compiled functions and mutable structures like atoms won't work.

We have time in millis in our data and would like to format it as YYYY-MM-DD, but that's impossible due to aforementioned reasons :( Are there any workaround to make functions work? Otherwise this statement looks far fetched:

There are no special user defined functions (UDFs). Define Clojure functions, anonymously or named, and use them like you would in any Clojure program.

Thank you!

@mbossenbroek
Copy link
Contributor

This just means that you can't close over a compiled function. For example:

(require '[simple-time.core :as st]))

;; works
(defn format-ts [data]
  (pig/map (fn [x] (st/format x :date)) data))

(format-ts my-data)

;; works
(defn format-ts [data format]
  (pig/map (fn [x] (st/format x format)) data))

(format-ts my-data :date)

;; won't work because f is compiled
(defn format-ts [data f]
  (pig/map f data))

(format-ts my-data (fn [x] (st/format x :date)))

There is a way around this, but it's not officially supported yet:

(defn format-ts [data f]
  (pigpen.map/map* f data))

(format-ts my-data (pigpen.code/trap (fn [x] (st/format x :date))))

Let me know if that's not clear or if you have a specific example of what you're trying to do.

Also, check out pigpen-support@googlegroups.com or https://groups.google.com/forum/#!forum/pigpen-support for future questions.

@ljank
Copy link
Contributor Author

ljank commented Mar 2, 2015

I still get CompilerException java.lang.RuntimeException: No such namespace: st in cases that meant to be working :\

@mbossenbroek
Copy link
Contributor

Could you send a code sample and stack trace that you get?

@mbossenbroek
Copy link
Contributor

Might be worth mentioning - any code that you close over needs to be in a file that will end up in the uberjar that goes to hadoop. If you're just in a user ns in a repl, the code I listed won't work.

If that's the case, let me know if that's not clear from the docs & I can update them.

@ljank
Copy link
Contributor Author

ljank commented Mar 2, 2015

I've spotted that it behaves differently when using pig/return and loading data from files (same for JSON and Avro). This works just fine:

(require '[simple-time.core :as st])

(defn time->ymd
  [data]
  (pig/map (fn [entry]
             (assoc entry
               :ymd (st/format (st/datetime (:time entry)) :date)))
           data))

(->> (pig/return [{:time 1425254400010} {:time 1425254400019} {:time 1425254400090}])
     (time->ymd)
     (pig/dump))
; [{:ymd "2015-03-02", :time 1425254400010} 
;  {:ymd "2015-03-02", :time 1425254400019} 
;  {:ymd "2015-03-02", :time 1425254400090}]

For JSON:

(spit "/tmp/events.json" "{\"time\": 1425254400010}\n{\"time\": 1425254400019}\n{\"time\": 1425254400090}")
(->> (pig/load-json "/tmp/events.json")
     (time->ymd)
     (pig/dump))

CompilerException java.lang.RuntimeException: No such namespace: st

Same error while using Avro.

@mbossenbroek
Copy link
Contributor

Yeah, it sounds like you're in a user ns. This complete example works for me:

(ns pigpen-demo.core
  (:require [pigpen.core :as pig]
            [simple-time.core :as st]))

(defn time->ymd
  [data]
  (pig/map (fn [entry]
             (assoc entry
               :ymd (st/format (st/datetime (:time entry)) :date)))
           data))

(clojure.pprint/pprint
  (->> (pig/return [{:time 1425254400010} {:time 1425254400019} {:time 1425254400090}])
       (time->ymd)
       (pig/dump)))

(spit "/tmp/events.json" "{\"time\": 1425254400010}\n{\"time\": 1425254400019}\n{\"time\": 1425254400090}")

(clojure.pprint/pprint
  (->> (pig/load-json "/tmp/events.json")
       (time->ymd)
       (pig/dump)))

and produces this output:

[{:ymd "2015-03-01", :time 1425254400010}
 {:ymd "2015-03-01", :time 1425254400019}
 {:ymd "2015-03-01", :time 1425254400090}]
Start reading from  /tmp/events.json
Stop reading from  /tmp/events.json
[{:ymd "2015-03-01", :time 1425254400010}
 {:ymd "2015-03-01", :time 1425254400019}
 {:ymd "2015-03-01", :time 1425254400090}]
nil

If you're in a file & still getting that exception, could you run these commands in the REPL and let me know what you get?

(pigpen.code/trap identity)
(ns-name *ns*)
(pigpen.code/ns-exists *1)

@ljank
Copy link
Contributor Author

ljank commented Mar 3, 2015

You're right — everything works fine when running from file and being not in a user namespace. Thank you for lightning fast and correct diagnosis!

Next time I'll use mailgroup. Sorry!

@ljank ljank closed this as completed Mar 3, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants