Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't require join function to be anonymous #19

Closed
kyptin opened this issue Mar 31, 2014 · 5 comments · Fixed by #23
Closed

Don't require join function to be anonymous #19

kyptin opened this issue Mar 31, 2014 · 5 comments · Fixed by #23
Labels

Comments

@kyptin
Copy link

kyptin commented Mar 31, 2014

With PigPen 0.2.3, I was using join, but instead of specifying an anonymous function inline with the join call, I defnd a function and just used the name of the function. In other words, instead of:

(join [(xs :on first)
       (ys :on first)]
      (fn [x y] ...))

...I was doing:

(defn foo [x y] ...)
(join [(xs :on first)
       (ys :on first)]
      foo)

The second version produced output with the same structure as the first version, except that there were nils in most places. My guess is that the macros aren't quite evaluating things properly, but I don't know this for sure.

I can't share more details, unfortunately, as they are proprietary. Although, if you're having difficulty reproducing this issue, I can try to reproduce it in a way that I can share.

An issue which may be related is that a print statement in the function in version 1 works, but in version 2 it does not print anything.

Thanks very much!
-Jeff T.

@mbossenbroek
Copy link
Contributor

Unfortunately I can't repro that one.

Does the problem happen locally, on the cluster, or both? What's the type of the key you're trying to join on?

This works for me:

(defn foo [x y](prn x y)
{:x x, :y y})

(deftest test-join
(let [xs (pig/return [[1 "a"]
[1 "b"]
[2 "a"]])
ys (pig/return [[1 "a"]
[2 "b"]
[2 "a"]])
command (pig/join [(xs :on first)
(ys :on first)]
foo)](is %28= %28pig/dump command%29
[{:x [2 "a"], :y [2 "b"]}
{:x [2 "a"], :y [2 "a"]}
{:x [1 "a"], :y [1 "a"]}
{:x [1 "b"], :y [1 "a"]}]))))

I can also print from the function:

=> (test-join)
[2 "a"] [2 "b"]
[2 "a"] [2 "a"]
[1 "a"] [1 "a"]
[1 "b"] [1 "a"]
nil

Sometimes when running locally, code will execute on other threads. At least for CCW, this causes it to appear in the console instead of the REPL, which is kind of annoying. If you're using CCW, could you check the console output? If not, what editor are you using?

To repro, what commands are you using before the join? Are you loading data from a file, doing any transformations, etc?

Thanks,
Matt

On Sunday, March 30, 2014 at 5:57 PM, Jeff Terrell wrote:

With PigPen 0.2.3, I was using join, but instead of specifying an anonymous function inline with the join call, I defnd a function and just used the name of the function. In other words, instead of:
(join [(xs :on first) (ys :on first)](fn [x y] ...))

...I was doing:
(defn foo [x y] ...) (join [(xs :on first) (ys :on first)] foo)

The second version produced output with the same structure as the first version, except that there were nils in most places. My guess is that the macros aren't quite evaluating things properly, but I don't know this for sure.
I can't share more details, unfortunately, as they are proprietary. Although, if you're having difficulty reproducing this issue, I can try to reproduce it in a way that I can share.
Relatedly, is there a good reason why my print statements don't work in the join function? If that's easy to fix, that would be helpful for my debugging.
Thanks very much!
-Jeff T.


Reply to this email directly or view it on GitHub (#19).

@kyptin
Copy link
Author

kyptin commented Mar 31, 2014

I'm running locally, in a lein repl session. I'm using vim to edit the code.

I'm trying to join vectors. The key function for each vector is simply first.

I am doing a variety of transformations before the join, but I am not loading from a file.

I'll try to create a reproducible failure case tonight or tomorrow.

@mbossenbroek
Copy link
Contributor

Thanks. What's the data type of the join key?

Are you joining large maps or data structures? Or is it joining numbers, strings, keywords, or some other primitive?

The thread-switching happens when you're locally reading from a file, so that's the only reason I can think of for the printing not working.

The example I listed before prints when I run from a lein repl too.

Let me know what you can come up with for a repro case!

-Matt

On Sunday, March 30, 2014 at 6:50 PM, Jeff Terrell wrote:

I'm running locally, in a lein repl session. I'm using vim to edit the code.
I'm trying to join vectors. The key function for each vector is simply first.
I am doing a variety of transformations before the join, but I am not loading from a file.
I'll try to create a reproducible failure case tonight or tomorrow.


Reply to this email directly or view it on GitHub (#19 (comment)).

@kyptin
Copy link
Author

kyptin commented Mar 31, 2014

I'm joining on strings, so yeah, it's a primitive.

Heh, I guess it's on me to reproduce this, then—you've certainly done your due diligence. Thanks!

@mbossenbroek
Copy link
Contributor

I followed up with Jeff on another thread & we found that the problem was a stale fn in the REPL. Restarting the REPL fixed the issue.

Right now I'm memoizing user functions based on what you pass to the pigpen operator. This has the unfortunate side effect of using stale versions of named functions. In your case this means that if you load foo, load the join, and then modify foo, it'll use the first version.

The reason for this is historical and for performance. I never want to re-eval the same code on the cluster and on the cluster you never change the code, hence the memoization. In the past, defining a function not-inline wasn't supported so this wasn't a problem.

Fix coming soon...

-Matt

On Sunday, March 30, 2014 at 7:08 PM, Jeff Terrell wrote:

I'm joining on strings, so yeah, it's a primitive.
Heh, I guess it's on me to reproduce this, then—you've certainly done your due diligence. Thanks!


Reply to this email directly or view it on GitHub (#19 (comment)).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants