Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Weird error when used with prismatic plumbing #138

Closed
aconbere opened this issue Mar 5, 2015 · 13 comments
Closed

Weird error when used with prismatic plumbing #138

aconbere opened this issue Mar 5, 2015 · 13 comments

Comments

@aconbere
Copy link

aconbere commented Mar 5, 2015

When I try to use prismatic plumbing, I run into a weird error java.lang.ClassNotFoundException: pig_graph_test.graph-record3363. graph-recordxxxx is a macro generated record exposed by the library, but my attempts to get pigpen to play nice with this have all failed.

I suspect something about the way that code/trap-* operates? I've included a little test repository that you can just hit with lein run to reproduce, and includes the full stack trace.

https://github.com/aconbere/pigpen-graph-test

@mbossenbroek
Copy link
Contributor

Thanks for the detailed repro! From the error it looks like they have local vars with periods in them, which doesn't play well with the edn/read-string. I should be able to easily exclude those from the closure, but I'll make sure that they aren't required by the closure first.

@aconbere
Copy link
Author

aconbere commented Mar 5, 2015

@mbossenbroek I'd love to know more about how you derived that (after staring at this code for an hour or two I'm thoroughly confused and suspect I might learn something).

@mbossenbroek
Copy link
Contributor

Certainly! pigpen.code/trap re-writes your function like this:

=> (let [x 42]
     (pigpen.code/trap
       (fn [y] (+ x y))))
(pigpen.pig/with-ns pigpen-demo.core (clojure.core/let [x (quote 42)] (fn [y] (+ x y))))

What this returns is an expression, which when evaluated, will evaluate your user function within your namespace, with all of the lexical scope that was present at script generation time. Anything that's bound at that time ends up in that let that encloses your function. It's a way of freezing everything we know now and reviving it later on a hadoop machine.

I've seen in the past that macro expansion will often leave a bunch of junk in there that's not actually required by the user code. for is a good example of that.

The java.lang.ClassNotFoundException is a classic example of Clojure interpreting any symbol with a period as a java class and trying to load it:

=> (eval '(prn x.y))
CompilerException java.lang.ClassNotFoundException: x.y, compiling:(/private/var/folders/54/cllx6y1d0nz92rmz915fgc4mmjkfgm/T/form-init3678987450896995460.clj:1:8) 

In pigpen, we take the result of pigpen.code/trap above, pr-str it, put it in the script, read it, eval it, and run it. If you're getting that error, that symbol is likely getting into the closure somehow and failing when we try to eval it.

At least that's my guess at this point :)

@aconbere
Copy link
Author

aconbere commented Mar 6, 2015

oof, well the only thing I think I can offer at this point is that this dotted name that it can't find is probably needed. It's generated here https://github.com/Prismatic/plumbing/blob/master/src/plumbing/graph/positional.clj#L12-L30 and is building a record that is used in place of a map further on in the library for performance.

My worry initially is that I've seen very strange behavior in clojure with regards to records and file load ordering (which was in that case solved by AOT compiling certain namespaces). And injecting a record into the namespace at run time seems like a very easy thing to have break with the pigpen approach.

@aconbere
Copy link
Author

aconbere commented Mar 6, 2015

@mbossenbroek one other question, can you think of any work arounds for this in the short term? I have some uses for pigpen that will be blocked on a fix. A work around would free me up to continue. Also, let me know if there's anything else I can do to help here.

@mbossenbroek
Copy link
Contributor

None off the top of my head. Sorry I didn't get a chance to look at this yesterday - I'll have something for you today though.

@aconbere
Copy link
Author

aconbere commented Mar 6, 2015

HA! I have no expectation of you dropping everything and fixing my bugs ;-) I've been already very impressed with your responsiveness and mostly frustrated that I seem unable to fix this myself!

I'm trying tracing some of those trap calls to see if I can figure out what exactly is getting caught in there.

@mbossenbroek
Copy link
Contributor

I found the problem & it wasn't what I thought it was. It's actually the serialization library that we use, nippy, that doesn't want to deserialize the record. What makes this even weirder is that I can only reproduce the problem if it's using the nippy jar that's AOT'ed into the pigpen jar.

I'll follow up with him & see what we can come up with.

@aconbere
Copy link
Author

aconbere commented Mar 6, 2015

Ooooooooh so records are serializable and nippy is happily serializing it, but when it goes to deserialize it can't find the reference and it blows up?

Maybe because this is gensym'ed so probably doesn't result in a class file?

@mbossenbroek
Copy link
Contributor

It seems to happen for normal records too - it looks like the immediate problem is that pigpen uses AOT. When I turn AOT off, it works locally. If all else fails, I can disable AOT for pigpen; I was just using it to generate 32 nearly identical copies of a java class to work around a pig limitation.

The gensym might will be a problem down the road though as one machine will serialize the record and another will deserialize it. If the generated records will have different ids on different machines, it won't be able to deserialize the transported data. If those ids are locked in at jar compilation time (possibly via AOT), then this could work, but then we're back to the AOT problem.

Do you know if there's a way to disable record generation in prismatic's graph? Or at least have it generate stable ids?

@aconbere
Copy link
Author

aconbere commented Mar 6, 2015

It's possible to disable the record generation, but the cost to performance hurts (at least for us), where we're using this to process a very large stream of data.

That being said... we are AOT'ing our code before putting on the cluster so it's likely we'll be seeing this problem if we went that route anyway.

Generating stable id's is interesting, but goes beyond my understanding of clojure. The way this is used though...it would look like you could just do a hash of the map that is used to generate the record and turn that into a symbol instead of using gensym

@mbossenbroek
Copy link
Contributor

IIRC, you said that disabling AOT resolved this, correct? Could I mark this as closed?

@aconbere
Copy link
Author

You may certainly mark it as closed, it is happily running now.

On Mon, May 11, 2015 at 10:39 AM, Matt Bossenbroek <notifications@github.com

wrote:

IIRC, you said that disabling AOT resolved this, correct? Could I mark
this as closed?


Reply to this email directly or view it on GitHub
#138 (comment).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants