Issue #59 -- Custom serializers for defop params #63

Closed
wants to merge 3 commits into
from

Conversation

Projects
None yet
4 participants

Addresses #59

Probably not quite ready for prime-time. It seems to work, but needs some tweaking. See comments in the commit.

@mlimotte mlimotte and 1 other commented on an outdated diff Mar 26, 2012

src/jvm/cascalog/KryoService.java
/** User: sritchie Date: 12/16/11 Time: 8:34 PM */
public class KryoService {
public static final Logger LOG = Logger.getLogger(KryoService.class);
static ObjectBuffer kryoBuf;
+ static Var require = RT.var("clojure.core", "require");
@mlimotte

mlimotte Mar 26, 2012

I'm using this code to get the current project JobConf (line 47 and line 63). Is there a better way?

@sritchie

sritchie Mar 26, 2012

Collaborator

Hey Marc,

Thanks for the pull req! The problem with doing it this way is that we're not going to pick up cluster-wide configuration settings, or settings that are supplied dynamically using with-job-conf.

The right way to do this is to modify KryoService to accept a JobConf, and use this to build a SerializationFactory in the same way that the MemorySourceTap does it:

https://github.com/nathanmarz/cascalog/blob/master/src/jvm/cascalog/TupleMemoryInputFormat.java#L133

This is almost exactly what we need to do. I think you can downcast the FlowProcess that comes in through a BaseOperation's prepare method to a JobConf and pass this through. What do you think?

@mlimotte

mlimotte Mar 26, 2012

I considered this approach the first time, but the issue I saw is that while ClojureCascadingBase#prepare has a FlowProcess instance, which I can pass to KryoService#deserialize. The ClojureCascadingBase#initialize method has no FlowProcess for passing to KryoService#serialize.

Also, I didn't look too closely, because of the issue above, but can we be confident that the FlowProcess being passed in is actually an instance of cascading.flow.hadoop.HadoopFlowProcess? That way I can cast to that, and use .getJobConf().

@sritchie

sritchie Mar 27, 2012

Collaborator

Yup, we can definitely be confident -- Cascalog's going to need a bit of work before it supports local mode.

JobConf jc = ((HadoopFlowProcess) flowProcess).getJobConf();
@mlimotte

mlimotte Mar 27, 2012

Good to know. That should work for the deserialize side. What are your
thoughts on the serialize side (from ClojureCascadingBase/initialize)?

thanks,
Marc

On Tue, Mar 27, 2012 at 3:21 AM, Sam Ritchie <
reply@reply.github.com

wrote:

/** User: sritchie Date: 12/16/11 Time: 8:34 PM */
public class KryoService {
public static final Logger LOG =
Logger.getLogger(KryoService.class);
static ObjectBuffer kryoBuf;

  • static Var require = RT.var("clojure.core", "require");

Yup, we can definitely be confident -- Cascalog's going to need a bit of
work before it supports local mode.

JobConf jc = ((HadoopFlowProcess) flowProcess).getJobConf();

Reply to this email directly or view it on GitHub:
https://github.com/nathanmarz/cascalog/pull/63/files#r607412

Collaborator

Quantisan commented Mar 24, 2013

what does this do please?

Collaborator

sritchie commented Aug 7, 2013

Closing as a duplicate of #65

sritchie closed this Aug 7, 2013

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment