Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

hadoop-compat addons project with HadoopDataSource #437

Merged
merged 4 commits into from
Jan 28, 2014

Conversation

rmetzger
Copy link
Member

I picked up the code of #424 and enhanced it with the following:

  • make hadoop-compat compatible to hadoop yarn and remove code in package org.apache.hadoop.
  • Extended usercode object wrapper (@aljoscha, please validate my changes), fixed InputFormat serialization
  • introduce pluggable type converter
  • Add a generic wrapper for Hadoop's Writable and WritableComparable.

The generic wrapper allows to do the following: (The example is from Mongodb's Hadoop InputFormat)

public void map(Record record, Collector<Record> out) throws Exception {
    WritableWrapper wrap = record.getField(0, WritableWrapper.class);
    Writable wr = wrap.value();
    Writable valWr = record.getField(1, WritableWrapper.class).value();
    BSONWritable bson = (BSONWritable) wr;
    BSONWritable value = (BSONWritable) valWr;
    System.err.println("bson value has "+value.toString());
}

Lets discuss if we want the project being called "hadoop-compat" or if you prefer "hadoop-compatability" ?

Open issues:

  • If the Hadoop IF is a FileInputFormat, we should make the split assignment file locality aware
  • Add junit tests for various formats such as ORC, Parquet and Avro.
  • Write website documentation for this code.
  • Add HadoopDataSink
  • See if it is possible to map hadoop counters to our accumulators

@uce
Copy link
Contributor

uce commented Jan 25, 2014

Nice. I vote for hadoop-compatability as compat is imho not a standard acronym.

What do you mean with "Add HadoopDataSource"? Typo for sink?

@rmetzger
Copy link
Member Author

I saw compat being used by some Linux developers ;)
And the source was a typo, I meant sink.

I added an additional converter that allows to use any Hadoop Writable!

@fhueske
Copy link
Contributor

fhueske commented Jan 27, 2014

Not sure, if I would call it either.
This feature gives "only" support for Hadoop InputFormats, but there is lots of other stuff in Hadoop which we do not support right now such as OutputFormats, Map, and Reduce functions.
If we plan to extend the support for these interfaces as well, I would go with hadoop-compatibility.
Otherwise, hadoop-input would be a good name IMHO.

@rmetzger
Copy link
Member Author

The plan is to put everything related to hadoop compatability into this package.
I think the next step would be a OutputFormat.

I will change the name!

faisalmoeen and others added 4 commits January 27, 2014 11:51
This is a combination of commits:
* make hadoop-compat compatible to hadoop yarn and remove org.apache.hadoop code
* Extended usercode wrapper, fixed IF serialization
* introduce pluggable type converter
* make the converter interface more generic; (hopefully) improved fetching logic
@StephanEwen
Copy link
Contributor

We are thinking about adding proper interface compatibility with Hadoop
later. That could all be in one project, called hadoop-compatibility.

On Mon, Jan 27, 2014 at 2:34 AM, Robert Metzger notifications@github.comwrote:

The plan is to put everything related to hadoop compatability into this
package.
I think the next step would be a OutputFormat.

I will change the name!


Reply to this email directly or view it on GitHubhttps://github.com//pull/437#issuecomment-33356572
.

@rmetzger rmetzger merged commit d8fb870 into stratosphere:master Jan 28, 2014
@rmetzger rmetzger mentioned this pull request Jan 28, 2014
Method customSerializer = null;
Method customDeserializer = null;
try {
customSerializer = current.getClass().getDeclaredMethod("writeObject", java.io.ObjectOutputStream.class);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aljoscha: Could you review this change? (I committed it already to master) ... but it would be good if you can confirm my approach.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants