New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Basic tasks (Mapper, Reducer and Combiner) for hadoopcompatibility #833
Conversation
@Override | ||
@SuppressWarnings("unchecked") | ||
public void map(Record record, Collector<Record> out) throws Exception { | ||
output.wrapStratosphereCollector(out); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not particularly great this one... A Stratosphere Collector
is wrapped everytime map
and reduce
are called which is not elegant. As I am aware, the implementation of Collector
interface is not configurable in Stratosphere via the user (as in Hadoop) and one should work to the job driver level (the caller of the reduce
function) to utilise this one. I can do that, but what do you think would be the best approach?
Hey @atsikiridis, great to see your first contribution after 5 hours of GSoC. |
Hey @atsikiridis, please also have a look on my PR #777. I have refactored the the complete hadoop compatibility package in order to support our new Java API and the Hadoop |
* wrappers for Mapper, Reducer and Combiner (as a local Reducer) * interface for Wrappers of OutputCollectors and a default implementation * New full example of Wordcount using mapred Mapper and Reducer * Updated test case
… to use in reduce function.
} | ||
|
||
@Override | ||
public Plan getPlan(String... args) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are using here Stratosphere's old "Record" Java API. We are moving away from this one and will probably deprecate it in the 0.6 release.
Please check out the new Java API:
@atsikiridis Nice work so far! 👍 Thanks! |
I have ported the code of this branch to the new Java API (basically rebased on the branch in #777 ). Here is the link: Due to the fact that there are some limitations with the By the way, if we don't need the initial code for the old Records API maybe this PR should close. It's ported to the new API anyway and I'll submit a new PR very soon. Thanks! |
you should be able to support non identical generic input/output types already.
will this approach not work for you? |
I think this PR is subsumed by apache/flink#37. |
Wrappers for basic tasks (Mapper, Reducer, Combiner), new interface for OutputCollectors and a testcase with a complete Hadoop WordCount. With these in place, along with
HadoopDataSource
andHadoopDataSink
the ground is set to start working seriously on the hadoop abstraction layer (which by the way is my Google Summer of code project and starts officially today :))Notice that in some cases there is code that might be generalised / refactored very soon.