Update PR Meta Issue #22

fschueler · 2016-04-18T10:12:18Z

I copy and paste Matthias' suggestions here. Let's keep track of it and coordinate the work in this issue.

APIs and interfaces:
Unfortunately, you've put already effort into API extensions. I would recommend to remove the modifications of MLOutput and MLContext because MLOutput is not maintained and MLContext is about to get a major refactoring. Let's use the command line interface for now and extend the APIs once the backend is ready.
Isolation of Flink dependencies: We need to be very careful about introducing Flink API dependencies into shared classes in order to ensure that Spark and Flink backends can be used without dependency on the other. For example, there are (1) references to Flink in runtime/controlprogram/Program, and (2) references to the SparkExecutionContext in Flink execution modes. Could you please systematically go over shared classes and double check for isolation?
Bufferpool integration: I agree that the duplication of RDDObject and DataSetObject creates unnecessary redundancy in a core place like our bufferpool (CacheableData and ExecutionContext). Let's abstract this to something like DistDataObject with two implementations RDDObject and DataSetObject. Just let me know, I'd be happy to prepare this for you guys.
Broadcast memory budget: Currently, the broadcast memory budget is obtained from the SparkExecutionContext. It would be good to automatically obtain this size (e.g., size for user objects) to remove the dependency and because it is really important for operator selection.
Checkpointing: I see a CheckpointFLInstruction which always does a pass-through. Let's remove this instruction (until Flink supports distributed caching) and simply do not apply the rewrites for injecting these checkpoints (RewriteInjectSparkLoopCheckpointing, RewriteInjectSparkPReadCheckpointing).
Instructions w/ broadcasts: There seem to be some inconsistencies wrt when and how we broadcast data (e.g., join vs join w/ hint vs withBroadcastSet). I understand that Flink offers optimization capabilities in that regard. However, from a SystemML perspective I would recommend to stick to physical operator semantics (e.g., MAPMM with broadcast, CPMM/RMM with join). If experiments show that Flink actually does better than our operator selection, we could simply compile earlier (for smaller data sizes) CPMM/RMM. Btw, is there a way to pin the broadcast in distributed memory and reuse it across multiple invocations?
Advanced operations: You already added compiler support for advanced operations like weighted squared loss and cumulative aggregates. Let's put these aside for now and get a narrow prototype running well first before we broaden the scope.
Tests: It would be good if we could add a few tests for flink mapmm and tsmm to ensure they are producing correct results as this is the precondition for any experiments. Please have a look into FullMatrixMultiplicationTransposeSelfTest and FullDistributedMatrixMultiplicationTest.
Exception handling: There are a couple of instances where exceptions are simply put to stderr. Could you please double check that all caught exceptions are rethrown in order to simplify debugging. Also, please replace the assertions for dimensions mismatch checks with exceptions as these checks need to happen even if assertions are disabled (by default).
Formatting: Well, we don't have a style guide defined yet but the project-wide convention is to use tabs instead of spaces (e.g., issues in DMLScript). Furthermore, you've also done some code cleanups of formatting existing code. Could we separate these changes out of this PR to simplify the review? Feel free to submit them as an individual PR though.

FelixNeutatz · 2016-04-18T11:33:09Z

I am working on AggBinary problem and try to make the linear test run on "FLINK" mode

carabolic · 2016-04-18T11:34:15Z

I'll have a look at the Instructions w/ broadcasts issue. I will replace the general implementation with the one given by the name. That is replace Flink's join operator with BroadcastSets.

fschueler · 2016-04-18T12:29:58Z

I am having a look at the unnecessary API changes and Flink dependencies. The removal of the Flink dependency from Program.java touches the question of where to put the env.execute() parts ( #15 )

fschueler · 2016-04-18T13:26:50Z

I think for the CheckpointFLInstruction, we can just remove it since it was not generated during optimization anyways...

fschueler · 2016-04-26T15:40:06Z

I went over the Flink/Spark interdependencies but we should double check before we check this off the list.

fschueler · 2016-05-04T09:58:24Z

Is this up to date? What is left to do for the next update of the PR?

j143-zz · 2017-07-21T08:59:17Z

Hi @fschueler !
May be we can have a fresh look on this flink integration status and relist what are the current bottlenecks or things to be taken care off. I think this integration is important for engineering applications.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update PR Meta Issue #22

Update PR Meta Issue #22

fschueler commented Apr 18, 2016 •

edited

Loading

FelixNeutatz commented Apr 18, 2016

carabolic commented Apr 18, 2016

fschueler commented Apr 18, 2016

fschueler commented Apr 18, 2016

fschueler commented Apr 26, 2016

fschueler commented May 4, 2016

j143-zz commented Jul 21, 2017 •

edited

Loading

Update PR Meta Issue #22

Update PR Meta Issue #22

Comments

fschueler commented Apr 18, 2016 • edited Loading

FelixNeutatz commented Apr 18, 2016

carabolic commented Apr 18, 2016

fschueler commented Apr 18, 2016

fschueler commented Apr 18, 2016

fschueler commented Apr 26, 2016

fschueler commented May 4, 2016

j143-zz commented Jul 21, 2017 • edited Loading

fschueler commented Apr 18, 2016 •

edited

Loading

j143-zz commented Jul 21, 2017 •

edited

Loading