Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ExecutionEnvironment.execute() should not be called when no sinks are defined. #15

Open
fschueler opened this issue Apr 7, 2016 · 5 comments

Comments

@fschueler
Copy link

Currently we call env.execute() after the execution of all program blocks if the ExecutionContext is of type FlinkExecutionContext.
In hybrid_flink mode this leads to the problem that execute() can be called even though no sinks are defined.

We should somehow check if an adequate plan exists before calling execute().

@fschueler
Copy link
Author

This can not only happen in hybrid_flink mode but whenever collect() is called somewhere in between. Anyone has an idea how we can keep track of the defined sources/sinks?

@aalexandrov
Copy link

You can use createProgramPlan and inspect the result for sinks.

@aalexandrov
Copy link

Alternatively, use the ExecutionEnvironment wrapper class using the delegate pattern and override the void registerDataSink(DataSink<?> sink) { method in order to keep track of this kind of meta-information.

@fschueler
Copy link
Author

Unfortunately, createProgramPlan already fails if no sinks are defined and the variable sinks of the ExecutionEnvironment is private. I guess we will have to go with the alternative.

@fschueler
Copy link
Author

Mabe the "final" execute (plus check) could happen in the finally block of the DMLScript in line 685, where the SparkContext is stopped.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants