Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Streamline - Flink Runner #535

Open
geoHeil opened this issue Apr 10, 2017 · 7 comments
Open

Streamline - Flink Runner #535

geoHeil opened this issue Apr 10, 2017 · 7 comments

Comments

@geoHeil
Copy link

geoHeil commented Apr 10, 2017

Following up with #534 I will open an issue here to discuss flink support.

What API would flink need to implement? How much work needs to be done?

@geoHeil
Copy link
Author

geoHeil commented Apr 10, 2017

Probably support for apache beam would be most interesting.

@harshach harshach changed the title Discuss Flink support Streamline - Flink Runner Apr 10, 2017
@harshach
Copy link
Contributor

@arunmahadevan can you outline the work and it would be good to create sub-issues if possible.

@arunmahadevan
Copy link
Contributor

arunmahadevan commented Apr 11, 2017

At a high level the Streamline topology is represented as a TopologyDAG that captures the sources, sinks and the operations users intend to perform.

During deployment, the topology DAG gets translated into the respective runtime. The runtime providers needs to implement the TopologyActions interface. The framework invokes TopologyActions.deploy() passing the DAG as an argument. This is where the runtime specific translations take place. In case of Storm, StormTopologyActionsImpl translates the DAG into storm specific flux representation and then submits it into storm cluster. Similar implementation will be needed for Flink (E.g. FlinkTopologyActionsImpl).

The topology DAG has a traverse method that accepts a TopologyDagVisitor and will call back the appropriate visit methods as the different components in the topology DAG are traversed. In case of storm this is StormTopologyFluxGenerator. Similar implementation will have to be provided for Flink (E.g. FlinkDataStreamGenerator).

At a high level the topology DAG contains implementations of sources (StreamlineSource), sinks (StreamlineSink) and processors(StreamlineProcessor). These capture the operation user intends to perfom and the necessary design time configuration (what the user configured via the UI). The runtime needs to provide respective runtime translations for the different design time components.

I list down the currently supported sources, sinks and processors which needs to be translated in separate subtasks. To start with we could support only a subset of sources and sinks. E.g. only Kafka source and Kafka sink.

The different processors will have to be supported. Right now we have Rules, Join, PMML and Custom processors in the backend. The different processors have runtime components that can be reused while building the DataStream pipeline. E.g. The rules are evaluated via RuleProcessorRuntime which can be re-used (say within a datastream.flatMap). Maybe we can start with RuleProcessor, I will create separate tasks processors as well.

This is at a high level, we may need to refactor or come up with new interfaces within Streamline itself to support flink, which we can figure out as we proceed.

@arunmahadevan
Copy link
Contributor

@harshach, created an initial set of sub-tasks to get started. We could start with a basic kafka-rule-kafka topology for flink and add more stuff once this works.

@maver1ck
Copy link

Don't we want to support Apache Beam instead of Flink ?

@suez1224
Copy link

suez1224 commented Apr 1, 2018

@harshach @arunmahadevan what are the instructions to register Flink as another stream engine in the service pool, so the web UI can recognize it? what are the configurations need to be added or change? Thanks.

@haicc-wang
Copy link

@harshach @arunmahadevan what are the instructions to register Flink as another stream engine in the service pool, so the web UI can recognize it? what are the configurations need to be added or change? Thanks.

any updates?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants