Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Elyra now part of ODH, but Airflow optional support needs to be there #45

Open
shalberd opened this issue May 26, 2023 · 2 comments
Open
Labels
kind/enhancement New feature or request

Comments

@shalberd
Copy link

shalberd commented May 26, 2023

Airflow has been made an optional tier-2 part of ODH in summer of 2022.

https://github.com/opendatahub-io-contrib/airflow-on-openshift

Recently, Elyra became a part of ODH via overlay. Even more recently, Elyra itself has been taken over by RedHat (from IBM).

opendatahub-io/notebooks#58 (comment)

Since ODH has a top-tier focus on Kubeflow Pipelines, ODH wants to focus on Kubeflow Pipelines only in Elyra.

Elyra has for a long time had Airflow support in all sorts of ways

Airflow-specific operators

https://medium.com/ibm-data-ai/getting-started-with-apache-airflow-operators-in-elyra-aae882f80c4a

Generic pipelines

https://medium.com/ibm-data-ai/automate-your-machine-learning-workflow-tasks-using-elyra-and-apache-airflow-adf297adc455

, though Airflow 2.x support is still lacking, but will come, some tweaks needed for e.g. generic pipeline to DAG rendering, libraries have changed :-)

So it would be bad if the pipeline editor and runtime support for Airflow were removed. At least allow for optionally enabling it via Configmap or ENV variable, based on this

Background:

We plan to use both: data science pipelines / Kubeflow Pipelines for pure ML development and Airflow for more of an ETL / data engineering set of tasks.

@LaVLaS LaVLaS added the kind/enhancement New feature or request label May 30, 2023
@LaVLaS
Copy link
Member

LaVLaS commented Jun 14, 2023

So it would be bad if the pipeline editor and runtime support for Airflow were removed.

This statement should be clarified to show that runtime support for Airflow was not removed from the Elyra package in the ODH Elyra notebook images that are built and supported as part of ODH Core. We only restrict the Elyra PipelinesProcessor to kfp (Data Science Pipelines) since that is what ODH supports.

There is no official support for Airflow in ODH as the integration is currently an ODH Contrib component (https://github.com/opendatahub-io-contrib/airflow-on-openshift) with no guarantee that the deployment works.

At least allow for optionally enabling it via Configmap or ENV variable, based on this

Since ODH does not officially support Airflow, you can still build and import a custom notebook image into ODH Dashboard that has Airflow pipelines processor enabled. Based on the offline comment by @harshad16, you can build an Elyra notebook image with the Airflow pipelines processor by modifying jupyter_elyra_config.py and building the notebook

If the Elyra Airflow notebook image works with the deployment of airflow-on-openshift in odh-contrib then you could submit a PR for review to
odh-contrib/workbench-images

@shalberd
Copy link
Author

@LaVLaS @harshad16 Airflow itself is no problem, I for now started talking to the Red Hat folks on what makes it run as a whole successfully (never use the mucked up postgres image that comes with it, use a decent way of running postgres like crunchy postgres via OLM) and some more.

opendatahub-io-contrib/airflow-on-openshift#7 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants