New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add airflow package catalog connector #2437
Conversation
Thanks for making a pull request to Elyra! To try out this branch on binder, follow this link: |
This PR replaces #2409, which had issues. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some NITs below, but otherwise this is looking good and working for me!
That being said, I'm also wondering what would be the best way to handle adding the relevant strings to the available_airflow_operators
configurable trait for both the Airflow packages and the provider packages. (As a reminder, we use that list to render the appropriate import
statement in the DAG template during processing).
I'll start looking into some options. @kevin-bates might have some ideas here as well
...age-catalog-connector/airflow_package_catalog_connector/airflow_package_catalog_connector.py
Outdated
Show resolved
Hide resolved
...age-catalog-connector/airflow_package_catalog_connector/airflow_package_catalog_connector.py
Outdated
Show resolved
Hide resolved
...age-catalog-connector/airflow_package_catalog_connector/airflow_package_catalog_connector.py
Outdated
Show resolved
Hide resolved
...age-catalog-connector/airflow_package_catalog_connector/airflow_package_catalog_connector.py
Outdated
Show resolved
Hide resolved
catalog-connectors/airflow/airflow-package-catalog-connector/README.md
Outdated
Show resolved
Hide resolved
...age-catalog-connector/airflow_package_catalog_connector/airflow_package_catalog_connector.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Working well for me! I think the only thing left is to remove the catalog-connectors
directory and it's files
elyra/pipeline/airflow/package_catalog_connector/airflow-package-catalog.json
Outdated
Show resolved
Hide resolved
{ | ||
"$schema": "https://raw.githubusercontent.com/elyra-ai/elyra/master/elyra/metadata/schemas/meta-schema.json", | ||
"$id": "https://raw.githubusercontent.com/elyra-ai/elyra/master/catalog-connectors/airflow/airflow-package-catalog-connector/airflow-package-catalog.json", | ||
"title": "Apache Airflow package operator catalog", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same as above comment! Though I guess this won't really apply until we have the related documentation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a README in the directory. Can you please take a look?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good, just had the comment regarding the utility of 'package' in all the naming in an attempt to perhaps shorten things. If the comment is accepted, it does affect naming in several other locations of this PR.
This PR adds a catalog connector for Apache Airflow packages. Connector instances (of which there should typically be only one) require the user to configure a download URL for the Apache Airflow package that is used in the cluster.
Requires #2418
What changes were proposed in this pull request?
catalog-connectors
directory to the repository root, containing in theairflow
subdirectory the newly introduced connectorMakefile
to include newlint-connectors
target, which was also added as a dependency to thelint
tasklint-connectors
task to Github'sbuild.yaml
extras
install dependency insetup.py
(The connector is also installed if one runspip install elyra[all]
)Notes:
The connector is not included in the Elyra release process and needs to be published independently, as necessary.
The connector declares file name (e.g.
operators/bash_operator.py
) as hash keys, which are used to internally identify operators in the palette. The archive name, e.g.apache_airflow-1.10.15-py2.py3-none-any.whl
, is currently not part of the key to avoid potential versioning issues. For example, assume user A adds operators from archiveapache_airflow-1.10.15-py2.py3-none-any.whl
to the Elyra deployment and creates a pipeline using some of the operators. User B adds operators from an older archive, such asapache_airflow-1.10.12-py2.py3-none-any.whl
. If we were to include the archive name as is as a key, user B would not be able to run pipelines that user A created (and vice versa) because (pseudo code)We need to decide whether the implemented behavior is sufficient (archive version numbers are completely ignored, even though this might lead to issues if the loaded operator signatures in Elyra's pipeline editor are significantly different from those of the operators that are installed in the Airflow cluster) or if semver support is required. The latter could be accomplished by using only parts of the archive name as key, e.g. by omitting/masking minor and patch version numbers. It does require though that archive names follow a constant naming pattern to allow for the extraction of version strings.
How was this pull request tested?
Notes:
Developer's Certificate of Origin 1.1