Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Migrate kedro-airflow to static metadata #172

Merged
merged 5 commits into from
Apr 18, 2023

Conversation

astrojuanlu
Copy link
Member

@astrojuanlu astrojuanlu commented Apr 17, 2023

Description

See kedro-org/kedro#2334.

Also, I saw kedro-datasets was missing explicit build-system, so I added it (I think pip has a fallback for cases like this, but better not to rely on it).

Development notes

I merged requirements.txt, setup.cfg and setup.py into a single pyproject.toml, yay!

Differences in RECORD (what goes into the wheel):

--- dist.old/kedro_airflow-0.5.1.dist-info/RECORD       2023-04-17 13:49:26
+++ dist.new/kedro_airflow-0.5.1.dist-info/RECORD       2023-04-17 13:54:00
@@ -1,7 +1,7 @@
 kedro_airflow/__init__.py,sha256=rPTM7Yfk6BHZqBvDv8iFF6DXQ0kNAw24pIHIFdzOZ4w,78
 kedro_airflow/airflow_dag_template.j2,sha256=1hF1L8DF1jUC1QnR6WIzIAkQBHr70FUebVeXQX6pW6s,2565
 kedro_airflow/plugin.py,sha256=Sh7w91mjhstzv5L5g-dTKqeBGTDx8_gMk8pGY7MRi-8,3018
-kedro_airflow-0.5.1.dist-info/METADATA,sha256=LFllLrI94LBNdMXtbAYq1Rdh9vLY40Iy3obsjdEfZ3U,3695
+kedro_airflow-0.5.1.dist-info/METADATA,sha256=2UDgUC5yMq6Qqk1VbGGzXN3XP2JqHKQEbMHjraUIRHc,3882
 kedro_airflow-0.5.1.dist-info/WHEEL,sha256=pkctZYzUS4AYVn6dJ-7367OJZivF2e8RA9b_ZBjif18,92
 kedro_airflow-0.5.1.dist-info/entry_points.txt,sha256=AAmoJsZRgb4tG0WeptLbvvjjmbEDF_1JcsxQVKypT20,65
 kedro_airflow-0.5.1.dist-info/top_level.txt,sha256=qnSLh-V5c3yVXYcKflrXSLyrWkeGZKEMSgLz3Nq2_2c,14

Differences in METADATA:

--- dist.old/kedro_airflow-0.5.1.dist-info/METADATA     2023-04-17 13:49:26
+++ dist.new/kedro_airflow-0.5.1.dist-info/METADATA     2023-04-17 13:48:06
@@ -2,10 +2,12 @@
 Name: kedro-airflow
 Version: 0.5.1
 Summary: Kedro-Airflow makes it easy to deploy Kedro projects to Airflow
-Home-page: https://github.com/kedro-org/kedro-plugins/tree/main/kedro-airflow
 Author: Kedro
 License: Apache Software License (Apache 2.0)
-Requires-Python: >=3.7, <3.11
+Project-URL: Source, https://github.com/kedro-org/kedro-plugins/tree/main/kedro-datasets
+Project-URL: Documentation, https://github.com/kedro-org/kedro-plugins/blob/main/kedro-airflow/README.md
+Project-URL: Tracker, https://github.com/kedro-org/kedro-plugins/issues
+Requires-Python: <3.11,>=3.7
 Description-Content-Type: text/markdown
 Requires-Dist: kedro (>=0.17.5)
 Requires-Dist: python-slugify (>=4.0)

Checklist

  • Opened this PR as a 'Draft Pull Request' if it is work-in-progress
  • Updated the documentation to reflect the code changes
  • Added a description of this change in the relevant RELEASE.md file
  • Added tests to cover my changes

@astrojuanlu
Copy link
Member Author

Ooops, CI is using requirements.txt. Will have a look.

astrojuanlu and others added 4 commits April 17, 2023 16:29
See kedro-org/kedro#2334.

Signed-off-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com>
Signed-off-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com>
Co-authored-by: Merel Theisen <49397448+merelcht@users.noreply.github.com>

Signed-off-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com>
Signed-off-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com>
@astrojuanlu astrojuanlu force-pushed the dev/kedro-airflow-static-metadata branch from 98c03db to 80237bc Compare April 17, 2023 14:34
@astrojuanlu astrojuanlu changed the title Migrate kedro-airflow to static metadata Migrate kedro-airflow to static metadata Apr 17, 2023
Copy link
Member

@merelcht merelcht left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be good to add this to the release notes for our own records.

Signed-off-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com>
Copy link
Contributor

@antonymilne antonymilne left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is heroic work, thank you for doing it!

I have the same thing to double check about the entrypoints as in #174 (review), but LGTM 👍

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I presume there's no new [tool.setuptools] configuration that could replace this?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess not given that you introduced a MANIFEST.in in #173...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MANIFEST.in controls what's in the source distribution https://packaging.python.org/en/latest/guides/using-manifest-in/, and then setuptools uses that by virtue of include-package-data = True https://setuptools.pypa.io/en/latest/userguide/datafiles.html

However, the packaging.python.org docs are a bit outdated and I'm not sure how applicable they are in a post-PEP 621 world... The official Python docs are not much better https://docs.python.org/3/distutils/sourcedist.html?highlight=manifest

To be honest I'm not 100 % sure whether MANIFEST.in can be ditched completely in favor of setuptools package_data, I'd need to investigate a bit and unfortunately I don't have much time for that at the moment (these simple PRs on the other hand took mere minutes to put together). It's an excellent question but I'd prefer to explore it in a separate PR.

@astrojuanlu astrojuanlu merged commit 322894f into main Apr 18, 2023
@astrojuanlu astrojuanlu deleted the dev/kedro-airflow-static-metadata branch April 18, 2023 11:25
tingtingQB pushed a commit to tingtingQB/kedro-plugins that referenced this pull request May 1, 2023
* Migrate kedro-airflow to static metadata

See kedro-org/kedro#2334.

Signed-off-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com>

* Add explicit PEP 518 build requirements for kedro-datasets

Signed-off-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com>

* Typos

Co-authored-by: Merel Theisen <49397448+merelcht@users.noreply.github.com>

Signed-off-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com>

* Remove dangling reference to requirements.txt

Signed-off-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com>

* Add release notes

Signed-off-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com>

---------

Signed-off-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com>
Signed-off-by: Tingting_Wan <tingting_wan@mckinsey.com>
dannyrfar pushed a commit to dannyrfar/kedro-plugins that referenced this pull request May 3, 2023
* Migrate kedro-airflow to static metadata

See kedro-org/kedro#2334.

Signed-off-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com>

* Add explicit PEP 518 build requirements for kedro-datasets

Signed-off-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com>

* Typos

Co-authored-by: Merel Theisen <49397448+merelcht@users.noreply.github.com>

Signed-off-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com>

* Remove dangling reference to requirements.txt

Signed-off-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com>

* Add release notes

Signed-off-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com>

---------

Signed-off-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com>
Signed-off-by: Danny Farah <danny_farah@mckinsey.com>
McDonnellJoseph pushed a commit to McDonnellJoseph/kedro-plugins that referenced this pull request May 11, 2023
* Migrate kedro-airflow to static metadata

See kedro-org/kedro#2334.

Signed-off-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com>

* Add explicit PEP 518 build requirements for kedro-datasets

Signed-off-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com>

* Typos

Co-authored-by: Merel Theisen <49397448+merelcht@users.noreply.github.com>

Signed-off-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com>

* Remove dangling reference to requirements.txt

Signed-off-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com>

* Add release notes

Signed-off-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com>

---------

Signed-off-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com>
Signed-off-by: jmcdonnell <jmcdonnell@fieldbox.ai>
noklam added a commit that referenced this pull request May 31, 2023
* Fix links on GitHub issue templates (#150)

Signed-off-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com>
Signed-off-by: Tingting_Wan <tingting_wan@mckinsey.com>

* add spark_stream_dataset.py

Signed-off-by: Tingting_Wan <tingting_wan@mckinsey.com>

* Migrate most of `kedro-datasets` metadata to `pyproject.toml` (#161)

* Include missing requirements files in sdist

Fix gh-86.

Signed-off-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com>

* Migrate most project metadata to `pyproject.toml`

See kedro-org/kedro#2334.

Signed-off-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com>

* Move requirements to `pyproject.toml`

Signed-off-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com>

---------

Signed-off-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com>
Signed-off-by: Tingting_Wan <tingting_wan@mckinsey.com>

* restructure the strean dataset to align with the other spark dataset

Signed-off-by: Tingting_Wan <tingting_wan@mckinsey.com>

* adding README.md for specification

Signed-off-by: Tingting_Wan <tingting_wan@mckinsey.com>

* Update kedro-datasets/kedro_datasets/spark/spark_stream_dataset.py

Co-authored-by: Nok Lam Chan <nok.lam.chan@quantumblack.com>
Signed-off-by: Tingting_Wan <tingting_wan@mckinsey.com>

* rename the dataset

Signed-off-by: Tingting_Wan <tingting_wan@mckinsey.com>

* resolve comments

Signed-off-by: Tingting_Wan <tingting_wan@mckinsey.com>

* fix format and pylint

Signed-off-by: Tingting_Wan <tingting_wan@mckinsey.com>

* Update kedro-datasets/kedro_datasets/spark/README.md

Co-authored-by: Deepyaman Datta <deepyaman.datta@utexas.edu>
Signed-off-by: Tingting_Wan <tingting_wan@mckinsey.com>

* add unit tests and SparkStreamingDataset in init.py

Signed-off-by: Tingting_Wan <tingting_wan@mckinsey.com>

* add unit tests

Signed-off-by: Tingting_Wan <tingting_wan@mckinsey.com>

* update test_save

Signed-off-by: Tingting_Wan <tingting_wan@mckinsey.com>

* Upgrade Polars (#171)

* Upgrade Polars

Signed-off-by: Juan Luis Cano Rodríguez <hello@juanlu.space>

* Update Polars to 0.17.x

---------

Signed-off-by: Juan Luis Cano Rodríguez <hello@juanlu.space>
Signed-off-by: Tingting_Wan <tingting_wan@mckinsey.com>

* if release is failed, it return exit code and fail the CI (#158)

Signed-off-by: Tingting_Wan <tingting_wan@mckinsey.com>

* Migrate `kedro-airflow` to static metadata (#172)

* Migrate kedro-airflow to static metadata

See kedro-org/kedro#2334.

Signed-off-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com>

* Add explicit PEP 518 build requirements for kedro-datasets

Signed-off-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com>

* Typos

Co-authored-by: Merel Theisen <49397448+merelcht@users.noreply.github.com>

Signed-off-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com>

* Remove dangling reference to requirements.txt

Signed-off-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com>

* Add release notes

Signed-off-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com>

---------

Signed-off-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com>
Signed-off-by: Tingting_Wan <tingting_wan@mckinsey.com>

* Migrate `kedro-telemetry` to static metadata (#174)

* Migrate kedro-telemetry to static metadata

See kedro-org/kedro#2334.

Signed-off-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com>

* Add release notes

Signed-off-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com>

---------

Signed-off-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com>
Signed-off-by: Tingting_Wan <tingting_wan@mckinsey.com>

* ci: port lint, unit test, and e2e tests to Actions (#155)

* Add unit test + lint test on GA

* trigger GA - will revert

Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com>

* Fix lint

Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com>

* Add end to end tests

* Add cache key

Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com>

* Add cache action

Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com>

* Rename workflow files

Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com>

* Lint + add comment + default bash

Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com>

* Add windows test

Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com>

* Update workflow name + revert changes to READMEs

Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com>

* Add kedro-telemetry/RELEASE.md to trufflehog ignore

Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com>

* Add pytables to test_requirements remove from workflow

Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com>

* Revert "Add pytables to test_requirements remove from workflow"

This reverts commit 8203daa.

* Separate pip freeze step

Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com>

---------

Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com>
Signed-off-by: Tingting_Wan <tingting_wan@mckinsey.com>

* Migrate `kedro-docker` to static metadata (#173)

* Migrate kedro-docker to static metadata

See kedro-org/kedro#2334.

Signed-off-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com>

* Address packaging warning

Signed-off-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com>

* Fix tests

Signed-off-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com>

* Actually install current plugin with dependencies

Signed-off-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com>

* Add release notes

Signed-off-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com>

---------

Signed-off-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com>
Signed-off-by: Tingting_Wan <tingting_wan@mckinsey.com>

* Introdcuing .gitpod.yml to kedro-plugins (#185)

Currently opening gitpod will installed a Python 3.11 which breaks everything because we don't support it set. This PR introduce a simple .gitpod.yml to get it started.

Signed-off-by: Tingting_Wan <tingting_wan@mckinsey.com>

* sync APIDataSet  from kedro's `develop` (#184)

* Update APIDataSet

Signed-off-by: Nok Chan <nok.lam.chan@quantumblack.com>

* Sync ParquetDataSet

Signed-off-by: Nok Chan <nok.lam.chan@quantumblack.com>

* Sync Test

Signed-off-by: Nok Chan <nok.lam.chan@quantumblack.com>

* Linting

Signed-off-by: Nok Chan <nok.lam.chan@quantumblack.com>

* Revert Unnecessary ParquetDataSet Changes

Signed-off-by: Nok Chan <nok.lam.chan@quantumblack.com>

* Sync release notes

Signed-off-by: Nok Chan <nok.lam.chan@quantumblack.com>

---------

Signed-off-by: Nok Chan <nok.lam.chan@quantumblack.com>
Signed-off-by: Tingting_Wan <tingting_wan@mckinsey.com>

* formatting

Signed-off-by: Tingting_Wan <tingting_wan@mckinsey.com>

* formatting

Signed-off-by: Tingting_Wan <tingting_wan@mckinsey.com>

* formatting

Signed-off-by: Tingting_Wan <tingting_wan@mckinsey.com>

* formatting

Signed-off-by: Tingting_Wan <tingting_wan@mckinsey.com>

* add spark_stream_dataset.py

Signed-off-by: Tingting_Wan <tingting_wan@mckinsey.com>

* restructure the strean dataset to align with the other spark dataset

Signed-off-by: Tingting_Wan <tingting_wan@mckinsey.com>

* adding README.md for specification

Signed-off-by: Tingting_Wan <tingting_wan@mckinsey.com>

* Update kedro-datasets/kedro_datasets/spark/spark_stream_dataset.py

Co-authored-by: Nok Lam Chan <nok.lam.chan@quantumblack.com>
Signed-off-by: Tingting_Wan <tingting_wan@mckinsey.com>

* rename the dataset

Signed-off-by: Tingting_Wan <tingting_wan@mckinsey.com>

* resolve comments

Signed-off-by: Tingting_Wan <tingting_wan@mckinsey.com>

* fix format and pylint

Signed-off-by: Tingting_Wan <tingting_wan@mckinsey.com>

* Update kedro-datasets/kedro_datasets/spark/README.md

Co-authored-by: Deepyaman Datta <deepyaman.datta@utexas.edu>
Signed-off-by: Tingting_Wan <tingting_wan@mckinsey.com>

* add unit tests and SparkStreamingDataset in init.py

Signed-off-by: Tingting_Wan <tingting_wan@mckinsey.com>

* add unit tests

Signed-off-by: Tingting_Wan <tingting_wan@mckinsey.com>

* update test_save

Signed-off-by: Tingting_Wan <tingting_wan@mckinsey.com>

* formatting

Signed-off-by: Tingting_Wan <tingting_wan@mckinsey.com>

* formatting

Signed-off-by: Tingting_Wan <tingting_wan@mckinsey.com>

* formatting

Signed-off-by: Tingting_Wan <tingting_wan@mckinsey.com>

* formatting

Signed-off-by: Tingting_Wan <tingting_wan@mckinsey.com>

* lint

Signed-off-by: Tingting_Wan <tingting_wan@mckinsey.com>

* lint

Signed-off-by: Tingting_Wan <tingting_wan@mckinsey.com>

* lint

Signed-off-by: Tingting_Wan <tingting_wan@mckinsey.com>

* update test cases

Signed-off-by: Tingting_Wan <tingting_wan@mckinsey.com>

* add negative test

Signed-off-by: Tingting_Wan <tingting_wan@mckinsey.com>

* remove code snippets fpr testing

Signed-off-by: Tingting_Wan <tingting_wan@mckinsey.com>

* lint

Signed-off-by: Tingting_Wan <tingting_wan@mckinsey.com>

* update tests

Signed-off-by: Tingting_Wan <tingting_wan@mckinsey.com>

* update test and remove redundacy

Signed-off-by: Tingting_Wan <tingting_wan@mckinsey.com>

* linting

Signed-off-by: Tingting_Wan <tingting_wan@mckinsey.com>

* refactor file format

Signed-off-by: Tom Kurian <tom_kurian@mckinsey.com>

* fix read me file

Signed-off-by: Tom Kurian <tom_kurian@mckinsey.com>

* docs: Add community contributions (#199)

* Add community contributions

Signed-off-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com>

* Use newer link to docs

Signed-off-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com>

---------

Signed-off-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com>

* adding test for raise error

Signed-off-by: Tingting_Wan <tingting_wan@mckinsey.com>

* update test and remove redundacy

Signed-off-by: Tingting_Wan <tingting_wan@mckinsey.com>
Signed-off-by: Tom Kurian <tom_kurian@mckinsey.com>

* linting

Signed-off-by: Tingting_Wan <tingting_wan@mckinsey.com>
Signed-off-by: Tom Kurian <tom_kurian@mckinsey.com>

* refactor file format

Signed-off-by: Tom Kurian <tom_kurian@mckinsey.com>

* fix read me file

Signed-off-by: Tom Kurian <tom_kurian@mckinsey.com>

* adding test for raise error

Signed-off-by: Tingting_Wan <tingting_wan@mckinsey.com>
Signed-off-by: Tom Kurian <tom_kurian@mckinsey.com>

* fix readme file

Signed-off-by: Tom Kurian <tom_kurian@mckinsey.com>

* fix readme

Signed-off-by: Tom Kurian <tom_kurian@mckinsey.com>

* fix conflicts

Signed-off-by: Tom Kurian <tom_kurian@mckinsey.com>

* fix ci erors

Signed-off-by: Tom Kurian <tom_kurian@mckinsey.com>

* fix lint issue

Signed-off-by: Tom Kurian <tom_kurian@mckinsey.com>

* update class documentation

Signed-off-by: Tom Kurian <tom_kurian@mckinsey.com>

* add additional test cases

Signed-off-by: Tom Kurian <tom_kurian@mckinsey.com>

* add s3 read test cases

Signed-off-by: Tom Kurian <tom_kurian@mckinsey.com>

* add s3 read test cases

Signed-off-by: Tom Kurian <tom_kurian@mckinsey.com>

* add s3 read test case

Signed-off-by: Tom Kurian <tom_kurian@mckinsey.com>

* test s3 read

Signed-off-by: Tom Kurian <tom_kurian@mckinsey.com>

* remove redundant test cases

Signed-off-by: Tom Kurian <tom_kurian@mckinsey.com>

* fix streaming dataset configurations

Signed-off-by: Tom Kurian <tom_kurian@mckinsey.com>

* update streaming datasets doc

Signed-off-by: Tingting_Wan <tingting_wan@mckinsey.com>

* resolve comments re documentation

Signed-off-by: Tingting_Wan <tingting_wan@mckinsey.com>

* bugfix lint

Signed-off-by: Tingting_Wan <tingting_wan@mckinsey.com>

* update link

Signed-off-by: Tingting_Wan <tingting_wan@mckinsey.com>

* revert the changes on CI

Signed-off-by: Nok Chan <nok.lam.chan@quantumblack.com>

* test(docker): remove outdated logging-related step (#207)

* fixkedro- docker e2e test

Signed-off-by: Nok Chan <nok.lam.chan@quantumblack.com>

* fix: add timeout to request to satisfy bandit lint

---------

Signed-off-by: Nok Chan <nok.lam.chan@quantumblack.com>
Co-authored-by: Deepyaman Datta <deepyaman.datta@utexas.edu>
Signed-off-by: Tom Kurian <tom_kurian@mckinsey.com>

* ci: ensure plugin requirements get installed in CI (#208)

* ci: install the plugin alongside test requirements

* ci: install the plugin alongside test requirements

* Update kedro-airflow.yml

* Update kedro-datasets.yml

* Update kedro-docker.yml

* Update kedro-telemetry.yml

* Update kedro-airflow.yml

* Update kedro-datasets.yml

* Update kedro-airflow.yml

* Update kedro-docker.yml

* Update kedro-telemetry.yml

* ci(telemetry): update isort config to correct sort

* Don't use profile ¯\_(ツ)_/¯

Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu>

* chore(datasets): remove empty `tool.black` section

* chore(docker): remove empty `tool.black` section

---------

Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu>
Signed-off-by: Tom Kurian <tom_kurian@mckinsey.com>

* ci: Migrate the release workflow from CircleCI to GitHub Actions (#203)

* Create check-release.yml

* change from test pypi to pypi

* split into jobs and move version logic into script

* update github actions output

* lint

* changes based on review

* changes based on review

* fix script to not append continuously

* change pypi api token logic

Signed-off-by: Tom Kurian <tom_kurian@mckinsey.com>

* build: Relax Kedro bound for `kedro-datasets` (#140)

* Less strict pin on Kedro for datasets

Signed-off-by: Merel Theisen <merel.theisen@quantumblack.com>
Signed-off-by: Tom Kurian <tom_kurian@mckinsey.com>

* ci: don't run checks on both `push`/`pull_request` (#192)

* ci: don't run checks on both `push`/`pull_request`

* ci: don't run checks on both `push`/`pull_request`

* ci: don't run checks on both `push`/`pull_request`

* ci: don't run checks on both `push`/`pull_request`

Signed-off-by: Tom Kurian <tom_kurian@mckinsey.com>

* chore: delete extra space ending check-release.yml (#210)

Signed-off-by: Tom Kurian <tom_kurian@mckinsey.com>

* ci: Create merge-gatekeeper.yml to make sure PR only merged when all tests checked. (#215)

* Create merge-gatekeeper.yml

* Update .github/workflows/merge-gatekeeper.yml

---------

Co-authored-by: Sajid Alam <90610031+SajidAlamQB@users.noreply.github.com>
Signed-off-by: Tom Kurian <tom_kurian@mckinsey.com>

* ci: Remove the CircleCI setup (#209)

* remove circleci setup files and utils

* remove circleci configs in kedro-telemetry

* remove redundant .github in kedro-telemetry

* Delete continue_config.yml

* Update check-release.yml

* lint

* increase timeout to 40 mins for docker e2e tests

Signed-off-by: Tom Kurian <tom_kurian@mckinsey.com>

* feat: Dataset API add `save` method (#180)

* [FEAT] add save method to APIDataset

Signed-off-by: jmcdonnell <jmcdonnell@fieldbox.ai>

* [ENH] create save_args parameter for api_dataset

Signed-off-by: jmcdonnell <jmcdonnell@fieldbox.ai>

* [ENH] add tests for socket + http errors

Signed-off-by: <jmcdonnell@fieldbox.ai>
Signed-off-by: jmcdonnell <jmcdonnell@fieldbox.ai>

* [ENH] check save data is json

Signed-off-by: <jmcdonnell@fieldbox.ai>
Signed-off-by: jmcdonnell <jmcdonnell@fieldbox.ai>

* [FIX] clean code

Signed-off-by: jmcdonnell <jmcdonnell@fieldbox.ai>

* [ENH] handle different data types

Signed-off-by: jmcdonnell <jmcdonnell@fieldbox.ai>

* [FIX] test coverage for exceptions

Signed-off-by: jmcdonnell <jmcdonnell@fieldbox.ai>

* [ENH] add examples in APIDataSet docstring

Signed-off-by: jmcdonnell <jmcdonnell@fieldbox.ai>

* sync APIDataSet  from kedro's `develop` (#184)

* Update APIDataSet

Signed-off-by: Nok Chan <nok.lam.chan@quantumblack.com>

* Sync ParquetDataSet

Signed-off-by: Nok Chan <nok.lam.chan@quantumblack.com>

* Sync Test

Signed-off-by: Nok Chan <nok.lam.chan@quantumblack.com>

* Linting

Signed-off-by: Nok Chan <nok.lam.chan@quantumblack.com>

* Revert Unnecessary ParquetDataSet Changes

Signed-off-by: Nok Chan <nok.lam.chan@quantumblack.com>

* Sync release notes

Signed-off-by: Nok Chan <nok.lam.chan@quantumblack.com>

---------

Signed-off-by: Nok Chan <nok.lam.chan@quantumblack.com>
Signed-off-by: jmcdonnell <jmcdonnell@fieldbox.ai>

* [FIX] remove support for delete method

Signed-off-by: jmcdonnell <jmcdonnell@fieldbox.ai>

* [FIX] lint files

Signed-off-by: jmcdonnell <jmcdonnell@fieldbox.ai>

* [FIX] fix conflicts

Signed-off-by: jmcdonnell <jmcdonnell@fieldbox.ai>

* [FIX] remove fail save test

Signed-off-by: jmcdonnell <jmcdonnell@fieldbox.ai>

* [ENH] review suggestions

Signed-off-by: jmcdonnell <jmcdonnell@fieldbox.ai>

* [ENH] fix tests

Signed-off-by: jmcdonnell <jmcdonnell@fieldbox.ai>

* [FIX] reorder arguments

Signed-off-by: jmcdonnell <jmcdonnell@fieldbox.ai>

---------

Signed-off-by: jmcdonnell <jmcdonnell@fieldbox.ai>
Signed-off-by: <jmcdonnell@fieldbox.ai>
Signed-off-by: Nok Chan <nok.lam.chan@quantumblack.com>
Co-authored-by: jmcdonnell <jmcdonnell@fieldbox.ai>
Co-authored-by: Nok Lam Chan <mediumnok@gmail.com>
Signed-off-by: Tom Kurian <tom_kurian@mckinsey.com>

* ci: Automatically extract release notes for GitHub Releases (#212)

* ci: Automatically extract release notes

Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com>

* fix lint

Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com>

* Raise exceptions

Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com>

* Lint

Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com>

* Lint

Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com>

---------

Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com>
Signed-off-by: Tom Kurian <tom_kurian@mckinsey.com>

* feat: Add metadata attribute to datasets (#189)

* Add metadata attribute to all datasets

Signed-off-by: Ahdra Merali <ahdra.merali@quantumblack.com>
Signed-off-by: Tom Kurian <tom_kurian@mckinsey.com>

* feat: Add ManagedTableDataset for managed Delta Lake tables in Databricks (#206)

* committing first version of UnityTableCatalog with unit tests. This datasets allows users to interface with Unity catalog tables in Databricks to both read and write.

Signed-off-by: Danny Farah <danny_farah@mckinsey.com>
Signed-off-by: Jannic Holzer <jannic.holzer@quantumblack.com>

* renaming dataset

Signed-off-by: Danny Farah <danny_farah@mckinsey.com>
Signed-off-by: Jannic Holzer <jannic.holzer@quantumblack.com>

* adding mlflow connectors

Signed-off-by: Danny Farah <danny_farah@mckinsey.com>
Signed-off-by: Jannic Holzer <jannic.holzer@quantumblack.com>

* fixing mlflow imports

Signed-off-by: Danny Farah <danny_farah@mckinsey.com>
Signed-off-by: Jannic Holzer <jannic.holzer@quantumblack.com>

* cleaned up mlflow for initial release

Signed-off-by: Danny Farah <danny_farah@mckinsey.com>
Signed-off-by: Jannic Holzer <jannic.holzer@quantumblack.com>

* cleaned up mlflow references from setup.py for initial release

Signed-off-by: Danny Farah <danny_farah@mckinsey.com>
Signed-off-by: Jannic Holzer <jannic.holzer@quantumblack.com>

* fixed deps in setup.py

Signed-off-by: Danny Farah <danny_farah@mckinsey.com>
Signed-off-by: Jannic Holzer <jannic.holzer@quantumblack.com>

* adding comments before intiial PR

Signed-off-by: Danny Farah <danny_farah@mckinsey.com>
Signed-off-by: Jannic Holzer <jannic.holzer@quantumblack.com>

* moved validation to dataclass

Signed-off-by: Danny Farah <danny_farah@mckinsey.com>
Signed-off-by: Jannic Holzer <jannic.holzer@quantumblack.com>

* bug fix in type of partition column and cleanup

Signed-off-by: Danny Farah <danny_farah@mckinsey.com>
Signed-off-by: Jannic Holzer <jannic.holzer@quantumblack.com>

* updated docstring for ManagedTableDataSet

Signed-off-by: Danny Farah <danny_farah@mckinsey.com>
Signed-off-by: Jannic Holzer <jannic.holzer@quantumblack.com>

* added backticks to catalog

Signed-off-by: Danny Farah <danny_farah@mckinsey.com>
Signed-off-by: Jannic Holzer <jannic.holzer@quantumblack.com>

* fixing regex to allow hyphens

Signed-off-by: Danny Farah <danny_farah@mckinsey.com>
Signed-off-by: Jannic Holzer <jannic.holzer@quantumblack.com>

* Update kedro-datasets/kedro_datasets/databricks/managed_table_dataset.py

Co-authored-by: Jannic <37243923+jmholzer@users.noreply.github.com>
Signed-off-by: Jannic Holzer <jannic.holzer@quantumblack.com>

* Update kedro-datasets/kedro_datasets/databricks/managed_table_dataset.py

Co-authored-by: Jannic <37243923+jmholzer@users.noreply.github.com>
Signed-off-by: Danny Farah <danny_farah@mckinsey.com>
Signed-off-by: Jannic Holzer <jannic.holzer@quantumblack.com>

* Update kedro-datasets/kedro_datasets/databricks/managed_table_dataset.py

Co-authored-by: Jannic <37243923+jmholzer@users.noreply.github.com>
Signed-off-by: Danny Farah <danny_farah@mckinsey.com>
Signed-off-by: Jannic Holzer <jannic.holzer@quantumblack.com>

* Update kedro-datasets/kedro_datasets/databricks/managed_table_dataset.py

Co-authored-by: Jannic <37243923+jmholzer@users.noreply.github.com>
Signed-off-by: Danny Farah <danny_farah@mckinsey.com>
Signed-off-by: Jannic Holzer <jannic.holzer@quantumblack.com>

* Update kedro-datasets/kedro_datasets/databricks/managed_table_dataset.py

Co-authored-by: Jannic <37243923+jmholzer@users.noreply.github.com>
Signed-off-by: Danny Farah <danny_farah@mckinsey.com>
Signed-off-by: Jannic Holzer <jannic.holzer@quantumblack.com>

* Update kedro-datasets/kedro_datasets/databricks/managed_table_dataset.py

Co-authored-by: Jannic <37243923+jmholzer@users.noreply.github.com>
Signed-off-by: Danny Farah <danny_farah@mckinsey.com>
Signed-off-by: Jannic Holzer <jannic.holzer@quantumblack.com>

* Update kedro-datasets/kedro_datasets/databricks/managed_table_dataset.py

Co-authored-by: Jannic <37243923+jmholzer@users.noreply.github.com>
Signed-off-by: Danny Farah <danny_farah@mckinsey.com>
Signed-off-by: Jannic Holzer <jannic.holzer@quantumblack.com>

* Update kedro-datasets/test_requirements.txt

Co-authored-by: Jannic <37243923+jmholzer@users.noreply.github.com>
Signed-off-by: Danny Farah <danny_farah@mckinsey.com>
Signed-off-by: Jannic Holzer <jannic.holzer@quantumblack.com>

* Update kedro-datasets/kedro_datasets/databricks/managed_table_dataset.py

Co-authored-by: Jannic <37243923+jmholzer@users.noreply.github.com>
Signed-off-by: Danny Farah <danny_farah@mckinsey.com>
Signed-off-by: Jannic Holzer <jannic.holzer@quantumblack.com>

* Update kedro-datasets/kedro_datasets/databricks/managed_table_dataset.py

Co-authored-by: Jannic <37243923+jmholzer@users.noreply.github.com>
Signed-off-by: Danny Farah <danny_farah@mckinsey.com>
Signed-off-by: Jannic Holzer <jannic.holzer@quantumblack.com>

* Update kedro-datasets/kedro_datasets/databricks/managed_table_dataset.py

Co-authored-by: Jannic <37243923+jmholzer@users.noreply.github.com>
Signed-off-by: Danny Farah <danny_farah@mckinsey.com>
Signed-off-by: Jannic Holzer <jannic.holzer@quantumblack.com>

* Update kedro-datasets/kedro_datasets/databricks/managed_table_dataset.py

Co-authored-by: Jannic <37243923+jmholzer@users.noreply.github.com>
Signed-off-by: Danny Farah <danny_farah@mckinsey.com>
Signed-off-by: Jannic Holzer <jannic.holzer@quantumblack.com>

* adding backticks to catalog

Signed-off-by: Danny Farah <danny_farah@mckinsey.com>
Signed-off-by: Jannic Holzer <jannic.holzer@quantumblack.com>

* Require pandas < 2.0 for compatibility with spark < 3.4

Signed-off-by: Jannic Holzer <jannic.holzer@quantumblack.com>

* Replace use of walrus operator

Signed-off-by: Jannic Holzer <jannic.holzer@quantumblack.com>

* Add test coverage for validation methods

Signed-off-by: Jannic Holzer <jannic.holzer@quantumblack.com>

* Remove unused versioning functions

Signed-off-by: Jannic Holzer <jannic.holzer@quantumblack.com>

* Fix exception catching for invalid schema, add test for invalid schema

Signed-off-by: Jannic Holzer <jannic.holzer@quantumblack.com>

* Add pylint ignore

Signed-off-by: Jannic Holzer <jannic.holzer@quantumblack.com>

* Add tests/databricks to ignore for no-spark tests

Signed-off-by: Jannic Holzer <jannic.holzer@quantumblack.com>

* Update kedro-datasets/kedro_datasets/databricks/managed_table_dataset.py

Co-authored-by: Nok Lam Chan <mediumnok@gmail.com>

* Update kedro-datasets/kedro_datasets/databricks/managed_table_dataset.py

Co-authored-by: Nok Lam Chan <mediumnok@gmail.com>

* Remove spurious mlflow test dependency

Signed-off-by: Jannic Holzer <jannic.holzer@quantumblack.com>

* Add explicit check for database existence

Signed-off-by: Jannic Holzer <jannic.holzer@quantumblack.com>

* Remove character limit for table names

Signed-off-by: Jannic Holzer <jannic.holzer@quantumblack.com>

* Refactor validation steps in ManagedTable

Signed-off-by: Jannic Holzer <jannic.holzer@quantumblack.com>

* Remove spurious checks for table and schema name existence

Signed-off-by: Jannic Holzer <jannic.holzer@quantumblack.com>

---------

Signed-off-by: Danny Farah <danny_farah@mckinsey.com>
Signed-off-by: Jannic Holzer <jannic.holzer@quantumblack.com>
Co-authored-by: Danny Farah <danny.farah@quantumblack.com>
Co-authored-by: Danny Farah <danny_farah@mckinsey.com>
Co-authored-by: Nok Lam Chan <mediumnok@gmail.com>
Signed-off-by: Tom Kurian <tom_kurian@mckinsey.com>

* docs: Update APIDataset docs and refactor (#217)

* Update APIDataset docs and refactor

* Acknowledge community contributor

* Fix more broken doc

Signed-off-by: Nok Chan <nok.lam.chan@quantumblack.com>

* Lint

Signed-off-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com>

* Fix release notes of upcoming kedro-datasets

---------

Signed-off-by: Nok Chan <nok.lam.chan@quantumblack.com>
Signed-off-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com>
Co-authored-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com>
Co-authored-by: Jannic <37243923+jmholzer@users.noreply.github.com>
Signed-off-by: Tom Kurian <tom_kurian@mckinsey.com>

* feat: Release `kedro-datasets` version `1.3.0` (#219)

* Modify release version and RELEASE.md

Signed-off-by: Jannic Holzer <jannic.holzer@quantumblack.com>

* Add proper name for ManagedTableDataSet

Signed-off-by: Jannic Holzer <jannic.holzer@quantumblack.com>

* Update kedro-datasets/RELEASE.md

Co-authored-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com>

* Revert lost semicolon for release 1.2.0

Signed-off-by: Jannic Holzer <jannic.holzer@quantumblack.com>

---------

Signed-off-by: Jannic Holzer <jannic.holzer@quantumblack.com>
Co-authored-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com>
Signed-off-by: Tom Kurian <tom_kurian@mckinsey.com>

* docs: Fix APIDataSet docstring (#220)

* Fix APIDataSet docstring

Signed-off-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com>

* Add release notes

Signed-off-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com>

* Separate [docs] extras from [all] in kedro-datasets

Fix gh-143.

Signed-off-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com>

---------

Signed-off-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com>
Signed-off-by: Tom Kurian <tom_kurian@mckinsey.com>

* Update kedro-datasets/tests/spark/test_spark_streaming_dataset.py

Co-authored-by: Deepyaman Datta <deepyaman.datta@utexas.edu>
Signed-off-by: Tom Kurian <tom_kurian@mckinsey.com>

* Update kedro-datasets/kedro_datasets/spark/spark_streaming_dataset.py

Co-authored-by: Deepyaman Datta <deepyaman.datta@utexas.edu>
Signed-off-by: Tom Kurian <tom_kurian@mckinsey.com>

* Update kedro-datasets/setup.py

Co-authored-by: Deepyaman Datta <deepyaman.datta@utexas.edu>
Signed-off-by: Tom Kurian <tom_kurian@mckinsey.com>

* fix linting issue

Signed-off-by: Tom Kurian <tom_kurian@mckinsey.com>

---------

Signed-off-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com>
Signed-off-by: Tingting_Wan <tingting_wan@mckinsey.com>
Signed-off-by: Juan Luis Cano Rodríguez <hello@juanlu.space>
Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com>
Signed-off-by: Nok Chan <nok.lam.chan@quantumblack.com>
Signed-off-by: Tom Kurian <tom_kurian@mckinsey.com>
Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu>
Signed-off-by: Merel Theisen <merel.theisen@quantumblack.com>
Signed-off-by: jmcdonnell <jmcdonnell@fieldbox.ai>
Signed-off-by: <jmcdonnell@fieldbox.ai>
Signed-off-by: Ahdra Merali <ahdra.merali@quantumblack.com>
Signed-off-by: Danny Farah <danny_farah@mckinsey.com>
Signed-off-by: Jannic Holzer <jannic.holzer@quantumblack.com>
Co-authored-by: Juan Luis Cano Rodríguez <hello@juanlu.space>
Co-authored-by: Tingting Wan <110382691+Tingting711@users.noreply.github.com>
Co-authored-by: Nok Lam Chan <nok.lam.chan@quantumblack.com>
Co-authored-by: Deepyaman Datta <deepyaman.datta@utexas.edu>
Co-authored-by: Nok Lam Chan <mediumnok@gmail.com>
Co-authored-by: Ankita Katiyar <110245118+ankatiyar@users.noreply.github.com>
Co-authored-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com>
Co-authored-by: Tom Kurian <tom_kurian@mckinsey.com>
Co-authored-by: Sajid Alam <90610031+SajidAlamQB@users.noreply.github.com>
Co-authored-by: Merel Theisen <49397448+merelcht@users.noreply.github.com>
Co-authored-by: McDonnellJoseph <90898184+McDonnellJoseph@users.noreply.github.com>
Co-authored-by: jmcdonnell <jmcdonnell@fieldbox.ai>
Co-authored-by: Ahdra Merali <90615669+AhdraMeraliQB@users.noreply.github.com>
Co-authored-by: Jannic <37243923+jmholzer@users.noreply.github.com>
Co-authored-by: Danny Farah <danny.farah@quantumblack.com>
Co-authored-by: Danny Farah <danny_farah@mckinsey.com>
Co-authored-by: kuriantom369 <116743025+kuriantom369@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants