Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add ManagedTableDataset for managed Delta Lake tables in Databricks #206

Merged
merged 45 commits into from
May 22, 2023
Merged
Show file tree
Hide file tree
Changes from 33 commits
Commits
Show all changes
45 commits
Select commit Hold shift + click to select a range
d1bc9ab
committing first version of UnityTableCatalog with unit tests. This d…
dannyrfar Feb 10, 2023
798055e
renaming dataset
dannyrfar Feb 14, 2023
f2ea255
adding mlflow connectors
dannyrfar Feb 23, 2023
9bb88c2
fixing mlflow imports
dannyrfar Feb 23, 2023
20d20b5
cleaned up mlflow for initial release
dannyrfar Mar 8, 2023
d6bc149
cleaned up mlflow references from setup.py for initial release
dannyrfar Mar 8, 2023
aee12a2
fixed deps in setup.py
dannyrfar Mar 8, 2023
911e53f
adding comments before intiial PR
dannyrfar Mar 13, 2023
10932fb
moved validation to dataclass
dannyrfar Mar 14, 2023
74471a8
bug fix in type of partition column and cleanup
dannyrfar Mar 21, 2023
4022f0d
updated docstring for ManagedTableDataSet
dannyrfar Mar 21, 2023
f6531e1
added backticks to catalog
dannyrfar Apr 5, 2023
3ed18a1
fixing regex to allow hyphens
dannyrfar Apr 11, 2023
a149b4d
Update kedro-datasets/kedro_datasets/databricks/managed_table_dataset.py
dannyrfar May 3, 2023
c854a64
Update kedro-datasets/kedro_datasets/databricks/managed_table_dataset.py
dannyrfar May 3, 2023
c994c3f
Update kedro-datasets/kedro_datasets/databricks/managed_table_dataset.py
dannyrfar May 3, 2023
09bf847
Update kedro-datasets/kedro_datasets/databricks/managed_table_dataset.py
dannyrfar May 3, 2023
b7e8cff
Update kedro-datasets/kedro_datasets/databricks/managed_table_dataset.py
dannyrfar May 3, 2023
e7b8e40
Update kedro-datasets/kedro_datasets/databricks/managed_table_dataset.py
dannyrfar May 3, 2023
31a0c73
Update kedro-datasets/kedro_datasets/databricks/managed_table_dataset.py
dannyrfar May 3, 2023
83704b4
Update kedro-datasets/test_requirements.txt
dannyrfar May 3, 2023
1bf1e29
Update kedro-datasets/kedro_datasets/databricks/managed_table_dataset.py
dannyrfar May 3, 2023
9e391ee
Update kedro-datasets/kedro_datasets/databricks/managed_table_dataset.py
dannyrfar May 3, 2023
651e379
Update kedro-datasets/kedro_datasets/databricks/managed_table_dataset.py
dannyrfar May 3, 2023
b267616
Update kedro-datasets/kedro_datasets/databricks/managed_table_dataset.py
dannyrfar May 3, 2023
f8f9786
adding backticks to catalog
dannyrfar May 3, 2023
57248ea
Require pandas < 2.0 for compatibility with spark < 3.4
jmholzer May 4, 2023
944009a
Replace use of walrus operator
jmholzer May 4, 2023
25a293e
Add test coverage for validation methods
jmholzer May 4, 2023
3d6b682
Remove unused versioning functions
jmholzer May 4, 2023
b37a198
Fix exception catching for invalid schema, add test for invalid schema
jmholzer May 5, 2023
952cf3d
Add pylint ignore
jmholzer May 5, 2023
743816e
Add tests/databricks to ignore for no-spark tests
jmholzer May 12, 2023
0a160a5
Update kedro-datasets/kedro_datasets/databricks/managed_table_dataset.py
jmholzer May 17, 2023
daf5411
Update kedro-datasets/kedro_datasets/databricks/managed_table_dataset.py
jmholzer May 17, 2023
c1e78cd
Remove spurious mlflow test dependency
jmholzer May 18, 2023
5fe9fec
Merge branch 'main' into feat/add-managed-delta-table-dataset
jmholzer May 19, 2023
4c830bf
Merge branch 'main' into feat/add-managed-delta-table-dataset
jmholzer May 22, 2023
dbfd641
Add explicit check for database existence
jmholzer May 22, 2023
5ea0d66
Remove character limit for table names
jmholzer May 22, 2023
c2fd478
Refactor validation steps in ManagedTable
jmholzer May 22, 2023
e228164
Merge branch 'feat/add-managed-delta-table-dataset' of github.com:ked…
jmholzer May 22, 2023
47e2a18
Merge branch 'main' into feat/add-managed-delta-table-dataset
jmholzer May 22, 2023
7e52e9c
Remove spurious checks for table and schema name existence
jmholzer May 22, 2023
f03bbe3
Merge branch 'feat/add-managed-delta-table-dataset' of github.com:ked…
jmholzer May 22, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -52,10 +52,10 @@ sign-off:

# kedro-datasets related only
test-no-spark:
cd kedro-datasets && pytest tests --no-cov --ignore tests/spark --numprocesses 4 --dist loadfile
cd kedro-datasets && pytest tests --no-cov --ignore tests/spark --ignore tests/databricks --numprocesses 4 --dist loadfile

test-no-spark-sequential:
cd kedro-datasets && pytest tests --no-cov --ignore tests/spark
cd kedro-datasets && pytest tests --no-cov --ignore tests/spark --ignore tests/databricks

# kedro-datasets/snowflake tests skipped from default scope
test-snowflake-only:
Expand Down
3 changes: 3 additions & 0 deletions kedro-datasets/.gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -145,3 +145,6 @@ kedro.db
kedro/html
docs/tmp-build-artifacts
docs/build
spark-warehouse
metastore_db/
derby.log
8 changes: 8 additions & 0 deletions kedro-datasets/kedro_datasets/databricks/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
"""Provides interface to Unity Catalog Tables."""

__all__ = ["ManagedTableDataSet"]

from contextlib import suppress

with suppress(ImportError):
from .managed_table_dataset import ManagedTableDataSet
Loading