New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Schema overhaul #785
Merged
Merged
Schema overhaul #785
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Signed-off-by: Kevin Su <pingsutw@apache.org>
Signed-off-by: Kevin Su <pingsutw@apache.org>
Signed-off-by: Kevin Su <pingsutw@apache.org>
Signed-off-by: Kevin Su <pingsutw@apache.org>
Signed-off-by: Kevin Su <pingsutw@apache.org>
Signed-off-by: Kevin Su <pingsutw@apache.org>
Signed-off-by: Kevin Su <pingsutw@apache.org>
Signed-off-by: Kevin Su <pingsutw@apache.org>
Signed-off-by: Kevin <pingsutw@apache.org>
Signed-off-by: Kevin Su <pingsutw@apache.org>
Signed-off-by: Kevin Su <pingsutw@apache.org>
Signed-off-by: Kevin Su <pingsutw@apache.org>
Signed-off-by: Yee Hing Tong <wild-endeavor@users.noreply.github.com>
Signed-off-by: Yee Hing Tong <wild-endeavor@users.noreply.github.com>
Signed-off-by: Yee Hing Tong <wild-endeavor@users.noreply.github.com>
Signed-off-by: Yee Hing Tong <wild-endeavor@users.noreply.github.com>
Signed-off-by: Yee Hing Tong <wild-endeavor@users.noreply.github.com>
Signed-off-by: Yee Hing Tong <wild-endeavor@users.noreply.github.com>
Signed-off-by: Yee Hing Tong <wild-endeavor@users.noreply.github.com>
Signed-off-by: Kevin Su <pingsutw@apache.org>
Signed-off-by: Kevin Su <pingsutw@apache.org>
Signed-off-by: Kevin Su <pingsutw@apache.org>
Signed-off-by: Kevin Su <pingsutw@apache.org>
Signed-off-by: Kevin Su <pingsutw@apache.org>
pingsutw
requested review from
eapolinario,
kumare3 and
wild-endeavor
as code owners
December 20, 2021 08:22
8 tasks
* [pr into #785] Turn structured dataset into dataclass Signed-off-by: Kevin Su <pingsutw@apache.org> * Inspect class Signed-off-by: Kevin Su <pingsutw@apache.org> * Inspect class Signed-off-by: Kevin Su <pingsutw@apache.org> * Resolved conflict Signed-off-by: Kevin Su <pingsutw@apache.org> * Resolved conflict Signed-off-by: Kevin Su <pingsutw@apache.org> * Resolved conflict Signed-off-by: Kevin Su <pingsutw@apache.org> * update Signed-off-by: Kevin Su <pingsutw@apache.org> * tests Signed-off-by: Yee Hing Tong <wild-endeavor@users.noreply.github.com> * nit Signed-off-by: Yee Hing Tong <wild-endeavor@users.noreply.github.com> * improve printing Signed-off-by: Yee Hing Tong <wild-endeavor@users.noreply.github.com> * Fixed tests and lint Signed-off-by: Kevin Su <pingsutw@apache.org> * Fixed tests Signed-off-by: Kevin Su <pingsutw@apache.org> * Fixed tests Signed-off-by: Kevin Su <pingsutw@apache.org> * Pandas read local dir instead of file Signed-off-by: Kevin Su <pingsutw@apache.org> * Remove duplicate code Signed-off-by: Kevin Su <pingsutw@apache.org> * support FlyteSchema -> StructuredDataset Signed-off-by: Kevin Su <pingsutw@apache.org> * Remove inspect.isclass Signed-off-by: Kevin Su <pingsutw@apache.org> * Fix tests Signed-off-by: Kevin Su <pingsutw@apache.org> * Fix lint Signed-off-by: Kevin Su <pingsutw@apache.org> Co-authored-by: Yee Hing Tong <wild-endeavor@users.noreply.github.com>
pingsutw
commented
Jan 11, 2022
pingsutw
commented
Jan 11, 2022
Signed-off-by: Yee Hing Tong <wild-endeavor@users.noreply.github.com>
Signed-off-by: Kevin Su <pingsutw@apache.org>
Signed-off-by: Kevin Su <pingsutw@apache.org>
Signed-off-by: Kevin Su <pingsutw@apache.org>
wild-endeavor
previously approved these changes
Jan 12, 2022
Signed-off-by: Kevin Su <pingsutw@apache.org>
wild-endeavor
approved these changes
Jan 13, 2022
This was referenced Jan 14, 2022
eapolinario
pushed a commit
that referenced
this pull request
Jan 28, 2022
Signed-off-by: Kevin Su <pingsutw@apache.org> Signed-off-by: Yee Hing Tong <wild-endeavor@users.noreply.github.com>
kennyworkman
pushed a commit
to kennyworkman/flytekit
that referenced
this pull request
Feb 8, 2022
Signed-off-by: Kevin Su <pingsutw@apache.org> Signed-off-by: Yee Hing Tong <wild-endeavor@users.noreply.github.com> Signed-off-by: Kenny Workman <kennyworkman@sbcglobal.net>
kennyworkman
pushed a commit
to kennyworkman/flytekit
that referenced
this pull request
Feb 8, 2022
Signed-off-by: Kevin Su <pingsutw@apache.org> Signed-off-by: Yee Hing Tong <wild-endeavor@users.noreply.github.com>
maximsmol
pushed a commit
to maximsmol/flytekit
that referenced
this pull request
Mar 8, 2022
Signed-off-by: Kevin Su <pingsutw@apache.org> Signed-off-by: Yee Hing Tong <wild-endeavor@users.noreply.github.com>
8 tasks
eapolinario
added a commit
that referenced
this pull request
Mar 21, 2022
* Add support union type Signed-off-by: Kevin Su <pingsutw@apache.org> Signed-off-by: maximsmol <maximsmol@gmail.com> * Fixed test Signed-off-by: Kevin Su <pingsutw@apache.org> Signed-off-by: maximsmol <maximsmol@gmail.com> * Fixed test Signed-off-by: Kevin Su <pingsutw@apache.org> Signed-off-by: maximsmol <maximsmol@gmail.com> * Fixed lint Signed-off-by: Kevin Su <pingsutw@apache.org> Signed-off-by: maximsmol <maximsmol@gmail.com> * Fixed tests Signed-off-by: Kevin Su <pingsutw@apache.org> Signed-off-by: maximsmol <maximsmol@gmail.com> * Updated tests Signed-off-by: Kevin Su <pingsutw@apache.org> Signed-off-by: maximsmol <maximsmol@gmail.com> * Update Union to use tagged Unions Signed-off-by: maximsmol <maximsmol@gmail.com> * Update to use string tags (part 1) Signed-off-by: maximsmol <maximsmol@gmail.com> * Working implementation, update tests Signed-off-by: maximsmol <maximsmol@gmail.com> * Fixes, more tests Signed-off-by: maximsmol <maximsmol@gmail.com> * Fix incorrect unwrapped literal-union matching, update test Signed-off-by: maximsmol <maximsmol@gmail.com> * Fix duplicate tag handling, add tests for collections containing unions Signed-off-by: maximsmol <maximsmol@gmail.com> * Fix type hint test Signed-off-by: maximsmol <maximsmol@gmail.com> * Add implicit wrapping union type tests Signed-off-by: maximsmol <maximsmol@gmail.com> * Add union ambiguity tests Signed-off-by: maximsmol <maximsmol@gmail.com> * Fixup tests, make TypeTransformerFailed compatible with all old exception types Signed-off-by: maximsmol <maximsmol@gmail.com> * Fixup models + add tests Signed-off-by: maximsmol <maximsmol@gmail.com> * Implement changed design Signed-off-by: maximsmol <maximsmol@gmail.com> * Fix PR comments Signed-off-by: maximsmol <maximsmol@gmail.com> * Remote entrypoint serialize (#733) Signed-off-by: Emirhan Karagül <emirhan350z@gmail.com> Signed-off-by: Yee Hing Tong <wild-endeavor@users.noreply.github.com> Signed-off-by: maximsmol <maximsmol@gmail.com> * Fix lint error in remote.py (#755) Signed-off-by: Eduardo Apolinario <eapolinario@users.noreply.github.com> Co-authored-by: Eduardo Apolinario <eapolinario@users.noreply.github.com> Signed-off-by: maximsmol <maximsmol@gmail.com> * Support enum in dataclass (#753) * Add support enum in dataclass Signed-off-by: Kevin Su <pingsutw@apache.org> * Update test Signed-off-by: Kevin Su <pingsutw@apache.org> * Fixed lint Signed-off-by: Kevin Su <pingsutw@apache.org> * Fix subworkflow and launch plan FlyteRemote behavior (#751) Signed-off-by: Yee Hing Tong <wild-endeavor@users.noreply.github.com> Signed-off-by: maximsmol <maximsmol@gmail.com> * Extras: Shell task (#747) Signed-off-by: maximsmol <maximsmol@gmail.com> * Add support FlyteSchema in dataclass (#722) * schema in dataclass Signed-off-by: Kevin Su <pingsutw@apache.org> * Added tests Signed-off-by: Kevin Su <pingsutw@apache.org> * Fixed lint Signed-off-by: Kevin Su <pingsutw@apache.org> * Updated tests Signed-off-by: Kevin Su <pingsutw@apache.org> * Updated tests Signed-off-by: Kevin Su <pingsutw@apache.org> * Fixed lint Signed-off-by: Kevin Su <pingsutw@apache.org> * updated Signed-off-by: Kevin Su <pingsutw@apache.org> * updated Signed-off-by: Kevin Su <pingsutw@apache.org> * updated Signed-off-by: Kevin Su <pingsutw@apache.org> * Remove workflow_execution.py (#758) Signed-off-by: Kevin Su <pingsutw@apache.org> Signed-off-by: maximsmol <maximsmol@gmail.com> * Get raw input/output from remote execution (#675) * [wip] for feast demo Signed-off-by: Ketan Umare <ketan.umare@gmail.com> * clean up a bit Signed-off-by: Yee Hing Tong <wild-endeavor@users.noreply.github.com> * add a test and move where constructor is called Signed-off-by: Yee Hing Tong <wild-endeavor@users.noreply.github.com> * remove unneeded import Signed-off-by: Yee Hing Tong <wild-endeavor@users.noreply.github.com> * add a part of a test Signed-off-by: Yee Hing Tong <wild-endeavor@users.noreply.github.com> * Added tests Signed-off-by: Kevin Su <pingsutw@apache.org> * Fixed lint Signed-off-by: Kevin Su <pingsutw@apache.org> * typo Signed-off-by: Kevin Su <pingsutw@apache.org> Co-authored-by: Yee Hing Tong <wild-endeavor@users.noreply.github.com> Co-authored-by: Kevin Su <pingsutw@apache.org> Signed-off-by: maximsmol <maximsmol@gmail.com> * Fix mypy errors in flytekit/types (#757) Signed-off-by: Lisa <aeioulisa@gmail.com> Signed-off-by: Kevin Su <pingsutw@apache.org> * Remote client failed to fetch FlytePickle object (#764) * Fetch pickle value from flytekit remote Signed-off-by: Kevin Su <pingsutw@apache.org> * Fix tests Signed-off-by: Kevin Su <pingsutw@apache.org> * Remove default value Signed-off-by: Kevin Su <pingsutw@apache.org> * Add support FlyteFile in dataclass (#725) * Add support Flyte File and directory in dataclass Signed-off-by: Kevin Su <pingsutw@apache.org> * Fixed tests Signed-off-by: Kevin Su <pingsutw@apache.org> * Fixed tests Signed-off-by: Kevin Su <pingsutw@apache.org> * Updated Signed-off-by: Kevin Su <pingsutw@apache.org> * Updated Signed-off-by: Kevin Su <pingsutw@apache.org> * Fixed tests Signed-off-by: Kevin Su <pingsutw@apache.org> * add task_resolver arg to @task decorator (#765) Signed-off-by: Niels Bantilan <niels.bantilan@gmail.com> Signed-off-by: maximsmol <maximsmol@gmail.com> * Copy metadata into map task from underlying (#766) Signed-off-by: Yee Hing Tong <wild-endeavor@users.noreply.github.com> Signed-off-by: maximsmol <maximsmol@gmail.com> * Support for delayed annotations (#760) Signed-off-by: Stefan Nelson-Lindall <stef@stripe.com> Signed-off-by: Yee Hing Tong <wild-endeavor@users.noreply.github.com> Signed-off-by: maximsmol <maximsmol@gmail.com> * Complex dataclass unit tests (#773) Signed-off-by: Ketan Umare <ketan.umare@gmail.com> Signed-off-by: Yee Hing Tong <wild-endeavor@users.noreply.github.com> Signed-off-by: maximsmol <maximsmol@gmail.com> * remote_source lost on serialization of @dataclass_json with FlyteFile (#774) * remote_source lost on serialization of @dataclass_json with FlyteFile Signed-off-by: Kevin Su <pingsutw@apache.org> * updated tests Signed-off-by: Kevin Su <pingsutw@apache.org> * updated tests Signed-off-by: Kevin Su <pingsutw@apache.org> Signed-off-by: maximsmol <maximsmol@gmail.com> * Single-task execution FlyteRemote sync (#778) Signed-off-by: Yee Hing Tong <wild-endeavor@users.noreply.github.com> Signed-off-by: maximsmol <maximsmol@gmail.com> * Logging updates (#775) Signed-off-by: Yee Hing Tong <wild-endeavor@users.noreply.github.com> Signed-off-by: maximsmol <maximsmol@gmail.com> * fix: doc-requirements.txt to reduce vulnerabilities (#779) The following vulnerabilities are fixed by pinning transitive dependencies: - https://snyk.io/vuln/SNYK-PYTHON-LXML-2316995 Signed-off-by: maximsmol <maximsmol@gmail.com> * Add cache_serialize parameter to tasks (#673) * added cache_serialize parameter for basic tasks Signed-off-by: Daniel Rammer <daniel@union.ai> * fixed typo Signed-off-by: Daniel Rammer <daniel@union.ai> * updated flyteidl version Signed-off-by: Daniel Rammer <daniel@union.ai> * changed flyteidl requirements everywhere Signed-off-by: Daniel Rammer <daniel@union.ai> * remove flyteidl version requires in setup.py so we can use develop Signed-off-by: Daniel Rammer <daniel@union.ai> * removed flyteidl git repos from a variety of requirements packages Signed-off-by: Daniel Rammer <daniel@union.ai> * updated variable discovery_serializable to cache_serializable Signed-off-by: Daniel Rammer <daniel@union.ai> * updated requirements Signed-off-by: Daniel Rammer <daniel@union.ai> * fixed TaskMetadata _cache_serializable variable name Signed-off-by: Daniel Rammer <daniel@union.ai> * propgating cache_serialize parameter through to tasks Signed-off-by: Daniel Rammer <daniel@union.ai> * added cache_serializable to SdkRawContainerTask Signed-off-by: Daniel Rammer <daniel@union.ai> * fixing cache_serializable variable propogation issues Signed-off-by: Daniel Rammer <daniel@union.ai> * added documentation Signed-off-by: Daniel Rammer <daniel@union.ai> * added unit tests for cache_serialize metadata Signed-off-by: Daniel Rammer <daniel@union.ai> * linter added spaces in unit tests Signed-off-by: Daniel Rammer <daniel@union.ai> Signed-off-by: maximsmol <maximsmol@gmail.com> * When using the `task` and `workflow` decorator, correctly wrap the fu… (#780) * When using the `task` and `workflow` decorator, correctly wrap the function This enables tooling such as docstring search tools to unwrap the object and show the correct docstring. Signed-off-by: Bernhard Stadlbauer <bstadlbauer@blackshark.ai> * Remove blackshark copyright header Signed-off-by: Bernhard Stadlbauer <bstadlbauer@blackshark.ai> * Fix broken great expectations test Signed-off-by: Bernhard Stadlbauer <bstadlbauer@blackshark.ai> * Add test for stacked decorators Signed-off-by: Bernhard Stadlbauer <bstadlbauer@blackshark.ai> Co-authored-by: Bernhard Stadlbauer <bstadlbauer@gmx.net> Co-authored-by: Bernhard Stadlbauer <bstadlbauer@blackshark.ai> Signed-off-by: maximsmol <maximsmol@gmail.com> * Add option to flyte-cli for specifying root certificate (#783) Signed-off-by: Yee Hing Tong <wild-endeavor@users.noreply.github.com> Signed-off-by: maximsmol <maximsmol@gmail.com> * Add validation check to cacert switch (#787) Signed-off-by: Yee Hing Tong <wild-endeavor@users.noreply.github.com> Signed-off-by: maximsmol <maximsmol@gmail.com> * Remove pytz constraint (#786) * Remove pytz constraint from setup.py Signed-off-by: Eduardo Apolinario <eapolinario@users.noreply.github.com> * Regenerate requirements files Signed-off-by: Eduardo Apolinario <eapolinario@users.noreply.github.com> * Put pytz back Signed-off-by: Eduardo Apolinario <eapolinario@users.noreply.github.com> * make requirements.txt Signed-off-by: Eduardo Apolinario <eapolinario@users.noreply.github.com> Co-authored-by: Eduardo Apolinario <eapolinario@users.noreply.github.com> Signed-off-by: maximsmol <maximsmol@gmail.com> * Lint: remove f-string misuse (#788) Signed-off-by: Eduardo Apolinario <eapolinario@users.noreply.github.com> Signed-off-by: maximsmol <maximsmol@gmail.com> * Pyarrow greater than 4.0.0 (#790) * pyarrow>=4.0.0 Signed-off-by: Eduardo Apolinario <eapolinario@users.noreply.github.com> * Regenerate requirements. Signed-off-by: Eduardo Apolinario <eapolinario@users.noreply.github.com> Co-authored-by: Eduardo Apolinario <eapolinario@users.noreply.github.com> Signed-off-by: maximsmol <maximsmol@gmail.com> * Support python 3.10 (#791) * [wip] Support python 3.10 Signed-off-by: Eduardo Apolinario <eapolinario@users.noreply.github.com> * Add Dockerfile.py310 Signed-off-by: Eduardo Apolinario <eapolinario@users.noreply.github.com> * Stringify python version Signed-off-by: Eduardo Apolinario <eapolinario@users.noreply.github.com> * Skip flytekit-modin plugin tests on 3.10 Signed-off-by: Eduardo Apolinario <eapolinario@users.noreply.github.com> * Add 3.9 and 3.10 to list of supported version in plugins Signed-off-by: Eduardo Apolinario <eapolinario@users.noreply.github.com> * Comment why flytekit-modin is not running on 3.10 and disable fail-fast in plugin tests Signed-off-by: Eduardo Apolinario <eapolinario@users.noreply.github.com> Co-authored-by: Eduardo Apolinario <eapolinario@users.noreply.github.com> * add `with_overrides` to map task (#794) * add with_overrides Signed-off-by: Samhita Alla <aallasamhita@gmail.com> * remove Resources Signed-off-by: Samhita Alla <aallasamhita@gmail.com> Signed-off-by: maximsmol <maximsmol@gmail.com> * bump docsearch version (#805) Signed-off-by: Samhita Alla <aallasamhita@gmail.com> Signed-off-by: maximsmol <maximsmol@gmail.com> * update docs for new navbar theme (#806) Signed-off-by: maximsmol <maximsmol@gmail.com> * fix requirment.txt github issue (#810) Signed-off-by: Yuvraj <code@evalsocket.dev> Signed-off-by: maximsmol <maximsmol@gmail.com> * Add sphinx panels (#815) Signed-off-by: Ketan Umare <ketan.umare@gmail.com> Signed-off-by: maximsmol <maximsmol@gmail.com> * Schema overhaul (#785) Signed-off-by: Kevin Su <pingsutw@apache.org> Signed-off-by: Yee Hing Tong <wild-endeavor@users.noreply.github.com> * Parent workflow serialization fails when calling a launch plan with fixed inputs (#814) Signed-off-by: Kevin Su <pingsutw@apache.org> Signed-off-by: maximsmol <maximsmol@gmail.com> * Fix sagemaker plugin (#817) Signed-off-by: Eduardo Apolinario <eapolinario@users.noreply.github.com> Co-authored-by: Eduardo Apolinario <eapolinario@users.noreply.github.com> Signed-off-by: maximsmol <maximsmol@gmail.com> * Remove legacy API (#807) Signed-off-by: Yee Hing Tong <wild-endeavor@users.noreply.github.com> * Add Bigquery plugin (#789) * Add bigquery plugin Signed-off-by: Kevin Su <pingsutw@apache.org> * Update dependency Signed-off-by: Kevin Su <pingsutw@apache.org> * update get_custom Signed-off-by: Kevin Su <pingsutw@apache.org> * Add structured dataset Signed-off-by: Kevin Su <pingsutw@apache.org> * Add structured dataset Signed-off-by: Kevin Su <pingsutw@apache.org> * Updated comment Signed-off-by: Kevin Su <pingsutw@apache.org> * Add BQ in GA Signed-off-by: Kevin Su <pingsutw@apache.org> * alphabetical order Signed-off-by: Kevin Su <pingsutw@apache.org> Signed-off-by: maximsmol <maximsmol@gmail.com> * Fixed flytekit-papermill ImportError (#818) Signed-off-by: Kevin Su <pingsutw@apache.org> * Add support for string-format-like sytax for shell task (#792) * POC: Add support for f-string like sytax for shell task This commit is a proof of concept adding f-string like syntax for shell_tasks. This supports using nested types for script inputs, such as data classes. This change was motivated by the desire to combine shell_tasks that have multiple inputs with map_tasks which only support tasks with a single input. This commit is only a starting point, since it makes some changes to the shell_task API (adds a template_style field), and modifies some of the default behavior for ease of implementation (e.g. throwing an error when there are unused input arguments). Signed-off-by: Zach Palchick <palchicz@zymergen.com> * Drop support for old/regex style for doing string interpolation Signed-off-by: Zach Palchick <palchicz@zymergen.com> Signed-off-by: maximsmol <maximsmol@gmail.com> * Expose configured RawOutputPrefix during execution (#813) * Expose configured RawOutputPrefix during execution Signed-off-by: Kevin Su <pingsutw@apache.org> * Remove sdk_runnable.py and spark_task.py Signed-off-by: Kevin Su <pingsutw@apache.org> Signed-off-by: maximsmol <maximsmol@gmail.com> * Add SecretsManager back to old import location (#820) Signed-off-by: Yee Hing Tong <wild-endeavor@users.noreply.github.com> Signed-off-by: maximsmol <maximsmol@gmail.com> * Add some tests (#819) Signed-off-by: Yee Hing Tong <wild-endeavor@users.noreply.github.com> Signed-off-by: maximsmol <maximsmol@gmail.com> * Fixed flaky spark test (#821) Signed-off-by: Kevin Su <pingsutw@apache.org> Signed-off-by: maximsmol <maximsmol@gmail.com> * fix: plugins/flytekit-greatexpectations/requirements.txt to reduce vulnerabilities (#823) The following vulnerabilities are fixed by pinning transitive dependencies: - https://snyk.io/vuln/SNYK-PYTHON-IPYTHON-2348630 Signed-off-by: maximsmol <maximsmol@gmail.com> * fix: plugins/flytekit-papermill/requirements.txt to reduce vulnerabilities (#825) The following vulnerabilities are fixed by pinning transitive dependencies: - https://snyk.io/vuln/SNYK-PYTHON-IPYTHON-2348630 Signed-off-by: maximsmol <maximsmol@gmail.com> * fix: requirements-spark2.txt to reduce vulnerabilities (#826) The following vulnerabilities are fixed by pinning transitive dependencies: - https://snyk.io/vuln/SNYK-PYTHON-IPYTHON-2348630 Signed-off-by: maximsmol <maximsmol@gmail.com> * fix: requirements.txt to reduce vulnerabilities (#824) The following vulnerabilities are fixed by pinning transitive dependencies: - https://snyk.io/vuln/SNYK-PYTHON-IPYTHON-2348630 Signed-off-by: maximsmol <maximsmol@gmail.com> * Intratask checkpointing (#771) * wip - intratask checkpointing Signed-off-by: Ketan Umare <ketan.umare@gmail.com> * sync checkpointer with tests Signed-off-by: Ketan Umare <ketan.umare@gmail.com> * Checkpinter in entrypoint Signed-off-by: Ketan Umare <ketan.umare@gmail.com> * checkpoint in progress Signed-off-by: Ketan Umare <ketan.umare@gmail.com> * wip Signed-off-by: Ketan Umare <ketan.umare@gmail.com> * Intratask checkpointer Signed-off-by: Ketan Umare <ketan.umare@gmail.com> * Checkpoint updated Signed-off-by: Ketan Umare <ketan.umare@gmail.com> * Intra-task checkpointing Signed-off-by: Ketan Umare <ketan.umare@gmail.com> * Test and entrypoint updated Signed-off-by: Ketan Umare <ketan.umare@gmail.com> * lint fixed Signed-off-by: Ketan Umare <ketan.umare@gmail.com> * test fixes Signed-off-by: Ketan Umare <ketan.umare@gmail.com> * fmt Signed-off-by: Ketan Umare <ketan.umare@gmail.com> * updated entrypoint Signed-off-by: Ketan Umare <ketan.umare@gmail.com> * updated Signed-off-by: Ketan Umare <ketan.umare@gmail.com> * update Signed-off-by: Ketan Umare <ketan.umare@gmail.com> * print Signed-off-by: Ketan Umare <ketan.umare@gmail.com> * updated Signed-off-by: Ketan Umare <ketan.umare@gmail.com> * SyncCheckpointer working Signed-off-by: Ketan Umare <ketan.umare@gmail.com> * updated Signed-off-by: Ketan Umare <ketan.umare@gmail.com> * update Signed-off-by: Ketan Umare <ketan.umare@gmail.com> * fixed import problems Signed-off-by: Ketan Umare <ketan.umare@gmail.com> * fixed test Signed-off-by: Ketan Umare <ketan.umare@gmail.com> * fixed imports Signed-off-by: Ketan Umare <ketan.umare@gmail.com> * fixed lints and errors Signed-off-by: Ketan Umare <ketan.umare@gmail.com> * lint fix Signed-off-by: Ketan Umare <ketan.umare@gmail.com> * addressed comments Signed-off-by: Ketan Umare <ketan.umare@gmail.com> Signed-off-by: maximsmol <maximsmol@gmail.com> * Support reading subset column (#822) * Support StructuredDatasetDecoder read subset column Signed-off-by: Kevin Su <pingsutw@apache.org> * Added tests Signed-off-by: Kevin Su <pingsutw@apache.org> * Fixed tests Signed-off-by: Kevin Su <pingsutw@apache.org> * Fixed typo Signed-off-by: Kevin Su <pingsutw@apache.org> * Updated tests Signed-off-by: Kevin Su <pingsutw@apache.org> * [pr into #822] (#827) Signed-off-by: Yee Hing Tong <wild-endeavor@users.noreply.github.com> Signed-off-by: Kevin Su <pingsutw@apache.org> * [pr into #822] Final update to structured dataset column subsetting (#828) Signed-off-by: Yee Hing Tong <wild-endeavor@users.noreply.github.com> Co-authored-by: Yee Hing Tong <wild-endeavor@users.noreply.github.com> * Fix spark regression (#830) Signed-off-by: Kevin Su <pingsutw@apache.org> Signed-off-by: maximsmol <maximsmol@gmail.com> * Update argument setting for in fast registered, dynamically generated, pod tasks (#835) Signed-off-by: Yee Hing Tong <wild-endeavor@users.noreply.github.com> Signed-off-by: maximsmol <maximsmol@gmail.com> * `ctx` Context can be used within shell tasks - to access context vars and secrets (#832) * Adding context to a substitutable parameter in shell task Signed-off-by: Ketan Umare <ketan.umare@gmail.com> * Support for secrets in context Signed-off-by: Ketan Umare <ketan.umare@gmail.com> * addressed comments Signed-off-by: Ketan Umare <ketan.umare@gmail.com> Signed-off-by: maximsmol <maximsmol@gmail.com> * Expose Checkpoint as a top-level interface in flytekit (#839) Signed-off-by: Ketan Umare <ketan.umare@gmail.com> Signed-off-by: maximsmol <maximsmol@gmail.com> * Parse duration field from flyteidl to `flytekit.models.execution.ExecutionClosure` (#829) * Parse duration field from flyteidl to `flytekit.models.execution.ExecutionClosure` Signed-off-by: Bernhard Stadlbauer <bstadlbauer@blackshark.ai> * Add test for execution closure Signed-off-by: Bernhard Stadlbauer <bstadlbauer@blackshark.ai> * Add tests to Flyte remote Signed-off-by: Bernhard Stadlbauer <bstadlbauer@blackshark.ai> * Split execution test into with output and with error Signed-off-by: Bernhard Stadlbauer <b.stadlbauer@gmx.net> Co-authored-by: Bernhard Stadlbauer <bstadlbauer@blackshark.ai> Signed-off-by: maximsmol <maximsmol@gmail.com> * Gate new Structured Dataset feature & remove old config objects (#831) Signed-off-by: Yee Hing Tong <wild-endeavor@users.noreply.github.com> * Fixing out of order for conditional outputs (#843) Signed-off-by: maximsmol <maximsmol@gmail.com> * Set default values to map task template (#841) * Set sane defaults in map task templates Signed-off-by: Eduardo Apolinario <eapolinario@users.noreply.github.com> * Remove unused method Signed-off-by: Eduardo Apolinario <eapolinario@users.noreply.github.com> * Put ArrayJob.from_dict back Signed-off-by: Eduardo Apolinario <eapolinario@users.noreply.github.com> * Define parallelism=0 as unbounded Signed-off-by: Eduardo Apolinario <eapolinario@users.noreply.github.com> * Remove special case to handle 0 Signed-off-by: Eduardo Apolinario <eapolinario@users.noreply.github.com> Co-authored-by: Eduardo Apolinario <eapolinario@users.noreply.github.com> Signed-off-by: maximsmol <maximsmol@gmail.com> * Regenerate requirements files - dependencies dropping support for python 3.7 (#838) * Regenerate requirements files Signed-off-by: Eduardo Apolinario <eapolinario@users.noreply.github.com> * Put restrictions on numpy and pandas versions Signed-off-by: Eduardo Apolinario <eapolinario@users.noreply.github.com> * Use --use-deprecated=legacy-resolver Signed-off-by: Eduardo Apolinario <eapolinario@users.noreply.github.com> * use pip==22.0.3 everywhere Signed-off-by: Eduardo Apolinario <eapolinario@users.noreply.github.com> * Remove --use-deprecated=legacy-resolver Signed-off-by: Eduardo Apolinario <eapolinario@users.noreply.github.com> * Relax click Signed-off-by: Eduardo Apolinario <eapolinario@users.noreply.github.com> * Regenerated plugins requirements Signed-off-by: Eduardo Apolinario <eapolinario@users.noreply.github.com> Co-authored-by: Eduardo Apolinario <eapolinario@users.noreply.github.com> * TypeAnnotation (#759) * feat: support for annotated simple + list Signed-off-by: Kenny Workman <kennyworkman@sbcglobal.net> * feat: addition of annotation att to Signed-off-by: Kenny Workman <kennyworkman@sbcglobal.net> * feat: core obj Signed-off-by: Kenny Workman <kennyworkman@sbcglobal.net> * feat: proto model Signed-off-by: Kenny Workman <kennyworkman@sbcglobal.net> * feat: testing suite Signed-off-by: Kenny Workman <kennyworkman@sbcglobal.net> * fix: more stable typing introspection Signed-off-by: Kenny Workman <kennyworkman@sbcglobal.net> * fix: strip legacy Signed-off-by: Kenny Workman <kennyworkman@sbcglobal.net> * fix: explicitly allow only one annotation Signed-off-by: Kenny Workman <kennyworkman@sbcglobal.net> * feat: direct type transformer tests Signed-off-by: Kenny Workman <kennyworkman@sbcglobal.net> * fix: there and back test Signed-off-by: Kenny Workman <kennyworkman@sbcglobal.net> * fix: typing_extensions for get_origin Signed-off-by: Kenny Workman <kennyworkman@sbcglobal.net> * fix: more semantic list generic unwrap Signed-off-by: Kenny Workman <kennyworkman@sbcglobal.net> * fix: tmp requirements file with custom idl Signed-off-by: Kenny Workman <kennyworkman@sbcglobal.net> * fix: nits Signed-off-by: Kenny Workman <kennyworkman@sbcglobal.net> * feat: semantic error for unsupported complex literals Signed-off-by: Kenny Workman <kennyworkman@sbcglobal.net> * fix: but Signed-off-by: Kenny Workman <kennyworkman@sbcglobal.net> * feat: more tests ;) Signed-off-by: Kenny Workman <kennyworkman@sbcglobal.net> * fix: imports Signed-off-by: Kenny Workman <kennyworkman@sbcglobal.net> * fix: complex annotations Signed-off-by: Kenny Workman <kennyworkman@sbcglobal.net> * fix: temp requirements files for unit tests Signed-off-by: Kenny Workman <kennyworkman@sbcglobal.net> * fix: lint bug Signed-off-by: Kenny Workman <kennyworkman@sbcglobal.net> * fix: tmp setup.py Signed-off-by: Kenny Workman <kennyworkman@sbcglobal.net> * fix: use typing_extensions Signed-off-by: Kenny Workman <kennyworkman@sbcglobal.net> * fix: typing_extensions for annotated Signed-off-by: Kenny Workman <kennyworkman@sbcglobal.net> * fix: typing_ext Signed-off-by: Kenny Workman <kennyworkman@sbcglobal.net> * fix: plugin tmp requirements Signed-off-by: Kenny Workman <kennyworkman@sbcglobal.net> * fix: bump requirements Signed-off-by: Kenny Workman <kennyworkman@sbcglobal.net> * fix: doc requirements Signed-off-by: Kenny Workman <kennyworkman@sbcglobal.net> * fix: whitespace Signed-off-by: Kenny Workman <kennyworkman@sbcglobal.net> * fix: bump flytekit Signed-off-by: Kenny Workman <kennyworkman@sbcglobal.net> * fix: numpy version Signed-off-by: Kenny Workman <kennyworkman@sbcglobal.net> * fix: lint Signed-off-by: Kenny Workman <kennyworkman@sbcglobal.net> * fix: pandas version Signed-off-by: Kenny Workman <kennyworkman@sbcglobal.net> * fix: bump requirements Signed-off-by: Kenny Workman <kennyworkman@sbcglobal.net> * fix: test import Signed-off-by: Kenny Workman <kennyworkman@sbcglobal.net> * fix: flake8 lint Signed-off-by: Kenny Workman <kennyworkman@sbcglobal.net> * fix: merge Signed-off-by: Kenny Workman <kennyworkman@sbcglobal.net> * fix: requirements Signed-off-by: Kenny Workman <kennyworkman@sbcglobal.net> * fix: requirements Signed-off-by: Kenny Workman <kennyworkman@sbcglobal.net> * fix: lint Signed-off-by: Kenny Workman <kennyworkman@sbcglobal.net> * fix: papermill req Signed-off-by: Kenny Workman <kennyworkman@sbcglobal.net> * fix: req Signed-off-by: Kenny Workman <kennyworkman@sbcglobal.net> * Remove singleton from structured dataset transformer engine (#848) Signed-off-by: Yee Hing Tong <wild-endeavor@users.noreply.github.com> Signed-off-by: Kevin Su <pingsutw@apache.org> * Assign input and output to FlyteWorkflowExecution (#842) Signed-off-by: Kevin Su <pingsutw@apache.org> Signed-off-by: maximsmol <maximsmol@gmail.com> * Add reference entities to FlyteTask and FlyteLaunchPlan (#850) Signed-off-by: Yee Hing Tong <wild-endeavor@users.noreply.github.com> Signed-off-by: maximsmol <maximsmol@gmail.com> * Fix fast registration error (#851) Signed-off-by: Yee Hing Tong <wild-endeavor@users.noreply.github.com> Signed-off-by: Samhita Alla <aallasamhita@gmail.com> Signed-off-by: maximsmol <maximsmol@gmail.com> * Add support for local execute in pod task (#852) Signed-off-by: ggydush-fn <greg.gydush@freenome.com> Signed-off-by: maximsmol <maximsmol@gmail.com> * Add anonymous retry (#854) Signed-off-by: Yee Hing Tong <wild-endeavor@users.noreply.github.com> Signed-off-by: maximsmol <maximsmol@gmail.com> * fix: formatting, linting, typing_extensions * fix: do not use SDK types Signed-off-by: maximsmol <maximsmol@gmail.com> * fix: update test comment Signed-off-by: maximsmol <maximsmol@gmail.com> * fix: also check literal type castability when tags match Signed-off-by: maximsmol <maximsmol@gmail.com> * Point flyteidl to maxim's fork in CI and requirements files Signed-off-by: Eduardo Apolinario <eapolinario@users.noreply.github.com> * [Core feature] Add Raw AWS Batch Task (#782) * Init plugin Signed-off-by: Kevin Su <pingsutw@apache.org> * Fixed lint Signed-off-by: Kevin Su <pingsutw@apache.org> * address comment Signed-off-by: Kevin Su <pingsutw@apache.org> * Fixed typo Signed-off-by: Kevin Su <pingsutw@apache.org> * Updated AWS config * Fixed lint Signed-off-by: Kevin Su <pingsutw@apache.org> * Added comment Signed-off-by: Kevin Su <pingsutw@apache.org> * Update config Signed-off-by: Kevin Su <pingsutw@apache.org> * Fixed tests Signed-off-by: Kevin Su <pingsutw@apache.org> * Fixed tests Signed-off-by: Kevin Su <pingsutw@apache.org> * Fixed tests Signed-off-by: Kevin Su <pingsutw@apache.org> * use pyflyte execute Signed-off-by: Kevin Su <pingsutw@apache.org> * Fixed tests Signed-off-by: Kevin Su <pingsutw@apache.org> * Fixed tests Signed-off-by: Kevin Su <pingsutw@apache.org> * Added comment Signed-off-by: Kevin Su <pingsutw@apache.org> Signed-off-by: maximsmol <maximsmol@gmail.com> * Add structured dataset encoder/decoder in fsspec plugin (#849) Signed-off-by: Kevin Su <pingsutw@apache.org> Signed-off-by: maximsmol <maximsmol@gmail.com> * Delete unnecessary auth configuration (#858) Signed-off-by: Yee Hing Tong <wild-endeavor@users.noreply.github.com> Signed-off-by: maximsmol <maximsmol@gmail.com> * Fixed format alias in Flytekit docs (#844) * Fixed format alias Fixed docs for file format alias that weren't rendered properly. A warning popped up stating 'ignore' is deprecated, and to use 'ignore_paths' instead. Signed-off-by: SmritiSatyanV <smriti@union.ai> * Bump idl (#862) Signed-off-by: Yee Hing Tong <wild-endeavor@users.noreply.github.com> * Updated authoring.rst (#863) * Updated authoring.rst Added directive Rephrased sentence * Fixed build error Signed-off-by: SmritiSatyanV <smriti@union.ai> * test-build-1 Signed-off-by: SmritiSatyanV <smriti@union.ai> Signed-off-by: maximsmol <maximsmol@gmail.com> * Updated authoring.rst (#866) Added the directive correctly Signed-off-by: SmritiSatyanV <smriti@union.ai> Signed-off-by: maximsmol <maximsmol@gmail.com> * Change docs for HTMLPage type to say HTMLPage instead of PNGImage (#868) Signed-off-by: maximsmol <maximsmol@gmail.com> * Revisit StructuredDatasetDecoder interface (#865) Signed-off-by: Kevin Su <pingsutw@apache.org> Signed-off-by: maximsmol <maximsmol@gmail.com> * Remove legacy mentions in contributing guide (#870) Signed-off-by: Yee Hing Tong <wild-endeavor@users.noreply.github.com> Signed-off-by: maximsmol <maximsmol@gmail.com> * Add GCS protocol in the structured dataset (#869) Signed-off-by: Kevin Su <pingsutw@apache.org> Signed-off-by: maximsmol <maximsmol@gmail.com> * Make fetched entities callable within workflows (#867) Signed-off-by: Yee Hing Tong <wild-endeavor@users.noreply.github.com> Signed-off-by: maximsmol <maximsmol@gmail.com> * Caching of offloaded objects (#762) * Remove flyteidl from install_requires Signed-off-by: Eduardo Apolinario <eapolinario@users.noreply.github.com> * Expose hash in Literal Signed-off-by: Eduardo Apolinario <eapolinario@users.noreply.github.com> * Set hash in TypeEngine Signed-off-by: Eduardo Apolinario <eapolinario@users.noreply.github.com> * Modify cache key calculation to take hash into account Signed-off-by: Eduardo Apolinario <eapolinario@users.noreply.github.com> * Opt-in PandasDataFrameTransformer Signed-off-by: Eduardo Apolinario <eapolinario@users.noreply.github.com> * Add unit tests Signed-off-by: Eduardo Apolinario <eapolinario@users.noreply.github.com> * Iterate using a flyteidl branch Signed-off-by: Eduardo Apolinario <eapolinario@users.noreply.github.com> * Regenerate requirements files Signed-off-by: Eduardo Apolinario <eapolinario@users.noreply.github.com> * Regenerate requirements files Signed-off-by: Eduardo Apolinario <eapolinario@users.noreply.github.com> * Move _hash_overridable to StructureDatasetTransformerEngine Signed-off-by: Eduardo Apolinario <eapolinario@users.noreply.github.com> * Move HashMethod to flytekit.core.hash Signed-off-by: Eduardo Apolinario <eapolinario@users.noreply.github.com> * Fix `unit_test` make target Signed-off-by: Eduardo Apolinario <eapolinario@users.noreply.github.com> * Split `unit_test` make target in two lines Signed-off-by: Eduardo Apolinario <eapolinario@users.noreply.github.com> * Add assert to structured dataset compatibility test Signed-off-by: Eduardo Apolinario <eapolinario@users.noreply.github.com> * Remove TODO Signed-off-by: Eduardo Apolinario <eapolinario@users.noreply.github.com> * Regenerate plugins requirements files pointing to the right version of flyteidl. Signed-off-by: Eduardo Apolinario <eapolinario@users.noreply.github.com> * Set hash as a property of the literal Signed-off-by: Eduardo Apolinario <eapolinario@users.noreply.github.com> * Install plugins requirements in CI. Signed-off-by: Eduardo Apolinario <eapolinario@users.noreply.github.com> * Add hash.setter Signed-off-by: Eduardo Apolinario <eapolinario@users.noreply.github.com> * Install flyteidl directly Signed-off-by: Eduardo Apolinario <eapolinario@users.noreply.github.com> * Revert "Regenerate plugins requirements files pointing to the right version of flyteidl." This reverts commit c2dbb54. Signed-off-by: Eduardo Apolinario <eapolinario@users.noreply.github.com> * wip - Add support for univariate lists Signed-off-by: Eduardo Apolinario <eapolinario@users.noreply.github.com> * Add support for lists of annotated objects Signed-off-by: Eduardo Apolinario <eapolinario@users.noreply.github.com> * Revamp generation of cache key (to cover case of literals collections and maps) Signed-off-by: Eduardo Apolinario <eapolinario@users.noreply.github.com> * Leave TODO for warning Signed-off-by: Eduardo Apolinario <eapolinario@users.noreply.github.com> * Revert "Add support for lists of annotated objects" This reverts commit 4b5f608. Signed-off-by: Eduardo Apolinario <eapolinario@users.noreply.github.com> * Revert "wip - Add support for univariate lists" This reverts commit adaa448. Signed-off-by: Eduardo Apolinario <eapolinario@users.noreply.github.com> * Remove docstring Signed-off-by: Eduardo Apolinario <eapolinario@users.noreply.github.com> * Add flyteidl>=0.23.0 Signed-off-by: Eduardo Apolinario <eapolinario@users.noreply.github.com> * Remove mentions to branch flyteidl@add-hash-to-literal Signed-off-by: Eduardo Apolinario <eapolinario@users.noreply.github.com> * Bump flyteidl in plugins requirements Signed-off-by: Eduardo Apolinario <eapolinario@users.noreply.github.com> * Regenerate plugins requirements again Signed-off-by: Eduardo Apolinario <eapolinario@users.noreply.github.com> * Restore papermill/requirements.txt Signed-off-by: Eduardo Apolinario <eapolinario@users.noreply.github.com> * Point flytekitplugins-spark to the offloaded-objects-caching branch in papermill tests Signed-off-by: Eduardo Apolinario <eapolinario@users.noreply.github.com> * Set flyteidl>=0.23.0 in papermill dev-requirements Co-authored-by: Eduardo Apolinario <eapolinario@users.noreply.github.com> * FlyteRemote fetch of conditional nodes (#772) Signed-off-by: Yee Hing Tong <wild-endeavor@users.noreply.github.com> Signed-off-by: maximsmol <maximsmol@gmail.com> * Removed root logger (#871) * Removed root logger Signed-off-by: Kevin Su <pingsutw@apache.org> * Updated logger Signed-off-by: Kevin Su <pingsutw@apache.org> * Fixed lint Signed-off-by: Kevin Su <pingsutw@apache.org> * Updated log level Signed-off-by: Kevin Su <pingsutw@apache.org> * Updated logger Signed-off-by: Kevin Su <pingsutw@apache.org> * Updated logger Signed-off-by: Kevin Su <pingsutw@apache.org> * Updated logger Signed-off-by: Kevin Su <pingsutw@apache.org> * lint fixed Signed-off-by: Kevin Su <pingsutw@apache.org> Signed-off-by: maximsmol <maximsmol@gmail.com> * Fix flytekit_compatibility/test_schema_types.py test Signed-off-by: Eduardo Apolinario <eapolinario@users.noreply.github.com> Signed-off-by: maximsmol <maximsmol@gmail.com> * make lint Signed-off-by: Eduardo Apolinario <eapolinario@users.noreply.github.com> * fix: annotated type conversion error Signed-off-by: maximsmol <maximsmol@gmail.com> * fix: _are_types_castable based on tests Signed-off-by: maximsmol <maximsmol@gmail.com> * fix: test failing if using random order Signed-off-by: maximsmol <maximsmol@gmail.com> * Merge branch 'master' into maximsmol/union_type Signed-off-by: maximsmol <maximsmol@gmail.com> * fix: merge issue Signed-off-by: maximsmol <maximsmol@gmail.com> * fix: requirements Signed-off-by: maximsmol <maximsmol@gmail.com> * fix: schema transformer error Signed-off-by: maximsmol <maximsmol@gmail.com> * fix: test Signed-off-by: maximsmol <maximsmol@gmail.com> * fix: merge issue Signed-off-by: maximsmol <maximsmol@gmail.com> * fix: union + annotated behavior Signed-off-by: maximsmol <maximsmol@gmail.com> * Regenerate requirements Signed-off-by: Eduardo Apolinario <eapolinario@users.noreply.github.com> * Bump requirements in plugins Signed-off-by: Eduardo Apolinario <eapolinario@users.noreply.github.com> * Handle nested Annotated Signed-off-by: Eduardo Apolinario <eapolinario@users.noreply.github.com> * Leave TODO re: strucutured dataset type castability Signed-off-by: Eduardo Apolinario <eapolinario@users.noreply.github.com> * Remove mention to flyteidl@union_type in doc-requirements Signed-off-by: Eduardo Apolinario <eapolinario@users.noreply.github.com> * Linting Signed-off-by: Eduardo Apolinario <eapolinario@users.noreply.github.com> * Use tempfile.mkdtemp to create a temporary directory for local data persistence. Signed-off-by: Eduardo Apolinario <eapolinario@users.noreply.github.com> * Revert "Use tempfile.mkdtemp to create a temporary directory for local data persistence." This reverts commit 861b157. Signed-off-by: eduardo apolinario <eduardo@eduardos-MacBook-Pro.local> * Force temporary file to not be deleted in test Signed-off-by: eduardo apolinario <eduardo@eduardos-MacBook-Pro.local> * Regenerate papermill dev-requirements Signed-off-by: eduardo apolinario <eduardo@eduardos-MacBook-Pro.local> * Remove duplicate code Signed-off-by: Eduardo Apolinario <eapolinario@users.noreply.github.com> * Put a lower bound on the pip version installed in CI Signed-off-by: Eduardo Apolinario <eapolinario@users.noreply.github.com> Co-authored-by: Kevin Su <pingsutw@apache.org> Co-authored-by: Emirhan Karagül <emirhan350z@gmail.com> Co-authored-by: Eduardo Apolinario <653394+eapolinario@users.noreply.github.com> Co-authored-by: Eduardo Apolinario <eapolinario@users.noreply.github.com> Co-authored-by: Kevin Su <pingsutw@gmail.com> Co-authored-by: Yee Hing Tong <wild-endeavor@users.noreply.github.com> Co-authored-by: Ketan Umare <16888709+kumare3@users.noreply.github.com> Co-authored-by: Lisa <30621230+aeioulisa@users.noreply.github.com> Co-authored-by: Niels Bantilan <niels.bantilan@gmail.com> Co-authored-by: Stef Nelson-Lindall <bethebunny@gmail.com> Co-authored-by: Snyk bot <github+bot@snyk.io> Co-authored-by: Dan Rammer <hamersaw@protonmail.com> Co-authored-by: bstadlbauer <11799671+bstadlbauer@users.noreply.github.com> Co-authored-by: Bernhard Stadlbauer <bstadlbauer@gmx.net> Co-authored-by: Bernhard Stadlbauer <bstadlbauer@blackshark.ai> Co-authored-by: Samhita Alla <aallasamhita@gmail.com> Co-authored-by: Yuvraj <code@evalsocket.dev> Co-authored-by: Zach Palchick <palchicz@users.noreply.github.com> Co-authored-by: Snyk bot <snyk-bot@snyk.io> Co-authored-by: Kenny Workman <31255434+kennyworkman@users.noreply.github.com> Co-authored-by: ggydush-fn <69013027+ggydush-fn@users.noreply.github.com> Co-authored-by: SmritiSatyanV <94349093+SmritiSatyanV@users.noreply.github.com> Co-authored-by: Matthew Griffin <1matthewgriffin@gmail.com> Co-authored-by: eduardo apolinario <eduardo@eduardos-MacBook-Pro.local>
myz540
pushed a commit
to ProjectAussie/flytekit
that referenced
this pull request
Apr 11, 2022
* Add support union type Signed-off-by: Kevin Su <pingsutw@apache.org> Signed-off-by: maximsmol <maximsmol@gmail.com> * Fixed test Signed-off-by: Kevin Su <pingsutw@apache.org> Signed-off-by: maximsmol <maximsmol@gmail.com> * Fixed test Signed-off-by: Kevin Su <pingsutw@apache.org> Signed-off-by: maximsmol <maximsmol@gmail.com> * Fixed lint Signed-off-by: Kevin Su <pingsutw@apache.org> Signed-off-by: maximsmol <maximsmol@gmail.com> * Fixed tests Signed-off-by: Kevin Su <pingsutw@apache.org> Signed-off-by: maximsmol <maximsmol@gmail.com> * Updated tests Signed-off-by: Kevin Su <pingsutw@apache.org> Signed-off-by: maximsmol <maximsmol@gmail.com> * Update Union to use tagged Unions Signed-off-by: maximsmol <maximsmol@gmail.com> * Update to use string tags (part 1) Signed-off-by: maximsmol <maximsmol@gmail.com> * Working implementation, update tests Signed-off-by: maximsmol <maximsmol@gmail.com> * Fixes, more tests Signed-off-by: maximsmol <maximsmol@gmail.com> * Fix incorrect unwrapped literal-union matching, update test Signed-off-by: maximsmol <maximsmol@gmail.com> * Fix duplicate tag handling, add tests for collections containing unions Signed-off-by: maximsmol <maximsmol@gmail.com> * Fix type hint test Signed-off-by: maximsmol <maximsmol@gmail.com> * Add implicit wrapping union type tests Signed-off-by: maximsmol <maximsmol@gmail.com> * Add union ambiguity tests Signed-off-by: maximsmol <maximsmol@gmail.com> * Fixup tests, make TypeTransformerFailed compatible with all old exception types Signed-off-by: maximsmol <maximsmol@gmail.com> * Fixup models + add tests Signed-off-by: maximsmol <maximsmol@gmail.com> * Implement changed design Signed-off-by: maximsmol <maximsmol@gmail.com> * Fix PR comments Signed-off-by: maximsmol <maximsmol@gmail.com> * Remote entrypoint serialize (flyteorg#733) Signed-off-by: Emirhan Karagül <emirhan350z@gmail.com> Signed-off-by: Yee Hing Tong <wild-endeavor@users.noreply.github.com> Signed-off-by: maximsmol <maximsmol@gmail.com> * Fix lint error in remote.py (flyteorg#755) Signed-off-by: Eduardo Apolinario <eapolinario@users.noreply.github.com> Co-authored-by: Eduardo Apolinario <eapolinario@users.noreply.github.com> Signed-off-by: maximsmol <maximsmol@gmail.com> * Support enum in dataclass (flyteorg#753) * Add support enum in dataclass Signed-off-by: Kevin Su <pingsutw@apache.org> * Update test Signed-off-by: Kevin Su <pingsutw@apache.org> * Fixed lint Signed-off-by: Kevin Su <pingsutw@apache.org> * Fix subworkflow and launch plan FlyteRemote behavior (flyteorg#751) Signed-off-by: Yee Hing Tong <wild-endeavor@users.noreply.github.com> Signed-off-by: maximsmol <maximsmol@gmail.com> * Extras: Shell task (flyteorg#747) Signed-off-by: maximsmol <maximsmol@gmail.com> * Add support FlyteSchema in dataclass (flyteorg#722) * schema in dataclass Signed-off-by: Kevin Su <pingsutw@apache.org> * Added tests Signed-off-by: Kevin Su <pingsutw@apache.org> * Fixed lint Signed-off-by: Kevin Su <pingsutw@apache.org> * Updated tests Signed-off-by: Kevin Su <pingsutw@apache.org> * Updated tests Signed-off-by: Kevin Su <pingsutw@apache.org> * Fixed lint Signed-off-by: Kevin Su <pingsutw@apache.org> * updated Signed-off-by: Kevin Su <pingsutw@apache.org> * updated Signed-off-by: Kevin Su <pingsutw@apache.org> * updated Signed-off-by: Kevin Su <pingsutw@apache.org> * Remove workflow_execution.py (flyteorg#758) Signed-off-by: Kevin Su <pingsutw@apache.org> Signed-off-by: maximsmol <maximsmol@gmail.com> * Get raw input/output from remote execution (flyteorg#675) * [wip] for feast demo Signed-off-by: Ketan Umare <ketan.umare@gmail.com> * clean up a bit Signed-off-by: Yee Hing Tong <wild-endeavor@users.noreply.github.com> * add a test and move where constructor is called Signed-off-by: Yee Hing Tong <wild-endeavor@users.noreply.github.com> * remove unneeded import Signed-off-by: Yee Hing Tong <wild-endeavor@users.noreply.github.com> * add a part of a test Signed-off-by: Yee Hing Tong <wild-endeavor@users.noreply.github.com> * Added tests Signed-off-by: Kevin Su <pingsutw@apache.org> * Fixed lint Signed-off-by: Kevin Su <pingsutw@apache.org> * typo Signed-off-by: Kevin Su <pingsutw@apache.org> Co-authored-by: Yee Hing Tong <wild-endeavor@users.noreply.github.com> Co-authored-by: Kevin Su <pingsutw@apache.org> Signed-off-by: maximsmol <maximsmol@gmail.com> * Fix mypy errors in flytekit/types (flyteorg#757) Signed-off-by: Lisa <aeioulisa@gmail.com> Signed-off-by: Kevin Su <pingsutw@apache.org> * Remote client failed to fetch FlytePickle object (flyteorg#764) * Fetch pickle value from flytekit remote Signed-off-by: Kevin Su <pingsutw@apache.org> * Fix tests Signed-off-by: Kevin Su <pingsutw@apache.org> * Remove default value Signed-off-by: Kevin Su <pingsutw@apache.org> * Add support FlyteFile in dataclass (flyteorg#725) * Add support Flyte File and directory in dataclass Signed-off-by: Kevin Su <pingsutw@apache.org> * Fixed tests Signed-off-by: Kevin Su <pingsutw@apache.org> * Fixed tests Signed-off-by: Kevin Su <pingsutw@apache.org> * Updated Signed-off-by: Kevin Su <pingsutw@apache.org> * Updated Signed-off-by: Kevin Su <pingsutw@apache.org> * Fixed tests Signed-off-by: Kevin Su <pingsutw@apache.org> * add task_resolver arg to @task decorator (flyteorg#765) Signed-off-by: Niels Bantilan <niels.bantilan@gmail.com> Signed-off-by: maximsmol <maximsmol@gmail.com> * Copy metadata into map task from underlying (flyteorg#766) Signed-off-by: Yee Hing Tong <wild-endeavor@users.noreply.github.com> Signed-off-by: maximsmol <maximsmol@gmail.com> * Support for delayed annotations (flyteorg#760) Signed-off-by: Stefan Nelson-Lindall <stef@stripe.com> Signed-off-by: Yee Hing Tong <wild-endeavor@users.noreply.github.com> Signed-off-by: maximsmol <maximsmol@gmail.com> * Complex dataclass unit tests (flyteorg#773) Signed-off-by: Ketan Umare <ketan.umare@gmail.com> Signed-off-by: Yee Hing Tong <wild-endeavor@users.noreply.github.com> Signed-off-by: maximsmol <maximsmol@gmail.com> * remote_source lost on serialization of @dataclass_json with FlyteFile (flyteorg#774) * remote_source lost on serialization of @dataclass_json with FlyteFile Signed-off-by: Kevin Su <pingsutw@apache.org> * updated tests Signed-off-by: Kevin Su <pingsutw@apache.org> * updated tests Signed-off-by: Kevin Su <pingsutw@apache.org> Signed-off-by: maximsmol <maximsmol@gmail.com> * Single-task execution FlyteRemote sync (flyteorg#778) Signed-off-by: Yee Hing Tong <wild-endeavor@users.noreply.github.com> Signed-off-by: maximsmol <maximsmol@gmail.com> * Logging updates (flyteorg#775) Signed-off-by: Yee Hing Tong <wild-endeavor@users.noreply.github.com> Signed-off-by: maximsmol <maximsmol@gmail.com> * fix: doc-requirements.txt to reduce vulnerabilities (flyteorg#779) The following vulnerabilities are fixed by pinning transitive dependencies: - https://snyk.io/vuln/SNYK-PYTHON-LXML-2316995 Signed-off-by: maximsmol <maximsmol@gmail.com> * Add cache_serialize parameter to tasks (flyteorg#673) * added cache_serialize parameter for basic tasks Signed-off-by: Daniel Rammer <daniel@union.ai> * fixed typo Signed-off-by: Daniel Rammer <daniel@union.ai> * updated flyteidl version Signed-off-by: Daniel Rammer <daniel@union.ai> * changed flyteidl requirements everywhere Signed-off-by: Daniel Rammer <daniel@union.ai> * remove flyteidl version requires in setup.py so we can use develop Signed-off-by: Daniel Rammer <daniel@union.ai> * removed flyteidl git repos from a variety of requirements packages Signed-off-by: Daniel Rammer <daniel@union.ai> * updated variable discovery_serializable to cache_serializable Signed-off-by: Daniel Rammer <daniel@union.ai> * updated requirements Signed-off-by: Daniel Rammer <daniel@union.ai> * fixed TaskMetadata _cache_serializable variable name Signed-off-by: Daniel Rammer <daniel@union.ai> * propgating cache_serialize parameter through to tasks Signed-off-by: Daniel Rammer <daniel@union.ai> * added cache_serializable to SdkRawContainerTask Signed-off-by: Daniel Rammer <daniel@union.ai> * fixing cache_serializable variable propogation issues Signed-off-by: Daniel Rammer <daniel@union.ai> * added documentation Signed-off-by: Daniel Rammer <daniel@union.ai> * added unit tests for cache_serialize metadata Signed-off-by: Daniel Rammer <daniel@union.ai> * linter added spaces in unit tests Signed-off-by: Daniel Rammer <daniel@union.ai> Signed-off-by: maximsmol <maximsmol@gmail.com> * When using the `task` and `workflow` decorator, correctly wrap the fu… (flyteorg#780) * When using the `task` and `workflow` decorator, correctly wrap the function This enables tooling such as docstring search tools to unwrap the object and show the correct docstring. Signed-off-by: Bernhard Stadlbauer <bstadlbauer@blackshark.ai> * Remove blackshark copyright header Signed-off-by: Bernhard Stadlbauer <bstadlbauer@blackshark.ai> * Fix broken great expectations test Signed-off-by: Bernhard Stadlbauer <bstadlbauer@blackshark.ai> * Add test for stacked decorators Signed-off-by: Bernhard Stadlbauer <bstadlbauer@blackshark.ai> Co-authored-by: Bernhard Stadlbauer <bstadlbauer@gmx.net> Co-authored-by: Bernhard Stadlbauer <bstadlbauer@blackshark.ai> Signed-off-by: maximsmol <maximsmol@gmail.com> * Add option to flyte-cli for specifying root certificate (flyteorg#783) Signed-off-by: Yee Hing Tong <wild-endeavor@users.noreply.github.com> Signed-off-by: maximsmol <maximsmol@gmail.com> * Add validation check to cacert switch (flyteorg#787) Signed-off-by: Yee Hing Tong <wild-endeavor@users.noreply.github.com> Signed-off-by: maximsmol <maximsmol@gmail.com> * Remove pytz constraint (flyteorg#786) * Remove pytz constraint from setup.py Signed-off-by: Eduardo Apolinario <eapolinario@users.noreply.github.com> * Regenerate requirements files Signed-off-by: Eduardo Apolinario <eapolinario@users.noreply.github.com> * Put pytz back Signed-off-by: Eduardo Apolinario <eapolinario@users.noreply.github.com> * make requirements.txt Signed-off-by: Eduardo Apolinario <eapolinario@users.noreply.github.com> Co-authored-by: Eduardo Apolinario <eapolinario@users.noreply.github.com> Signed-off-by: maximsmol <maximsmol@gmail.com> * Lint: remove f-string misuse (flyteorg#788) Signed-off-by: Eduardo Apolinario <eapolinario@users.noreply.github.com> Signed-off-by: maximsmol <maximsmol@gmail.com> * Pyarrow greater than 4.0.0 (flyteorg#790) * pyarrow>=4.0.0 Signed-off-by: Eduardo Apolinario <eapolinario@users.noreply.github.com> * Regenerate requirements. Signed-off-by: Eduardo Apolinario <eapolinario@users.noreply.github.com> Co-authored-by: Eduardo Apolinario <eapolinario@users.noreply.github.com> Signed-off-by: maximsmol <maximsmol@gmail.com> * Support python 3.10 (flyteorg#791) * [wip] Support python 3.10 Signed-off-by: Eduardo Apolinario <eapolinario@users.noreply.github.com> * Add Dockerfile.py310 Signed-off-by: Eduardo Apolinario <eapolinario@users.noreply.github.com> * Stringify python version Signed-off-by: Eduardo Apolinario <eapolinario@users.noreply.github.com> * Skip flytekit-modin plugin tests on 3.10 Signed-off-by: Eduardo Apolinario <eapolinario@users.noreply.github.com> * Add 3.9 and 3.10 to list of supported version in plugins Signed-off-by: Eduardo Apolinario <eapolinario@users.noreply.github.com> * Comment why flytekit-modin is not running on 3.10 and disable fail-fast in plugin tests Signed-off-by: Eduardo Apolinario <eapolinario@users.noreply.github.com> Co-authored-by: Eduardo Apolinario <eapolinario@users.noreply.github.com> * add `with_overrides` to map task (flyteorg#794) * add with_overrides Signed-off-by: Samhita Alla <aallasamhita@gmail.com> * remove Resources Signed-off-by: Samhita Alla <aallasamhita@gmail.com> Signed-off-by: maximsmol <maximsmol@gmail.com> * bump docsearch version (flyteorg#805) Signed-off-by: Samhita Alla <aallasamhita@gmail.com> Signed-off-by: maximsmol <maximsmol@gmail.com> * update docs for new navbar theme (flyteorg#806) Signed-off-by: maximsmol <maximsmol@gmail.com> * fix requirment.txt github issue (flyteorg#810) Signed-off-by: Yuvraj <code@evalsocket.dev> Signed-off-by: maximsmol <maximsmol@gmail.com> * Add sphinx panels (flyteorg#815) Signed-off-by: Ketan Umare <ketan.umare@gmail.com> Signed-off-by: maximsmol <maximsmol@gmail.com> * Schema overhaul (flyteorg#785) Signed-off-by: Kevin Su <pingsutw@apache.org> Signed-off-by: Yee Hing Tong <wild-endeavor@users.noreply.github.com> * Parent workflow serialization fails when calling a launch plan with fixed inputs (flyteorg#814) Signed-off-by: Kevin Su <pingsutw@apache.org> Signed-off-by: maximsmol <maximsmol@gmail.com> * Fix sagemaker plugin (flyteorg#817) Signed-off-by: Eduardo Apolinario <eapolinario@users.noreply.github.com> Co-authored-by: Eduardo Apolinario <eapolinario@users.noreply.github.com> Signed-off-by: maximsmol <maximsmol@gmail.com> * Remove legacy API (flyteorg#807) Signed-off-by: Yee Hing Tong <wild-endeavor@users.noreply.github.com> * Add Bigquery plugin (flyteorg#789) * Add bigquery plugin Signed-off-by: Kevin Su <pingsutw@apache.org> * Update dependency Signed-off-by: Kevin Su <pingsutw@apache.org> * update get_custom Signed-off-by: Kevin Su <pingsutw@apache.org> * Add structured dataset Signed-off-by: Kevin Su <pingsutw@apache.org> * Add structured dataset Signed-off-by: Kevin Su <pingsutw@apache.org> * Updated comment Signed-off-by: Kevin Su <pingsutw@apache.org> * Add BQ in GA Signed-off-by: Kevin Su <pingsutw@apache.org> * alphabetical order Signed-off-by: Kevin Su <pingsutw@apache.org> Signed-off-by: maximsmol <maximsmol@gmail.com> * Fixed flytekit-papermill ImportError (flyteorg#818) Signed-off-by: Kevin Su <pingsutw@apache.org> * Add support for string-format-like sytax for shell task (flyteorg#792) * POC: Add support for f-string like sytax for shell task This commit is a proof of concept adding f-string like syntax for shell_tasks. This supports using nested types for script inputs, such as data classes. This change was motivated by the desire to combine shell_tasks that have multiple inputs with map_tasks which only support tasks with a single input. This commit is only a starting point, since it makes some changes to the shell_task API (adds a template_style field), and modifies some of the default behavior for ease of implementation (e.g. throwing an error when there are unused input arguments). Signed-off-by: Zach Palchick <palchicz@zymergen.com> * Drop support for old/regex style for doing string interpolation Signed-off-by: Zach Palchick <palchicz@zymergen.com> Signed-off-by: maximsmol <maximsmol@gmail.com> * Expose configured RawOutputPrefix during execution (flyteorg#813) * Expose configured RawOutputPrefix during execution Signed-off-by: Kevin Su <pingsutw@apache.org> * Remove sdk_runnable.py and spark_task.py Signed-off-by: Kevin Su <pingsutw@apache.org> Signed-off-by: maximsmol <maximsmol@gmail.com> * Add SecretsManager back to old import location (flyteorg#820) Signed-off-by: Yee Hing Tong <wild-endeavor@users.noreply.github.com> Signed-off-by: maximsmol <maximsmol@gmail.com> * Add some tests (flyteorg#819) Signed-off-by: Yee Hing Tong <wild-endeavor@users.noreply.github.com> Signed-off-by: maximsmol <maximsmol@gmail.com> * Fixed flaky spark test (flyteorg#821) Signed-off-by: Kevin Su <pingsutw@apache.org> Signed-off-by: maximsmol <maximsmol@gmail.com> * fix: plugins/flytekit-greatexpectations/requirements.txt to reduce vulnerabilities (flyteorg#823) The following vulnerabilities are fixed by pinning transitive dependencies: - https://snyk.io/vuln/SNYK-PYTHON-IPYTHON-2348630 Signed-off-by: maximsmol <maximsmol@gmail.com> * fix: plugins/flytekit-papermill/requirements.txt to reduce vulnerabilities (flyteorg#825) The following vulnerabilities are fixed by pinning transitive dependencies: - https://snyk.io/vuln/SNYK-PYTHON-IPYTHON-2348630 Signed-off-by: maximsmol <maximsmol@gmail.com> * fix: requirements-spark2.txt to reduce vulnerabilities (flyteorg#826) The following vulnerabilities are fixed by pinning transitive dependencies: - https://snyk.io/vuln/SNYK-PYTHON-IPYTHON-2348630 Signed-off-by: maximsmol <maximsmol@gmail.com> * fix: requirements.txt to reduce vulnerabilities (flyteorg#824) The following vulnerabilities are fixed by pinning transitive dependencies: - https://snyk.io/vuln/SNYK-PYTHON-IPYTHON-2348630 Signed-off-by: maximsmol <maximsmol@gmail.com> * Intratask checkpointing (flyteorg#771) * wip - intratask checkpointing Signed-off-by: Ketan Umare <ketan.umare@gmail.com> * sync checkpointer with tests Signed-off-by: Ketan Umare <ketan.umare@gmail.com> * Checkpinter in entrypoint Signed-off-by: Ketan Umare <ketan.umare@gmail.com> * checkpoint in progress Signed-off-by: Ketan Umare <ketan.umare@gmail.com> * wip Signed-off-by: Ketan Umare <ketan.umare@gmail.com> * Intratask checkpointer Signed-off-by: Ketan Umare <ketan.umare@gmail.com> * Checkpoint updated Signed-off-by: Ketan Umare <ketan.umare@gmail.com> * Intra-task checkpointing Signed-off-by: Ketan Umare <ketan.umare@gmail.com> * Test and entrypoint updated Signed-off-by: Ketan Umare <ketan.umare@gmail.com> * lint fixed Signed-off-by: Ketan Umare <ketan.umare@gmail.com> * test fixes Signed-off-by: Ketan Umare <ketan.umare@gmail.com> * fmt Signed-off-by: Ketan Umare <ketan.umare@gmail.com> * updated entrypoint Signed-off-by: Ketan Umare <ketan.umare@gmail.com> * updated Signed-off-by: Ketan Umare <ketan.umare@gmail.com> * update Signed-off-by: Ketan Umare <ketan.umare@gmail.com> * print Signed-off-by: Ketan Umare <ketan.umare@gmail.com> * updated Signed-off-by: Ketan Umare <ketan.umare@gmail.com> * SyncCheckpointer working Signed-off-by: Ketan Umare <ketan.umare@gmail.com> * updated Signed-off-by: Ketan Umare <ketan.umare@gmail.com> * update Signed-off-by: Ketan Umare <ketan.umare@gmail.com> * fixed import problems Signed-off-by: Ketan Umare <ketan.umare@gmail.com> * fixed test Signed-off-by: Ketan Umare <ketan.umare@gmail.com> * fixed imports Signed-off-by: Ketan Umare <ketan.umare@gmail.com> * fixed lints and errors Signed-off-by: Ketan Umare <ketan.umare@gmail.com> * lint fix Signed-off-by: Ketan Umare <ketan.umare@gmail.com> * addressed comments Signed-off-by: Ketan Umare <ketan.umare@gmail.com> Signed-off-by: maximsmol <maximsmol@gmail.com> * Support reading subset column (flyteorg#822) * Support StructuredDatasetDecoder read subset column Signed-off-by: Kevin Su <pingsutw@apache.org> * Added tests Signed-off-by: Kevin Su <pingsutw@apache.org> * Fixed tests Signed-off-by: Kevin Su <pingsutw@apache.org> * Fixed typo Signed-off-by: Kevin Su <pingsutw@apache.org> * Updated tests Signed-off-by: Kevin Su <pingsutw@apache.org> * [pr into flyteorg#822] (flyteorg#827) Signed-off-by: Yee Hing Tong <wild-endeavor@users.noreply.github.com> Signed-off-by: Kevin Su <pingsutw@apache.org> * [pr into flyteorg#822] Final update to structured dataset column subsetting (flyteorg#828) Signed-off-by: Yee Hing Tong <wild-endeavor@users.noreply.github.com> Co-authored-by: Yee Hing Tong <wild-endeavor@users.noreply.github.com> * Fix spark regression (flyteorg#830) Signed-off-by: Kevin Su <pingsutw@apache.org> Signed-off-by: maximsmol <maximsmol@gmail.com> * Update argument setting for in fast registered, dynamically generated, pod tasks (flyteorg#835) Signed-off-by: Yee Hing Tong <wild-endeavor@users.noreply.github.com> Signed-off-by: maximsmol <maximsmol@gmail.com> * `ctx` Context can be used within shell tasks - to access context vars and secrets (flyteorg#832) * Adding context to a substitutable parameter in shell task Signed-off-by: Ketan Umare <ketan.umare@gmail.com> * Support for secrets in context Signed-off-by: Ketan Umare <ketan.umare@gmail.com> * addressed comments Signed-off-by: Ketan Umare <ketan.umare@gmail.com> Signed-off-by: maximsmol <maximsmol@gmail.com> * Expose Checkpoint as a top-level interface in flytekit (flyteorg#839) Signed-off-by: Ketan Umare <ketan.umare@gmail.com> Signed-off-by: maximsmol <maximsmol@gmail.com> * Parse duration field from flyteidl to `flytekit.models.execution.ExecutionClosure` (flyteorg#829) * Parse duration field from flyteidl to `flytekit.models.execution.ExecutionClosure` Signed-off-by: Bernhard Stadlbauer <bstadlbauer@blackshark.ai> * Add test for execution closure Signed-off-by: Bernhard Stadlbauer <bstadlbauer@blackshark.ai> * Add tests to Flyte remote Signed-off-by: Bernhard Stadlbauer <bstadlbauer@blackshark.ai> * Split execution test into with output and with error Signed-off-by: Bernhard Stadlbauer <b.stadlbauer@gmx.net> Co-authored-by: Bernhard Stadlbauer <bstadlbauer@blackshark.ai> Signed-off-by: maximsmol <maximsmol@gmail.com> * Gate new Structured Dataset feature & remove old config objects (flyteorg#831) Signed-off-by: Yee Hing Tong <wild-endeavor@users.noreply.github.com> * Fixing out of order for conditional outputs (flyteorg#843) Signed-off-by: maximsmol <maximsmol@gmail.com> * Set default values to map task template (flyteorg#841) * Set sane defaults in map task templates Signed-off-by: Eduardo Apolinario <eapolinario@users.noreply.github.com> * Remove unused method Signed-off-by: Eduardo Apolinario <eapolinario@users.noreply.github.com> * Put ArrayJob.from_dict back Signed-off-by: Eduardo Apolinario <eapolinario@users.noreply.github.com> * Define parallelism=0 as unbounded Signed-off-by: Eduardo Apolinario <eapolinario@users.noreply.github.com> * Remove special case to handle 0 Signed-off-by: Eduardo Apolinario <eapolinario@users.noreply.github.com> Co-authored-by: Eduardo Apolinario <eapolinario@users.noreply.github.com> Signed-off-by: maximsmol <maximsmol@gmail.com> * Regenerate requirements files - dependencies dropping support for python 3.7 (flyteorg#838) * Regenerate requirements files Signed-off-by: Eduardo Apolinario <eapolinario@users.noreply.github.com> * Put restrictions on numpy and pandas versions Signed-off-by: Eduardo Apolinario <eapolinario@users.noreply.github.com> * Use --use-deprecated=legacy-resolver Signed-off-by: Eduardo Apolinario <eapolinario@users.noreply.github.com> * use pip==22.0.3 everywhere Signed-off-by: Eduardo Apolinario <eapolinario@users.noreply.github.com> * Remove --use-deprecated=legacy-resolver Signed-off-by: Eduardo Apolinario <eapolinario@users.noreply.github.com> * Relax click Signed-off-by: Eduardo Apolinario <eapolinario@users.noreply.github.com> * Regenerated plugins requirements Signed-off-by: Eduardo Apolinario <eapolinario@users.noreply.github.com> Co-authored-by: Eduardo Apolinario <eapolinario@users.noreply.github.com> * TypeAnnotation (flyteorg#759) * feat: support for annotated simple + list Signed-off-by: Kenny Workman <kennyworkman@sbcglobal.net> * feat: addition of annotation att to Signed-off-by: Kenny Workman <kennyworkman@sbcglobal.net> * feat: core obj Signed-off-by: Kenny Workman <kennyworkman@sbcglobal.net> * feat: proto model Signed-off-by: Kenny Workman <kennyworkman@sbcglobal.net> * feat: testing suite Signed-off-by: Kenny Workman <kennyworkman@sbcglobal.net> * fix: more stable typing introspection Signed-off-by: Kenny Workman <kennyworkman@sbcglobal.net> * fix: strip legacy Signed-off-by: Kenny Workman <kennyworkman@sbcglobal.net> * fix: explicitly allow only one annotation Signed-off-by: Kenny Workman <kennyworkman@sbcglobal.net> * feat: direct type transformer tests Signed-off-by: Kenny Workman <kennyworkman@sbcglobal.net> * fix: there and back test Signed-off-by: Kenny Workman <kennyworkman@sbcglobal.net> * fix: typing_extensions for get_origin Signed-off-by: Kenny Workman <kennyworkman@sbcglobal.net> * fix: more semantic list generic unwrap Signed-off-by: Kenny Workman <kennyworkman@sbcglobal.net> * fix: tmp requirements file with custom idl Signed-off-by: Kenny Workman <kennyworkman@sbcglobal.net> * fix: nits Signed-off-by: Kenny Workman <kennyworkman@sbcglobal.net> * feat: semantic error for unsupported complex literals Signed-off-by: Kenny Workman <kennyworkman@sbcglobal.net> * fix: but Signed-off-by: Kenny Workman <kennyworkman@sbcglobal.net> * feat: more tests ;) Signed-off-by: Kenny Workman <kennyworkman@sbcglobal.net> * fix: imports Signed-off-by: Kenny Workman <kennyworkman@sbcglobal.net> * fix: complex annotations Signed-off-by: Kenny Workman <kennyworkman@sbcglobal.net> * fix: temp requirements files for unit tests Signed-off-by: Kenny Workman <kennyworkman@sbcglobal.net> * fix: lint bug Signed-off-by: Kenny Workman <kennyworkman@sbcglobal.net> * fix: tmp setup.py Signed-off-by: Kenny Workman <kennyworkman@sbcglobal.net> * fix: use typing_extensions Signed-off-by: Kenny Workman <kennyworkman@sbcglobal.net> * fix: typing_extensions for annotated Signed-off-by: Kenny Workman <kennyworkman@sbcglobal.net> * fix: typing_ext Signed-off-by: Kenny Workman <kennyworkman@sbcglobal.net> * fix: plugin tmp requirements Signed-off-by: Kenny Workman <kennyworkman@sbcglobal.net> * fix: bump requirements Signed-off-by: Kenny Workman <kennyworkman@sbcglobal.net> * fix: doc requirements Signed-off-by: Kenny Workman <kennyworkman@sbcglobal.net> * fix: whitespace Signed-off-by: Kenny Workman <kennyworkman@sbcglobal.net> * fix: bump flytekit Signed-off-by: Kenny Workman <kennyworkman@sbcglobal.net> * fix: numpy version Signed-off-by: Kenny Workman <kennyworkman@sbcglobal.net> * fix: lint Signed-off-by: Kenny Workman <kennyworkman@sbcglobal.net> * fix: pandas version Signed-off-by: Kenny Workman <kennyworkman@sbcglobal.net> * fix: bump requirements Signed-off-by: Kenny Workman <kennyworkman@sbcglobal.net> * fix: test import Signed-off-by: Kenny Workman <kennyworkman@sbcglobal.net> * fix: flake8 lint Signed-off-by: Kenny Workman <kennyworkman@sbcglobal.net> * fix: merge Signed-off-by: Kenny Workman <kennyworkman@sbcglobal.net> * fix: requirements Signed-off-by: Kenny Workman <kennyworkman@sbcglobal.net> * fix: requirements Signed-off-by: Kenny Workman <kennyworkman@sbcglobal.net> * fix: lint Signed-off-by: Kenny Workman <kennyworkman@sbcglobal.net> * fix: papermill req Signed-off-by: Kenny Workman <kennyworkman@sbcglobal.net> * fix: req Signed-off-by: Kenny Workman <kennyworkman@sbcglobal.net> * Remove singleton from structured dataset transformer engine (flyteorg#848) Signed-off-by: Yee Hing Tong <wild-endeavor@users.noreply.github.com> Signed-off-by: Kevin Su <pingsutw@apache.org> * Assign input and output to FlyteWorkflowExecution (flyteorg#842) Signed-off-by: Kevin Su <pingsutw@apache.org> Signed-off-by: maximsmol <maximsmol@gmail.com> * Add reference entities to FlyteTask and FlyteLaunchPlan (flyteorg#850) Signed-off-by: Yee Hing Tong <wild-endeavor@users.noreply.github.com> Signed-off-by: maximsmol <maximsmol@gmail.com> * Fix fast registration error (flyteorg#851) Signed-off-by: Yee Hing Tong <wild-endeavor@users.noreply.github.com> Signed-off-by: Samhita Alla <aallasamhita@gmail.com> Signed-off-by: maximsmol <maximsmol@gmail.com> * Add support for local execute in pod task (flyteorg#852) Signed-off-by: ggydush-fn <greg.gydush@freenome.com> Signed-off-by: maximsmol <maximsmol@gmail.com> * Add anonymous retry (flyteorg#854) Signed-off-by: Yee Hing Tong <wild-endeavor@users.noreply.github.com> Signed-off-by: maximsmol <maximsmol@gmail.com> * fix: formatting, linting, typing_extensions * fix: do not use SDK types Signed-off-by: maximsmol <maximsmol@gmail.com> * fix: update test comment Signed-off-by: maximsmol <maximsmol@gmail.com> * fix: also check literal type castability when tags match Signed-off-by: maximsmol <maximsmol@gmail.com> * Point flyteidl to maxim's fork in CI and requirements files Signed-off-by: Eduardo Apolinario <eapolinario@users.noreply.github.com> * [Core feature] Add Raw AWS Batch Task (flyteorg#782) * Init plugin Signed-off-by: Kevin Su <pingsutw@apache.org> * Fixed lint Signed-off-by: Kevin Su <pingsutw@apache.org> * address comment Signed-off-by: Kevin Su <pingsutw@apache.org> * Fixed typo Signed-off-by: Kevin Su <pingsutw@apache.org> * Updated AWS config * Fixed lint Signed-off-by: Kevin Su <pingsutw@apache.org> * Added comment Signed-off-by: Kevin Su <pingsutw@apache.org> * Update config Signed-off-by: Kevin Su <pingsutw@apache.org> * Fixed tests Signed-off-by: Kevin Su <pingsutw@apache.org> * Fixed tests Signed-off-by: Kevin Su <pingsutw@apache.org> * Fixed tests Signed-off-by: Kevin Su <pingsutw@apache.org> * use pyflyte execute Signed-off-by: Kevin Su <pingsutw@apache.org> * Fixed tests Signed-off-by: Kevin Su <pingsutw@apache.org> * Fixed tests Signed-off-by: Kevin Su <pingsutw@apache.org> * Added comment Signed-off-by: Kevin Su <pingsutw@apache.org> Signed-off-by: maximsmol <maximsmol@gmail.com> * Add structured dataset encoder/decoder in fsspec plugin (flyteorg#849) Signed-off-by: Kevin Su <pingsutw@apache.org> Signed-off-by: maximsmol <maximsmol@gmail.com> * Delete unnecessary auth configuration (flyteorg#858) Signed-off-by: Yee Hing Tong <wild-endeavor@users.noreply.github.com> Signed-off-by: maximsmol <maximsmol@gmail.com> * Fixed format alias in Flytekit docs (flyteorg#844) * Fixed format alias Fixed docs for file format alias that weren't rendered properly. A warning popped up stating 'ignore' is deprecated, and to use 'ignore_paths' instead. Signed-off-by: SmritiSatyanV <smriti@union.ai> * Bump idl (flyteorg#862) Signed-off-by: Yee Hing Tong <wild-endeavor@users.noreply.github.com> * Updated authoring.rst (flyteorg#863) * Updated authoring.rst Added directive Rephrased sentence * Fixed build error Signed-off-by: SmritiSatyanV <smriti@union.ai> * test-build-1 Signed-off-by: SmritiSatyanV <smriti@union.ai> Signed-off-by: maximsmol <maximsmol@gmail.com> * Updated authoring.rst (flyteorg#866) Added the directive correctly Signed-off-by: SmritiSatyanV <smriti@union.ai> Signed-off-by: maximsmol <maximsmol@gmail.com> * Change docs for HTMLPage type to say HTMLPage instead of PNGImage (flyteorg#868) Signed-off-by: maximsmol <maximsmol@gmail.com> * Revisit StructuredDatasetDecoder interface (flyteorg#865) Signed-off-by: Kevin Su <pingsutw@apache.org> Signed-off-by: maximsmol <maximsmol@gmail.com> * Remove legacy mentions in contributing guide (flyteorg#870) Signed-off-by: Yee Hing Tong <wild-endeavor@users.noreply.github.com> Signed-off-by: maximsmol <maximsmol@gmail.com> * Add GCS protocol in the structured dataset (flyteorg#869) Signed-off-by: Kevin Su <pingsutw@apache.org> Signed-off-by: maximsmol <maximsmol@gmail.com> * Make fetched entities callable within workflows (flyteorg#867) Signed-off-by: Yee Hing Tong <wild-endeavor@users.noreply.github.com> Signed-off-by: maximsmol <maximsmol@gmail.com> * Caching of offloaded objects (flyteorg#762) * Remove flyteidl from install_requires Signed-off-by: Eduardo Apolinario <eapolinario@users.noreply.github.com> * Expose hash in Literal Signed-off-by: Eduardo Apolinario <eapolinario@users.noreply.github.com> * Set hash in TypeEngine Signed-off-by: Eduardo Apolinario <eapolinario@users.noreply.github.com> * Modify cache key calculation to take hash into account Signed-off-by: Eduardo Apolinario <eapolinario@users.noreply.github.com> * Opt-in PandasDataFrameTransformer Signed-off-by: Eduardo Apolinario <eapolinario@users.noreply.github.com> * Add unit tests Signed-off-by: Eduardo Apolinario <eapolinario@users.noreply.github.com> * Iterate using a flyteidl branch Signed-off-by: Eduardo Apolinario <eapolinario@users.noreply.github.com> * Regenerate requirements files Signed-off-by: Eduardo Apolinario <eapolinario@users.noreply.github.com> * Regenerate requirements files Signed-off-by: Eduardo Apolinario <eapolinario@users.noreply.github.com> * Move _hash_overridable to StructureDatasetTransformerEngine Signed-off-by: Eduardo Apolinario <eapolinario@users.noreply.github.com> * Move HashMethod to flytekit.core.hash Signed-off-by: Eduardo Apolinario <eapolinario@users.noreply.github.com> * Fix `unit_test` make target Signed-off-by: Eduardo Apolinario <eapolinario@users.noreply.github.com> * Split `unit_test` make target in two lines Signed-off-by: Eduardo Apolinario <eapolinario@users.noreply.github.com> * Add assert to structured dataset compatibility test Signed-off-by: Eduardo Apolinario <eapolinario@users.noreply.github.com> * Remove TODO Signed-off-by: Eduardo Apolinario <eapolinario@users.noreply.github.com> * Regenerate plugins requirements files pointing to the right version of flyteidl. Signed-off-by: Eduardo Apolinario <eapolinario@users.noreply.github.com> * Set hash as a property of the literal Signed-off-by: Eduardo Apolinario <eapolinario@users.noreply.github.com> * Install plugins requirements in CI. Signed-off-by: Eduardo Apolinario <eapolinario@users.noreply.github.com> * Add hash.setter Signed-off-by: Eduardo Apolinario <eapolinario@users.noreply.github.com> * Install flyteidl directly Signed-off-by: Eduardo Apolinario <eapolinario@users.noreply.github.com> * Revert "Regenerate plugins requirements files pointing to the right version of flyteidl." This reverts commit c2dbb54. Signed-off-by: Eduardo Apolinario <eapolinario@users.noreply.github.com> * wip - Add support for univariate lists Signed-off-by: Eduardo Apolinario <eapolinario@users.noreply.github.com> * Add support for lists of annotated objects Signed-off-by: Eduardo Apolinario <eapolinario@users.noreply.github.com> * Revamp generation of cache key (to cover case of literals collections and maps) Signed-off-by: Eduardo Apolinario <eapolinario@users.noreply.github.com> * Leave TODO for warning Signed-off-by: Eduardo Apolinario <eapolinario@users.noreply.github.com> * Revert "Add support for lists of annotated objects" This reverts commit 4b5f608. Signed-off-by: Eduardo Apolinario <eapolinario@users.noreply.github.com> * Revert "wip - Add support for univariate lists" This reverts commit adaa448. Signed-off-by: Eduardo Apolinario <eapolinario@users.noreply.github.com> * Remove docstring Signed-off-by: Eduardo Apolinario <eapolinario@users.noreply.github.com> * Add flyteidl>=0.23.0 Signed-off-by: Eduardo Apolinario <eapolinario@users.noreply.github.com> * Remove mentions to branch flyteidl@add-hash-to-literal Signed-off-by: Eduardo Apolinario <eapolinario@users.noreply.github.com> * Bump flyteidl in plugins requirements Signed-off-by: Eduardo Apolinario <eapolinario@users.noreply.github.com> * Regenerate plugins requirements again Signed-off-by: Eduardo Apolinario <eapolinario@users.noreply.github.com> * Restore papermill/requirements.txt Signed-off-by: Eduardo Apolinario <eapolinario@users.noreply.github.com> * Point flytekitplugins-spark to the offloaded-objects-caching branch in papermill tests Signed-off-by: Eduardo Apolinario <eapolinario@users.noreply.github.com> * Set flyteidl>=0.23.0 in papermill dev-requirements Co-authored-by: Eduardo Apolinario <eapolinario@users.noreply.github.com> * FlyteRemote fetch of conditional nodes (flyteorg#772) Signed-off-by: Yee Hing Tong <wild-endeavor@users.noreply.github.com> Signed-off-by: maximsmol <maximsmol@gmail.com> * Removed root logger (flyteorg#871) * Removed root logger Signed-off-by: Kevin Su <pingsutw@apache.org> * Updated logger Signed-off-by: Kevin Su <pingsutw@apache.org> * Fixed lint Signed-off-by: Kevin Su <pingsutw@apache.org> * Updated log level Signed-off-by: Kevin Su <pingsutw@apache.org> * Updated logger Signed-off-by: Kevin Su <pingsutw@apache.org> * Updated logger Signed-off-by: Kevin Su <pingsutw@apache.org> * Updated logger Signed-off-by: Kevin Su <pingsutw@apache.org> * lint fixed Signed-off-by: Kevin Su <pingsutw@apache.org> Signed-off-by: maximsmol <maximsmol@gmail.com> * Fix flytekit_compatibility/test_schema_types.py test Signed-off-by: Eduardo Apolinario <eapolinario@users.noreply.github.com> Signed-off-by: maximsmol <maximsmol@gmail.com> * make lint Signed-off-by: Eduardo Apolinario <eapolinario@users.noreply.github.com> * fix: annotated type conversion error Signed-off-by: maximsmol <maximsmol@gmail.com> * fix: _are_types_castable based on tests Signed-off-by: maximsmol <maximsmol@gmail.com> * fix: test failing if using random order Signed-off-by: maximsmol <maximsmol@gmail.com> * Merge branch 'master' into maximsmol/union_type Signed-off-by: maximsmol <maximsmol@gmail.com> * fix: merge issue Signed-off-by: maximsmol <maximsmol@gmail.com> * fix: requirements Signed-off-by: maximsmol <maximsmol@gmail.com> * fix: schema transformer error Signed-off-by: maximsmol <maximsmol@gmail.com> * fix: test Signed-off-by: maximsmol <maximsmol@gmail.com> * fix: merge issue Signed-off-by: maximsmol <maximsmol@gmail.com> * fix: union + annotated behavior Signed-off-by: maximsmol <maximsmol@gmail.com> * Regenerate requirements Signed-off-by: Eduardo Apolinario <eapolinario@users.noreply.github.com> * Bump requirements in plugins Signed-off-by: Eduardo Apolinario <eapolinario@users.noreply.github.com> * Handle nested Annotated Signed-off-by: Eduardo Apolinario <eapolinario@users.noreply.github.com> * Leave TODO re: strucutured dataset type castability Signed-off-by: Eduardo Apolinario <eapolinario@users.noreply.github.com> * Remove mention to flyteidl@union_type in doc-requirements Signed-off-by: Eduardo Apolinario <eapolinario@users.noreply.github.com> * Linting Signed-off-by: Eduardo Apolinario <eapolinario@users.noreply.github.com> * Use tempfile.mkdtemp to create a temporary directory for local data persistence. Signed-off-by: Eduardo Apolinario <eapolinario@users.noreply.github.com> * Revert "Use tempfile.mkdtemp to create a temporary directory for local data persistence." This reverts commit 861b157. Signed-off-by: eduardo apolinario <eduardo@eduardos-MacBook-Pro.local> * Force temporary file to not be deleted in test Signed-off-by: eduardo apolinario <eduardo@eduardos-MacBook-Pro.local> * Regenerate papermill dev-requirements Signed-off-by: eduardo apolinario <eduardo@eduardos-MacBook-Pro.local> * Remove duplicate code Signed-off-by: Eduardo Apolinario <eapolinario@users.noreply.github.com> * Put a lower bound on the pip version installed in CI Signed-off-by: Eduardo Apolinario <eapolinario@users.noreply.github.com> Co-authored-by: Kevin Su <pingsutw@apache.org> Co-authored-by: Emirhan Karagül <emirhan350z@gmail.com> Co-authored-by: Eduardo Apolinario <653394+eapolinario@users.noreply.github.com> Co-authored-by: Eduardo Apolinario <eapolinario@users.noreply.github.com> Co-authored-by: Kevin Su <pingsutw@gmail.com> Co-authored-by: Yee Hing Tong <wild-endeavor@users.noreply.github.com> Co-authored-by: Ketan Umare <16888709+kumare3@users.noreply.github.com> Co-authored-by: Lisa <30621230+aeioulisa@users.noreply.github.com> Co-authored-by: Niels Bantilan <niels.bantilan@gmail.com> Co-authored-by: Stef Nelson-Lindall <bethebunny@gmail.com> Co-authored-by: Snyk bot <github+bot@snyk.io> Co-authored-by: Dan Rammer <hamersaw@protonmail.com> Co-authored-by: bstadlbauer <11799671+bstadlbauer@users.noreply.github.com> Co-authored-by: Bernhard Stadlbauer <bstadlbauer@gmx.net> Co-authored-by: Bernhard Stadlbauer <bstadlbauer@blackshark.ai> Co-authored-by: Samhita Alla <aallasamhita@gmail.com> Co-authored-by: Yuvraj <code@evalsocket.dev> Co-authored-by: Zach Palchick <palchicz@users.noreply.github.com> Co-authored-by: Snyk bot <snyk-bot@snyk.io> Co-authored-by: Kenny Workman <31255434+kennyworkman@users.noreply.github.com> Co-authored-by: ggydush-fn <69013027+ggydush-fn@users.noreply.github.com> Co-authored-by: SmritiSatyanV <94349093+SmritiSatyanV@users.noreply.github.com> Co-authored-by: Matthew Griffin <1matthewgriffin@gmail.com> Co-authored-by: eduardo apolinario <eduardo@eduardos-MacBook-Pro.local> Signed-off-by: Mike Zhong <mzhong@embarkvet.com>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
TL;DR
This PR introduces a new type and user experience for working with tabular data in Flyte. Please see the design doc for the complete background.
StructuredDataset
. (see the IDL PR) We thought the name sufficiently generic and differentiated fromFlyteSchema
, which we're also keeping around for the time being (see more in the Migration section below). The main difference between theFlyteSchema
andStructuredDataset
types themselves is that columns in the latter are just namedLiteralType
s. That is, columns are no longer constrained to be purely primitives.FlyteSchema
to something we hope is more straightforward and easier to understand, but we've tried to keep the user-experience as similar as possible.Note:
The
[kwtype(...)]
syntax used for specifying columns and storage formats will be changing before the next non-beta release of flytekit, so if you try this out now, keep in mind you'll have some minor minor updates to make in a couple weeks.Note:
To handle the new type, you will need to be on Propeller version v0.16.14 and Admin version v0.6.78 or later.
Type
Are all requirements met?
Design
Type Handling
We found the double-interaction between the core
TypeEngine
and theSchemaEngine
too confusing.In the new model, contributors will only be dealing with the
StructuredDatasetTransformerEngine
. You register your encoders/decoders with that only, and it in turn will handle interaction with the coreTypeEngine
.If you have a new dataframe type that you want to be able to use in Flytekit, you'll just have to subclass
StructuredDatasetEncoder
andStructuredDatasetDecoder
and implement the logic. Encoders are responsible for taking a dataframe in Python memory and turning it into a Flyte literal (replete with any uploading/writing of data) and decoders do the opposite (and should handle downloading of data).Encoders and decoders are now distinct entities rather than two functions within one class because we felt the the asymmetry increases flexibility. One might choose to write a custom decoder for pandas but use the default encoder.
Encoder/Decoder (aka Handler) Selection
This PR ships with handlers to encode and decode pandas and spark dataframes. And users can of course add their own. How does the transformer engine pick the encoder/decoder?
The handlers in the new transformer-engine are keyed off of three things:
s3
,bq
,gs
, etc.)Each en/decoder is required to have all three. When the transformer engine needs to convert a Python dataframe in memory to a Flyte literal, it will pick an encoder based on exact match of all three, except for the
format
. For the format, if an exact match cannot be found, it will then search for an encoder with the format""
. The same pattern is used for decoders/when converting from a Flyte literal to a Python dataframe.Default protocols and formats
When registering an en/decoder with the
FLYTE_DATASET_TRANSFORMER
singleton, you have the option of specifying whether the protocol and storage format that that handler handles, should be taken as the default format for that dataframe type.For example, if you
this will tell flytekit that whenever it sees
that it will assume that the format should be "parquet" and the protocol should be "s3", and it will look for the appropriate en/decoder.
Usage
Please see the
tests/flytekit/unit/type_engines/structured_dataset/test_structured_dataset_workflow.py
test for user examples. The basic rules are though that if youthen you'll get the default behavior (as determined by the default protocol and format). If you want to customize behavior (for example, specifying the uri that the data will be stored in), then you have to change the return signature and wrap the object returned inside the Python
StructuredDataset
class (as opposed to theliterals.StructuredDataset
model class).Reading is done much the same as before...
If you
then the default handler for
pd.DataFrame
will be used and whatever object the decoder returns will be what the task function is called with. Alternatively you canMigration and Compatibility
We'll be keeping the existing
FlyteSchema
object around for some time. However, there are still some backwards compatibility issues you may come across.With this PR,
pd.DataFrame
andpyspark.sql.DataFrame
will henceforth be stored in the Flyte backend as aStructuredDataset
literal. Previously they were stored asSchema
literals. We recommend that if you have any cached data frame tasks, that you bump the cache version and recompute.This can also lead to compatibility issues between tasks. For example if you have:
where
t2(a=t1())
, once you upgrade to this version of flytekit, whent2
runs, it will try to use the new code, to read the oldSchema
literal. The reverse is also trueTo handle this, we've added code to both the existing SchemaEngine and the new StructuredDataset engine to handle the other. Let us know if you have problems.
Plugins
We update the spark plugin made a few fixes to the great expectations plugin but most of the dataframe-related plugins remain untouched. We will be updating these in the weeks to come.
Upcoming Changes
There's a couple changes that we just felt would be better to do after we had a beta release to play with but they should really be thought of as part of this PR.
StructuredDataset[kwtypes(col_a=int), "my_parquet"]
. This is going to be replaced within the next week or two.Separately,
Detailed Description of Changes
Core changes
Compatibility
Plugins
flytekit-sqlalchemy
,flytekit-snowflake
,flytekit-hive
,flytekit-aws-athena
need backend changes to handle the new StructuredDataset type in addition to FlyteSchema. Probably lower priority unless someone asks for it. After backend changes are done, we can update the Python type hints.flytekit-modin
seems to operate on its own type, no interoperability present with the normalpandas.DataFrame
type, so we don't have to worry aboutpandas
->modin
compatibility.flytekit-dolt
seems to be the same asmodin
.Tracking Issues
User submitted, not the main issue: flyteorg/flyte#523
flyteorg/flyte#2074
Follow-up issue
NA