Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[pr into #785] Turn structured dataset into dataclass #802

Merged
merged 21 commits into from
Jan 11, 2022

Conversation

pingsutw
Copy link
Member

@pingsutw pingsutw commented Jan 4, 2022

Signed-off-by: Kevin Su pingsutw@apache.org

TL;DR

Please replace this text with a description of what this PR accomplishes.

image

Type

  • Bug Fix
  • Feature
  • Plugin

Are all requirements met?

  • Code completed
  • Smoke tested
  • Unit tests added
  • Code documentation added
  • Any pending items have an associated Issue

Complete description

How did you fix the bug, make the feature etc. Link to any design docs etc

Tracking Issue

https://github.com/lyft/flyte/issues/

Follow-up issue

NA
OR
https://github.com/lyft/flyte/issues/

Signed-off-by: Kevin Su <pingsutw@apache.org>
@codecov
Copy link

codecov bot commented Jan 4, 2022

Codecov Report

Merging #802 (f70e32d) into structured-dataset-proposal (4f42207) will increase coverage by 0.10%.
The diff coverage is 93.05%.

Impacted file tree graph

@@                       Coverage Diff                       @@
##           structured-dataset-proposal     #802      +/-   ##
===============================================================
+ Coverage                        85.60%   85.71%   +0.10%     
===============================================================
  Files                              353      356       +3     
  Lines                            30465    30585     +120     
  Branches                          3674     3679       +5     
===============================================================
+ Hits                             26080    26216     +136     
+ Misses                            3716     3700      -16     
  Partials                           669      669              
Impacted Files Coverage Δ
flytekit/models/types.py 98.83% <ø> (ø)
flytekit/types/structured/structured_dataset.py 80.35% <73.68%> (+6.25%) ⬆️
tests/flytekit/unit/core/hint_handling/a.py 81.81% <81.81%> (ø)
tests/flytekit/unit/core/hint_handling/b.py 83.33% <83.33%> (ø)
tests/flytekit/unit/core/test_interface.py 85.40% <91.66%> (+0.43%) ⬆️
flytekit/core/interface.py 82.51% <100.00%> (+0.69%) ⬆️
flytekit/core/type_engine.py 89.13% <100.00%> (+0.13%) ⬆️
flytekit/types/structured/basic_dfs.py 92.72% <100.00%> (+4.59%) ⬆️
...ekit/unit/core/hint_handling/test_hint_handling.py 100.00% <100.00%> (ø)
tests/flytekit/unit/core/test_type_engine.py 99.62% <100.00%> (+0.01%) ⬆️
... and 5 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 4f42207...f70e32d. Read the comment docs.

Copy link
Contributor

@wild-endeavor wild-endeavor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

couple questions

@@ -337,7 +337,7 @@ def to_literal(
) -> Literal:
# If the type signature has the StructuredDataset class, it will, or at least should, also be a
# StructuredDataset instance.
if issubclass(python_type, StructuredDataset):
if inspect.isclass(python_type) and issubclass(python_type, StructuredDataset):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

works for python 3.7-3.10 right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, I've tested it with python 3.7~3.10.

@@ -390,6 +390,8 @@ def to_literal(
)

# Otherwise assume it's a dataframe instance. Wrap it with some defaults
if get_origin(python_type) is Annotated:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we have done this at the top of this function? or is it okay here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we extract the python_type in get_transformer instead of to_literal, so python_type could be Annotated here.

pingsutw and others added 17 commits January 5, 2022 01:55
Signed-off-by: Kevin Su <pingsutw@apache.org>
Signed-off-by: Kevin Su <pingsutw@apache.org>
Signed-off-by: Kevin Su <pingsutw@apache.org>
Signed-off-by: Kevin Su <pingsutw@apache.org>
Signed-off-by: Kevin Su <pingsutw@apache.org>
Signed-off-by: Kevin Su <pingsutw@apache.org>
Signed-off-by: Kevin Su <pingsutw@apache.org>
Signed-off-by: Yee Hing Tong <wild-endeavor@users.noreply.github.com>
Signed-off-by: Yee Hing Tong <wild-endeavor@users.noreply.github.com>
Signed-off-by: Yee Hing Tong <wild-endeavor@users.noreply.github.com>
Signed-off-by: Yee Hing Tong <wild-endeavor@users.noreply.github.com>
Signed-off-by: Kevin Su <pingsutw@apache.org>
Signed-off-by: Kevin Su <pingsutw@apache.org>
Signed-off-by: Kevin Su <pingsutw@apache.org>
Signed-off-by: Kevin Su <pingsutw@apache.org>
Signed-off-by: Kevin Su <pingsutw@apache.org>
Signed-off-by: Kevin Su <pingsutw@apache.org>
@@ -542,7 +551,7 @@ def _get_dataset_type(self, t: typing.Union[Type[StructuredDataset], typing.Any]
raise ValueError(f"Unrecognized Annotated type for StructuredDataset {t}")

# 2. Fill in columns by checking for StructuredDataset metadata. For example, StructuredDataset[my_cols, parquet]
elif issubclass(t, StructuredDataset):
elif inspect.isclass(t) and issubclass(t, StructuredDataset):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remind me again what this inspect.isclass is supposed to catch? can you add a comment? i keep forgetting.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's for Annotated[pd.Dataframe, my_col]. I just moved expected_python_type = get_args(expected_python_type)[0] to the beginning of the to_python and to_literal. Therefore, we don't need inspect.isclass(t) any more, so I removed it.

Signed-off-by: Kevin Su <pingsutw@apache.org>
Signed-off-by: Kevin Su <pingsutw@apache.org>
Signed-off-by: Kevin Su <pingsutw@apache.org>
@pingsutw pingsutw merged commit e66f80f into structured-dataset-proposal Jan 11, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants