Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Union Types RFCs #1926

Merged
merged 3 commits into from
Jun 27, 2022
Merged

Union Types RFCs #1926

merged 3 commits into from
Jun 27, 2022

Conversation

maximsmol
Copy link
Contributor

@maximsmol maximsmol commented Dec 9, 2021

PR'd to allow comments/discussion

@welcome
Copy link

welcome bot commented Dec 9, 2021

Thank you for opening this pull request! 🙌
These tips will help get your PR across the finish line: - Most of the repos have a PR template; if not, fill it out to the best of your knowledge. - Sign off your commits (Reference: DCO Guide).

Copy link
Contributor

@EngHabu EngHabu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an awesome document. Once we are done with this work, I would love if you can/are willing to present this work (in OSS, Blog, Official Docs).. I've a few questions about the assumptions below... happy to jump on a call if it's easier to hash out the details.


# Proposed Implementation

Use a string tag. In Haskell use the symbolic tag. In C++ use the index (serialized to a string) as the tag. In Python use the name of the type transformer (already present on all transformers).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with most of this and I appreciate the deep dive into various implementations in other languages… one goal that I would like to emphasize is that a task written in Java (or C++.. etc.) that accepts a Union should be usable from a workflow written in Python… and vis versa… I could be wrong but I got the sense that there is an assumption of homogeneity between tasks/wfs when it comes to the language used…
This will manifest in how the tag will be filled and subsequently used. We need to be able to consistently populate it across languages as well as consistently consume it in any of the languages…


In Python the correspondence between a tag and the choice of type transformer must be 1-to-1 i.e. type transformer names must be made unique which is already the case but not formalized.

The matching procedure for Haskell and C++ is trivial. In Python, we must deal specially with duplicates. For example:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we do "compile" the workflows on the server side, we will need to be able to validate something like this:

@task
def task2(typing.Union[int, str]) -> int:
  return 5
 
@workflow
def my_wf() -> int:
   a = task1()
   return task2(in=a)

The system will need to know that task2's input accept task1's output (even if task1 is declared in a completely different language)... so the way we represent the type will need to be language agnostic.
Moreover, since the system is able to make this connection, it should also be able to, at compile time, to define the target type binding/choice (i.e. will it be the int or the str)...

The target container's SDK/language will also need to understand that of course and be able to map between flyteidl's representation of the target type choice (e.g. int) to its needed Union's representation (index based tag or otherwise)...

...
```

In this case, having the `MyList` tag is not enough to disambiguate the choice of variant. We iterate each candidate variant and try to match it recursively with the literal. It is not required that all type transformers can fail gracefully when given a value of any incorrect type, only that they fail on union `LiteralRepr`s and that the union type transformer can recognize its own literals. The only possible difference between how the types are resolved is in the choice of type transformers, which is made completely unambiguous by traversing the type tree with its tags (since the tags resolve the only ambiguity). This procedure thus guarantees a value will be recovered appropriately from a union IDL representation.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The proposal is to make this matching a compile-time matching. Based on the LiteralType of the connection (task1 returns foo of type blah, try to match blah against one of the union variants of x)...
At runtime, the SDK, if needed, can use the compile-time matching information to instruct the language of which variant is desired... I say 'if needed' because in Python, as you said, we don't actually need to pass the typing information of the literal...


We could use an index-based tag, but there are multiple reasons why that is not a good idea either:

- `typing.Union` semantics are that of a set (the order does not matter for equality comparison, duplicates are eliminated), though in practice it the source ordering of the variants can be recovered in cases without duplicates. Note that the CPython implementation actually uses a tuple behind the scenes, so this is more about intent and less about factual behavior of the class. The [PEP](https://www.python.org/dev/peps/pep-0484/#union-types) also specifies that Union is expected to accept a "set" of types, though it is unclear whether this is referring to a specific data structure or just a figure of speech
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a new insight for me. The whole section around colliding types within a Union and the equivalence of Unions of different variant ordering... Much appreciated. I think as long as we do not store the tag in the Literal itself and rather in the BindingInfo (that's computed for a particular task version), we will be fine, right? because as you said, a Literal produced by task1 v1 may be used to call task2 v1 in one case or task2 v2 in another... and it has to be portable... The binding, however, is computed when compiling the workflow closure (where all versions are pinned)

EngHabu
EngHabu previously approved these changes Apr 9, 2022
Copy link
Contributor

@EngHabu EngHabu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mind checking DCO guide to make it happy? then we can merge this!

hackmd-deploy and others added 3 commits June 26, 2022 21:10
Signed-off-by: maximsmol <maximsmol@gmail.com>
Signed-off-by: maximsmol <maximsmol@gmail.com>
Signed-off-by: maximsmol <maximsmol@gmail.com>
Copy link
Contributor

@EngHabu EngHabu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

@EngHabu EngHabu merged commit b16ffd7 into flyteorg:master Jun 27, 2022
@welcome
Copy link

welcome bot commented Jun 27, 2022

Congrats on merging your first pull request! 🎉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants