Skip to content

[Alpha] Native typing alpha 0

Pre-release
Pre-release
Compare
Choose a tag to compare
@wild-endeavor wild-endeavor released this 11 Nov 19:23
· 87 commits to annotations since this release

https://pypi.org/project/flytekit/0.16.0a0/

Please refer to the design doc for additional background information.

Native Typing Support

Take a look at the examples in our cookbook. Because of changes to the Dockerfile, this PR has not been merged to master.

You should probably play around with the new interface with that repo as the starting point.
Here is a list of the salient features in the new version,

  1. Write code in python using python-3.7+ typing annotations. Once happy with the python function - decorate it with @task. No need to use @inputs, @outputs, @spark_task.
  2. A Workflow is just another python function. Use the @workflow decorator on a python function that accepts inputs and produces outputs. The inputs to the workflow function will become inputs to the workflow and the outputs will become outputs of the workflow.
    • Syntactically workflow and tasks look the same, but semantically the contents of a workflow function are run at compile time and the content of tasks are run at runtime. The Flyte serialization process will convert the workflow function to the graph representation understood by Flyte Admin.

    • For this reason, please abstain from using random(), datetime.now(), etc. in the workflow function. Constants are valid in the workflow however.

    • Currently we only support invoking both the task and a workflow with keyword arguments (t1(in1=3) as opposed to t1(3)). This is to ensure that we bind the correct variables.

    • Native inputs to the workflow function are transformed into Flytekit objects. That is, while tasks can accept them as inputs, they cannot be directly manipulated/accessed using Python native functions (like range for an int). You should expect an error.

    • In the workflow the tasks can be accessed directly and outputs can be captured as variables and passed on to the next task
      a, b = t1(x=y); return t2(x=a)

    • You can print the outputs of a task within a workflow when run locally, but they may look a little different (stay tuned)

  3. The workflows and tasks can be run locally, just as if you are invoking a python function
  4. Flyte interpreter today supports translating most python native objects like
    • Int, float, str, datetime.datetime, datetime.timedelta, os.pathlike, TextIO, Dict[type, type]. List[Type] and also untyped dict
    • It also supports advanced types like pandas.DataFrame (more coming)
    • Flyte introduced 2 new types, FlyteFile and FlyteSchema
    • The most interesting thing is the TypeSystem is completely extendable, new types can be added which requires translations (more on this later)
  5. The Flyte Task plugins have also been greatly simplified. Extending Tasks and adding new types is easier (more on this later)

Setup

  1. If you don't already have one, create a virtualenv with Python 3.7 or higher.
  2. pip install flytekit[all]==0.16.0a0
  3. Upgrade your requirements(3).in/txt to this version as well and run pip compile if that is part of your process.

Iteration

If you look through examples in the native_typing folder you'll see that each file has a main section __main__. That means you can run each of these examples locally. (We’ll be adding unit tests for each later.)

python recipes/native_typing/simple.py
Running recipes/native_typing/simple.py main...
Running parent_wf(a=3) (5, 'world', 'world')
Running parent_wf_with_subwf_default(a=30) (32, 'world', 'world')
Running parent_wf_with_lp_with_default(a=40) (42, 'world', 'world')
Running parent_wf_with_lp_overriding_input(a=50) (52, 'world', 'world')

Registration

Note:
One key difference besides the obvious interface changes is that the instructions below run two commands rather than one to register tasks/workflows with the control plane. This process has actually always been around and we would like to steadily move to it as the flytectl project gains momentum. As such, you'll need to use flyte-cli in addition to pyflyte.

From here, note that the sandbox.config file's workflow_packages setting has been scoped down to only the new native_typing folder. This is just to make the following steps quicker.

As seen from above, to play around locally, you will not need to build any containers. When the time comes to register your workflows with Flyte, use the following commands to serialize and register. The instructions below assume you have a Flyte installation running.

  1. If you haven't used flyte-cli before, run flyte-cli setup-config -h flyte.lyft.net (only if your Flyte admin has authentication turned on, not for local deployments). See more information here.
  2. make serialize_sandbox This will create protobuf files in the _pb_output directory of the repo
  3. flyte-cli register-files _pb_output/*

At Lyft, additional instructions will be sent out.