Skip to content

[Alpha] - v0.16.0a2

Pre-release
Pre-release
Compare
Choose a tag to compare
@wild-endeavor wild-endeavor released this 08 Jan 18:06
· 23 commits to annotations since this release
2da69b6

Alpha release - v0.16.0a2 (almost final)

We've been hard at work over the holidays putting this together. This will be the final alpha release for the new natively typed Flytekit. Please see the earlier two releases as well as the proposal doc, which will be updated again and finalized for the coming beta release, which we expect to make by the end of next week.

Changes

Potentially Breaking

First, the updates in this release that may break your existing code, and what changes you'll need to make.

  • Task settings have been condensed into TaskConfig subclasses as opposed to just indiscriminate kwargs. For instance,
    from flytekit.taskplugins.hive.task import HiveConfig
    HiveTask(
        # cluster_label="flyte", (old)
        task_config=HiveConfig(cluster_label="flyte"),  # (new)
        ...
    )
    
  • Spark session functionality has been nested inside the Spark context
    sess = flytekit.current_context().spark_session
    count = sess.parallelize(...)  # (old)
    count = sess.sparkContext.parallelize(...)  # (new)
    
  • Tasks that take an explicit metadata should be changed to use the TaskMetadata dataclass instead of the metadata() function.
    # from flytekit import metadata (old)
    from flytekit import TaskMetadata  # (new)
    # in a task declaration
    WaitForObjectStoreFile(
        metadata=metadata(retries=2),  # (old)
        metadata=TaskMetadata(retries=2),  # (new)
        ...
    )
    
  • Types have been moved to a subfolder so your imports may need to change. This was done because previously importing one custom type (say FlyteFile) would trigger imports of all the custom types in flytekit.
    # from flytekit.types import FlyteFile, FlyteSchema (old)
    from flytekit.types.file import FlyteFile  # (new)
    from flytekit.types.schema import FlyteSchema
    
  • Flytekit tasks and workflows by default assign names to the outputs, o0, o1, etc. If you want to explicitly name your outputs, you can use a typing.NamedTuple, like
    rankings = typing.NamedTuple("Rankings", order=int)
    def t1() -> rankings:
        ...
    
    Previously though, flytekit was accidentally de-tuple-tizing single-output named tuples. That has been fixed.
    r = t1()
    # read_ordering_task(r=r)  # (old)
    read_ordering_task(r=r.order)  # (new)
    

Process Changes

Registration of Flyte entities (that is, translating your Python tasks, workflows, and launch plans from Python code to something that Flyte understands) has always been a two step process, even if it looks like one step. The first step is compilation (aka serialization), where users Python code is compiled down to protobuf files, and the second step is sending those files over the wire to the Flyte control plane.

In this release, we've further isolated moved where certain settings are read and applied. That is, when you call

pyflyte --config /code/sandbox.config serialize workflows -f /tmp/output

the project, domain, version values are no longer in the compiled protobuf files. kubernetes-service-account assumable-iam-role, and output-location-prefix have all also been removed. If you inspect the protobuf files via flyte-cli parse-proto -f serialized_task.pb -p flyteidl.admin.task_pb2.TaskSpec you should see that those values are now missing. Instead, they will be filled in during the second step, in flyte-cli. Your registration command should now look like.

flyte-cli register-files -p <project> -d <domain> -v <your version, we recommend still the git sha> --kubernetes-service-account <account> OR --assumable-iam-role --output-location-prefix s3://some/bucket -h flyte.host.com serialized_protos/*

Note however that the container image that tasks run is still specified at serialization time (and we suggest that the image version also be same git sha).

This change was made because serialized protos are now completely portable. You can serialize your code, hand them to someone else, and as long as they have access to the container images specified in the task specifications, they can register them under a completely different AWS account or cloud provider.

In the near future, we hope to add some automation to our examples repo so that with each release of flytekit, we also publish the docker image for the cookbook (Second Edition) along with all the serialized artifacts so that users can just run one registration command to pick them all up and play around with them.

Other Changes

  • Resources, Auth and custom environment variables now piped through correctly in task decorator

  • Schedules and Notifications added to launch plans

  • Reference Entities refactored and Reference Launch Plans added

  • Added FlyteDirectory as a parallel to FlyteFile

  • Minor cleanup of some mypy warnings

  • Shift operators (aka runs_before function) along with an explicit node creating call introduced. For those of you familiar with the existing Flytekit (master), this style should be reminiscent.

  • Additional task types ported over

    • Pytorch
    • Sagemaker task and custom training task
    • Sagemaker HPO
  • The default node and output names that Flytekit assigns have been condensed from out0, out1, etc, to o0, o1... Nodes have been shortened from node-0 to n0. This is just to save on disk, network, and compute.

  • Workflow metadata settings like failure policy and interruptible have been added to the workflow decorator.