# Demo showcasing Flyte & Feast Integration—Feature Engineering and Training Pipeline

In this demo, we will learn how to interact with Feast through Flyte. The goal will be to train a simple [Gaussian Naive Bayes model using sklearn](https://scikit-learn.org/stable/modules/generated/sklearn.naive_bayes.GaussianNB.html) on the [Horse-Colic dataset from UCI](https://archive.ics.uci.edu/ml/datasets/Horse+Colic).

The model aims to classify if the lesion of the horse is surgical or not. It uses a modified version of the original dataset.

**NOTE**
We will not dive into the dataset or the model as the aim of this tutorial is to show how you can use Feast as a feature store and Flyte to engineer the features that can be identical across your online and offline training.

### Step 1: Code 💻

We have used [Flytekit](https://docs.flyte.org/projects/flytekit/en/latest/)—Flyte's Python SDK to express the pipeline in pure Python. The actual workflow code is auto-documented and rendered using sphinx [here](https://docs.flyte.org/projects/cookbook/en/latest/auto/case_studies/feature_engineering/feast_integration/index.html).

### Step 2: Launch an execution 🚀

We can use [FlyteConsole](https://github.com/flyteorg/flyteconsole) to launch, monitor, and introspect Flyte executions. However, in our case, we will use [flytekit.remote](https://docs.flyte.org/projects/flytekit/en/latest/design/control_plane.html) to interact with the Flyte backend.

#### Set up Flytekit remote from config

To work with Flyte-sandbox, we need to create a simple local config at `~/.flyte/config`
that points to Flyte-sandbox server and execution environment. We will initialize Flytekit remote with this server.

Example configuration:
```
[platform]
url = localhost:30081
insecure = True
```

We will also pin FlyteRemote to one project and domain.

**NOTE** The integration also sets up access to S3 or other equivalent datastores needed by FEAST.

In [None]:
from flytekit.remote import FlyteRemote
remote = FlyteRemote.from_config("flytesnacks", "development")

#### Retrieve the latest registered version of the pipeline

FlyteRemote provides convenient methods to retrieve version of the pipeline from the remote server.

**NOTE** It is possible to get a specific version of the workflow and trigger a launch for that, but let's just get the latest.

In [None]:
lp = remote.fetch_launch_plan(name="feast_integration.feast_workflow.feast_workflow")
lp.id.version

#### Launch

`remote.execute` simplifies starting an execution for the launch plan. Let's use the default inputs.

In [None]:
execution = remote.execute(lp, inputs={})
print(f"http://localhost:30081/console/projects/{execution.id.project}/domains/{execution.id.domain}/executions/{execution.id.name}")

### Step 3: Wait for the execution to complete 

It is possible to launch a sync execution and wait for it to complete, but since all the processes are completely detached (you can even close your laptop and come back to it later), we will show how to sync the execution back.

**Side Note**
It is possible to fetch an existing execution or simply retrieve a started execution. Also, if you launch an execution with the same name, Flyte will respect that and not restart a new execution!

In [None]:
from flytekit.models.core.execution import WorkflowExecutionPhase

synced_execution = remote.sync(execution)
print(f"Execution {synced_execution.id.name} is in Phase - {WorkflowExecutionPhase.enum_to_string(synced_execution.closure.phase)}")

### Step 4: Retrieve output

Let's fetch the workflow outputs.

In [None]:
from feast_dataobjects import FeatureStore

# "raw_outputs" in FlyteRemote helps associate type to the output, and resolves Literals to Python objects.
# For example, a data class is returned as a marshmallow schema (serialized) when "outputs" is used but is returned as a data class when "raw_outputs" is used.
fs = synced_execution.raw_outputs.get("o0", FeatureStore)
model = synced_execution.outputs['o1']

Next, we inspect the feature store configuration and model. 

In [None]:
fs.config

**NOTE** The output model is available locally as a JobLibSerialized file, which can be downloaded and loaded.

In [None]:
model

### Step 5: Cool, let's predict

We now have the model and feature store with us! So, how can we generate predictions? We can simply re-use the `predict` function from the workflow; Flytekit will automatically manage the IO for us.

**NOTE** We set a couple of environment variables to facilitate the AWS access.

#### Load features from an online feature store

Let's re-use the feature definition from the Flyte workflow.

```python
inference_point = fs.get_online_features(FEAST_FEATURES, [{"Hospital Number": "533738"}])
```

In [None]:
import os

from feast_workflow import predict, FEAST_FEATURES

os.environ["FLYTE_AWS_ENDPOINT"] = os.environ["FEAST_S3_ENDPOINT_URL"] = "http://localhost:30084/"
os.environ["FLYTE_AWS_ACCESS_KEY_ID"] = os.environ["AWS_ACCESS_KEY_ID"] = "minio"
os.environ["FLYTE_AWS_SECRET_ACCESS_KEY"] = os.environ["AWS_SECRET_ACCESS_KEY"] = "miniostorage"

inference_point = fs.get_online_features(FEAST_FEATURES, [{"Hospital Number": "533738"}])

inference_point

#### Generate a prediction

Notice how we are passing the serialized model and some loaded features.

In [None]:
predict(model_ser=model, features=inference_point)

### Next Steps 

We can, of course, observe the intermediates from the workflow in the UI and download the intermediate data.

### Future 🔮

We want to improve the integration experience further to allow the `predict` function to run in an inference server and a workflow. We are almost there, but we need to remove `model de-serialization` in the `predict` method.