# Demo of running a Flyte + Feast, feature engineering and training pipeline
In this demo we will learn how to interact with Feast through Flyte. The goal will be to train a simple [Gaussian Naive Bayes model using sklearn](https://scikit-learn.org/stable/modules/generated/sklearn.naive_bayes.GaussianNB.html) on the [Horse-Colic dataset from UCI](https://archive.ics.uci.edu/ml/datasets/Horse+Colic).

**NOTE**
We will not really dive into the dataset or the model, as the aim of this tutorial is to show how you can use Feast as the feature store and use Flyte to engineer the features that can be identical across your online predictions as well as offline training

## Step 1: Check out the code for the pipeline
We have used [flytekit](https://docs.flyte.org/projects/flytekit/en/latest/) flyte's python SDK to express the pipeline in pure python. The code is auto-documented and rendered using sphinx [here]()

## Step 2: Launch an execution
We can use the [FlyteConsole](https://github.com/flyteorg/flyteconsole) to launch, monitor and introspect Flyte executions, but in this case we will use [flytekit.remote](https://docs.flyte.org/projects/flytekit/en/latest/design/control_plane.html) to interact with the Flyte backend.

### Setup flytekit remote from config
To work with flytesandbox, we have created a simple local config that points to FlyteSandbox server and execution environment. We will initialize flytekit remote with this server. We will also pin it to one project and domain.

**Note** this also sets up access to S3 or other equivalent datastores needed by FEAST

In [20]:
from flytekit.remote import FlyteRemote
remote = FlyteRemote.from_config("flytesnacks", "development")

[34mUsing default config file at /Users/ketanumare/.flyte/config[0m




No images specified, will use the default image


### Retrieve the latest registered version of the pipeline
FlyteRemote provides convienient methods to retrieve a version of the pipeline from the remote server.

**Note** It is possible to get a specific version of workflow and trigger a launch for that, but, we will just get the latest

In [22]:
# from feast_integration.feast_workflow import feast_workflow
lp = remote.fetch_launch_plan(name="feast_integration.feast_workflow.feast_workflow")
lp.id.version

'86b9fe59988c2b91eb9a852048b6c3179939198a'

### Launch an execution
`remote.execute` makes it simple to start an execution for the launchplan. We will not provide any inputs and just use the default inputs

In [23]:
exe = remote.execute(lp, inputs={})
print(f"http://localhost:30081/console/projects/{exe.id.project}/domains/{exe.id.domain}/executions/{exe.id.name}")

http://localhost:30081/console/projects/flytesnacks/domains/development/executions/f9f180a56e67b4c9781e


## Step 3: Now wait for the execution to complete
It is possible to launch a sync execution and wait for it to complete, but since all the processes are completely detached (you can even close your laptop) and come back to it later, we will show how to sync the execution back.

In [34]:
from flytekit.models.core.execution import WorkflowExecutionPhase
exe = remote.sync(exe)
print(f"Execution {exe.id.name} is in Phase - {WorkflowExecutionPhase.enum_to_string(exe.closure.phase)}")

Execution f9f180a56e67b4c9781e is in Phase - SUCCEEDED


In [35]:
exe.sync()

## Step 4: Lets sync data from this execution

**Side Note**
It is possible to fetch an existing execution or simply retrieve a started execution. Also if you launch an execution with the same name, flyte will respect and not restart a new execution!

To fetch an execution
```python
exe = remote.fetch_workflow_execution(name='fdde7d53867b74cd9885')
exe = remote.sync(exe2)
```

In [36]:
from feast_dataobjects import FeatureStore
fs = exe.raw_outputs.get('o0', FeatureStore)
model = exe.outputs['o1']

#### Lets inspect the feature store configuration

In [37]:
fs.config

FeatureStoreConfig(registry_path='registry.db', project='horsecolic', s3_bucket='feast-integration', online_store_path='online.db')

#### Also, the model is now available locally as a JobLibSerialized file and can be downloaded and loaded

In [38]:
model

/var/folders/hs/n83kv4c57c9bcnlpg66mh6gw0000gn/T/flytetkd8_fqj/control_plane_metadata/local_flytekit/e51f130fdbe204ce483f3e80ef78c032/model.joblib.dat

In [11]:
from feast_workflow import predict, FEAST_FEATURES

fs1 = fs._build_feast_feature_store()
fs1.config

In [19]:

inference_point = fs.get_online_features(FEAST_FEATURES, [{"Hospital Number": 5290409}])


In [14]:
inference_point

{'nasogastric reflux PH': [None],
 'abdominal distension': [None],
 'surgical lesion': [None],
 'rectal temperature': [None],
 'outcome': [None],
 'total protein': [None],
 'packed cell volume': [None],
 'nasogastric tube': [None],
 'peripheral pulse': [None],
 'Hospital Number': [5290409]}

In [17]:
predict(model_ser=model, features=inference_point)

  return f(*args, **kwargs)
{"asctime": "2021-09-28 09:46:29,348", "name": "flytekit", "levelname": "ERROR", "message": "Exception when executing Input contains NaN, infinity or a value too large for dtype('float64').", "exc_info": "Traceback (most recent call last):\n  File \"/Users/ketanumare/src/flytekit/flytekit/core/base_task.py\", line 464, in dispatch_execute\n    native_outputs = self.execute(**native_inputs)\n  File \"/Users/ketanumare/src/flytekit/flytekit/core/python_function_task.py\", line 157, in execute\n    return exception_scopes.user_entry_point(self._task_function)(**kwargs)\n  File \"/Users/ketanumare/.virtualenvs/flyte-feast/lib/python3.8/site-packages/wrapt/wrappers.py\", line 566, in __call__\n    return self._self_wrapper(self.__wrapped__, self._self_instance,\n  File \"/Users/ketanumare/src/flytekit/flytekit/common/exceptions/scopes.py\", line 198, in user_entry_point\n    return wrapped(*args, **kwargs)\n  File \"/Users/ketanumare/src/flytesnacks/cookbook/case

ValueError: Input contains NaN, infinity or a value too large for dtype('float64').

In [2]:
from flytekit.configuration import aws
from flytekit.configuration import set_flyte_config_file
set_flyte_config_file('/Users/ketanumare/.flyte/config')
print(aws.S3_ACCESS_KEY_ID.get())
print(aws.S3_ENDPOINT.get())
print(aws.S3_SECRET_ACCESS_KEY.get())

minio
http://localhost:30084
miniostorage


In [1]:
!cat /Users/ketanumare/.flyte/config

[platform]
url=localhost:30081
insecure=True

[credentials]
client_id=flytectl
redirect_uri=http://localhost:53593/callback
oauth_scopes=offline,all
authorization_metadata_key=flyte-authorization
auth_mode=standard

[aws]
s3_endpoint=http://localhost:30084
s3_access_key_id=minio
s3_secret_access_key=miniostorage
