# Titanic KFP Example

This example demonstrates how to use managed AI Pipelines, CAIP Training, and CAIP Predictions with the Kaggle titanic dataset. We will build a KFP that will:
1. Split the dataset into train and test
1. Peform feature engineering on the train dataset and apply that feature engineering to the test dataset.
1. Train a keras model on the data (maybe on Cloud AI Platform Training)
1. Hyperparameter search the keras model.
1. Push the best keras model to Cloud AI Platform Serving

## Prereqs:
1. Install the [KFP SDK](https://www.kubeflow.org/docs/pipelines/sdk/install-sdk/).
1. Create an AI Pipelines instance.
1. Copy the [Kaggle titanic train.csv and test.csv](https://www.kaggle.com/c/titanic) to GCS

In [2]:
import os
import kfp

In [8]:
input_data = "gs://xoonij-titanic-mlops/input.csv"
train_output = "gs://xoonij-titanic-mlops/train.csv"
val_output = "gs://xoonij-titanic-mlops/val.csv"

In [4]:
split_dataset_op = kfp.components.load_component_from_file(os.path.join('components/split_dataset', 'component.yaml')) 

In [5]:
@kfp.dsl.pipeline(
    name = "Titanic KFP Pipeline",
    description = "Example pipeline using the Titanic Dataset."
    )
def titanic_kfp_pipeline(input_data: str, 
                         train_output: str, 
                         val_output:str):
    """KubeFlow pipeline example for Titanic Dataset"""
    split_dataset_op(input_data, train_output, val_output)

In [6]:
kfp.compiler.Compiler().compile(titanic_kfp_pipeline,  'titanic-kfp-pipeline.zip')

In [7]:
client = kfp.Client(host='bdcf010b13b422b-dot-us-central2.pipelines.googleusercontent.com')
my_experiment = client.create_experiment(name='titanic-kfp-pipeline')
my_run = client.run_pipeline(my_experiment.id, 'titanic-kfp-pipeline', 'titanic-kfp-pipeline.zip')