## Housing Price Prediction Using Kubeflow and Feast

* Predict housing prices using Feast and Kubeflow

Setup the notebook
- Install `feast` with pip.
- Activate user service account with credentials JSON.
- Hacks to retrieve essential information for deployments and serving.

**NOTE**: This code block might hangs for a long time.

In [6]:
import demo_util
demo_util.notebook_setup()

INFO:root:Adding /home/jovyan/LinearModel/fairing to path


In [7]:
import importlib
importlib.reload(demo_util)
working_dir = "/home/jovyan/LinearModel"

In [8]:
PROJECT, ZONE, APP_NAME = demo_util.get_project_config()
print('PROJECT =', PROJECT)
print('APP_NAME =', APP_NAME)
print('ZONE =', ZONE)

PROJECT = aliz-development
APP_NAME = kubeflow-asia
ZONE = asia-southeast1-a


In [30]:
# fairing:include-cell
import fairing
import sys
import importlib
import uuid
import logging
import os
import json
import requests
import pandas as pd
import numpy as np
from retrying import retry
from feast.sdk.resources.entity import Entity
from feast.sdk.resources.storage import Storage
from feast.sdk.resources.feature import Feature, Datastore, ValueType
from feast.sdk.resources.feature_set import FeatureSet, FileType
import feast.specs.FeatureSpec_pb2 as feature_pb

from feast.sdk.importer import Importer
from feast.sdk.client import Client

In [10]:
# Connect to the Feast deployment
FEAST_CORE_URL = '10.148.0.99:6565'
FEAST_SERVING_URL = '10.148.0.100:6566'
STAGING_LOCATION = 'gs://kubecon-19-gojek/staging'
fs = Client(core_url=FEAST_CORE_URL,serving_url=FEAST_SERVING_URL, verbose=True)

## Load precomputed feature data

In [11]:
df = pd.read_csv('usa_housing.csv', index_col=False)
df.head()

Unnamed: 0,avg_area_income,avg_area_house_age,avg_area_number_of_rooms,avg_area_number_of_bedrooms,area_population,price,area_code,timestamp
0,79545.458574,5.682861,7.009188,4.09,23086.800503,1059034.0,NE 37010-5101,2018-01-01T00:00:00
1,79248.642455,6.0029,6.730821,3.09,40173.072174,1505891.0,CA 48958,2018-01-01T00:00:00
2,61287.067179,5.86589,8.512727,5.13,36882.1594,1058988.0,WI 06482-3489,2018-01-01T00:00:00
3,63345.240046,7.188236,5.586729,3.26,34310.242831,1260617.0,FPO AP 44820,2018-01-01T00:00:00
4,59982.197226,5.040555,7.839388,4.23,26354.109472,630943.5,FPO AE 09386,2018-01-01T00:00:00


## Register entity and features

In [12]:
# Create importer
importer = Importer.from_df(df, 
                           entity='usa_housing', 
                           owner='user@website.com',  
                           staging_location=STAGING_LOCATION,
                           id_column='area_code', 
                           timestamp_column='timestamp',
                           serving_store=Datastore(id='SERVING'),
                           warehouse_store=Datastore(id='WAREHOUSE'))

# Update feature and entity metadata. Ideally you want to update these manually
# so that they contain adequate information for the next user
importer.entity.description = 'entity level description' 
for feature_id in importer.features:
    importer.features[feature_id].description = 'feature level description'
    
# Ingest the feature data into the store
fs.run(importer, apply_features=True, apply_entity=True)

Successfully applied entity with name: usa_housing
---
name: usa_housing
description: entity level description

Successfully applied feature with id: usa_housing.avg_area_income
---
id: usa_housing.avg_area_income
name: avg_area_income
owner: user@website.com
description: feature level description
valueType: DOUBLE
entity: usa_housing
dataStores:
  serving:
    id: SERVING
  warehouse:
    id: WAREHOUSE

Successfully applied feature with id: usa_housing.avg_area_house_age
---
id: usa_housing.avg_area_house_age
name: avg_area_house_age
owner: user@website.com
description: feature level description
valueType: DOUBLE
entity: usa_housing
dataStores:
  serving:
    id: SERVING
  warehouse:
    id: WAREHOUSE

Successfully applied feature with id: usa_housing.avg_area_number_of_rooms
---
id: usa_housing.avg_area_number_of_rooms
name: avg_area_number_of_rooms
owner: user@website.com
description: feature level description
valueType: DOUBLE
entity: usa_housing
dataStores:
  serving:
    id: SERV

'feastimport1558387886119'

## Define a Feature Set for this project

In [13]:
ENTITY_ID = 'usa_housing'
TRAINING_FEATURES_SET = [
    'usa_housing.avg_area_income',
    'usa_housing.avg_area_house_age',
    'usa_housing.avg_area_number_of_rooms',
    'usa_housing.avg_area_number_of_bedrooms',
    'usa_housing.area_population',
    'usa_housing.price'
]

feature_set = FeatureSet(entity=ENTITY_ID, 
                         features=TRAINING_FEATURES_SET)

## Retrieve a Training Set from Feast

In [14]:
# Retrieve feature data for training from Feast
dataset = fs.create_dataset(feature_set, "2018-01-01", "2018-01-31")
training_df = fs.download_dataset_to_df(dataset, STAGING_LOCATION)

creating training dataset for features: ['usa_housing.avg_area_income', 'usa_housing.avg_area_house_age', 'usa_housing.avg_area_number_of_rooms', 'usa_housing.avg_area_number_of_bedrooms', 'usa_housing.area_population', 'usa_housing.price']
created dataset usa_housing_1558387901327_20180101_20180131: aliz-development.fs_usa_housing.1558387901327_20180101_20180131


## Train Linear Model

In [72]:
# fairing:include-cell
class HousingModel(object):
  """Model class."""
  SERVING_FEATURE_SET = [
        'usa_housing.avg_area_income',
        'usa_housing.avg_area_house_age',
        'usa_housing.avg_area_number_of_rooms',
        'usa_housing.avg_area_number_of_bedrooms',
        'usa_housing.area_population']

  def __init__(self):
    self.m = None
    self.b = None
    self.fs = None
    self.serving_fs = None

    logging.basicConfig(level=logging.INFO,
        format=('%(levelname)s|%(asctime)s'
                '|%(pathname)s|%(lineno)d| %(message)s'),
        datefmt='%Y-%m-%dT%H:%M:%S',
        )
    logging.getLogger().setLevel(logging.INFO)

  # Train model 
  def train(self, training_df):
    np.set_printoptions(precision=3)
    train_data = training_df[[x.split('.')[1] for x in TRAINING_FEATURES_SET]].to_numpy()
    train_data[:, len(train_data[0]) - 1] = 1
    Y = training_df['price'].to_numpy()

    x = np.linalg.lstsq(train_data, Y, rcond=0)[0]
    m, b = x[:len(train_data[0])-1], x[len(train_data[0])-1]

    self.m = m
    self.b = b
    return m,b

  def predict(self, feature_id, feature_names):
    logging.info('feature_id = %s', feature_id)
    logging.info('feature_names = %s', feature_names)
    if any([i is None for i in [self.m, self.b, self.fs, self.serving_fs]]):
      with open('simple_model.dat', 'r') as f:
        model = json.load(f)
        self.m = np.array(model.get('m', []))
        self.b = float(model.get('b', 0))

        _FEAST_CORE_URL = model['FEAST_CORE_URL']
        _FEAST_SERVING_URL = model['FEAST_SERVING_URL']
        _ENTITY_ID = model['ENTITY_ID']

        logging.info('FEAST_CORE_URL: %s', _FEAST_CORE_URL)
        logging.info('FEAST_SERVING_URL: %s', _FEAST_SERVING_URL)
        logging.info('ENTITY_ID: %s', _ENTITY_ID)
        logging.info('FEATURES_SET: %s', self.SERVING_FEATURE_SET)

        self.fs = Client(core_url=_FEAST_CORE_URL,
            serving_url=_FEAST_SERVING_URL,
            verbose=True)
        self.serving_fs = FeatureSet(
            entity=_ENTITY_ID,
            features=self.SERVING_FEATURE_SET)

    features = self.fs.get_serving_data(
        self.serving_fs,
        entity_keys=[feature_id])
    X = features.to_numpy()[0][1:]
    logging.info('X: %s', str(X))

    return [sum(self.m * X) + self.b]

  def save_model(self, model_path):
    """Save the model to a json file."""
    MODEL_FILE = 'simple_model.dat'

    model = {
        'm': self.m.tolist(),
        'b': self.b,
        'FEAST_CORE_URL': FEAST_CORE_URL,
        'FEAST_SERVING_URL': FEAST_SERVING_URL,
        'ENTITY_ID': ENTITY_ID,
    }
    
    logging.info('Saving model to %s', model_path)

    with open(model_path, 'w+') as f:
        json.dump(model, f)

## Train Locally 

In [81]:
model = HousingModel()
m, b = model.train(training_df)
print(m, b)

[2.158e+01 1.656e+05 1.207e+05 1.651e+03 1.520e+01] -2637299.0333282975


## Save the model

In [20]:
MODEL_FILE = 'simple_model.dat'


model_path = os.path.join(os.getcwd(), MODEL_FILE)
model.save_model(model_path)

INFO:root:Saving model to /home/jovyan/LinearModel/simple_model.dat


## Local Prediction

In [82]:
model.predict('FPO AE 09386', None)


INFO:root:feature_id = FPO AE 09386
INFO:root:feature_names = None
INFO:root:FEAST_CORE_URL: 10.148.0.99:6565
INFO:root:FEAST_SERVING_URL: 10.148.0.100:6566
INFO:root:ENTITY_ID: usa_housing
INFO:root:FEATURES_SET: ['usa_housing.avg_area_income', 'usa_housing.avg_area_house_age', 'usa_housing.avg_area_number_of_rooms', 'usa_housing.avg_area_number_of_bedrooms', 'usa_housing.area_population']
INFO:root:X: [59982.19722570803 5.040554523106283 7.839387785120487 4.23
 26354.109472103148]


[845388.7662961711]

## Use fairing to build the docker image

* This uses the append builder to rapidly build docker images

In [43]:
GCP_PROJECT = fairing.cloud.gcp.guess_project_name()
DOCKER_REGISTRY = 'gcr.io/{}/fairing-job'.format(GCP_PROJECT)
PY_VERSION = ".".join([str(x) for x in sys.version_info[0:3]])
base_image = "gcr.io/aliz-development/kubecon-demo/notebook:v20190520-67db96e-dirty-c5f145"

In [83]:
from fairing.builders import append
import fairing_util
preprocessor = fairing_util.ConvertNotebookPreprocessorWithFire("HouingModel")

if not preprocessor.input_files:
    preprocessor.input_files = set()

# Bake the model into the container    
input_files=["simple_model.dat"]
preprocessor.input_files =  set([os.path.normpath(f) for f in input_files])
preprocessor.preprocess()
builder = append.append.AppendBuilder(registry=DOCKER_REGISTRY,
                                      base_image=base_image, preprocessor=preprocessor)
builder.build()


INFO:root:Creating docker context: /tmp/fairing.context.tar.gz
INFO:root:Adding files to context: [PosixPath('ames-feast.py'), 'simple_model.dat']
INFO:root:Context: /tmp/fairing.context.tar.gz, Adding /home/jovyan/LinearModel/fairing/fairing/__init__.py at /app/fairing/__init__.py
INFO:root:Context: /tmp/fairing.context.tar.gz, Adding /home/jovyan/LinearModel/fairing/fairing/runtime_config.py at /app/fairing/runtime_config.py
INFO:root:Context: /tmp/fairing.context.tar.gz, Adding ames-feast.py at /app/ames-feast.py
INFO:root:Context: /tmp/fairing.context.tar.gz, Adding simple_model.dat at /app/simple_model.dat
INFO:root:Loading Docker credentials for repository 'gcr.io/aliz-development/kubecon-demo/notebook:v20190520-67db96e-dirty-c5f145'
INFO:root:Invoking 'docker-credential-gcloud' to obtain Docker credentials.
INFO:root:Successfully obtained Docker credentials.
INFO:root:Loading Docker credentials for repository 'gcr.io/aliz-development/fairing-job/fairing-job:31CB2869'
INFO:root:I

## Deploy with Kubeflow

In [87]:
from fairing.deployers import serving
import fairing_util
pod_spec = builder.generate_pod_spec()

module_name = os.path.splitext(preprocessor.executable.name)[0]
deployer = serving.serving.Serving(module_name + ".HousingModel",
                                   service_type="ClusterIP",
                                   labels={"app": "ames"})

url = deployer.deploy(pod_spec)

logging.info("Created deployment %s", print(deployer.deployment.metadata.name))

INFO:root:Cluster endpoint: http://fairing-service-m6k78.kubeflow.svc.cluster.local
INFO:root:Created deployment None


fairing-deployer-mh9kb


In [78]:
!kubectl describe deploy {deployer.deployment.metadata.name}

Name:                   fairing-deployer-sxhqp
Namespace:              kubeflow
CreationTimestamp:      Mon, 20 May 2019 22:23:30 +0000
Labels:                 app=ames
                        fairing-deployer=serving
                        fairing-id=e53c78f2-7b4d-11e9-91dd-a6a881dd7379
Annotations:            deployment.kubernetes.io/revision: 1
Selector:               app=ames,fairing-deployer=serving,fairing-id=e53c78f2-7b4d-11e9-91dd-a6a881dd7379
Replicas:               1 desired | 1 updated | 1 total | 1 available | 0 unavailable
StrategyType:           RollingUpdate
MinReadySeconds:        0
RollingUpdateStrategy:  25% max unavailable, 25% max surge
Pod Template:
  Labels:  app=ames
           fairing-deployer=serving
           fairing-id=e53c78f2-7b4d-11e9-91dd-a6a881dd7379
  Containers:
   model:
    Image:      gcr.io/aliz-development/fairing-job/fairing-job:18E3B5BE
    Port:       <none>
    Host Port:  <none>
    Command:
      seldon-core-microserv

In [25]:
!kubectl get deploy -o yaml {deployer.deployment.metadata.name}

apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  annotations:
    deployment.kubernetes.io/revision: "1"
  creationTimestamp: "2019-05-20T21:34:08Z"
  generateName: fairing-deployer-
  generation: 1
  labels:
    app: ames
    fairing-deployer: serving
    fairing-id: ffd4b492-7b46-11e9-91dd-a6a881dd7379
  name: fairing-deployer-hfk9z
  namespace: kubeflow
  resourceVersion: "2319324"
  selfLink: /apis/extensions/v1beta1/namespaces/kubeflow/deployments/fairing-deployer-hfk9z
  uid: ffd70fc7-7b46-11e9-852c-42010a9400a1
spec:
  progressDeadlineSeconds: 600
  replicas: 1
  revisionHistoryLimit: 10
  selector:
    matchLabels:
      app: ames
      fairing-deployer: serving
      fairing-id: ffd4b492-7b46-11e9-91dd-a6a881dd7379
  strategy:
    rollingUpdate:
      maxSurge: 25%
      maxUnavailable: 25%
    type: RollingUpdate
  template:
    metadata:
      creationTimestamp: null
      labels:
        app: ames
        fairing-deployer: serv

## Call the prediction endpoint

In [85]:
@retry(wait_exponential_multiplier=1000, wait_exponential_max=5000,
       stop_max_delay=2*60*1000)
def predict(url, id):
    pdata={
        'strData': id,
    }
    serialized_data = json.dumps(pdata)
    r = requests.post(url, data={'json':serialized_data}, timeout=5)
    return r

In [86]:
full_url = url + ":5000/predict"
r = predict(full_url, 'CA 48958')
if r.ok:
    logging.info("Response: %s", r.content)
else:
    logging.error("Prediction failed; %s", r.content)

INFO:root:Response: b'{"data":{"tensor":{"shape":[1],"values":[1494937.691618489]}},"meta":{}}\n'
