### Overview
Making a prediction using a linear regression model is a common use case in ML. In this guide tutorial, we build the model that predicts best driver.

The basic local mode gives you ability to quickly try Feast.

This tutorial uses Feast with scikit learn to train a model locally.
 

## Step 1: Install feast, scikit-learn

Install feast and scikit-learn


In [1]:
!pip install feast scikit-learn



#### Check feast version

In [2]:
!feast version 

  from numpy.dual import register_func
  supported_dtypes = [np.typeDict[x] for x in supported_dtypes]
Feast SDK Version: "feast 0.21.2"


## Step 2: Clone the Git repo

Clone the Driver Ranking Git repo into your Colab Folder

In [3]:
!git clone https://github.com/juskuz/feast-driver-ranking-demo-aitech.git

Cloning into 'feast-driver-ranking-demo-aitech'...
remote: Enumerating objects: 11, done.[K
remote: Counting objects: 100% (11/11), done.[K
remote: Compressing objects: 100% (11/11), done.[K
remote: Total 11 (delta 0), reused 11 (delta 0), pack-reused 0[K
Unpacking objects: 100% (11/11), done.


## Step 3: Apply and deploy feature definitions

`feast apply` scans python files in the current directory for feature definitions and deploys infrastructure according to `feature_store.yaml`

In [4]:
# %%shell
# cd /content/feast-driver-ranking-demo-aitech/driver_ranking/
!feast -c feast-driver-ranking-demo-aitech/driver_ranking/ apply

  from numpy.dual import register_func
  supported_dtypes = [np.typeDict[x] for x in supported_dtypes]
Created entity [1m[32mdriver_id[0m
Created feature view [1m[32mdriver_hourly_stats[0m

Created sqlite table [1m[32mdriver_ranking_driver_hourly_stats[0m



### Inspect the files created under your local folder

In [5]:
%%shell
cd /content/feast-driver-ranking-demo-aitech/driver_ranking/data/
ls -l 

total 20
-rw-r--r-- 1 root root 16384 May 22 00:09 online_store.db
-rw-r--r-- 1 root root   950 May 22 00:09 registry.db




## Step 4: Train your model

In [6]:
import feast
from joblib import dump
import pandas as pd
from sklearn.linear_model import LinearRegression

# Load driver order data
orders = pd.read_csv("/content/feast-driver-ranking-demo-aitech/driver_orders.csv", sep="\t")
orders["event_timestamp"] = pd.to_datetime(orders["event_timestamp"])

# Connect to your feature store provider
fs = feast.FeatureStore(repo_path="/content/feast-driver-ranking-demo-aitech/driver_ranking")
        
# Retrieve training data from BigQuery
training_df = fs.get_historical_features(
    entity_df=orders,
    features=[
        "driver_hourly_stats:conv_rate",
        "driver_hourly_stats:acc_rate",
        "driver_hourly_stats:avg_daily_trips"
    ],
).to_df()

print("----- Feature schema -----\n")
print(training_df.info())

print()
print("----- Example features -----\n")
print(training_df.head())

# Train model
target = "trip_completed"

reg = LinearRegression()
train_X = training_df[training_df.columns.drop(target).drop("event_timestamp")]
train_Y = training_df.loc[:, target]
reg.fit(train_X[sorted(train_X)], train_Y)

# Save model
dump(reg, "driver_model.bin")

  from numpy.dual import register_func
  supported_dtypes = [np.typeDict[x] for x in supported_dtypes]
  return f(*args, **kwds)
  return f(*args, **kwds)
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
  from numpy import (exp, inf, pi, sqrt, floor, sin, cos, around, int,


----- Feature schema -----

<class 'pandas.core.frame.DataFrame'>
Int64Index: 10 entries, 360 to 3615
Data columns (total 6 columns):
 #   Column           Non-Null Count  Dtype              
---  ------           --------------  -----              
 0   event_timestamp  10 non-null     datetime64[ns, UTC]
 1   driver_id        10 non-null     int64              
 2   trip_completed   10 non-null     int64              
 3   conv_rate        10 non-null     float32            
 4   acc_rate         10 non-null     float32            
 5   avg_daily_trips  10 non-null     int32              
dtypes: datetime64[ns, UTC](1), float32(2), int32(1), int64(2)
memory usage: 440.0 bytes
None

----- Example features -----

               event_timestamp  driver_id  trip_completed  conv_rate  \
360  2021-04-16 20:29:28+00:00       1001               1   0.701558   
721  2021-04-17 04:29:28+00:00       1002               0   0.775499   
1082 2021-04-17 12:29:28+00:00       1003               0   0

['driver_model.bin']

## Step 5: Materialize your online store
Apply and materialize data.

In [7]:
!cd /content/feast-driver-ranking-demo-aitech/driver_ranking/ && feast materialize-incremental 2022-01-01T00:00:00

  from numpy.dual import register_func
  supported_dtypes = [np.typeDict[x] for x in supported_dtypes]
Materializing [1m[32m1[0m feature views to [1m[32m2022-01-01 00:00:00+00:00[0m into the [1m[32msqlite[0m online store.

[1m[32mdriver_hourly_stats[0m from [1m[32m2021-05-23 00:10:48+00:00[0m to [1m[32m2022-01-01 00:00:00+00:00[0m:
100%|████████████████████████████████████████████████████████████████| 5/5 [00:00<00:00, 335.99it/s]


### Step 6:  Make Prediction

In [8]:
import pandas as pd
import feast
from joblib import load


class DriverRankingModel:
    def __init__(self):
        # Load model
        self.model = load("/content/driver_model.bin")

        # Set up feature store
        self.fs = feast.FeatureStore(repo_path="/content/feast-driver-ranking-demo-aitech/driver_ranking/")

    def predict(self, driver_ids):
        # Read features from Feast
        driver_features = self.fs.get_online_features(
            entity_rows=[{"driver_id": driver_id} for driver_id in driver_ids],
            features=[
                "driver_hourly_stats:conv_rate",
                "driver_hourly_stats:acc_rate",
                "driver_hourly_stats:avg_daily_trips",
            ],
        )
        df = pd.DataFrame.from_dict(driver_features.to_dict())

        # Make prediction
        df["prediction"] = self.model.predict(df[sorted(df)])

        # Choose best driver
        best_driver_id = df["driver_id"].iloc[df["prediction"].argmax()]

        # return best driver
        return best_driver_id

In [9]:
def make_drivers_prediction():
    drivers = [1001, 1002, 1003, 1004]
    model = DriverRankingModel()
    best_driver = model.predict(drivers)
    print(f"Prediction for best driver id: {best_driver}")

In [10]:
make_drivers_prediction()

Prediction for best driver id: 1003
