In this notebook, I will create a pipeline mechanism to feed the query data for the model to predict the results. This is an important aspect of modeling to check how the model does prediction on a single query point.

From `03-Modeling-FI.ipynb` notebook, I noticed __Gradient Boosting__ ensemble classifier outperformed all the models including Random Forest and XGBoost classifiers. Though train loss of Random Forest and XGBoost classifiers is negligible, the cross-validation loss is more, which substantiates the fact that both the models are overfitting.

__1. Packages__

In [1]:
import warnings
warnings.filterwarnings('ignore')

In [2]:
from IPython.display import display

In [3]:
from matplotlib import pyplot as plt
from matplotlib import style
style.use(style='seaborn-deep')

In [4]:
from tabulate import tabulate

In [5]:
import numpy as np
import os
import pandas as pd
import pickle
import seaborn as sns

In [6]:
from sklearn.preprocessing import MinMaxScaler

__2. Features and target__

In [7]:
features = ['alpha', 'delta', 'u', 'g', 'r', 'i', 'z', 'redshift']
target = 'class'

__3. Fetch the raw data__

In [8]:
def fetch_data(features):
    """
    This function fetches the raw data.
    """
    data = {f: [float(input("  '{}': ".format(f)))] for f in features}
    df = pd.DataFrame(data=data)
    print("Raw data is fetched successfully.")
    return df

In [9]:
df = fetch_data(features=features)

  'alpha': 20
  'delta': 20
  'u': 20
  'g': 20
  'r': 20
  'i': 20
  'z': 20
  'redshift': 20
Raw data is fetched successfully.


__4. Preprocess the raw data__

In [10]:
def preprocess(df, features):
    """
    This function preprocess the rae data.
    """
    with open(file='scaling.pkl', mode='rb') as pre_pkl:
        scaling = pickle.load(file=pre_pkl)
    
    df = scaling.transform(X=df)
    df = pd.DataFrame(data=df, columns=features)
    return df

In [11]:
df = preprocess(df=df, features=features)
display(df)

Unnamed: 0,alpha,delta,u,g,r,i,z,redshift
0,0.055541,0.373981,0.413298,0.472173,0.515344,0.464463,0.525388,2.84993


__5. Feature engineering on preprocessed data__

In [12]:
def featurize(df):
    """
    This function featurizes the dataframe.
    It selects the important features obtained using RF.
    Please refer 02-Modeling and 03-Modeling-FI notebooks.
    """
    fi_cols = ['g', 'redshift', 'g-r', 'i-z', 'u-r', 'i-r', 'z-r']
    df['g-r'] = df['g'] - df['r']
    df['i-z'] = df['i'] - df['z']
    df['u-r'] = df['u'] - df['r']
    df['i-r'] = df['i'] - df['r']
    df['z-r'] = df['z'] - df['r']
    df = df[fi_cols]
    return df

In [13]:
df = featurize(df=df)
display(df)

Unnamed: 0,g,redshift,g-r,i-z,u-r,i-r,z-r
0,0.472173,2.84993,-0.04317,-0.060926,-0.102045,-0.050881,0.010045


__6. Predictions__

In [14]:
def prediction(X):
    """
    This functions predicts the datapoint.
    """
    with open(file='gb_classifier.pkl', mode='rb') as m_pkl:
        clf, sig_clf = pickle.load(file=m_pkl)
    
    pred = sig_clf.predict(X=X)
    print("Predicted class: {}".format(pred))

In [15]:
prediction(X=df)

Predicted class: ['QSO']


__7. Machine learning pipeline__

In [16]:
def ml_application(features):
    """
    This is a local machine learning application.
    """
    print("Please provide the data for below features.")
    df = fetch_data(features=features)
    df = preprocess(df=df, features=features)
    df = featurize(df=df)
    prediction(X=df)

In [18]:
ml_application(features=features)

Please provide the data for below features.
  'alpha': 11
  'delta': 1
  'u': 11
  'g': 1
  'r': 11
  'i': 11
  'z': 11
  'redshift': 0
Raw data is fetched successfully.
Predicted class: ['STAR']
