In this notebook, I will create a pipeline mechanism to feed the query data for the model to predict the results. This is an important aspect of modeling to check how the model does prediction on a single query point.

From `03-Modeling-FI.ipynb` notebook, I noticed __Gradient Boosting__ ensemble classifier outperformed all the models including Random Forest and XGBoost classifiers. Though train loss of Random Forest and XGBoost classifiers is negligible, the cross-validation loss is more, which substantiates the fact that both the models are overfitting.

__1. Packages__

In [1]:
import warnings
warnings.filterwarnings('ignore')

In [2]:
from IPython.display import display

In [3]:
import numpy as np
import os
import pandas as pd
import pickle

__2. Features and target__

In [4]:
features = ['alpha', 'delta', 'u', 'g', 'r', 'i', 'z', 'redshift']
target = 'class'

__3. Fetch the raw data__

In [5]:
def fetch_data(features):
    """
    This function fetches the raw data.
    """
    data = {f: [float(input("  '{}': ".format(f)))] for f in features}
    df = pd.DataFrame(data=data)
    print("Raw data is fetched successfully.")
    return df

In [6]:
df = fetch_data(features=features)

  'alpha': 130
  'delta': 110
  'u': 1
  'g': 1
  'r': 1
  'i': 1
  'z': 1
  'redshift': 0
Raw data is fetched successfully.


__4. Preprocess the raw data__

In [7]:
def preprocess(df, features):
    """
    This function preprocess the rae data.
    """
    with open(file='scaling.pkl', mode='rb') as pre_pkl:
        scaling = pickle.load(file=pre_pkl)
    
    df = scaling.transform(X=df)
    df = pd.DataFrame(data=df, columns=features)
    return df

In [8]:
df = preprocess(df=df, features=features)
display(df)

Unnamed: 0,alpha,delta,u,g,r,i,z,redshift
0,0.361102,1.268287,-0.458855,-0.473307,-0.446692,-0.373591,-0.435595,0.00142


__5. Feature engineering on preprocessed data__

In [9]:
def featurize(df):
    """
    This function featurizes the dataframe.
    It selects the important features obtained using RF.
    Please refer 02-Modeling and 03-Modeling-FI notebooks.
    """
    fi_cols = ['redshift', 'g-r', 'i-z', 'u-r', 'i-r', 'z-r', 'g']
    df['g-r'] = df['g'] - df['r']
    df['i-z'] = df['i'] - df['z']
    df['u-r'] = df['u'] - df['r']
    df['i-r'] = df['i'] - df['r']
    df['z-r'] = df['z'] - df['r']
    df = df[fi_cols]
    return df

In [10]:
df = featurize(df=df)
display(df)

Unnamed: 0,redshift,g-r,i-z,u-r,i-r,z-r,g
0,0.00142,-0.026615,0.062004,-0.012163,0.0731,0.011096,-0.473307


__6. Predictions__

In [11]:
def prediction(X):
    """
    This functions predicts the datapoint.
    """
    with open(file='gb_classifier.pkl', mode='rb') as m_pkl:
        clf, sig_clf = pickle.load(file=m_pkl)
    
    pred_proba = sig_clf.predict_proba(X=X)
    confidence = np.round(a=np.max(pred_proba)*100, decimals=2)
    pred_class = sig_clf.predict(X=X)[0]
    if pred_class == 'QSO': pred_class = 'Quasi-Stellar Object'
    elif pred_class == 'GALAXY': pred_class = 'Galaxy'
    else: pred_class = 'Star'
    print("The predicted class is '{}' with a confidence of {}%.".format(pred_class, confidence))

In [12]:
prediction(X=df)

The predicted class is 'Quasi-Stellar Object' with a confidence of 50.92%.


__7. Machine learning pipeline__

In [13]:
def ml_pipeline(features):
    """
    This is a local machine learning application.
    """
    print("Please provide the data for below features.")
    df = fetch_data(features=features)
    df = preprocess(df=df, features=features)
    df = featurize(df=df)
    prediction(X=df)

In [14]:
ml_pipeline(features=features)

Please provide the data for below features.
  'alpha': 270
  'delta': 345
  'u': 11
  'g': 1
  'r': 11
  'i': 11
  'z': 11
  'redshift': 0
Raw data is fetched successfully.
The predicted class is 'Quasi-Stellar Object' with a confidence of 52.77%.
