In this notebook, I will create a pipeline mechanism to feed the query data for the model to predict the results. This is an important aspect of modeling to check how the model does prediction on a single query point.

From `03-Modeling-FI.ipynb` notebook, I noticed __Gradient Boosting__ ensemble classifier outperformed all the models including Random Forest and XGBoost classifiers. Though train loss of Random Forest and XGBoost classifiers is negligible, the cross-validation loss is more, which substantiates the fact that both the models are overfitting.

__1. Packages__

In [1]:
import warnings
warnings.filterwarnings('ignore')

In [2]:
from IPython.display import display

In [3]:
from matplotlib import pyplot as plt
from matplotlib import style
style.use(style='seaborn-deep')

In [4]:
from tabulate import tabulate

In [5]:
import numpy as np
import os
import pandas as pd
import pickle
import seaborn as sns

In [6]:
from sklearn.preprocessing import MinMaxScaler

__2. Data reading__

In [7]:
test_df = pd.read_csv(filepath_or_buffer='test_data.csv')

In [8]:
print("The shape of the test data: {}".format(test_df.shape))
print(list(test_df.columns))

The shape of the test data: (20000, 9)
['alpha', 'delta', 'u', 'g', 'r', 'i', 'z', 'redshift', 'class']


In [9]:
display(test_df.head())

Unnamed: 0,alpha,delta,u,g,r,i,z,redshift,class
0,246.974162,33.152325,21.96427,20.25309,19.0962,18.56822,18.19525,0.281372,GALAXY
1,127.61369,31.759591,23.06762,22.91654,21.22299,20.07707,19.41036,0.0,GALAXY
2,347.477767,-9.626819,23.01191,20.07524,18.91103,18.53486,18.30015,0.000184,STAR
3,170.479754,47.432146,19.52529,18.16651,17.59703,17.29709,17.09104,0.056898,GALAXY
4,327.967136,21.291223,25.352,24.05777,21.90221,20.32482,19.49638,-0.00013,STAR


In [10]:
labels = np.unique(ar=test_df['class'].values)
print(labels)

['GALAXY' 'QSO' 'STAR']


In [11]:
display(test_df.describe())

Unnamed: 0,alpha,delta,u,g,r,i,z,redshift
count,20000.0,20000.0,20000.0,20000.0,20000.0,20000.0,20000.0,20000.0
mean,178.1891,24.217515,22.092386,20.631011,19.641835,19.077594,18.765109,0.5751
std,96.233557,19.65578,2.261144,2.035309,1.853532,1.754459,1.766172,0.729548
min,0.013337,-18.785328,12.10168,10.4982,10.11604,10.00865,10.44131,-0.005675
25%,127.928798,5.237683,20.352412,18.96556,18.13386,17.730847,17.461047,0.053931
50%,181.501058,23.851404,22.195655,21.08923,20.12814,19.39765,18.99737,0.419844
75%,234.490546,40.038436,23.70146,22.126745,21.033005,20.40055,19.92008,0.699654
max,359.99981,82.288657,28.09553,31.60224,27.59332,25.67336,26.0935,7.011245


In [12]:
imp_cols = ['u', 'g', 'r', 'i', 'z', 'redshift']
target = 'class'

In [13]:
X = test_df[imp_cols]

In [14]:
y = test_df[target].values

__3. Fetch the data for the model__

In [24]:
def fetch_data(features):
    """
    This function fetches the raw data.
    """
    data = [float(input("Enter the feature '{}' data: ".format(f)))
            for f in features]
    print()
    return np.array(object=data).reshape(1, -1)

In [25]:
print("Query point Xq: {}".format(fetch_data(features=imp_cols)))

Enter the feature 'u' data: 12
Enter the feature 'g' data: 12
Enter the feature 'r' data: 12
Enter the feature 'i' data: 12
Enter the feature 'z' data: 12
Enter the feature 'redshift' data: 12

Query point Xq: [[12. 12. 12. 12. 12. 12.]]
