# Load Model and Make Predictions

<font color='steelblue'>

<span style="font-family:Arial; font-size:1.6em;">
    <strong>Load Logistic Regression Model</strong><br><br>
    Use New Income Prediction dataset to make predictions for model already built<br><br>
</span>
<span style="font-family:Arial; font-size:1.4em;">
    <b>Following examples are included in the processing:</b>
    <ol>
        <li>Load new dataset</li>
        <li>Load the data preprocessor built earlier</li>
        <li>Apply the processor to the dataframe</li>
        <li>Load the Logistic Regression Model already built</li>
        <li>Make predictions using processed data</li>
        <li>Write the predictions to .csv file</li>
    </ol>    
</span>

</font>

## Dataset Review

The Adult dataset we are going to use is publicly available at the [UCI Machine Learning Repository](https://archive.ics.uci.edu/ml/datasets/Adult).
This data derives from census data, and consists of information about 48842 individuals and their annual income.
We will use this information to predict if an individual earns **<=50K or >50k** a year.
The dataset is rather clean, and consists of both numeric and categorical variables.

Attribute Information:

- age: continuous
- workclass: Private,Self-emp-not-inc, Self-emp-inc, Federal-gov, Local-gov, State-gov, Without-pay, Never-worked
- fnlwgt: continuous
- education: Bachelors, Some-college, 11th, HS-grad, Prof-school, Assoc-acdm, Assoc-voc...
- education-num: continuous
- marital-status: Married-civ-spouse, Divorced, Never-married, Separated, Widowed, Married-spouse-absent...
- occupation: Tech-support, Craft-repair, Other-service, Sales, Exec-managerial, Prof-specialty, Handlers-cleaners...
- relationship: Wife, Own-child, Husband, Not-in-family, Other-relative, Unmarried
- race: White, Asian-Pac-Islander, Amer-Indian-Eskimo, Other, Black
- sex: Female, Male
- capital-gain: continuous
- capital-loss: continuous
- hours-per-week: continuous
- native-country: United-States, Cambodia, England, Puerto-Rico, Canada, Germany...

Target/Label: - <=50K, >50K

In [None]:
%config IPCompleter.greedy = True

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline

plt.style.use('seaborn-whitegrid')    # grids in the plots
import warnings
warnings.filterwarnings('ignore')

## Load Data

In [None]:
df = pd.read_csv('../datasets/agent-new.csv')

In [None]:
df.tail()

In [None]:
df.shape

## Use Data Processing Pipeline
<span style="font-family:times, serif; font-size:14pt; font-style:bold">
    Load the pipeline saved during the data cleaning process (pickle file), perform following process
<ol>
<li>Load the pickle file</li>
<li>Using the loaded preprocess, transform the dataframe in Spare format</li>
<li>Create Training and Test datasets</li>
</ol>
</span>

In [None]:
from pickle import load
processor = load(open('../preprocessor.pkl', 'rb'))

In [None]:
df1 = processor.fit_transform(df)

In [None]:
df1

## Load Model<br>
<span style="font-family:times, serif; font-size:14pt; font-style:bold">
    Load the Logistic Regression Model that was persisted

</span>

In [None]:
from sklearn.linear_model import LogisticRegression

In [None]:
from joblib import load

In [None]:
model = load(open('../logRegModel.joblib', 'rb'))

## Make Predictions

In [None]:
# make predictions on the test data and save them
y_pred = model.predict(df1)

In [None]:
# convert to list
preds = list(y_pred)

In [None]:
# change the predictions from 0 and 1 to "<=50K" & ">50K"
vals = list(map(lambda x: "<= 50K" if x == 0 else "> 50K", preds))

In [None]:
# add the predictions to dataframe
df['Predictions'] = vals

In [None]:
df.tail()

In [None]:
# Write the dataframe to .csv file
df.to_csv('../predictions.csv')