# 02 - Make Prediction
---

### Description:
make_predictions script: should load your saved model generated by the first script, and output a predictions.csv file with your predictions (one prediction per line)

### Sections:
1. Preparation Works
2. Standardizing and Performing Dimensional Reduction

### 1. Preparation Works

Loading in all the required libaries, testing data without labels, and the trained model

In [1]:
# Importing all the related libraries
import pickle
import pandas as pd
import numpy as np

from sklearn.preprocessing import StandardScaler

In [2]:
model_file = r"03_model_binary.pickle"
testing_data = r"data/test_potus_by_county.csv"

In [3]:
# Loading in the model and the data
trained_model = pickle.load(open(model_file, 'rb'))
testing_df = pd.read_csv(testing_data)

In [4]:
testing_df.head(5)

Unnamed: 0,Total population,Median age,% BachelorsDeg or higher,Unemployment rate,Per capita income,Total households,Average household size,% Owner occupied housing,% Renter occupied housing,% Vacant housing,Median home value,Population growth,House hold growth,Per capita income growth
0,695751,34.692593,19.896296,10.651852,25624.074074,248724,2.816296,42.044444,25.755556,32.207407,179329.851852,-0.295185,-0.239259,2.62963
1,28759,37.3,13.1,14.3,15978.0,10447,2.45,58.7,21.2,20.1,79052.0,-0.23,-0.1,0.71
2,10926,36.2,9.5,17.0,12847.0,3749,2.48,58.7,20.1,21.2,72014.0,-0.77,-0.75,1.05
3,13266,40.9,10.9,20.9,15340.0,5624,2.34,59.8,13.9,26.3,68596.0,-0.61,-0.45,0.73
4,42841,37.0,16.3,22.6,15369.0,17046,2.48,54.5,28.2,17.2,74130.0,-0.74,-0.57,0.76


### 2. Standardizing and Performing Dimensional Reduction

Since the model was fitted to preprocessed data, we will continue to apply `StandardScaler()` and `PCA` to the testing data to maintain consistencies.

In [5]:
# Setting up the scaler to standardize the data
scaler = StandardScaler()

X = np.array(testing_df)
X_scaled = scaler.fit_transform(X)

In [6]:
from sklearn.decomposition import PCA

# Reducing the component from 15 to 10
pca = PCA(n_components=10, svd_solver='auto')
X_pca = pca.fit_transform(X_scaled)

In [7]:
# Applying the trained model for prediction
testing_df['Prediction'] = trained_model.predict(X_pca)
testing_df['Prediction'] = testing_df['Prediction'].map({
                                            0 : "Barack Obama", 
                                            1 : "Mitt Romney"
                                        })

In [8]:
testing_df.head(5)

Unnamed: 0,Total population,Median age,% BachelorsDeg or higher,Unemployment rate,Per capita income,Total households,Average household size,% Owner occupied housing,% Renter occupied housing,% Vacant housing,Median home value,Population growth,House hold growth,Per capita income growth,Prediction
0,695751,34.692593,19.896296,10.651852,25624.074074,248724,2.816296,42.044444,25.755556,32.207407,179329.851852,-0.295185,-0.239259,2.62963,Barack Obama
1,28759,37.3,13.1,14.3,15978.0,10447,2.45,58.7,21.2,20.1,79052.0,-0.23,-0.1,0.71,Barack Obama
2,10926,36.2,9.5,17.0,12847.0,3749,2.48,58.7,20.1,21.2,72014.0,-0.77,-0.75,1.05,Barack Obama
3,13266,40.9,10.9,20.9,15340.0,5624,2.34,59.8,13.9,26.3,68596.0,-0.61,-0.45,0.73,Mitt Romney
4,42841,37.0,16.3,22.6,15369.0,17046,2.48,54.5,28.2,17.2,74130.0,-0.74,-0.57,0.76,Barack Obama


In [9]:
testing_df.groupby("Prediction").size()

Prediction
Barack Obama     616
Mitt Romney     1285
dtype: int64

### Output data into csv

In [10]:
testing_df.to_csv("04_predictions.csv", index=False)