# Scikit-Learn Perceptron & Adaline Implemantations

In this section we will use the `linear_model` module from the `sklearn` library to predict if the individual will earn over 50K or not. We will use the hyperparameters we found in the previous scratch implementations in this notebook. 

In [3]:
# scikit-learn pereptron and adaline implementations
import numpy as np
import pandas as pd
from pathlib import Path
from sklearn.linear_model import Perceptron as SkPerceptron, SGDRegressor
from sklearn.metrics import accuracy_score

---

## Load Processed Datasets

Read the preprocessed train/test feature matrices and labels, plus the validation features. These CSVs were produced earlier by the preprocessing pipeline (one-hot + numeric scaling), and will be the consistent input for all models.

In [4]:
# Reload features
X_train = pd.read_csv("../data/processed/X_train.csv")
X_test  = pd.read_csv("../data/processed/X_test.csv")
X_val = pd.read_csv("../data/processed/X_val.csv")

# Reload targets (squeezed into Series)
y_train = pd.read_csv("../data/processed/y_train.csv").squeeze("columns")
y_test  = pd.read_csv("../data/processed/y_test.csv").squeeze("columns")

# Align one-hot columns (safety)
X_test = X_test.reindex(columns=X_train.columns, fill_value=0)
X_val  = X_val.reindex(columns=X_train.columns,  fill_value=0)

In [5]:
# To numpy
Xtr = X_train.to_numpy(dtype=np.float64, copy=False)
ytr_ppn = y_train.to_numpy(dtype=np.int64,  copy=False)   # Perceptron wants class labels
ytr_ada = y_train.to_numpy(dtype=np.float64, copy=False)  # Adaline learns on 0.0/1.0
Xte = X_test.to_numpy(dtype=np.float64,  copy=False)
yte = y_test.to_numpy(dtype=np.int64,    copy=False)
Xv  = X_val.to_numpy(dtype=np.float64,   copy=False)

**Use Best Hyperparameters from CV**

Plug in the (eta, max_iter) pairs selected by cross-validation on the training set (no test peeking). These are the “winner” settings we’ll use to train the final sklearn models below.

In [6]:
# Use best hyperparameters from CV
best_eta_ppn, best_max_iter_ppn = 0.01, 150
best_eta_ada, best_max_iter_ada = 0.001, 150  

---

## Train scikit-learn Perceptron

Instantiate sklearn’s Perceptron with the chosen learning rate (`eta`) and number of epochs (`max_iter`), disable early stopping (`tol=None`) to run full passes, and train on all of `X_train, y_train`.

In [None]:
sk_ppn = SkPerceptron(
    eta0=best_eta_ppn,
    max_iter=best_max_iter_ppn,
    tol=None,                # run full epochs
    shuffle=True,
    random_state=42
)
sk_ppn.fit(Xtr, ytr_ppn)

# Test predictions
sk_y_pred_test_ppn = sk_ppn.predict(Xte)
print(f"[sklearn Perceptron] Test accuracy: {accuracy_score(yte, sk_y_pred_test_ppn):.4f}")

# Validation Predictions
sk_val_pred_ppn = sk_ppn.predict(Xv)
pd.DataFrame({"income": sk_val_pred_ppn}).to_csv("../outputs/predictions/Group_18_Perceptron_PredictedOutputs.csv", index=False)

[sklearn Perceptron] Test accuracy: 0.8129


--- 

## Train scikit-learn Adaline (via SGDRegressor)

Classic Adaline minimizes squared error of a linear unit. We emulate this with `SGDRegressor(loss="squared_error")`, learning on targets in `{0.0, 1.0}` and using a constant learning rate (`eta0`). No regularization (`penalty=None`) mirrors the textbook Adaline update.

In [None]:
# Train a linear regressor on targets {0,1}, then threshold at 0.5 for class prediction
sk_ada = SGDRegressor(
    loss="squared_error",
    penalty=None,             # mirrors classic Adaline (no regularization)
    learning_rate="constant",
    eta0=best_eta_ada,
    max_iter=best_max_iter_ada,
    tol=None,
    random_state=42
)
sk_ada.fit(Xtr, ytr_ada)

# Test predictions
test_scores = sk_ada.predict(Xte)               # continuous scores
sk_y_pred_test_ada = (test_scores >= 0.5).astype(int)
print(f"[sklearn Adaline]   Test accuracy: {accuracy_score(yte, sk_y_pred_test_ada):.4f}")

# Validation predictions
val_scores = sk_ada.predict(Xv)
sk_val_pred_ada = (val_scores >= 0.5).astype(int)
pd.DataFrame({"prediction": sk_val_pred_ada}).to_csv("../outputs/predictions/Group_18_Adaline_PredictedOutputs.csv", index=False)

[sklearn Adaline]   Test accuracy: 0.8253


In [9]:
print("Saved:")
print(" - ../outputs/predictions/Group_18_Perceptron_PredictedOutputs.csv")
print(" - ../outputs/predictions/Group_18_Adaline_PredictedOutputs.csv")

Saved:
 - ../outputs/predictions/Group_18_Perceptron_PredictedOutputs.csv
 - ../outputs/predictions/Group_18_Adaline_PredictedOutputs.csv
