## Linear and Support Vector Modeling for Class Target (Python & SAS Viya)

**EXAMPLE:** Linear and Support Vector Based Modeling for Class Target using Python & SAS Viya  
**DATA SOURCE:**  
Training Data: adult_train.csv, Testing Data: adult_test.csv   
Becker, B. and Kohavi, R. (1996). Adult. UCI Machine Learning Repository. [Link](https://doi.org/10.24432/C5XW20)  
                 
**DESCRIPTION:** This template demonstrates a workflow for building predictive models in Python using non-tree-based modeling techniques such as Logistic Regression and Support Vector Machines (SVM).  
**PURPOSE:** The goal is to predict the likelihood of a binary outcome, in this case, whether income exceeds $50K/yr.  
**DETAILS:**  
- Classification Models built include: Logistic Regression, Support Vector Machines (SVM), and Ensemble.  
- Score the test data.
- Model Assessment: Classification Report.
- Model Comparison: Overlaid curves are plotted to assess the performance of each model in predicting events along with AUC score.

In [None]:
# Importing necessary libraries
import os
import pandas as pd
import numpy as np
from sklearn.ensemble import VotingClassifier
from sklearn.impute import SimpleImputer
from sasviya.ml.linear_model import LogisticRegression
from sasviya.ml.svm import SVC
from sklearn.metrics import roc_curve, roc_auc_score, classification_report
import seaborn as sn
import matplotlib.pyplot as plt

# Suppress warnings
import warnings
warnings.filterwarnings("ignore")

### Data Loading and Preprocessing
- **Importing Data and Defining Variables**
    - Load the dataset for both training and testing partitions.
    - Define variables necessary for further analysis
- **Imputation for Missing Values**
    - Since the original data doesn't have missing values, let's insert missing values in the training partition for select interval variables to demonstrate an imputation technique.

In [None]:
# Construct the workspace path relative to the current working directory
workspace = f"{os.path.abspath('')}/../../data/"

# Importing Data and Defining Variables
train_data = pd.read_csv(os.path.join(workspace, "adult_train.csv"))
test_data = pd.read_csv(os.path.join(workspace, "adult_test.csv"))

# Encode categorical target variable as binary labels
train_data['target_binary'] = train_data['target'].replace({'<=50K': 0, '>50K': 1})
test_data['target_binary'] = test_data['target'].replace({'<=50K': 0, '>50K': 1})

# Define input features (X) and target variable (y)
X_train = pd.get_dummies(train_data.drop(columns=['target', 'target_binary']))
y_train = train_data['target_binary']
X_test = pd.get_dummies(test_data.drop(columns=['target', 'target_binary']))
y_test = test_data['target_binary']

# Reindex the testing dataset with the columns from the training dataset
X_test = X_test.reindex(columns=X_train.columns, fill_value=0)


**Perform Imputation using Mean Strategy**
- ***Note: Random missing values inserted for demonstration of imputation***


In [None]:
# Insert missing values randomly in the training data for demonstration purposes
train_data_imputed = train_data.copy()
np.random.seed(12345)  # Set seed for reproducibility
train_data_imputed.loc[train_data_imputed.sample(frac=0.02).index, 'age'] = np.nan  # 2% missing values for 'age'
train_data_imputed.loc[train_data_imputed.sample(frac=0.03).index, 'hours_per_week'] = np.nan  # 3% missing values for 'hours_per_week'

# Print summary of missing values before imputation
print("Summary of missing values before imputation:")
print(train_data_imputed.isnull().sum())

# Imputation for missing values using mean strategy
imputer = SimpleImputer(strategy='mean')
train_data_imputed[['age', 'hours_per_week']] = imputer.fit_transform(train_data_imputed[['age', 'hours_per_week']])

# Print summary of missing values after imputation
print("\nSummary of missing values after imputation:")
print(train_data_imputed.isnull().sum())

### Logistic Regression Model Training, Scoring and Evaluation
For more information regarding SAS Viya Logistic Regression, refer to [this link](https://documentation.sas.com/?cdcId=workbenchcdc&cdcVersion=default&docsetId=explore&docsetTarget=n0110bswc89wqjn1tht4ceu4hs7y.htm).


In [None]:
# Initialize Logistic Regression model
sas_lr = LogisticRegression()

# Fit the model
sas_lr.fit(X_train, y_train)

# Score on the test partition
y_pred_log = sas_lr.predict(X_test)

# Calculate predicted probabilities for the positive class ('>50K')
y_pred_proba_log = sas_lr.predict_proba(X_test)['P_target_binary1'].values

**Logistic Regression Model Evaluation**  
&emsp; Generate Classification Report



In [None]:
# Calculate confusion matrix 
class_report_log = classification_report(test_data['target_binary'], y_pred_log)

# Print classification report for Logistic Regression
print("\nClassification Report:")
print(class_report_log)

### SVC Model Training, Scoring and Evaluation
For more information regarding SAS Viya SVC, refer to [this link](https://documentation.sas.com/?cdcId=workbenchcdc&cdcVersion=default&docsetId=explore&docsetTarget=p1udx0532v47xfn1l3ix3scjh8uj.htm).


In [None]:
# Initialize SVM model
sas_svc_model = SVC()

# Fit the model
sas_svc_model.fit(X_train, train_data['target_binary']) 

# Score on the test partition
y_pred_svm = sas_svc_model.predict(X_test)

# Calculate predicted probabilities for the positive class ('>50K')
y_pred_proba_svm = sas_svc_model.predict_proba(X_test)['P_target_binary1'].values

**SVM Model Evaluation**  
&emsp; Generate Classification Report

In [None]:
# Calculate classification report 
class_report_svm = classification_report(test_data['target_binary'], y_pred_svm)

# Print classification report for SVM
print("\nClassification Report for SVM:")
print(class_report_svm)

### Ensemble Model
##### Final prediction is based on averaged predicted probabilities.

In [None]:
# Initialize ensemble model with logistic regression and SVM
ensemble_model = VotingClassifier(estimators=[('logistic', sas_lr), ('svm', sas_svc_model)], voting='soft')

# Fit the ensemble model
ensemble_model.fit(X_train, y_train)

# Score on the test partition
y_pred_ensemble = ensemble_model.predict(X_test)

# Calculate predicted probabilities for the positive class ('>50K')
y_pred_proba_ensemble = ensemble_model.predict_proba(X_test)[:, 1]

**Ensemble Model Evaluation**  
&emsp; Generate Classification Report

In [None]:
# Calculate classification report
class_report_ensemble = classification_report(test_data['target_binary'], y_pred_ensemble)

# Print classification report for Ensemble Model
print("\nClassification Report for Ensemble Model:")
print(class_report_ensemble)

### Model Comparison - Overlaid ROC Curves
##### Visualize ROC curves of multiple models on the same plot for easy comparison. 


In [None]:
# Calculate ROC curve
# Logistic Regression
fpr_log, tpr_log, thresholds_log = roc_curve(test_data['target_binary'], y_pred_proba_log)
roc_auc_log = roc_auc_score(test_data['target_binary'], y_pred_proba_log)
# SVM
fpr_svm, tpr_svm, _ = roc_curve(test_data['target_binary'], y_pred_proba_svm)
roc_auc_svm = roc_auc_score(test_data['target_binary'], y_pred_proba_svm)
# Ensemble
fpr_ensemble, tpr_ensemble, _ = roc_curve(test_data['target_binary'], y_pred_proba_ensemble)
roc_auc_ensemble = roc_auc_score(test_data['target_binary'], y_pred_proba_ensemble)

# Plot ROC curves for Logistic Regression, SVM, and Ensemble models
plt.figure(figsize=(10, 8))
# Plot Logistic Regression ROC curve
plt.plot(fpr_log, tpr_log, color='blue', lw=2, label='Logistic Regression ROC curve (AUC = %0.2f)' % roc_auc_log)
# Plot SVM ROC curve
plt.plot(fpr_svm, tpr_svm, color='green', lw=2, label='SVM ROC curve (AUC = %0.2f)' % roc_auc_svm)
# Plot Ensemble Model ROC curve
plt.plot(fpr_ensemble, tpr_ensemble, color='orange', lw=2, label='Ensemble Model ROC curve (AUC = %0.2f)' % roc_auc_ensemble)

# Add labels and legend
plt.plot([0, 1], [0, 1], color='gray', lw=1, linestyle='--')  # Diagonal reference line
plt.legend(loc="lower right")
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.title('ROC Curves for Logistic Regression, SVM, and Ensemble Models')
plt.show()