# **Phishing Website Detection by Machine Learning Techniques**

*Final project of AI & Cybersecurity Course*

## **1. Objective:**
A phishing website is a common social engineering method that mimics trustful uniform resource locators (URLs) and webpages. The objective of this project is to train machine learning models and deep neural nets on the dataset created to predict phishing websites. Both phishing and benign URLs of websites are gathered to form a dataset and from them required URL and website content-based features are extracted. The performance level of each model is measures and compared.

*This project is worked on Google Collaboratory.*<br>
*The required packages for this notebook are imported when needed.*

## **2. Loading Data:**

The features are extracted and store in the csv file. The working of this can be seen in the 'Phishing Website Detection_Feature Extraction.ipynb' file.

The reulted csv file is uploaded to this notebook and stored in the dataframe.

In [1]:

#importing basic packages
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.neural_network import MLPClassifier
from xgboost import XGBClassifier
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score, precision_score, f1_score, roc_auc_score, recall_score, confusion_matrix, classification_report
import pickle

In [2]:
#Loading the data
try:
    data0 = pd.read_csv('Datasets/6.full_urldata_features.csv')
    print("Data loaded successfully.")
except FileNotFoundError:
    print("Error: '6.full_urldata_features' not found. Please check the file path.")
    # Exit if file not found to prevent further errors
    exit()

Data loaded successfully.


## **3. Familiarizing with Data**
In this step, few dataframe methods are used to look into the data and its features.

In [3]:
#Checking the shape of the dataset
data = data0.drop(['Domain'], axis=1).copy()
data = data.sample(frac=1, random_state=42).reset_index(drop=True)

In [4]:
#Listing the features of the dataset
y = data['Label']
X = data.drop('Label', axis=1)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=12)


In [5]:
#Information about the dataset
ML_Model = []
acc_train = []
acc_test = []
precision_train = []
precision_test = []
f1_train = []
f1_test = []
recall_train = []
recall_test = []
auc_train = []
auc_test = []


In [6]:
def storeResults(model, a, b, c, d, e, f, g, h, i, j):
    ML_Model.append(model)
    acc_train.append(round(a, 3))
    acc_test.append(round(b, 3))
    precision_train.append(round(c, 3))
    precision_test.append(round(d, 3))
    f1_train.append(round(e, 3))
    f1_test.append(round(f, 3))
    recall_train.append(round(g, 3))
    recall_test.append(round(h, 3))
    auc_train.append(round(i, 3))
    auc_test.append(round(j, 3))

In [7]:
print("\n--- Training Decision Tree Classifier ---")
tree = DecisionTreeClassifier(max_depth=5)
tree.fit(X_train, y_train)

y_test_tree = tree.predict(X_test)
y_train_tree = tree.predict(X_train)

acc_train_tree = accuracy_score(y_train, y_train_tree)
acc_test_tree = accuracy_score(y_test, y_test_tree)
precision_train_tree = precision_score(y_train, y_train_tree)
precision_test_tree = precision_score(y_test, y_test_tree)
f1_train_tree = f1_score(y_train, y_train_tree)
f1_test_tree = f1_score(y_test, y_test_tree)
recall_train_tree = recall_score(y_train, y_train_tree)
recall_test_tree = recall_score(y_test, y_test_tree)
auc_train_tree = roc_auc_score(y_train, tree.predict_proba(X_train)[:, 1])
auc_test_tree = roc_auc_score(y_test, tree.predict_proba(X_test)[:, 1])

print("Decision Tree: Train Accuracy: {:.3f}, Test Accuracy: {:.3f}".format(acc_train_tree, acc_test_tree))
print("Decision Tree Classification Report (Test Set):")
print(classification_report(y_test, y_test_tree))
print("Confusion Matrix (Test Set):\n", confusion_matrix(y_test, y_test_tree))
storeResults('Decision Tree', acc_train_tree, acc_test_tree, precision_train_tree, precision_test_tree, f1_train_tree, f1_test_tree, recall_train_tree, recall_test_tree, auc_train_tree, auc_test_tree)



--- Training Decision Tree Classifier ---
Decision Tree: Train Accuracy: 0.860, Test Accuracy: 0.868
Decision Tree Classification Report (Test Set):
              precision    recall  f1-score   support

           0       0.81      0.97      0.88      1025
           1       0.96      0.76      0.85       975

    accuracy                           0.87      2000
   macro avg       0.88      0.87      0.87      2000
weighted avg       0.88      0.87      0.87      2000

Confusion Matrix (Test Set):
 [[993  32]
 [232 743]]


In [8]:
print("\n--- Training Random Forest Classifier ---")
forest = RandomForestClassifier(max_depth=5)
forest.fit(X_train, y_train)

y_test_forest = forest.predict(X_test)
y_train_forest = forest.predict(X_train)

acc_train_forest = accuracy_score(y_train, y_train_forest)
acc_test_forest = accuracy_score(y_test, y_test_forest)
precision_train_forest = precision_score(y_train, y_train_forest)
precision_test_forest = precision_score(y_test, y_test_forest)
f1_train_forest = f1_score(y_train, y_train_forest)
f1_test_forest = f1_score(y_test, y_test_forest)
recall_train_forest = recall_score(y_train, y_train_forest)
recall_test_forest = recall_score(y_test, y_test_forest)
auc_train_forest = roc_auc_score(y_train, forest.predict_proba(X_train)[:, 1])
auc_test_forest = roc_auc_score(y_test, forest.predict_proba(X_test)[:, 1])

print("Random Forest: Train Accuracy: {:.3f}, Test Accuracy: {:.3f}".format(acc_train_forest, acc_test_forest))
print("Random Forest Classification Report (Test Set):")
print(classification_report(y_test, y_test_forest))
print("Confusion Matrix (Test Set):\n", confusion_matrix(y_test, y_test_forest))
storeResults('Random Forest', acc_train_forest, acc_test_forest, precision_train_forest, precision_test_forest, f1_train_forest, f1_test_forest, recall_train_forest, recall_test_forest, auc_train_forest, auc_test_forest)



--- Training Random Forest Classifier ---
Random Forest: Train Accuracy: 0.864, Test Accuracy: 0.873
Random Forest Classification Report (Test Set):
              precision    recall  f1-score   support

           0       0.82      0.97      0.89      1025
           1       0.96      0.77      0.85       975

    accuracy                           0.87      2000
   macro avg       0.89      0.87      0.87      2000
weighted avg       0.89      0.87      0.87      2000

Confusion Matrix (Test Set):
 [[996  29]
 [226 749]]


In [9]:
print("\n--- Training Multilayer Perceptrons ---")
mlp = MLPClassifier(alpha=0.001, hidden_layer_sizes=([100, 100, 100]), random_state=42)
mlp.fit(X_train, y_train)

y_test_mlp = mlp.predict(X_test)
y_train_mlp = mlp.predict(X_train)

acc_train_mlp = accuracy_score(y_train, y_train_mlp)
acc_test_mlp = accuracy_score(y_test, y_test_mlp)
precision_train_mlp = precision_score(y_train, y_train_mlp)
precision_test_mlp = precision_score(y_test, y_test_mlp)
f1_train_mlp = f1_score(y_train, y_train_mlp)
f1_test_mlp = f1_score(y_test, y_test_mlp)
recall_train_mlp = recall_score(y_train, y_train_mlp)
recall_test_mlp = recall_score(y_test, y_test_mlp)
auc_train_mlp = roc_auc_score(y_train, mlp.predict_proba(X_train)[:, 1])
auc_test_mlp = roc_auc_score(y_test, mlp.predict_proba(X_test)[:, 1])

print("Multilayer Perceptrons: Train Accuracy: {:.3f}, Test Accuracy: {:.3f}".format(acc_train_mlp, acc_test_mlp))
print("Multilayer Perceptrons Classification Report (Test Set):")
print(classification_report(y_test, y_test_mlp))
print("Confusion Matrix (Test Set):\n", confusion_matrix(y_test, y_test_mlp))
storeResults('Multilayer Perceptrons', acc_train_mlp, acc_test_mlp, precision_train_mlp, precision_test_mlp, f1_train_mlp, f1_test_mlp, recall_train_mlp, recall_test_mlp, auc_train_mlp, auc_test_mlp)



--- Training Multilayer Perceptrons ---
Multilayer Perceptrons: Train Accuracy: 0.881, Test Accuracy: 0.879
Multilayer Perceptrons Classification Report (Test Set):
              precision    recall  f1-score   support

           0       0.84      0.94      0.89      1025
           1       0.93      0.81      0.87       975

    accuracy                           0.88      2000
   macro avg       0.89      0.88      0.88      2000
weighted avg       0.89      0.88      0.88      2000

Confusion Matrix (Test Set):
 [[968  57]
 [184 791]]


In [10]:
print("\n--- Training XGBoost Classifier ---")
xgb = XGBClassifier(learning_rate=0.4, max_depth=7, random_state=42)
xgb.fit(X_train, y_train)

y_test_xgb = xgb.predict(X_test)
y_train_xgb = xgb.predict(X_train)

acc_train_xgb = accuracy_score(y_train, y_train_xgb)
acc_test_xgb = accuracy_score(y_test, y_test_xgb)
precision_train_xgb = precision_score(y_train, y_train_xgb)
precision_test_xgb = precision_score(y_test, y_test_xgb)
f1_train_xgb = f1_score(y_train, y_train_xgb)
f1_test_xgb = f1_score(y_test, y_test_xgb)
recall_train_xgb = recall_score(y_train, y_train_xgb)
recall_test_xgb = recall_score(y_test, y_test_xgb)
auc_train_xgb = roc_auc_score(y_train, xgb.predict_proba(X_train)[:, 1])
auc_test_xgb = roc_auc_score(y_test, xgb.predict_proba(X_test)[:, 1])

print("XGBoost: Train Accuracy: {:.3f}, Test Accuracy: {:.3f}".format(acc_train_xgb, acc_test_xgb))
print("XGBoost Classification Report (Test Set):")
print(classification_report(y_test, y_test_xgb))
print("Confusion Matrix (Test Set):\n", confusion_matrix(y_test, y_test_xgb))
storeResults('XGBoost', acc_train_xgb, acc_test_xgb, precision_train_xgb, precision_test_xgb, f1_train_xgb, f1_test_xgb, recall_train_xgb, recall_test_xgb, auc_train_xgb, auc_test_xgb)



--- Training XGBoost Classifier ---
XGBoost: Train Accuracy: 0.885, Test Accuracy: 0.887
XGBoost Classification Report (Test Set):
              precision    recall  f1-score   support

           0       0.84      0.96      0.90      1025
           1       0.96      0.81      0.87       975

    accuracy                           0.89      2000
   macro avg       0.90      0.89      0.89      2000
weighted avg       0.90      0.89      0.89      2000

Confusion Matrix (Test Set):
 [[988  37]
 [188 787]]


In [11]:
print("\n--- Training Support Vector Machines ---")
svm = SVC(kernel='linear', C=1.0, random_state=12, probability=True)
svm.fit(X_train, y_train)

y_test_svm = svm.predict(X_test)
y_train_svm = svm.predict(X_train)

acc_train_svm = accuracy_score(y_train, y_train_svm)
acc_test_svm = accuracy_score(y_test, y_test_svm)
precision_train_svm = precision_score(y_train, y_train_svm)
precision_test_svm = precision_score(y_test, y_test_svm)
f1_train_svm = f1_score(y_train, y_train_svm)
f1_test_svm = f1_score(y_test, y_test_svm)
recall_train_svm = recall_score(y_train, y_train_svm)
recall_test_svm = recall_score(y_test, y_test_svm)
auc_train_svm = roc_auc_score(y_train, svm.predict_proba(X_train)[:, 1])
auc_test_svm = roc_auc_score(y_test, svm.predict_proba(X_test)[:, 1])

print("SVM: Train Accuracy: {:.3f}, Test Accuracy: {:.3f}".format(acc_train_svm, acc_test_svm))
print("SVM Classification Report (Test Set):")
print(classification_report(y_test, y_test_svm))
print("Confusion Matrix (Test Set):\n", confusion_matrix(y_test, y_test_svm))
storeResults('SVM', acc_train_svm, acc_test_svm, precision_train_svm, precision_test_svm, f1_train_svm, f1_test_svm, recall_train_svm, recall_test_svm, auc_train_svm, auc_test_svm)



--- Training Support Vector Machines ---
SVM: Train Accuracy: 0.860, Test Accuracy: 0.874
SVM Classification Report (Test Set):
              precision    recall  f1-score   support

           0       0.82      0.96      0.89      1025
           1       0.95      0.78      0.86       975

    accuracy                           0.87      2000
   macro avg       0.89      0.87      0.87      2000
weighted avg       0.89      0.87      0.87      2000

Confusion Matrix (Test Set):
 [[988  37]
 [216 759]]


In [12]:
print("\n--- Final Model Comparison ---")
results = pd.DataFrame({
    'ML Model': ML_Model,
    'Train Accuracy': acc_train,
    'Test Accuracy': acc_test,
    'Train Precision': precision_train,
    'Test Precision': precision_test,
    'Train F1-Score': f1_train,
    'Test F1-Score': f1_test,
    'Train Recall': recall_train,
    'Test Recall': recall_test,
    'Train AUC': auc_train,
    'Test AUC': auc_test
})

results_sorted = results.sort_values(by=['Test Accuracy', 'Test F1-Score'], ascending=False)
print(results_sorted)


--- Final Model Comparison ---
                 ML Model  Train Accuracy  Test Accuracy  Train Precision  \
3                 XGBoost           0.885          0.887            0.964   
2  Multilayer Perceptrons           0.881          0.879            0.947   
4                     SVM           0.860          0.874            0.951   
1           Random Forest           0.864          0.873            0.965   
0           Decision Tree           0.860          0.868            0.964   

   Test Precision  Train F1-Score  Test F1-Score  Train Recall  Test Recall  \
3           0.955           0.876          0.875         0.802        0.807   
2           0.933           0.873          0.868         0.810        0.811   
4           0.954           0.845          0.857         0.760        0.778   
1           0.963           0.849          0.855         0.758        0.768   
0           0.959           0.843          0.849         0.749        0.762   

   Train AUC  Test AUC  
3    

In [13]:
print("\n--- Saving and Testing Best Model ---")
pickle.dump(xgb, open("XGBoostClassifier.pickle.dat", "wb"))
print("XGBoost model saved as 'XGBoostClassifier.pickle.dat'.")

loaded_model = pickle.load(open("XGBoostClassifier.pickle.dat", "rb"))
print("XGBoost model loaded successfully.")

# Example prediction with the loaded model (features matching your original notebook)
legitimate_features = pd.DataFrame([{
    'Have_IP': 0, 'Have_At': 0, 'URL_Length': 1, 'URL_Depth': 1,
    'Redirection': 0, 'https_Domain': 0, 'TinyURL': 0, 'Prefix/Suffix': 0,
    'DNS_Record': 0, 'Web_Traffic': 1, 'Domain_Age': 1, 'Domain_End': 1,
    'iFrame': 0, 'Mouse_Over': 0, 'Right_Click': 1, 'Web_Forwards': 0,
    'Has_LoginForm': 0, 'Form_Action_Suspect': 0, 'Suspicious_Keywords': 0
}])

phishing_features = pd.DataFrame([{
    'Have_IP': 1, 'Have_At': 1, 'URL_Length': 1, 'URL_Depth': 5,
    'Redirection': 1, 'https_Domain': 0, 'TinyURL': 1, 'Prefix/Suffix': 1,
    'DNS_Record': 1, 'Web_Traffic': 1, 'Domain_Age': 0, 'Domain_End': 0,
    'iFrame': 1, 'Mouse_Over': 1, 'Right_Click': 1, 'Web_Forwards': 1,
    'Has_LoginForm': 1, 'Form_Action_Suspect': 1, 'Suspicious_Keywords': 1
}])

# Agora o código deve funcionar:
prob_legit = loaded_model.predict_proba(legitimate_features)
prob_phish = loaded_model.predict_proba(phishing_features)

print("\nProbabilidade para URL Legítima (modelo carregado):", prob_legit)
print("Probabilidade para URL de Phishing (modelo carregado):", prob_phish)


--- Saving and Testing Best Model ---
XGBoost model saved as 'XGBoostClassifier.pickle.dat'.
XGBoost model loaded successfully.

Probabilidade para URL Legítima (modelo carregado): [[0.9846673  0.01533267]]
Probabilidade para URL de Phishing (modelo carregado): [[0.3729285 0.6270715]]
