Phishing Detection Model

Experimenting with hybrid models to detect phishing emails on online platforms.

Done for SMU mod.

Project's Inspiration and Description

Phishing scams exist even today, with losses amounting to over $660 million in just 2022. As scammers are constantly evolving their phishing tactics coupled with the increase in online activities by the general public, we wanted to invent an improved model with better detection and classification of potential phishing scams, so as to protect society and our loved ones.

Generally, phishing emails detection can be divided into two types: the blacklist method and the machine-learning method. The blacklist checks the email address/url against a blacklist of known phish. Although blacklist is simple and efficient, a paper which analysed the effectiveness of phishing blacklists concluded that it was ineffective against fresh feed. Which is why we will be turning to the machine learning method.

What we did different from most studies out there who applied machine learning models is that:

trained on a larger dataset where 11000 sample data will be used
tested different models and compared their results
factored in sentiment analysis in the prediction
combined different models together to predict phishing emails

Project Methodoloy

First, we break down the email into 3 components: Text Message, Sentiment of the Email, and URLs of the email. Secondly, we clean the dataset using regex handling, spacing issues, etc. Then we will conduct analysis on each of them, train them on different models and identify the best models for each component. After that, we will combine the models together to form an Ensemble Model. The Ensemble Model with be used to test on a foreign dataset. Finally, we will get the performance of the Ensemble Model and compare the results with that of other models respectively.

Some Results

While the final Ensemble Model does seem to have some suspicious results, the accuracy of each individual models are still relatively high.

Here are some of them (based on original ipynb results):

Future works

Chrome extension that can automate detection and analysis of gmail content, presenting individual model results
Allow model to be used as Web API or pickle package

Name		Name	Last commit message	Last commit date
Latest commit History 42 Commits
cleaned_datasets		cleaned_datasets
docs		docs
images		images
models		models
official_datasets		official_datasets
.env.example		.env.example
.gitignore		.gitignore
ML Project Email Phishing-updated-cleaned.ipynb		ML Project Email Phishing-updated-cleaned.ipynb
ML Project Email Phishing-updated.ipynb		ML Project Email Phishing-updated.ipynb
ML Project Email Phishing.ipynb		ML Project Email Phishing.ipynb
Models.ipynb		Models.ipynb
README.md		README.md
load_models.py		load_models.py
models.py		models.py
requirements.txt		requirements.txt

jenniupdates/phishing-detection

Folders and files

Latest commit

History

Repository files navigation

Phishing Detection Model

Project's Inspiration and Description

Project Methodoloy

Some Results

Future works

About

Resources

Stars

Watchers

Forks

Languages