PhishShield

Datasets

The project utilizes two separate datasets, each tailored for training a specific machine learning model.

Dataset for Feature-based Model

The dataset used to train the feature-based model

Dataset for Text-based Model

The dataset used to train the text-based model

Machine Learning Models

The repository includes two machine learning models:

Feature-based Model:

This model is built using Python and popular libraries such as scikit-learn. It employs a supervised learning approach, where the model learns from labeled examples to make predictions based on the 29 URL features.

Text-based Model:

This model also uses Python and machine learning libraries to analyze the text content of URLs. It extracts relevant features from the text and trains a separate classifier to detect phishing websites.

The training process involves utilizing scikit-learn pipelines, which consist of custom transformers for preprocessing data before feeding it to the models. Grid search with cross-validation is used to tune hyperparameters and optimize model performance.

Each model is evaluated using metrics such as accuracy, precision, recall, and F1-score to assess its effectiveness in distinguishing between phishing and legitimate websites.

Flask Deployment with Caching

Both machine learning models are deployed using Flask, a lightweight web framework for Python. The Flask app exposes endpoints to make predictions using the trained models. Additionally, caching to disk is implemented to improve performance by storing results of previous predictions.

Webpage Interface

The web interface is built using HTML, CSS, and Bootstrap to provide a user-friendly experience. Users can input a URL and receive predictions on whether it is a phishing website or not.

Usage

To use the PhishShield, follow these steps:

Clone the repository:

git clone --depth=1 https://github.com/praneeth-katuri/PhishShield.git

Install the required dependencies:

Python Version: 3.12.3
```
pip install -r requirements.txt
```
Run the NLTK setup script:
```
python setup_nltk.py
```
Edit .env file and enter your reCAPTCHA Keys and Flask Secret Key

To generate Flask Secret Key run the below code in terminal and copy the Output key obtained in .env file
```
python -c 'import secrets; print(secrets.token_hex(16))'
```
To start the Flask application, run the following command in your terminal:
```
python app.py
```
To access the webpage interface, open http://127.0.0.1:5000 in your web browser.

Results

The performance of the phishing detection models is evaluated using metrics such as accuracy, precision, recall, and F1-score. The results demonstrate the effectiveness of each model in distinguishing between phishing and legitimate websites.

Feature-based Model

Text-based Model

Contributing

Contributions to this project are welcome! If you have ideas for improvements or new features, feel free to open an issue or submit a pull request.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 62 Commits
datafiles		datafiles
models		models
preprocessing		preprocessing
screenshots		screenshots
static		static
templates		templates
.env		.env
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
app.py		app.py
build_feature_model.py		build_feature_model.py
build_text_model.py		build_text_model.py
requirements.txt		requirements.txt
setup_nltk.py		setup_nltk.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PhishShield

Table of Contents

Datasets

Dataset for Feature-based Model

Dataset for Text-based Model

Machine Learning Models

Feature-based Model:

Text-based Model:

Flask Deployment with Caching

Webpage Interface

Usage

Results

Feature-based Model

Text-based Model

Contributing

License

About

Languages

License

praneeth-katuri/PhishShield

Folders and files

Latest commit

History

Repository files navigation

PhishShield

Table of Contents

Datasets

Dataset for Feature-based Model

Dataset for Text-based Model

Machine Learning Models

Feature-based Model:

Text-based Model:

Flask Deployment with Caching

Webpage Interface

Usage

Results

Feature-based Model

Text-based Model

Contributing

License

About

Topics

Resources

License

Stars

Watchers

Forks

Languages