Skip to content

swayamsankar/Factify

Repository files navigation

Factify: AI-Based Fake News & Email Detection System

AI News & Email Classifier with Deep Learning

The rapid proliferation of fake news and spam/phishing emails presents significant challenges to information integrity and cybersecurity. Factify is an intelligent deep learning–based system that automatically classifies news articles as "Real" or "Fake" and detects Spam/Phishing Emails with an accuracy of up to 98.7%.

Modern social media platforms, online news portals, and digital communication systems enable rapid information sharing. However, they also accelerate the spread of misinformation and malicious emails. These forms of digital deception undermine public trust, distort democratic processes, and can lead to financial fraud, identity theft, reputational damage, and public panic.

The core objective of this project is to automatically distinguish between fake and real news articles and identify spam/phishing emails using advanced Natural Language Processing (NLP), Deep Learning, and Explainable AI techniques.


πŸ‘¨β€πŸ’» Developed By

  • Swayam Sankar Nayak
  • Tushar Mallick
  • Rachna
  • Priyanshu Kumari

πŸ–ΌοΈ Project Preview

WhatsApp Image 2026-02-24 at 17 12 45 WhatsApp Image 2026-02-24 at 17 12 44 WhatsApp Image 2026-02-24 at 17 12 44 (2)

πŸ”‘ Key Features

  • 98.7% Accuracy using Hybrid LSTM–GRU architecture
  • Detection of both Fake News & Spam/Phishing Emails
  • Multilingual support using Transformer models (mBERT / XLM-R)
  • Explainable AI integration (LIME, SHAP, Attention Visualization)
  • End-to-End CI/CD Pipeline with Docker & GitHub Actions
  • Web-based Interface (HTML/CSS Frontend + Flask Backend)
  • Microservice-ready Architecture for scalability
  • Multiple Neural Network architectures for comparison

πŸ› οΈ Technical Stack

Component Technology
Programming Language Python 3.9
Machine Learning TensorFlow 2.8
NLP & Data Processing Pandas, NLTK
Transformer Models mBERT, XLM-R
Visualization Matplotlib, Seaborn
Backend Flask
Frontend HTML, CSS
Containerization Docker
CI/CD GitHub Actions

🧠 Model Architecture

πŸ”Ή Hybrid LSTM–GRU (Best Performing Model)

Sequential([
    Embedding(10000, 100),
    LSTM(100, return_sequences=True),
    GRU(100),
    Dropout(0.2),
    Dense(1, activation='sigmoid')
])

Why Hybrid LSTM–GRU?

  • LSTM captures long-term dependencies
  • GRU improves computational efficiency
  • Dropout prevents overfitting
  • Sigmoid outputs binary classification probability

πŸ“Š Model Performance Comparison

Model Precision Recall F1-Score Final Accuracy Key Features
Logistic Regression (TF-IDF) 88% 87% 87% 89% Sparse TF-IDF
SVM 90% 90% 90% 91% Margin-based classifier
LSTM 95% 94% 94% 95% Sequential learning
Transformer (mBERT / XLM-R) 98% 98% 98% 98.2% Multilingual embeddings
Hybrid LSTM–GRU + Transformer + LIME & SHAP (Proposed) 98.7% 98.6% 98.65% 98.75% Hybrid Deep Learning + Explainable AI

πŸ—οΈ System Architecture

graph TD
    A[Input Data] --> B[Preprocessing]
    B --> C[Feature Extraction]
    C --> D[Model Inference]
    D --> E[Prediction Output]
Loading

🧩 System Components

1️⃣ Data Ingestion Layer

  • CSV / JSON file support
  • Email header input support
  • Database connectors

2️⃣ Processing Layer

  • Text normalization
  • Tokenization
  • Sequence padding
  • URL extraction (for emails)
  • MX/SPF validation

3️⃣ Model Layer

  • Ensemble of LSTM variants
  • Transformer-based classifiers
  • Metadata-driven phishing engine
  • Model versioning

πŸ”„ Data Pipeline

πŸ“‚ Data Sources

  • Kaggle True/Fake News Dataset
  • 42,000 balanced labeled articles
  • Curated Spam/Phishing Email Samples

🧹 Preprocessing Steps

1. Cleaning

  • URL removal (news)
  • HTML tag stripping
  • Special character removal

2. Normalization

  • Lowercasing
  • Stopword removal
  • Stemming / Lemmatization

3. Feature Engineering

  • Word counts
  • Sentence counts
  • Character counts
  • Suspicious keyword frequency (emails)
  • URL-based risk features

πŸ“Š Data Schema

Column Type Description
clean_text String Processed news/email content
target Integer 0 = Fake/Spam, 1 = Real/Legitimate

πŸ” Explainable AI Integration

Factify integrates advanced interpretability tools:

  • LIME – Highlights influential words locally
  • SHAP – Provides feature contribution scores
  • Attention Visualization – Shows important tokens in predictions

This ensures transparency, trust, and practical usability.


πŸ† Final Results

  • βœ… 98.75% Test Accuracy
  • βœ… High Precision & Recall Balance
  • βœ… Multilingual Robustness
  • βœ… Metadata-aware Email Detection
  • βœ… Deployable Web Application

πŸ“ Project Folder Structure

Factify/
β”œβ”€β”€ .github/
β”‚   └── workflows/
β”‚       └── ci.yml                 # CI/CD pipeline configuration
β”‚
β”œβ”€β”€ config/
β”‚   └── config.yaml                # Hyperparameters & file paths
β”‚
β”œβ”€β”€ data/
β”‚   β”œβ”€β”€ raw/                       # Original datasets (True.csv, Fake.csv, Emails)
β”‚   └── processed/                 # Cleaned & processed datasets
β”‚
β”œβ”€β”€ logs/
β”‚   └── prediction.log             # Model prediction logs
β”‚
β”œβ”€β”€ models/
β”‚   └── saved_models/              # Trained models & tokenizer files
β”‚
β”œβ”€β”€ notebooks/
β”‚   └── experiment.ipynb           # EDA & experimentation
β”‚
β”œβ”€β”€ src/
β”‚   β”œβ”€β”€ __init__.py
β”‚   β”œβ”€β”€ data_loader.py             # Data loading utilities
β”‚   β”œβ”€β”€ preprocessing.py           # Text cleaning & tokenization
β”‚   β”œβ”€β”€ eda.py                     # Exploratory Data Analysis
β”‚   β”œβ”€β”€ model.py                   # Model building & training
β”‚   β”œβ”€β”€ evaluation.py              # Performance evaluation
β”‚   β”œβ”€β”€ utils.py                   # Helper functions
β”‚   └── logger.py                  # Logging configuration
β”‚
β”œβ”€β”€ static/
β”‚   └── style.css                  # Frontend styling
β”‚
β”œβ”€β”€ templates/
β”‚   └── index.html                 # Web interface
β”‚
β”œβ”€β”€ test/
β”‚   └── test_app.py                # Unit tests
β”‚
β”œβ”€β”€ img/                           # Visualization images
β”œβ”€β”€ Dockerfile                     # Container configuration
β”œβ”€β”€ render.yaml                    # Deployment configuration
β”œβ”€β”€ setup.py                       # Package setup
β”œβ”€β”€ requirements.txt               # Dependencies
β”œβ”€β”€ app.py                         # Flask API backend
β”œβ”€β”€ main.py                        # Pipeline execution script
β”œβ”€β”€ README.md                      # Project documentation
└── LICENSE

πŸš€ Future Enhancements

  • Multimodal analysis (Images & Videos)
  • Adversarial robustness improvements
  • Model compression for lightweight deployment
  • Real-time browser & email extensions

πŸ“Œ Conclusion

Factify is a unified, scalable, and explainable AI system that detects both fake news articles and spam/phishing emails with high accuracy. By combining deep learning, transformer models, metadata analysis, and explainable AI techniques, the system provides a reliable defense mechanism against digital misinformation and cyber threats.

Developed as a collaborative major project by:

Swayam Sankar Nayak, Tushar Mallick, Rachna, and Priyanshu Kumari


πŸ“¬ Contact Information


About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors