The rapid proliferation of fake news and spam/phishing emails presents significant challenges to information integrity and cybersecurity. Factify is an intelligent deep learningβbased system that automatically classifies news articles as "Real" or "Fake" and detects Spam/Phishing Emails with an accuracy of up to 98.7%.
Modern social media platforms, online news portals, and digital communication systems enable rapid information sharing. However, they also accelerate the spread of misinformation and malicious emails. These forms of digital deception undermine public trust, distort democratic processes, and can lead to financial fraud, identity theft, reputational damage, and public panic.
The core objective of this project is to automatically distinguish between fake and real news articles and identify spam/phishing emails using advanced Natural Language Processing (NLP), Deep Learning, and Explainable AI techniques.
- Swayam Sankar Nayak
- Tushar Mallick
- Rachna
- Priyanshu Kumari
- 98.7% Accuracy using Hybrid LSTMβGRU architecture
- Detection of both Fake News & Spam/Phishing Emails
- Multilingual support using Transformer models (mBERT / XLM-R)
- Explainable AI integration (LIME, SHAP, Attention Visualization)
- End-to-End CI/CD Pipeline with Docker & GitHub Actions
- Web-based Interface (HTML/CSS Frontend + Flask Backend)
- Microservice-ready Architecture for scalability
- Multiple Neural Network architectures for comparison
| Component | Technology |
|---|---|
| Programming Language | Python 3.9 |
| Machine Learning | TensorFlow 2.8 |
| NLP & Data Processing | Pandas, NLTK |
| Transformer Models | mBERT, XLM-R |
| Visualization | Matplotlib, Seaborn |
| Backend | Flask |
| Frontend | HTML, CSS |
| Containerization | Docker |
| CI/CD | GitHub Actions |
Sequential([
Embedding(10000, 100),
LSTM(100, return_sequences=True),
GRU(100),
Dropout(0.2),
Dense(1, activation='sigmoid')
])- LSTM captures long-term dependencies
- GRU improves computational efficiency
- Dropout prevents overfitting
- Sigmoid outputs binary classification probability
| Model | Precision | Recall | F1-Score | Final Accuracy | Key Features |
|---|---|---|---|---|---|
| Logistic Regression (TF-IDF) | 88% | 87% | 87% | 89% | Sparse TF-IDF |
| SVM | 90% | 90% | 90% | 91% | Margin-based classifier |
| LSTM | 95% | 94% | 94% | 95% | Sequential learning |
| Transformer (mBERT / XLM-R) | 98% | 98% | 98% | 98.2% | Multilingual embeddings |
| Hybrid LSTMβGRU + Transformer + LIME & SHAP (Proposed) | 98.7% | 98.6% | 98.65% | 98.75% | Hybrid Deep Learning + Explainable AI |
graph TD
A[Input Data] --> B[Preprocessing]
B --> C[Feature Extraction]
C --> D[Model Inference]
D --> E[Prediction Output]
- CSV / JSON file support
- Email header input support
- Database connectors
- Text normalization
- Tokenization
- Sequence padding
- URL extraction (for emails)
- MX/SPF validation
- Ensemble of LSTM variants
- Transformer-based classifiers
- Metadata-driven phishing engine
- Model versioning
- Kaggle True/Fake News Dataset
- 42,000 balanced labeled articles
- Curated Spam/Phishing Email Samples
- URL removal (news)
- HTML tag stripping
- Special character removal
- Lowercasing
- Stopword removal
- Stemming / Lemmatization
- Word counts
- Sentence counts
- Character counts
- Suspicious keyword frequency (emails)
- URL-based risk features
| Column | Type | Description |
|---|---|---|
| clean_text | String | Processed news/email content |
| target | Integer | 0 = Fake/Spam, 1 = Real/Legitimate |
Factify integrates advanced interpretability tools:
- LIME β Highlights influential words locally
- SHAP β Provides feature contribution scores
- Attention Visualization β Shows important tokens in predictions
This ensures transparency, trust, and practical usability.
- β 98.75% Test Accuracy
- β High Precision & Recall Balance
- β Multilingual Robustness
- β Metadata-aware Email Detection
- β Deployable Web Application
Factify/
βββ .github/
β βββ workflows/
β βββ ci.yml # CI/CD pipeline configuration
β
βββ config/
β βββ config.yaml # Hyperparameters & file paths
β
βββ data/
β βββ raw/ # Original datasets (True.csv, Fake.csv, Emails)
β βββ processed/ # Cleaned & processed datasets
β
βββ logs/
β βββ prediction.log # Model prediction logs
β
βββ models/
β βββ saved_models/ # Trained models & tokenizer files
β
βββ notebooks/
β βββ experiment.ipynb # EDA & experimentation
β
βββ src/
β βββ __init__.py
β βββ data_loader.py # Data loading utilities
β βββ preprocessing.py # Text cleaning & tokenization
β βββ eda.py # Exploratory Data Analysis
β βββ model.py # Model building & training
β βββ evaluation.py # Performance evaluation
β βββ utils.py # Helper functions
β βββ logger.py # Logging configuration
β
βββ static/
β βββ style.css # Frontend styling
β
βββ templates/
β βββ index.html # Web interface
β
βββ test/
β βββ test_app.py # Unit tests
β
βββ img/ # Visualization images
βββ Dockerfile # Container configuration
βββ render.yaml # Deployment configuration
βββ setup.py # Package setup
βββ requirements.txt # Dependencies
βββ app.py # Flask API backend
βββ main.py # Pipeline execution script
βββ README.md # Project documentation
βββ LICENSE
- Multimodal analysis (Images & Videos)
- Adversarial robustness improvements
- Model compression for lightweight deployment
- Real-time browser & email extensions
Factify is a unified, scalable, and explainable AI system that detects both fake news articles and spam/phishing emails with high accuracy. By combining deep learning, transformer models, metadata analysis, and explainable AI techniques, the system provides a reliable defense mechanism against digital misinformation and cyber threats.
Developed as a collaborative major project by:
Swayam Sankar Nayak, Tushar Mallick, Rachna, and Priyanshu Kumari
- π§ Email: swayamsankar898@gmail.com
- π GitHub: https://github.com/swayamsankar
- π§ Email: rachnachaubey2002@gmail.com
- π GitHub: https://github.com/rachna108
- π§ Email: kumaripriyanshu2404@gmail.com
- π GitHub: https://github.com/Priya24-ux
- π§ Email: tusharmallick354@gmail.com
- π GitHub: https://github.com/TusharMallick123


