A comprehensive multi-model fake news detection system using BERT, RoBERTa, and TF-IDF models with ensemble predictions and explainability.
- Multi-Model Architecture: Combines BERT, RoBERTa, and TF-IDF models for robust predictions
- Ensemble Methods: Multiple ensemble strategies including confidence-weighted, majority vote, and adaptive methods
- Explainability: Detailed explanations for predictions with text analysis and indicator detection
- Flexible Interface: CLI, batch processing, and interactive modes
- Risk Assessment: Automatic risk level calculation (HIGH/MEDIUM/LOW)
- Comprehensive Analysis: Text features, sentiment analysis, and credibility indicators
- Python 3.8+
- PyTorch 2.0+
- Transformers 4.30+
- See
requirements.txtfor full dependencies
# Clone the repository
git clone <repository-url>
cd fake-news-detection
# Install dependencies
pip install -r requirements.txt
# Ensure model files are in place
# models/bert/final_model/
# models/roberta/final_model/
# models/tf_idf/fake_news_classifier.joblib
# models/tf_idf/tfidf_model.joblibβββ main.py # Main entry point with CLI
βββ services/
β βββ inference.py # Model inference for all three models
β βββ orchestrator.py # Ensemble prediction logic
β βββ explainer.py # Explanation and interpretability
βββ models/
β βββ bert/final_model/ # BERT model files
β βββ roberta/final_model/ # RoBERTa model files
β βββ tf_idf/ # TF-IDF model and vectorizer
βββ requirements.txt # Python dependencies
The easiest way to get started:
# Run the demo with sample texts
python demo.py --demo
# Or start interactive mode (default)
python demo.py
# Or use the menu-driven interface
bash run.shpython main.py --text "Scientists discover groundbreaking cure for cancer"python main.py --file article.txt# Create a file with one text per line
python main.py --batch news_articles.txtpython main.py --interactive# Use only BERT and RoBERTa
python main.py --text "News text..." --no-tfidf
# Use only TF-IDF
python main.py --text "News text..." --no-bert --no-robertapython main.py --text "News..." --ensemble majority_vote
# Available: majority_vote, weighted_average, unanimous, confidence_weighted, adaptive# Simple output
python main.py --text "News..." --simple
# No explanations
python main.py --text "News..." --no-explain
# Quiet mode
python main.py --text "News..." --quietfrom main import FakeNewsDetector
# Initialize detector
detector = FakeNewsDetector(
use_bert=True,
use_roberta=True,
use_tfidf=True,
ensemble_method='confidence_weighted'
)
# Analyze text
text = "Breaking news: Scientists discover miracle cure!"
result = detector.predict(text, explain=True)
# Print formatted result
detector.print_result(result, detailed=True)
# Access prediction data
print(f"Prediction: {result['final_prediction']}")
print(f"Confidence: {result['confidence']:.2%}")
print(f"Risk Level: {result['risk_level']}")from services.inference import MultiModelInference
# Initialize inference
inference = MultiModelInference(
use_bert=True,
use_roberta=True,
use_tfidf=True
)
# Get predictions from all models
predictions = inference.predict_all(text)
# Access individual model predictions
print(predictions['BERT'])
print(predictions['RoBERTa'])
print(predictions['TF-IDF'])from services.orchestrator import PredictionOrchestrator, EnsembleMethod
# Initialize orchestrator
orchestrator = PredictionOrchestrator(
ensemble_method=EnsembleMethod.CONFIDENCE_WEIGHTED
)
# Get ensemble prediction
ensemble_result = orchestrator.ensemble_predict(predictions)
# Print interpretation
interpretation = orchestrator.interpret_results(ensemble_result)
print(interpretation)from services.explainer import PredictionExplainer
# Initialize explainer
explainer = PredictionExplainer()
# Get explanation for a prediction
explanation = explainer.explain_prediction(
text=text,
prediction='FAKE',
confidence=0.92,
model_name='BERT'
)
# Get ensemble explanation
ensemble_explanation = explainer.explain_ensemble(
text=text,
individual_predictions=predictions,
ensemble_result=ensemble_result
)Weights predictions by their confidence scores. Best for balanced results.
Simple voting system. Each model gets one vote.
Uses predefined model weights. Customize with model_weights parameter.
Requires all models to agree. Returns UNCERTAIN if they disagree.
Chooses strategy based on model agreement level.
======================================================================
FAKE NEWS DETECTION RESULT
======================================================================
π« PREDICTION: FAKE
π Confidence: 89.23%
π€ Consensus: 85.00%
β οΈ Risk Level: HIGH
----------------------------------------------------------------------
INDIVIDUAL MODEL PREDICTIONS
----------------------------------------------------------------------
BERT:
Prediction: FAKE
Confidence: 89.50%
Probabilities: REAL=10.50%, FAKE=89.50%
RoBERTa:
Prediction: FAKE
Confidence: 92.00%
Probabilities: REAL=8.00%, FAKE=92.00%
TF-IDF:
Prediction: REAL
Confidence: 65.00%
Probabilities: REAL=65.00%, FAKE=35.00%
----------------------------------------------------------------------
KEY FINDINGS
----------------------------------------------------------------------
β’ β οΈ Models disagree: 2 predict FAKE, 1 predict REAL
β’ β High average confidence: 82.2%
β’ β οΈ Multiple fake news indicators found (5 total)
----------------------------------------------------------------------
RECOMMENDATION
----------------------------------------------------------------------
β This content is likely FAKE NEWS. Do NOT share or trust this information.
======================================================================
The explainer analyzes:
- Text Features: Word count, sentence structure, capitalization patterns
- Fake Indicators: Clickbait phrases, emotional language, unreliable sources
- Credibility Indicators: Source attribution, balanced language, specific details
- Readability: Complexity score and reading level
- Sentiment: Positive/negative emotional content
- HIGH: Strong indication of fake news or low confidence in authenticity
- MEDIUM: Moderate confidence, verification recommended
- LOW: High confidence in assessment
- UNKNOWN: Unable to determine (errors or insufficient data)
custom_weights = {
'BERT': 0.3,
'RoBERTa': 0.5,
'TF-IDF': 0.2
}
orchestrator = PredictionOrchestrator(
ensemble_method=EnsembleMethod.WEIGHTED_AVERAGE,
model_weights=custom_weights
)explainer = PredictionExplainer()
# Add custom fake news indicators
explainer.fake_indicators['custom'] = [
'phrase1', 'phrase2', 'phrase3'
]
# Add custom credibility indicators
explainer.credibility_indicators['custom'] = [
'verified by', 'confirmed by', 'according to experts'
]- GPU Acceleration: Models automatically use GPU if available
- Batch Processing: Use batch mode for multiple texts to improve efficiency
- Model Selection: Disable unused models to reduce memory usage
- Simple Output: Use
--simpleflag for faster processing
# Ensure model files are in correct locations
ls models/bert/final_model/
ls models/roberta/final_model/
ls models/tf_idf/# Use fewer models
python main.py --text "..." --no-bert
# Or process in smaller batches# Force CPU mode if GPU issues occur
export CUDA_VISIBLE_DEVICES=""python main.py -t "Scientists announce breakthrough in renewable energy" -spython main.py -f suspicious_article.txt -e confidence_weighted# Create file: news_batch.txt with one article per line
python main.py -b news_batch.txt --no-explainfrom main import FakeNewsDetector
detector = FakeNewsDetector(verbose=False)
articles = [
"Breaking: Aliens land in Times Square!",
"Study shows effectiveness of new vaccine",
"You won't believe this shocking secret!"
]
for article in articles:
result = detector.predict(article, explain=False)
print(f"{result['final_prediction']}: {article[:50]}...")Contributions are welcome! Areas for improvement:
- Additional ensemble methods
- More sophisticated text analysis
- Model fine-tuning utilities
- Web interface
- API endpoint
See LICENSE.md for details.
- BERT: Google Research
- RoBERTa: Facebook AI
- Transformers library: Hugging Face
For questions or issues, please open a GitHub issue.
Note: This system is for educational and research purposes. Always verify important information from multiple trusted sources.