# Fake News Detector - Test Model
The notebook covers the testing the model workflow run on ISOT Fake News detection dataset, provided by Kaggle.

The Kaggle Link : https://www.kaggle.com/datasets/clmentbisaillon/fake-and-real-news-dataset

In [1]:
# Step 1: Install required packages
!pip install kagglehub[hf-datasets] pandas --quiet
!python -m nltk.downloader punkt_tab wordnet stopwords > /dev/null

[nltk_data] Downloading package punkt_tab to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt_tab.zip.
[nltk_data] Downloading package wordnet to /root/nltk_data...
[nltk_data] Downloading package stopwords to /root/nltk_data...
[nltk_data]   Unzipping corpora/stopwords.zip.


In [None]:
# Step 2: Import libraries
import re
from time import time
import joblib
import nltk
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize
from nltk.stem import WordNetLemmatizer

## Load the model

In [7]:

model_path = "/content/fake_news_model.pkl"

try:
    model = joblib.load(model_path)
    print(f"Model loaded successfully from {model_path}")
except FileNotFoundError:
    print(f"Error: Model file not found at {model_path}")
    model = None

Model loaded successfully from /content/fake_news_model.pkl


## Defining the text preprocessing function

In [12]:
# Text preprocessing function
def preprocess_text(text):
    if not isinstance(text, str):
        return ""

    # Lowercase
    text = text.lower()

    # Remove special characters/numbers except basic punctuation
    text = re.sub(r'[^a-zA-Z\s.,!?]', '', text)

    # Remove URLs
    text = re.sub(r'https?://\S+|www\.\S+', '', text)

    # Tokenize
    tokens = word_tokenize(text)

    # Remove stopwords and lemmatize
    lemmatizer = WordNetLemmatizer()
    stop_words = set(stopwords.words('english'))
    tokens = [lemmatizer.lemmatize(word) for word in tokens if word not in stop_words]

    # Remove short words (length < 2)
    tokens = [word for word in tokens if len(word) > 1]

    return ' '.join(tokens)

## Build test samples function

In [13]:
# Test with sample predictions
def test_samples(model):
    samples = [
        ("Scientists confirm climate change is accelerating at unprecedented rates. New data shows ice caps melting 40% faster than previous estimates.", 0),
        ("BREAKING: Celebrities injecting themselves with alien DNA to gain immortality, secret documents reveal!", 1),
        ("New study finds that regular exercise can reduce the risk of heart disease by up to 30%", 0),
        ("Government secretly adding mind-control chemicals to drinking water supplies nationwide", 1)
    ]

    print("\n🧪 Sample Predictions:")
    for text, true_label in samples:
        processed = preprocess_text(text)
        prediction = model.predict([processed])[0]
        proba = model.predict_proba([processed])[0]

        print(f"\nText: {text[:80]}...")
        print(f"True: {'Fake' if true_label == 1 else 'Real'}")
        print(f"Pred: {'Fake' if prediction == 1 else 'Real'}")
        print(f"Confidence: {max(proba)*100:.1f}%")
        print(f"Probabilities: [Real: {proba[0]:.4f}, Fake: {proba[1]:.4f}]")

# Test samples
test_samples(model)


🧪 Sample Predictions:

Text: Scientists confirm climate change is accelerating at unprecedented rates. New da...
True: Real
Pred: Fake
Confidence: 64.0%
Probabilities: [Real: 0.3603, Fake: 0.6397]

Text: BREAKING: Celebrities injecting themselves with alien DNA to gain immortality, s...
True: Fake
Pred: Fake
Confidence: 91.2%
Probabilities: [Real: 0.0882, Fake: 0.9118]

Text: New study finds that regular exercise can reduce the risk of heart disease by up...
True: Real
Pred: Fake
Confidence: 59.6%
Probabilities: [Real: 0.4036, Fake: 0.5964]

Text: Government secretly adding mind-control chemicals to drinking water supplies nat...
True: Fake
Pred: Fake
Confidence: 54.4%
Probabilities: [Real: 0.4562, Fake: 0.5438]
