# 📰 Fake News Detection using Machine Learning
This Jupyter Notebook implements a **Fake News Detection System** using machine learning techniques.

**Contents:**
- Introduction
- Data Preprocessing
- Model Training & Evaluation
- Predicting Fake News


## 📌 Introduction
Fake news has become a critical issue in today's digital world. This project aims to classify news articles as **real** or **fake** using **Natural Language Processing (NLP)** techniques and **machine learning models**.

**Goal:** Develop a model that accurately identifies fake news.

## 🗄️ Data Preprocessing
Before building the model, we perform:
- Loading the dataset
- Removing missing values
- Text preprocessing (lowercasing, stopword removal, stemming, etc.)
- Vectorization using **TF-IDF** or **Count Vectorizer**

## 🤖 Model Training & Evaluation
We train different machine learning models, such as:
- **Logistic Regression**
- **Naïve Bayes Classifier**
- **Support Vector Machines (SVM)**
- **Random Forest Classifier**

We evaluate the models using accuracy, precision, recall, and F1-score.

## 📰 Predicting Fake News
After training the model, we test it by feeding new articles and classifying them as **real or fake**. The model outputs a prediction based on the trained algorithm.

## 📌 Conclusion
This notebook demonstrates a **Fake News Detection System** using machine learning. Future improvements can include deep learning models like LSTMs or transformers (BERT, GPT) for better accuracy.

In [6]:
import pandas as pd
import numpy as np
import re
import string
import nltk
from nltk.corpus import stopwords
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, classification_report

# Load dataset
file_path = "WELFake_Dataset.csv"  # Update path if needed
df = pd.read_csv(file_path)

# Drop unnecessary index column if present
if 'Unnamed: 0' in df.columns:
    df = df.drop(columns=['Unnamed: 0'])

# Fill missing titles with an empty string
df['title'] = df['title'].fillna("")

# Drop rows where text is missing (text is essential)
df = df.dropna(subset=['text'])

# Combine title and text for processing
df['content'] = df['title'] + " " + df['text']

# Ensure labels are numeric (already 0 = Fake, 1 = Real)
df['label'] = df['label'].astype(int)

# Download necessary NLTK resources
nltk.download('stopwords')
nltk.download('punkt')
stop_words = set(stopwords.words('english'))

# Text preprocessing function
def preprocess_text(text):
    if not isinstance(text, str):  # Ensure input is a string
        return ""
    text = text.lower()
    text = re.sub(r'\d+', '', text)  # Remove numbers
    text = text.translate(str.maketrans('', '', string.punctuation))  # Remove punctuation
    text = text.strip()
    words = text.split()
    words = [word for word in words if word not in stop_words]  # Remove stopwords
    return " ".join(words) if words else ""

df['clean_text'] = df['content'].apply(preprocess_text)

# Remove empty processed rows
df = df.dropna(subset=['clean_text'])
df = df[df['clean_text'].str.strip() != ""]

# Vectorization using TF-IDF
vectorizer = TfidfVectorizer(max_features=5000)
X = vectorizer.fit_transform(df['clean_text'])
y = df['label']

# Train-test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Model training
model = LogisticRegression()
model.fit(X_train, y_train)

# Predictions
y_pred = model.predict(X_test)

# Evaluation
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy:.4f}")
print(classification_report(y_test, y_pred))

# Function to predict if a news paragraph is real or fake
def predict_news(news_text):
    processed_text = preprocess_text(news_text)  # Preprocess input
    transformed_text = vectorizer.transform([processed_text])  # Vectorize input
    prediction = model.predict(transformed_text)[0]  # Predict (0 = Fake, 1 = Real)
    return "Real News ✅" if prediction == 1 else "Fake News ❌"

# Example input
news_paragraph = input("Enter a news paragraph to check if it's Fake or Real:\n")
print("Prediction:", predict_news(news_paragraph))

[nltk_data] Downloading package stopwords to /root/nltk_data...
[nltk_data]   Package stopwords is already up-to-date!
[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Package punkt is already up-to-date!


Accuracy: 0.9480
              precision    recall  f1-score   support

           0       0.95      0.95      0.95      7010
           1       0.95      0.95      0.95      7409

    accuracy                           0.95     14419
   macro avg       0.95      0.95      0.95     14419
weighted avg       0.95      0.95      0.95     14419

Enter a news paragraph to check if it's Fake or Real:
All we can say on this one is it s about time someone sued the Southern Poverty Law Center!On Tuesday, D. James Kennedy Ministries (DJKM) filed a lawsuit against the Southern Poverty Law Center (SPLC), the charity navigation organization GuideStar, and Amazon, for defamation, religious discrimination, and trafficking in falsehood. The SPLC listed DJKM as a  hate group,  while GuideStar also categorized it in those terms, and Amazon kept the ministry off of its charity donation program, Amazon Smile. We embarked today on a journey to right a terrible wrong,  Dr. Frank Wright, president and CEO at