# 📰 Fake News Detection (Beginner Friendly, Google Colab)

This notebook will guide you step-by-step to build a Fake News Detector using Python and machine learning. No advanced knowledge required!

We'll use the [Fake and Real News Dataset](https://www.kaggle.com/datasets/clmentbisaillon/fake-and-real-news-dataset) from Kaggle. Follow along and you'll have a working model in less than 30 minutes!

## Step 1: Install and Import Libraries

Google Colab has most libraries pre-installed. Just import them below. If you get an error, you can install missing packages with `!pip install package-name`.

In [1]:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, classification_report

## Step 2: Download the Dataset from Kaggle

We need to download the dataset from Kaggle. You need a **Kaggle account** and an **API key**.

1. Go to your Kaggle account settings: https://www.kaggle.com/settings
2. Scroll down to "API" and click "Create New API Token". This downloads a `kaggle.json` file (your API key).
3. Upload `kaggle.json` to this Colab session (use the file upload icon on the left sidebar, or run the code below).

In [2]:
from google.colab import files
uploaded = files.upload()  # Upload kaggle.json here

Saving kaggle.json to kaggle.json


Now, let's set up Kaggle API and download the dataset.

In [3]:
import os
!mkdir -p ~/.kaggle
!cp kaggle.json ~/.kaggle/
!chmod 600 ~/.kaggle/kaggle.json
!kaggle datasets download -d clmentbisaillon/fake-and-real-news-dataset
!unzip fake-and-real-news-dataset.zip

The syntax of the command is incorrect.
'cp' is not recognized as an internal or external command,
operable program or batch file.
'chmod' is not recognized as an internal or external command,
operable program or batch file.
'kaggle' is not recognized as an internal or external command,
operable program or batch file.
'unzip' is not recognized as an internal or external command,
operable program or batch file.


## Step 3: Load and Combine the Dataset

This dataset has two files: `Fake.csv` and `True.csv`. Let's load and combine them.

In [4]:
fake = pd.read_csv('Fake.csv')
true = pd.read_csv('True.csv')

fake['label'] = 'FAKE'
true['label'] = 'REAL'

df = pd.concat([fake, true]).sample(frac=1, random_state=42).reset_index(drop=True)
df = df[['text', 'label']]
df.head()

FileNotFoundError: [Errno 2] No such file or directory: 'Fake.csv'

## Step 4: Split the Data into Training and Test Sets

In [5]:
X = df['text']
y = df['label']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
print(f'Train size: {len(X_train)}, Test size: {len(X_test)}')

Train size: 35918, Test size: 8980


## Step 5: TF-IDF Vectorization (Convert Text to Numbers)

This step turns text into a format the machine learning model can understand.

In [8]:
vectorizer = TfidfVectorizer(stop_words='english', max_df=0.7)
X_train_tfidf = vectorizer.fit_transform(X_train)
X_test_tfidf = vectorizer.transform(X_test)
print(f'Train TF-IDF shape: {X_train_tfidf.shape}')

Train TF-IDF shape: (35918, 111193)


## Step 6: Train the Machine Learning Model

In [7]:
model = LogisticRegression(max_iter=1000)
model.fit(X_train_tfidf, y_train)

## Step 7: Evaluate the Model

In [None]:
y_pred = model.predict(X_test_tfidf)
print(f'Accuracy: {accuracy_score(y_test, y_pred):.2f}')
print('\nClassification Report:')
print(classification_report(y_test, y_pred))

## Step 8: Upload Your Own News File (CSV)

You or your teacher can upload a CSV file with a `text` column containing the news articles you want to classify.

In [None]:
from google.colab import files
import io

uploaded_user = files.upload()  # Upload your news CSV file here

for fn in uploaded_user.keys():
    user_df = pd.read_csv(io.BytesIO(uploaded_user[fn]))
    print(f"Loaded file: {fn}")
    print(user_df.head())

## Step 9: Predict FAKE/REAL for Uploaded News

This will predict and show FAKE/REAL for each row in the uploaded file, and let you download the results as a CSV.

In [None]:
if 'user_df' in locals():
    if 'text' in user_df.columns:
        user_tfidf = vectorizer.transform(user_df['text'].astype(str))
        predictions = model.predict(user_tfidf)
        user_df['Prediction'] = predictions
        print(user_df[['text', 'Prediction']])
        # Save and offer download
        user_df.to_csv('predictions.csv', index=False)
        files.download('predictions.csv')
    else:
        print("Uploaded file must have a 'text' column.")
else:
    print("No file uploaded or variable user_df not defined.")

## Step 10: Make Your Own Predictions (Single Article)

Enter your own news article text and see if it's classified as FAKE or REAL.

In [1]:
def predict_news(text):
    tfidf = vectorizer.transform([text])
    return model.predict(tfidf)[0]

# Try your own example:
sample_text = "The government has announced a new policy for health care."
print('Prediction:', predict_news(sample_text))

NameError: name 'vectorizer' is not defined

# 🎉 Congratulations!

You have built a fake news detector from scratch! Try experimenting with different models or preprocessing steps to improve accuracy. If you have any issues, ask for help or explore the explanations in the notebook.