# 🏆 Fake News Classification Mini-Hackathon 🏆  

Welcome to this exciting **Machine Learning Society Mini-Hackathon**, a collaboration between **Manu's Machine Learning Lectures** and the **Kaggle Team**!  

📰 **Your Mission:** Build a model that can distinguish between **Fake News (1)** and **Real News (0)**.  
🚀 **Why Participate?** This is your chance to test your machine learning skills, experiment with NLP techniques, and compete against other talented individuals!  

💡 **Key Challenge:** Fake news detection is a crucial task in today's world. Can your model accurately classify news articles based on their content?  

Let’s get started! Good luck and happy coding! 🎯


In [1]:
# Let's import some basic libraries

import kagglehub
import os
import pandas as pd

In [2]:
# Let's download the dataset!

path = kagglehub.dataset_download("emineyetm/fake-news-detection-datasets")
dataset_path = path
# print("Path to dataset files:", path)
# print("Dataset files:", os.listdir(dataset_path))



# 📊 Data Exploration & Train-Test Split  

Before jumping into model training, let’s take a look at the dataset and the rules for this competition.  

### **Dataset Overview**  
This dataset consists of **news articles**, each containing the following features:  
- **Title:** The headline of the article.  
- **Text:** The main content of the article.  
- **Subject:** The category of the article (e.g., politics, world news, etc.).  
- **Date:** The published date of the article.  
- **Target:** The label indicating whether the news is **Fake (1)** or **Real (0)**.  

### **Train-Test Split (75%-25%)**  
For this hackathon, we have **predefined** the dataset split:  
- **Training Set (75%)** – You **must only train** your models on this portion.  
- **Test Set (25%)** – This is used to evaluate the final model performance.  

⚠️ **Important:** To ensure a fair competition, everyone should follow this split and avoid using test data for training.  

Explore the dataset, check for missing values, understand the distributions, and let’s get ready to build some awesome models! 🚀  


In [3]:
subdir_path = os.path.join(dataset_path, "News _dataset")  # Path to subdirectory
print("Files in subdirectory:", os.listdir(subdir_path))

Files in subdirectory: ['Fake.csv', 'True.csv']


In [4]:
true_df = pd.read_csv(os.path.join(subdir_path, 'True.csv'))
true_df

Unnamed: 0,title,text,subject,date
0,"As U.S. budget fight looms, Republicans flip t...",WASHINGTON (Reuters) - The head of a conservat...,politicsNews,"December 31, 2017"
1,U.S. military to accept transgender recruits o...,WASHINGTON (Reuters) - Transgender people will...,politicsNews,"December 29, 2017"
2,Senior U.S. Republican senator: 'Let Mr. Muell...,WASHINGTON (Reuters) - The special counsel inv...,politicsNews,"December 31, 2017"
3,FBI Russia probe helped by Australian diplomat...,WASHINGTON (Reuters) - Trump campaign adviser ...,politicsNews,"December 30, 2017"
4,Trump wants Postal Service to charge 'much mor...,SEATTLE/WASHINGTON (Reuters) - President Donal...,politicsNews,"December 29, 2017"
...,...,...,...,...
21412,'Fully committed' NATO backs new U.S. approach...,BRUSSELS (Reuters) - NATO allies on Tuesday we...,worldnews,"August 22, 2017"
21413,LexisNexis withdrew two products from Chinese ...,"LONDON (Reuters) - LexisNexis, a provider of l...",worldnews,"August 22, 2017"
21414,Minsk cultural hub becomes haven from authorities,MINSK (Reuters) - In the shadow of disused Sov...,worldnews,"August 22, 2017"
21415,Vatican upbeat on possibility of Pope Francis ...,MOSCOW (Reuters) - Vatican Secretary of State ...,worldnews,"August 22, 2017"


In [5]:
fake_df = pd.read_csv(os.path.join(subdir_path, 'Fake.csv'))
fake_df

Unnamed: 0,title,text,subject,date
0,Donald Trump Sends Out Embarrassing New Year’...,Donald Trump just couldn t wish all Americans ...,News,"December 31, 2017"
1,Drunk Bragging Trump Staffer Started Russian ...,House Intelligence Committee Chairman Devin Nu...,News,"December 31, 2017"
2,Sheriff David Clarke Becomes An Internet Joke...,"On Friday, it was revealed that former Milwauk...",News,"December 30, 2017"
3,Trump Is So Obsessed He Even Has Obama’s Name...,"On Christmas day, Donald Trump announced that ...",News,"December 29, 2017"
4,Pope Francis Just Called Out Donald Trump Dur...,Pope Francis used his annual Christmas Day mes...,News,"December 25, 2017"
...,...,...,...,...
23476,McPain: John McCain Furious That Iran Treated ...,21st Century Wire says As 21WIRE reported earl...,Middle-east,"January 16, 2016"
23477,JUSTICE? Yahoo Settles E-mail Privacy Class-ac...,21st Century Wire says It s a familiar theme. ...,Middle-east,"January 16, 2016"
23478,Sunnistan: US and Allied ‘Safe Zone’ Plan to T...,Patrick Henningsen 21st Century WireRemember ...,Middle-east,"January 15, 2016"
23479,How to Blow $700 Million: Al Jazeera America F...,21st Century Wire says Al Jazeera America will...,Middle-east,"January 14, 2016"


In [6]:
fake_df["target"] = 1 # 1 = Fake News
true_df["target"] = 0 # 0 = True News
df = pd.concat([fake_df, true_df], ignore_index=True)
df = df.sample(frac=1, random_state=1).reset_index(drop=True) # Shuffle the dataset
df

Unnamed: 0,title,text,subject,date,target
0,Trump Calls For This Racist Policy To Be Forc...,Donald Trump is calling for one of the most co...,News,"September 21, 2016",1
1,Republican ex-defense secretary Cohen backs Hi...,WASHINGTON (Reuters) - Former Republican U.S. ...,politicsNews,"September 7, 2016",0
2,"TEACHER QUITS JOB After 5th, 6th Grade Muslim ...",You re never to young to commit jihad Teachers...,politics,"May 9, 2017",1
3,LAURA INGRAHAM RIPS INTO THE PRESS…Crowd Goes ...,Laura Ingraham reminds the Never Trump people ...,politics,"Jul 21, 2016",1
4,Germany's Merkel suffers state vote setback as...,BERLIN/HANOVER (Reuters) - Germany s Social De...,worldnews,"October 14, 2017",0
...,...,...,...,...,...
44893,Guatemala federal auditor to probe president's...,GUATEMALA CITY (Reuters) - Guatemala s federal...,worldnews,"September 13, 2017",0
44894,House Democrats will stage sit-in until they g...,WASHINGTON (Reuters) - U.S. House of Represent...,politicsNews,"June 22, 2016",0
44895,D’oh!: Trump Tells Crowd In Richest County In...,"While in Virginia, GOP presidential nominee Do...",News,"August 3, 2016",1
44896,JUDGE JEANINE TELLS THE LEFT TO KNOCK IT OFF: ...,Judge Jeanine Pirro has had it with the left a...,politics,"Dec 11, 2016",1


In [7]:
train_size = int(0.75 * len(df))

train_df = df.iloc[:train_size]
test_df = df.iloc[train_size:]

print("Training set size:", len(train_df))
print("Test set size:", len(test_df))

Training set size: 33673
Test set size: 11225


In [8]:
train_df

Unnamed: 0,title,text,subject,date,target
0,Trump Calls For This Racist Policy To Be Forc...,Donald Trump is calling for one of the most co...,News,"September 21, 2016",1
1,Republican ex-defense secretary Cohen backs Hi...,WASHINGTON (Reuters) - Former Republican U.S. ...,politicsNews,"September 7, 2016",0
2,"TEACHER QUITS JOB After 5th, 6th Grade Muslim ...",You re never to young to commit jihad Teachers...,politics,"May 9, 2017",1
3,LAURA INGRAHAM RIPS INTO THE PRESS…Crowd Goes ...,Laura Ingraham reminds the Never Trump people ...,politics,"Jul 21, 2016",1
4,Germany's Merkel suffers state vote setback as...,BERLIN/HANOVER (Reuters) - Germany s Social De...,worldnews,"October 14, 2017",0
...,...,...,...,...,...
33668,Schumer says U.S. budget deal doable if Trump ...,WASHINGTON (Reuters) - Senate Democratic Leade...,politicsNews,"April 23, 2017",0
33669,WOMAN PULLED OVER FOR 51 MPH IN SCHOOL ZONE: “...,I ve never been more grateful there are so man...,left-news,"Sep 11, 2015",1
33670,"INDIAN-AMERICAN, Inventor Of Email Announces R...",Boston-based entrepreneur and inventor of Emai...,politics,"Feb 25, 2017",1
33671,Trump aides divided over policy shielding 'dre...,WASHINGTON (Reuters) - Divisions have emerged ...,politicsNews,"January 28, 2017",0


# 🚀 Start Your Submission Here!  

Now it’s your turn to shine! ✨  

📌 **Instructions:**  
- Implement your **feature engineering** and **model training** below.  
- Remember to use only the **75% training set** for training your model.  
- Test your model on the **25% test set** and analyze its performance.  

💡 **Pro Tip:** Try experimenting with different NLP techniques (TF-IDF, Word Embeddings, Transformers) to boost your model’s accuracy!  

Best of luck! 🎯 Let’s see who builds the most accurate Fake News Classifier! 🏆  


In [9]:
# Sample submission

from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score
import warnings
warnings.filterwarnings("ignore")

In [10]:
train_df = train_df.copy()
test_df = test_df.copy()

train_df["content"] = train_df["title"] + " " + train_df["subject"] + " " + train_df["text"] 
test_df["content"] = test_df["title"] + " " + test_df["subject"] + " " + test_df["text"] 

print("Training set size:", len(train_df))
print("Test set size:", len(test_df))

vectorizer = TfidfVectorizer(max_features=10)

X_train = vectorizer.fit_transform(train_df["content"])
X_test = vectorizer.transform(test_df["content"])

y_train = train_df["target"]
y_test = test_df["target"]

model = LogisticRegression(max_iter=10)
model.fit(X_train, y_train)

Training set size: 33673
Test set size: 11225


# 🏆 Test Your Accuracy Here!  

Now that you've trained your model, let's see how well it performs! 🚀  

📌 **Instructions:**  
- Use the **test set (25%)** to make predictions.  
- Compare your predictions against the actual labels.  
- Calculate the **accuracy** score to evaluate performance.  

📊 **Accuracy Calculation:**  
$$
\text{Accuracy} = \frac{\text{Number of Correct Predictions}}{\text{Total Number of Predictions}}
$$

🚀 **Submit your final accuracy score below and see how you rank!** 🔥  

In [11]:

predictions = model.predict(X_test)
accuracy = accuracy_score(y_test, predictions)
print("Test Accuracy:", accuracy)


Test Accuracy: 0.7515367483296214
