<a href="https://colab.research.google.com/github/mohammadreza-mohammadi94/Deep_Learning_Projects/blob/main/Fake_News_Detection_ANN/Fake_News_Detection_ANN.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Fake News Detection

# Project Content:
1. [Preparing Project](#1)
    - 1.1 [Downloading Dataset](#1.1)
    - 1.2 [Importing Libraries](#1.2)
    - 1.3 [Importing Dataset](#1.3)
2. [Preprocessing](#2)
3. [Preparing For Modelling](#3)
    - 3.1 [Train/Test Splitting](#3.1)
    - 3.2 [Vectorize the Text Data](#3.2)
4. [ANN](#4)
    - 4.1 [Evaluate the ANN Model](#4.1)




## 1. Preparing Project <a id=1></a>

### 1.1 Downloading Dataset <a id=1.1></a>

In [1]:
from google.colab import userdata
import os

os.environ["KAGGLE_PASS"] = userdata.get('KAGGLE_PASS')
os.environ["KAGGLE_USERNAME"] = userdata.get('KAGGLE_USERNAME')

In [2]:
!kaggle datasets download -d emineyetm/fake-news-detection-datasets

Dataset URL: https://www.kaggle.com/datasets/emineyetm/fake-news-detection-datasets
License(s): unknown
Downloading fake-news-detection-datasets.zip to /content
 71% 29.0M/41.0M [00:00<00:00, 52.1MB/s]
100% 41.0M/41.0M [00:00<00:00, 61.4MB/s]


In [3]:
!unzip fake-news-detection-datasets.zip

Archive:  fake-news-detection-datasets.zip
  inflating: News _dataset/Fake.csv  
  inflating: News _dataset/True.csv  


### 1.2 Import Libraries <a id=1.2></a>

In [4]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import re

import nltk
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize
from nltk.stem import WordNetLemmatizer

from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import TfidfVectorizer


### 1.3 Importing Dataset

In [5]:
true_news = pd.read_csv("/content/News _dataset/True.csv")
fake_news = pd.read_csv("/content/News _dataset/Fake.csv")

In [14]:
true_news.head(2)

Unnamed: 0,title,text,subject,date,label
0,"As U.S. budget fight looms, Republicans flip t...",WASHINGTON (Reuters) - The head of a conservat...,politicsNews,"December 31, 2017",1
1,U.S. military to accept transgender recruits o...,WASHINGTON (Reuters) - Transgender people will...,politicsNews,"December 29, 2017",1


In [13]:
fake_news.head(2)

Unnamed: 0,title,text,subject,date,label
0,Donald Trump Sends Out Embarrassing New Year’...,Donald Trump just couldn t wish all Americans ...,News,"December 31, 2017",0
1,Drunk Bragging Trump Staffer Started Russian ...,House Intelligence Committee Chairman Devin Nu...,News,"December 31, 2017",0


In [9]:
# Add a label column to differentiate between true and false news
true_news['label'] = 1  # Label for true news
fake_news['label'] = 0  # Label for false news

# Combine the two datasets into one
data = pd.concat([true_news, fake_news], axis=0).reset_index(drop=True)

## 2. Preprocessing <a id=2></a>

In [10]:
nltk.download('stopwords')
nltk.download('punkt')
nltk.download('wordnet')

# Initialize stop words and lemmatizer
stop_words = set(stopwords.words('english'))
lemmatizer = WordNetLemmatizer()

[nltk_data] Downloading package stopwords to /root/nltk_data...
[nltk_data]   Unzipping corpora/stopwords.zip.
[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt.zip.
[nltk_data] Downloading package wordnet to /root/nltk_data...


In [11]:
def preprocess_text(text):
    # Remove special characters and numbers
    text = re.sub(r'\W', ' ', text)
    text = re.sub(r'\d', ' ', text)

    # Convert to lowercase
    text = text.lower()

    # Tokenize the text
    words = word_tokenize(text)

    # Remove stop words and lemmatize
    words = [lemmatizer.lemmatize(word) for word in words if word not in stop_words]

    # Join words back into a single string
    text = ' '.join(words)

    return text

In [12]:
# Apply the preprocessing function to the text column (assuming 'text' is the column with news articles)
data['text'] = data['text'].apply(preprocess_text)

## 3. Preparing For Modelling <a id=3.1><a/>

### 3.1 Train/Test Splitting <a id=3.1>

In [15]:
X = data['text']  # Features
y = data['label']  # Labels

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

### 3.2 Vectorize the Text Data <a id=3.2></a>

In [16]:
# Initialize TF-IDF Vectorizer
tfidf = TfidfVectorizer(max_features=5000)

# Fit and transform the training data, and transform the testing data
X_train_tfidf = tfidf.fit_transform(X_train)
X_test_tfidf = tfidf.transform(X_test)

## 4. ANN <a id=4></a>

In [17]:
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout

In [18]:
# Build the ANN model
model = Sequential()

# Input layer
model.add(Dense(units=512, activation='relu', input_shape=(X_train_tfidf.shape[1],)))

# Hidden layers
model.add(Dropout(0.3))
model.add(Dense(units=256, activation='relu'))

model.add(Dropout(0.3))
model.add(Dense(units=128, activation='relu'))

# Output layer
model.add(Dense(units=1, activation='sigmoid'))

# Compile the model
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])

# Train the model
model.fit(X_train_tfidf, y_train, epochs=10, batch_size=64, validation_data=(X_test_tfidf, y_test))

  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


Epoch 1/10
[1m562/562[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m36s[0m 61ms/step - accuracy: 0.9445 - loss: 0.1605 - val_accuracy: 0.9901 - val_loss: 0.0331
Epoch 2/10
[1m562/562[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m44s[0m 67ms/step - accuracy: 0.9961 - loss: 0.0137 - val_accuracy: 0.9866 - val_loss: 0.0474
Epoch 3/10
[1m562/562[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m44s[0m 72ms/step - accuracy: 0.9981 - loss: 0.0061 - val_accuracy: 0.9902 - val_loss: 0.0380
Epoch 4/10
[1m562/562[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m33s[0m 58ms/step - accuracy: 0.9988 - loss: 0.0042 - val_accuracy: 0.9881 - val_loss: 0.0501
Epoch 5/10
[1m562/562[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m41s[0m 58ms/step - accuracy: 0.9986 - loss: 0.0046 - val_accuracy: 0.9904 - val_loss: 0.0457
Epoch 6/10
[1m562/562[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m40s[0m 56ms/step - accuracy: 0.9995 - loss: 0.0019 - val_accuracy: 0.9888 - val_loss: 0.0578
Epoch 7/10
[1m5

<keras.src.callbacks.history.History at 0x790cf3db3580>

### 4.1 Evaluate the ANN Model <a id=4.1></a>

In [19]:
# Evaluate the model
loss, accuracy = model.evaluate(X_test_tfidf, y_test)
print(f'Accuracy: {accuracy}')

# Make predictions
y_pred = (model.predict(X_test_tfidf) > 0.5).astype("int32")

# Print classification report and confusion matrix
from sklearn.metrics import classification_report, confusion_matrix

print(confusion_matrix(y_test, y_pred))
print(classification_report(y_test, y_pred))

[1m281/281[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m4s[0m 15ms/step - accuracy: 0.9909 - loss: 0.0557
Accuracy: 0.9898663759231567
[1m281/281[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m5s[0m 14ms/step
[[4581   69]
 [  22 4308]]
              precision    recall  f1-score   support

           0       1.00      0.99      0.99      4650
           1       0.98      0.99      0.99      4330

    accuracy                           0.99      8980
   macro avg       0.99      0.99      0.99      8980
weighted avg       0.99      0.99      0.99      8980

