# Fake news Detection

![](./banner.png)

### Importing required library

In [1]:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
from sklearn.metrics import classification_report
import re
import string

### Inserting fake and real dataset


In [2]:
df_fake = pd.read_csv("Fake.csv")
df_true = pd.read_csv("True.csv")

In [3]:
df_fake.tail()

Unnamed: 0,title,text,subject,date
23476,McPain: John McCain Furious That Iran Treated ...,21st Century Wire says As 21WIRE reported earl...,Middle-east,"January 16, 2016"
23477,JUSTICE? Yahoo Settles E-mail Privacy Class-ac...,21st Century Wire says It s a familiar theme. ...,Middle-east,"January 16, 2016"
23478,Sunnistan: US and Allied ‘Safe Zone’ Plan to T...,Patrick Henningsen 21st Century WireRemember ...,Middle-east,"January 15, 2016"
23479,How to Blow $700 Million: Al Jazeera America F...,21st Century Wire says Al Jazeera America will...,Middle-east,"January 14, 2016"
23480,10 U.S. Navy Sailors Held by Iranian Military ...,21st Century Wire says As 21WIRE predicted in ...,Middle-east,"January 12, 2016"


In [4]:
df_true.head(5)

Unnamed: 0,title,text,subject,date
0,"As U.S. budget fight looms, Republicans flip t...",WASHINGTON (Reuters) - The head of a conservat...,politicsNews,"December 31, 2017"
1,U.S. military to accept transgender recruits o...,WASHINGTON (Reuters) - Transgender people will...,politicsNews,"December 29, 2017"
2,Senior U.S. Republican senator: 'Let Mr. Muell...,WASHINGTON (Reuters) - The special counsel inv...,politicsNews,"December 31, 2017"
3,FBI Russia probe helped by Australian diplomat...,WASHINGTON (Reuters) - Trump campaign adviser ...,politicsNews,"December 30, 2017"
4,Trump wants Postal Service to charge 'much mor...,SEATTLE/WASHINGTON (Reuters) - President Donal...,politicsNews,"December 29, 2017"


**Inserting a column called "class" for fake and real news dataset to categories fake and true news.**

In [5]:
df_fake["class"] = 0
df_true["class"] = 1

Removing last 10 rows from both the dataset, for manual testing 

In [6]:
df_fake_manual_testing = df_fake.tail(10)

In [7]:
df_true_manual_testing = df_true.tail(10)

Merging the manual testing dataframe in single dataset and save it in a csv file

In [8]:
df_fake_manual_testing["class"] = 0
df_true_manual_testing["class"] = 1

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self._set_item(key, value)


In [9]:
df_fake_manual_testing.head()

Unnamed: 0,title,text,subject,date,class
23471,Seven Iranians freed in the prisoner swap have...,"21st Century Wire says This week, the historic...",Middle-east,"January 20, 2016",0
23472,#Hashtag Hell & The Fake Left,By Dady Chery and Gilbert MercierAll writers ...,Middle-east,"January 19, 2016",0
23473,Astroturfing: Journalist Reveals Brainwashing ...,Vic Bishop Waking TimesOur reality is carefull...,Middle-east,"January 19, 2016",0
23474,The New American Century: An Era of Fraud,Paul Craig RobertsIn the last years of the 20t...,Middle-east,"January 19, 2016",0
23475,Hillary Clinton: ‘Israel First’ (and no peace ...,Robert Fantina CounterpunchAlthough the United...,Middle-east,"January 18, 2016",0


In [10]:
df_manual_testing = pd.concat([df_fake_manual_testing,df_true_manual_testing], axis = 0)
df_manual_testing.to_csv("manual_testing.csv")

### Merging the main fake and true dataframe

In [11]:
df_marge = pd.concat([df_fake, df_true], axis =0 )
df_marge.head(10)

Unnamed: 0,title,text,subject,date,class
0,Donald Trump Sends Out Embarrassing New Year’...,Donald Trump just couldn t wish all Americans ...,News,"December 31, 2017",0
1,Drunk Bragging Trump Staffer Started Russian ...,House Intelligence Committee Chairman Devin Nu...,News,"December 31, 2017",0
2,Sheriff David Clarke Becomes An Internet Joke...,"On Friday, it was revealed that former Milwauk...",News,"December 30, 2017",0
3,Trump Is So Obsessed He Even Has Obama’s Name...,"On Christmas day, Donald Trump announced that ...",News,"December 29, 2017",0
4,Pope Francis Just Called Out Donald Trump Dur...,Pope Francis used his annual Christmas Day mes...,News,"December 25, 2017",0
5,Racist Alabama Cops Brutalize Black Boy While...,The number of cases of cops brutalizing and ki...,News,"December 25, 2017",0
6,"Fresh Off The Golf Course, Trump Lashes Out A...",Donald Trump spent a good portion of his day a...,News,"December 23, 2017",0
7,Trump Said Some INSANELY Racist Stuff Inside ...,In the wake of yet another court decision that...,News,"December 23, 2017",0
8,Former CIA Director Slams Trump Over UN Bully...,Many people have raised the alarm regarding th...,News,"December 22, 2017",0
9,WATCH: Brand-New Pro-Trump Ad Features So Muc...,Just when you might have thought we d get a br...,News,"December 21, 2017",0


In [12]:
df_marge.columns

Index(['title', 'text', 'subject', 'date', 'class'], dtype='object')

#### "title",  "subject" and "date" columns is not required for detecting the fake news, so we are going to drop the columns.

In [13]:
df = df_marge.drop(["title", "subject","date"], axis = 1)

**Checking null values**

In [14]:
df.isnull().sum()

text     0
class    0
dtype: int64

#### Randomly shuffling the dataframe 

In [15]:
df = df.sample(frac = 1)

In [16]:
df.head()

Unnamed: 0,text,class
17988,Sources tell UtahPolicy.com that former Massac...,0
5364,"NORTH CHARLESTON, S.C. (Reuters) - U.S. Presid...",1
13953,A tale of a mother and the dangerous religion ...,0
19335,She s pulling down a cool $2 Million a year as...,0
21128,MEXICO CITY (Reuters) - Tropical Storm Lidia s...,1


In [17]:
df.reset_index(inplace = True)
df.drop(["index"], axis = 1, inplace = True)

### Creating a function to convert the text in lowercase, remove the extra space, special chr., ulr and links.

In [18]:
def wordopt(text):
    text = text.lower()
    text = re.sub('\[.*?\]', '', text)
    text = re.sub("\\W"," ",text) 
    text = re.sub('https?://\S+|www\.\S+', '', text)
    text = re.sub('<.*?>+', '', text)
    text = re.sub('[%s]' % re.escape(string.punctuation), '', text)
    text = re.sub('\n', '', text)
    text = re.sub('\w*\d\w*', '', text)    
    return text

In [19]:
df["text"] = df["text"].apply(wordopt)

#### Defining dependent and independent variable as x and y

In [20]:
x = df["text"]
#print(x)
y = df["class"]

#### Splitting the dataset into training set and testing set. 

In [21]:
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.25)

### Convert text to vectors

In [22]:
from sklearn.feature_extraction.text import TfidfVectorizer

In [45]:
vectorization = TfidfVectorizer()
xv_train = vectorization.fit_transform(x_train)
xv_test = vectorization.transform(x_test)
print(xv_train)
#xv_train.shape

  (0, 50228)	0.0473492340161167
  (0, 19446)	0.03666452614767858
  (0, 90402)	0.022802943592474683
  (0, 84186)	0.06197920498487919
  (0, 52458)	0.026372558192204836
  (0, 13350)	0.08962435360375202
  (0, 78507)	0.06942673587987268
  (0, 13782)	0.09356263659454216
  (0, 29450)	0.09750091958533229
  (0, 29556)	0.10423345882662541
  (0, 80034)	0.04764551409744013
  (0, 36941)	0.030929691857311044
  (0, 29380)	0.055794475972665
  (0, 40555)	0.041255607789837195
  (0, 26276)	0.07671445485058005
  (0, 60765)	0.06984011745626079
  (0, 40021)	0.08403584110274608
  (0, 7441)	0.08139454865884507
  (0, 38008)	0.02200518733900505
  (0, 80812)	0.03291932616150664
  (0, 87585)	0.021210364988125056
  (0, 29890)	0.10423345882662541
  (0, 75717)	0.03204173634619308
  (0, 84123)	0.04572770647728642
  (0, 57913)	0.03366093727297933
  :	:
  (33672, 12072)	0.02311031085236438
  (33672, 34298)	0.02563321918998089
  (33672, 89267)	0.017333057472437147
  (33672, 92836)	0.01594006581921586
  (33672, 12215)	0.

# 1. Logistic Regression

In [24]:
from sklearn.linear_model import LogisticRegression
LR = LogisticRegression()
LR.fit(xv_train,y_train)

LogisticRegression()

In [25]:
pred_lr=LR.predict(xv_test)

In [26]:
LR.score(xv_test, y_test)

0.9862806236080178

In [27]:
print(classification_report(y_test, pred_lr))

              precision    recall  f1-score   support

           0       0.99      0.99      0.99      5888
           1       0.98      0.99      0.99      5337

    accuracy                           0.99     11225
   macro avg       0.99      0.99      0.99     11225
weighted avg       0.99      0.99      0.99     11225



# 2. Decision Tree Classification

In [28]:
from sklearn.tree import DecisionTreeClassifier
DT = DecisionTreeClassifier()

In [29]:
DT.fit(xv_train, y_train)

DecisionTreeClassifier()

In [30]:
pred_dt = DT.predict(xv_test)

In [31]:
DT.score(xv_test, y_test)

0.9953674832962138

In [32]:
print(classification_report(y_test, pred_dt))

              precision    recall  f1-score   support

           0       0.99      1.00      1.00      5888
           1       1.00      0.99      1.00      5337

    accuracy                           1.00     11225
   macro avg       1.00      1.00      1.00     11225
weighted avg       1.00      1.00      1.00     11225



# 3. Gradient Boosting Classifier

In [33]:
from sklearn.ensemble import GradientBoostingClassifier
GBC = GradientBoostingClassifier(random_state=0)

In [34]:
GBC.fit(xv_train, y_train)

GradientBoostingClassifier(random_state=0)

In [35]:
pred_gbc = GBC.predict(xv_test)

In [36]:
GBC.score(xv_test, y_test)

0.995456570155902

In [37]:
print(classification_report(y_test, pred_gbc))

              precision    recall  f1-score   support

           0       1.00      0.99      1.00      5888
           1       0.99      1.00      1.00      5337

    accuracy                           1.00     11225
   macro avg       1.00      1.00      1.00     11225
weighted avg       1.00      1.00      1.00     11225



# 4. Random Forest Classifier

In [38]:
from sklearn.ensemble import RandomForestClassifier
RFC = RandomForestClassifier(random_state=0)

In [39]:
RFC.fit(xv_train, y_train)

RandomForestClassifier(random_state=0)

In [40]:
pred_rfc = RFC.predict(xv_test)

In [41]:
RFC.score(xv_test, y_test)

0.9877951002227171

In [42]:
print(classification_report(y_test, pred_rfc))

              precision    recall  f1-score   support

           0       0.99      0.99      0.99      5888
           1       0.99      0.99      0.99      5337

    accuracy                           0.99     11225
   macro avg       0.99      0.99      0.99     11225
weighted avg       0.99      0.99      0.99     11225



# Model Testing With Manual Entry

In [46]:
def output_lable(n):
    if n == 0:
        return "Fake News"
    elif n == 1:
        return "It's a True News"
    
def manual_testing(news):
    testing_news = {"text":[news]}
    new_def_test = pd.DataFrame(testing_news)
    new_def_test["text"] = new_def_test["text"].apply(wordopt) 
    new_x_test = new_def_test["text"]
    new_xv_test = vectorization.transform(new_x_test)
    pred_LR = LR.predict(new_xv_test)
    #print(pred_LR)
    pred_DT = DT.predict(new_xv_test)
    pred_GBC = GBC.predict(new_xv_test)
    pred_RFC = RFC.predict(new_xv_test)

    return print("\n\nLR Prediction: {} \nDT Prediction: {} \nGBC Prediction: {} \nRFC Prediction: {}".format(output_lable(pred_LR[0]), 
                                                                                                              output_lable(pred_DT[0]), 
                                                                                                              output_lable(pred_GBC[0]), 
                                                                                                              output_lable(pred_RFC[0])))

In [47]:
news = str(input())
manual_testing(news)

BRUSSELS (Reuters) - NATO allies on Tuesday welcomed President Donald Trump s decision to commit more forces to Afghanistan, as part of a new U.S. strategy he said would require more troops and funding from America s partners. Having run for the White House last year on a pledge to withdraw swiftly from Afghanistan, Trump reversed course on Monday and promised a stepped-up military campaign against  Taliban insurgents, saying:  Our troops will fight to win .  U.S. officials said he had signed off on plans to send about 4,000 more U.S. troops to add to the roughly 8,400 now deployed in Afghanistan. But his speech did not define benchmarks for successfully ending the war that began with the U.S.-led invasion of Afghanistan in 2001, and which he acknowledged had required an   extraordinary sacrifice of blood and treasure .  We will ask our NATO allies and global partners to support our new strategy, with additional troops and funding increases in line with our own. We are confident they w