# Fake news detection using Python

Imagine waking up tomorrow and seeing an article that says "a cure for COVID-19 has been found!" Without thinking about it, you'd probably click the link, right? And then, to your disappointment, you find out that the article was written by some idiot who was just spreading fake news!

Any sane person would be disappointed greatly, short of cursing the son of a turtle ( *ahem* ) who would have written the article.

Unfortunately, there are lots of people who spread fake news through social media ... and today, we're writing about how to automate the detection with Python (Yipee).


Right, so what do we need?

Oh, nothing much. You must have Python installed ( *obviously* ), and the following python libraries installed:

- numpy
- sklearn
- pandas

If you didn't have them installed, don't worry, copy and paste the following command into your terminal / command prompt:

```shell
pip install numpy pandas sklearn

```

Once the installation is done, then download the dataset, and we can now start coding! (Yaaaayyyy!)

In [1]:
import os
import numpy as np
import pandas as pd
import itertools
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.linear_model import PassiveAggressiveClassifier
from sklearn.metrics import accuracy_score, confusion_matrix

In [2]:
# Read the data from csv file
dataframe = pd.read_csv('news.csv')

In [3]:
dataframe.head()

Unnamed: 0.1,Unnamed: 0,title,text,label
0,8476,You Can Smell Hillary’s Fear,"Daniel Greenfield, a Shillman Journalism Fello...",FAKE
1,10294,Watch The Exact Moment Paul Ryan Committed Pol...,Google Pinterest Digg Linkedin Reddit Stumbleu...,FAKE
2,3608,Kerry to go to Paris in gesture of sympathy,U.S. Secretary of State John F. Kerry said Mon...,REAL
3,10142,Bernie supporters on Twitter erupt in anger ag...,"— Kaydee King (@KaydeeKing) November 9, 2016 T...",FAKE
4,875,The Battle of New York: Why This Primary Matters,It's primary day in New York and front-runners...,REAL


In [4]:
dataframe.tail()

Unnamed: 0.1,Unnamed: 0,title,text,label
6330,4490,State Department says it can't find emails fro...,The State Department told the Republican Natio...,REAL
6331,8062,The ‘P’ in PBS Should Stand for ‘Plutocratic’ ...,The ‘P’ in PBS Should Stand for ‘Plutocratic’ ...,FAKE
6332,8622,Anti-Trump Protesters Are Tools of the Oligarc...,Anti-Trump Protesters Are Tools of the Oligar...,FAKE
6333,4021,"In Ethiopia, Obama seeks progress on peace, se...","ADDIS ABABA, Ethiopia —President Obama convene...",REAL
6334,4330,Jeb Bush Is Suddenly Attacking Trump. Here's W...,Jeb Bush Is Suddenly Attacking Trump. Here's W...,REAL


In [5]:
dataframe.shape

(6335, 4)

In [6]:
x_train, x_test, y_train, y_test = train_test_split(dataframe['text'], dataframe.label, 
                                                    test_size=0.2, random_state=7)


In [7]:
tfidf_vectorizer=TfidfVectorizer(stop_words='english', max_df=0.7)

tfidf_train=tfidf_vectorizer.fit_transform(x_train) 
tfidf_test=tfidf_vectorizer.transform(x_test)

In [8]:
pac=PassiveAggressiveClassifier(max_iter=50)
pac.fit(tfidf_train,y_train)

y_pred=pac.predict(tfidf_test)
score=accuracy_score(y_test,y_pred)
print(f'Accuracy: {round(score*100,2)}%')

Accuracy: 92.58%


In [9]:
confusion_matrix(y_test,y_pred, labels=['FAKE','REAL'])

array([[587,  51],
       [ 43, 586]])