# **Detecting Fake News with Natural Language Processing**


**Introduction:**

We get news from a variety of sources throughout the day in our daily lives, but it can be tough to tell which ones are phoney and which are genuine.


**Do you believe everything you read on the internet?**



Every piece of news we consume is false. If you listen to fake news, you're gathering incorrect information from the world, which can have an impact on society because a person's ideas or opinions can shift after eating fake news that the user believes is true.



How can we determine if news is fake or real, given that all news we see in our daily lives is not authentic?



***In this article, we will focus on text-based news and attempt to develop a model that will assist us in determining if a certain piece of news is fake or not***.

# **Libraries were once utilised to solve NLP issues.**


To deal with NLP-based difficulties, a variety of libraries and algorithms are employed. For text cleaning, a regular expression(re) is the most commonly used library. The next-level libraries NLTK and spacy are used to do natural language tasks such as stopword removal, named entity recognition, part of speech tagging, phrase matching, and so on.

`pip install nltk`

In [1]:
import pandas as pd
import numpy as np


# **Read dataset from CSV File**

**We willbe , using pandas `read_csv` to load in the dataframe.**
The data set contains of two csv file one file contains the real news and another file contains fake news. So, we will import both the dataset and stored in dataframe "real_news" and "fake_news" and later on we will concate the both dataset by using pandas  `pd.concat`.


### **Reading real news dataset:**

In [2]:
real_news= pd.read_excel("/content/drive/MyDrive/Fake News Classifier/Real_News.xlsx")
real_news.head()

Unnamed: 0,Title,News_text,Subject,label
0,Trump judicial nominee withdraws from consider...,WASHINGTON (Reuters) - A lawyer nominated by P...,politicsNews,1
1,No. 2 Democrat in Senate calls on Franken to r...,"(Reuters) - U.S. Senator Dick Durbin, the No. ...",politicsNews,1
2,"WTO chief won't debate Trump, but rallies supp...",GENEVA (Reuters) - The head of the World Trade...,politicsNews,1
3,Hungary says it is facing 'frontal assault' fr...,BUDAPEST (Reuters) - Hungary is facing a front...,worldnews,1
4,Senate Republicans shove tax bill ahead as Dem...,WASHINGTON (Reuters) - U.S. Senate Republicans...,politicsNews,1


In [3]:
real_news.shape

(21755, 4)

### **Reading fake news datset:**

In [4]:
fake_news=pd.read_excel("/content/drive/MyDrive/Fake News Classifier/Fake_News.xlsx")
fake_news.head()

Unnamed: 0,Title,News_text,Subject,label
0,Harry Reid UNLEASHES Anti-Trump Rant On Senat...,Senate Minority Leader Harry Reid has it out f...,News,0
1,BUSTED! MEDIA Caught Red-Handed Trying To Demo...,In their desire to push Hillary Clinton across...,left-news,0
2,Trumpâ€™s Latest Appointee Is A Climate Denie...,There are people currently being turned down f...,News,0
3,[VIDEO] TWO STREET PREACHERS SEVERELY BEATEN B...,Gay pride? Tolerance is a one-way street for t...,politics,0
4,Dem. Rep. Says Steve Bannon Is A â€˜Stone Col...,"Congressman Hakeem Jeffries says that, while h...",News,0


In [5]:
fake_news.shape

(23697, 4)

### **Merging the dataset:**

We have seen real news data and fake news data. We are going to **concat** these two data frame and stored in one data frame called df. So we will use  ```
pd.concat``` 



In [6]:
df= pd.concat([real_news,fake_news],axis=0)
df.head()

Unnamed: 0,Title,News_text,Subject,label
0,Trump judicial nominee withdraws from consider...,WASHINGTON (Reuters) - A lawyer nominated by P...,politicsNews,1
1,No. 2 Democrat in Senate calls on Franken to r...,"(Reuters) - U.S. Senator Dick Durbin, the No. ...",politicsNews,1
2,"WTO chief won't debate Trump, but rallies supp...",GENEVA (Reuters) - The head of the World Trade...,politicsNews,1
3,Hungary says it is facing 'frontal assault' fr...,BUDAPEST (Reuters) - Hungary is facing a front...,worldnews,1
4,Senate Republicans shove tax bill ahead as Dem...,WASHINGTON (Reuters) - U.S. Senate Republicans...,politicsNews,1


In [7]:
df.shape

(45452, 4)

*So, We have combined these two data frame into one data frame called df.* 

# **Data Cleaning:**
So, the Machine learning algorithms does'nt undersatand the text, we need to convert these words into numbers, so that machine can understand.

Text preprocessing is a technique for cleaning text data and preparing it for use in a model. Text data comprises noise in the form of emotions, punctuation, and text in a different case, among other things. When it comes to Human Language, there are many various ways to communicate the same thing, and this is only the beginning of our problems. Machines cannot understand words; they require numbers, thus we must translate text to numbers in a timely manner.

Steps:
1.   **Tokenization**: Splitting sentences into words, by using NLTK library.
2.   **Removal of Stop-words**: When data analysis needs to be data driven at the word level, the commonly occurring words (stop-words) should be removed. One can either create a long list of stop-words or one can use predefined language specific libraries.
1. **Removal of Punctuations**: All the punctuation marks according to the priorities should be dealt with. For example: “.”, “,”,”?” are important punctuations that should be retained while others need to be removed.
2.   **Converting text into lower case**
1.   **Lemmatization**: Is the process of converting words into meaningful words. 














# **Library for text cleaning:**

First thing first, you load all the necessary libraries:

In [8]:
import nltk # For text cleaning
import re # for regural expression, to remove punctuations
from nltk.corpus import stopwords # for removing stopwords from sentences
from nltk.stem import WordNetLemmatizer   # for lemmatization- to make sentence in proper meaning
from wordcloud import WordCloud  # Word cloud
from nltk.stem import PorterStemmer

import matplotlib.pyplot as plt # Matplot lib
% matplotlib inline

In [9]:
#checking for missing values
df.isnull().sum()

Title        0
News_text    0
Subject      0
label        0
dtype: int64

In [10]:
# Now we’ll create a copy of this dataset and also reset its index values.
news=df.copy()
news.reset_index(drop=True)
news.head(10)

Unnamed: 0,Title,News_text,Subject,label
0,Trump judicial nominee withdraws from consider...,WASHINGTON (Reuters) - A lawyer nominated by P...,politicsNews,1
1,No. 2 Democrat in Senate calls on Franken to r...,"(Reuters) - U.S. Senator Dick Durbin, the No. ...",politicsNews,1
2,"WTO chief won't debate Trump, but rallies supp...",GENEVA (Reuters) - The head of the World Trade...,politicsNews,1
3,Hungary says it is facing 'frontal assault' fr...,BUDAPEST (Reuters) - Hungary is facing a front...,worldnews,1
4,Senate Republicans shove tax bill ahead as Dem...,WASHINGTON (Reuters) - U.S. Senate Republicans...,politicsNews,1
5,Jeb Bush endorses Ted Cruz for Republican nomi...,WASHINGTON(Reuters) - Former candidate Jeb Bus...,politicsNews,1
6,"Trump, dogged at home, begins longest presiden...",HONOLULU (Reuters) - President Donald Trump ar...,worldnews,1
7,China's CIC head says Trump to be careful in c...,MELBOURNE (Reuters) - The CEO of Chinaâ€™s sov...,politicsNews,1
8,White House to keep paying Obamacare subsidies...,WASHINGTON (Reuters) - The Trump administratio...,politicsNews,1
9,Comey to testify to Senate panel in public ses...,WASHINGTON (Reuters) - Former FBI Director Jam...,politicsNews,1


In [11]:
news.isnull().sum()

Title        0
News_text    0
Subject      0
label        0
dtype: int64

In [12]:
news.dropna()

Unnamed: 0,Title,News_text,Subject,label
0,Trump judicial nominee withdraws from consider...,WASHINGTON (Reuters) - A lawyer nominated by P...,politicsNews,1
1,No. 2 Democrat in Senate calls on Franken to r...,"(Reuters) - U.S. Senator Dick Durbin, the No. ...",politicsNews,1
2,"WTO chief won't debate Trump, but rallies supp...",GENEVA (Reuters) - The head of the World Trade...,politicsNews,1
3,Hungary says it is facing 'frontal assault' fr...,BUDAPEST (Reuters) - Hungary is facing a front...,worldnews,1
4,Senate Republicans shove tax bill ahead as Dem...,WASHINGTON (Reuters) - U.S. Senate Republicans...,politicsNews,1
...,...,...,...,...
23692,CLOAKED ORDER: Whoâ€™s Really Behind â€˜New Au...,"21st Century Wire says Earlier this week, the ...",Middle-east,0
23693,Bill Maher Gets His Swagger On Over Liberal C...,Bill Maher finished Friday s episode of Real T...,News,0
23694,WHOA! BLACK WOMAN FED UP WITH BLACK RACISTS NA...,WOW This woman absolutely nails it!,politics,0
23695,"State Dept. Releases 7,000 Clinton E-mails But...","The State Department released 7,000 Clinton e-...",politics,0


In [13]:
nltk.download('stopwords')
nltk.download('wordnet')


[nltk_data] Downloading package stopwords to /root/nltk_data...
[nltk_data]   Package stopwords is already up-to-date!
[nltk_data] Downloading package wordnet to /root/nltk_data...
[nltk_data]   Package wordnet is already up-to-date!


True

### **Punctuation Removal:** 
In this step, all the punctuations from the text are removed. string library of Python contains some pre-defined list of punctuations such as ‘!”#$%&'()*+,-./:;?@[\]^_`{|}~’

In [14]:
#library that contains punctuation
import string
string.punctuation

'!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~'

In [15]:
#defining the function to remove punctuation
def remove_punctuation(text):
    punctuationfree="".join([i for i in text if i not in string.punctuation])
    return punctuationfree
#storing the puntuation free text
news['News_text']= news['News_text'].apply(lambda x:remove_punctuation(x))


In [16]:
#remove punctuation
news['News_text'] = news['News_text'].apply(lambda x: re.sub('[%s]' % re.escape(string.punctuation), '' , x))
news.head()

Unnamed: 0,Title,News_text,Subject,label
0,Trump judicial nominee withdraws from consider...,WASHINGTON Reuters A lawyer nominated by Pres...,politicsNews,1
1,No. 2 Democrat in Senate calls on Franken to r...,Reuters US Senator Dick Durbin the No 2 Democ...,politicsNews,1
2,"WTO chief won't debate Trump, but rallies supp...",GENEVA Reuters The head of the World Trade Or...,politicsNews,1
3,Hungary says it is facing 'frontal assault' fr...,BUDAPEST Reuters Hungary is facing a frontal ...,worldnews,1
4,Senate Republicans shove tax bill ahead as Dem...,WASHINGTON Reuters US Senate Republicans ramm...,politicsNews,1


## **Remove any words or digits that contain digits.**

It is not uncommon for words and numerals to be written together in a text, posing a barrier for machines to grasp. As a result, we must eliminate words and digits that are mixed, such as game57 or game5ts7. Because this type of word is difficult to digest, it is preferable to eliminate it or replace it with an empty string. For this, we employ regular expressions.

In [17]:
#remove words and digits
news['News_text'] = news['News_text'].apply(lambda x: re.sub('W*dw*','',x))
news.head()

Unnamed: 0,Title,News_text,Subject,label
0,Trump judicial nominee withdraws from consider...,WASHINGTON Reuters A lawyer nominate by Presi...,politicsNews,1
1,No. 2 Democrat in Senate calls on Franken to r...,Reuters US Senator Dick Durbin the No 2 Democ...,politicsNews,1
2,"WTO chief won't debate Trump, but rallies supp...",GENEVA Reuters The hea of the Worl Trae Organ...,politicsNews,1
3,Hungary says it is facing 'frontal assault' fr...,BUDAPEST Reuters Hungary is facing a frontal ...,worldnews,1
4,Senate Republicans shove tax bill ahead as Dem...,WASHINGTON Reuters US Senate Republicans ramm...,politicsNews,1


## **Lowering the text:**

- It is one of the most common preprocessing steps where the text is converted into the same case preferably lower case. But it is not necessary to do this step every time you are working on an NLP problem as for some problems lower casing can lead to loss of information.

- For example, if in any project we are dealing with the emotions of a person, then the words written in upper cases can be a sign of frustration or excitement.

In [None]:
news['News_text']= news['News_text'].apply(lambda x: x.lower())
news.head()


## **Tokenization:**

In this step, the text is split into smaller units. We can use either sentence tokenization or word tokenization based on our problem statement.

In [19]:
def tokenization(text):
    tokens = re.split('W+',text)
    return tokens
#applying function to the column
news['News_text']= news['News_text'].apply(lambda x: tokenization(x))
news.head()

Unnamed: 0,Title,News_text,Subject,label
0,Trump judicial nominee withdraws from consider...,[washington reuters a lawyer nominate by pres...,politicsNews,1
1,No. 2 Democrat in Senate calls on Franken to r...,[reuters us senator dick durbin the no 2 demo...,politicsNews,1
2,"WTO chief won't debate Trump, but rallies supp...",[geneva reuters the hea of the worl trae orga...,politicsNews,1
3,Hungary says it is facing 'frontal assault' fr...,[budapest reuters hungary is facing a frontal...,worldnews,1
4,Senate Republicans shove tax bill ahead as Dem...,[washington reuters us senate republicans ram...,politicsNews,1


## **Stop word removal:** 
- Stopwords are the commonly used words and are removed from the text as they do not add any value to the analysis. These words carry less or no meaning.

- NLTK library consists of a list of words that are considered stopwords for the English language. Some of them are : [i, me, my, myself, we, our, ours, ourselves, you, you’re, you’ve, you’ll, you’d, your, yours, yourself, yourselves, he, most, other, some, such, no, nor, not, only, own, same, so, then, too, very, s, t, can, will, just, don, don’t, should, should’ve, now, d, ll, m, o, re, ve, y, ain, aren’t, could, couldn’t, didn’t, didn’t]

In [20]:
#importing nlp library
import nltk
#Stop words present in the library
stopwords = nltk.corpus.stopwords.words('english')
stopwords[0:10]
['i', 'me', 'my', 'myself', 'we', 'our', 'ours', 'ourselves', 'you', "you're"]

['i', 'me', 'my', 'myself', 'we', 'our', 'ours', 'ourselves', 'you', "you're"]

In [21]:
#defining the function to remove stopwords from tokenized text
def remove_stopwords(text):
    output= [i for i in text if i not in stopwords]
    return output


In [22]:
#applying the function
news['News_text']= news['News_text'].apply(lambda x:remove_stopwords(x))

news.head()

Unnamed: 0,Title,News_text,Subject,label
0,Trump judicial nominee withdraws from consider...,[washington reuters a lawyer nominate by pres...,politicsNews,1
1,No. 2 Democrat in Senate calls on Franken to r...,[reuters us senator dick durbin the no 2 demo...,politicsNews,1
2,"WTO chief won't debate Trump, but rallies supp...",[geneva reuters the hea of the worl trae orga...,politicsNews,1
3,Hungary says it is facing 'frontal assault' fr...,[budapest reuters hungary is facing a frontal...,worldnews,1
4,Senate Republicans shove tax bill ahead as Dem...,[washington reuters us senate republicans ram...,politicsNews,1


## **Lemmatization:** 
- It stems the word but makes sure that it does not lose its meaning.  Lemmatization has a pre-defined dictionary that stores the context of words and checks the word in the dictionary while diminishing.

In [23]:
from nltk.stem import WordNetLemmatizer
#defining the object for Lemmatization
wordnet_lemmatizer = WordNetLemmatizer()

In [24]:
#defining the function for lemmatization
def lemmatizer(text):
  lemm_text = [wordnet_lemmatizer.lemmatize(word) for word in text]
  return lemm_text
news['News_text']=news['News_text'].apply(lambda x:lemmatizer(x))

In [25]:
news.head()

Unnamed: 0,Title,News_text,Subject,label
0,Trump judicial nominee withdraws from consider...,[washington reuters a lawyer nominate by pres...,politicsNews,1
1,No. 2 Democrat in Senate calls on Franken to r...,[reuters us senator dick durbin the no 2 demo...,politicsNews,1
2,"WTO chief won't debate Trump, but rallies supp...",[geneva reuters the hea of the worl trae orga...,politicsNews,1
3,Hungary says it is facing 'frontal assault' fr...,[budapest reuters hungary is facing a frontal...,worldnews,1
4,Senate Republicans shove tax bill ahead as Dem...,[washington reuters us senate republicans ram...,politicsNews,1


In [26]:
news['News_text']=news['News_text'].apply(str)

# **Creating Independent & dependent varibales**

In [27]:
#Independent varibale
X= news['News_text']

#Dependent variable
y= news['label']


## **Bag of Words**
The term "bag of words" refers to a text representation that describes the presence of words within text data. This is based on the assumption that two similar text fields will include similar types of words and so have a similar bag of words. Furthermore, we can deduce something about the document's significance solely from the text.

Sklearn provides a separate function for it to be implemented, as illustrated below:

In [28]:
from sklearn.feature_extraction.text import CountVectorizer
bow = CountVectorizer(max_features=5000, lowercase=True, ngram_range=(1,3),analyzer = "word")
X= bow.fit_transform(news['News_text']).toarray()


## **N-Grams as Characteristics**
N-Grams are a collection of N-words put together. When compared to words (Unigrams), N-grams (N > 1) are often more informative. Furthermore, bigrams (N = 2) are regarded as the most important of all the others. The code below creates a bigram of text.
## **Split the Data:**
The most important stage in machine learning is splitting the data. We use the trainset to train our model and the testing set to test our data. Using Scikit learns train_test_split function, we split our data into train and test.

In [29]:
#Dataset is now split into train and test. 
from sklearn.model_selection import train_test_split
X_train,X_test,y_train,y_test=train_test_split(X,y,test_size=0.2,random_state=0)

## **Preparing data for model building:**

In [30]:
#The array is converted into dataframe.
df_news=pd.DataFrame(X_train,columns=bow.get_feature_names_out())
df_news.head()

Unnamed: 0,10,100,11,12,13,14,15,16,17,18,19,20,200,2000,2001,2005,2006,2007,2008,2009,2010,2011,2012,2013,2014,2015,2016,2016 election,2016 presiential,2017,2017realdonaltrump,2018,2019,21,21st,21st century,21st century wire,21wire,21wiretv,22,...,yes,yesteray,yet,yet to,york,york city,york reuters,york times,you,you are,you can,you have,you have to,you know,you on,you re,you think,you to,you ve,you want,young,your,youtube,youâ,zero,zika,zone,œa,œhe,œi,œi think,œif,œit,œitâ,œiâ,œthe,œthere,œthis,œwe,œweâ
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


# **Building Model:**
### **Naive Bayes Multinomial:**
When it’s in discrete form, it’s useful for classification. It comes in handy when working with text. Each text will be turned into a word count vector. It can deal with negative numbers.

It’s pre-programmed in the Scikit Learn Library. We can then import that class into our project and build a Multinomial Naive Bayes Class object.

- Use our vectorized train data to train the classifier.
- Once the classifier has been fitted to the training set, we may use the  predict technique to forecast the test set outcome.

In [31]:
# Building the model
from sklearn.naive_bayes import MultinomialNB
nb= MultinomialNB()
nb.fit(X_train,y_train)
#predicting 
y_pred=nb.predict(X_test)

# **Classification Metrics**
To see how well our model works, we utilize several metrics to see how accurate it is. In Scikit-learn, there are a variety of categorization metrics to choose from.


**Precision, Recall, F1-Score, Confusion Matrix, Accuracy Score**

In [32]:
from sklearn.metrics import confusion_matrix,accuracy_score,classification_report
cm= confusion_matrix(y_test,y_pred)
print(cm)
accuracy= accuracy_score(y_test,y_pred)
print(accuracy)
report= classification_report(y_test,y_pred)
print(report)

[[4558  238]
 [  89 4206]]
0.964030359696403
              precision    recall  f1-score   support

           0       0.98      0.95      0.97      4796
           1       0.95      0.98      0.96      4295

    accuracy                           0.96      9091
   macro avg       0.96      0.96      0.96      9091
weighted avg       0.96      0.96      0.96      9091



***As you can see, our Precision, Recall, and F1 Score are all excellent. As a result, we can confidently state that our model performs admirably on unobserved data. The accuracy score on the Test Dataset is 96.5%, which is excellent.***

# **Term Frequency – Inverse Document Frequency (TF-IDF)**
We don’t have to calculate TF and IDF every time beforehand and then multiply it to obtain TF-IDF. Instead, sklearn has a separate function to directly obtain it:
### **Term Frequency**
The ratio of the number of words in a sentence to the length of the sentence is known as term frequency.

As a result, we can define term frequency as follows:

***TF = (Number of times term T appears in a particular row) /  (number of terms in that row)***

### **Inverse Document Frequency**
Inverse document frequency (IDF) is based on the idea that a word isn't really useful if it appears in every document.

As a result, the log of the ratio of the total number of rows to the number of rows in which that word appears is the IDF of each word.

**IDF = log(N/n), where N is the total number of rows and n denotes the number of rows containing the word.**

In [36]:
from sklearn.feature_extraction.text import TfidfVectorizer
tfidf = TfidfVectorizer(max_features=5000, lowercase=True, analyzer='word',
 stop_words= 'english',ngram_range=(1,3))
X = tfidf.fit_transform(news['News_text']).toarray()

In [37]:
#Dataset is now split into train and test. 
from sklearn.model_selection import train_test_split
X_train,X_test,y_train,y_test=train_test_split(X,y,test_size=0.2,random_state=0)

In [38]:
#The array is converted into dataframe.
df_news=pd.DataFrame(X_train,columns=bow.get_feature_names_out())
df_news.head()

Unnamed: 0,10,100,11,12,13,14,15,16,17,18,19,20,200,2000,2001,2005,2006,2007,2008,2009,2010,2011,2012,2013,2014,2015,2016,2016 election,2016 presiential,2017,2017realdonaltrump,2018,2019,21,21st,21st century,21st century wire,21wire,21wiretv,22,...,yes,yesteray,yet,yet to,york,york city,york reuters,york times,you,you are,you can,you have,you have to,you know,you on,you re,you think,you to,you ve,you want,young,your,youtube,youâ,zero,zika,zone,œa,œhe,œi,œi think,œif,œit,œitâ,œiâ,œthe,œthere,œthis,œwe,œweâ
0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.179055,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04995,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [39]:
# Building the model
from sklearn.naive_bayes import MultinomialNB
nb= MultinomialNB()
nb.fit(X_train,y_train)
#predicting 
y_pred=nb.predict(X_test)

In [40]:
from sklearn.metrics import confusion_matrix,accuracy_score,classification_report
cm= confusion_matrix(y_test,y_pred)
print(cm)
accuracy= accuracy_score(y_test,y_pred)
print(accuracy)
report= classification_report(y_test,y_pred)
print(report)

[[4572  224]
 [ 116 4179]]
0.9626003739962601
              precision    recall  f1-score   support

           0       0.98      0.95      0.96      4796
           1       0.95      0.97      0.96      4295

    accuracy                           0.96      9091
   macro avg       0.96      0.96      0.96      9091
weighted avg       0.96      0.96      0.96      9091



# **Conclusion:**

We learned how to use Python to detect fake news today. We used a Fake and real News dataset to fit our model, which included a Text cleaning function, Bag of word, and an initialized Multinomial Naive Bayes Classifier. We were able to get a magnitude accuracy of 96.5 percent. With TF-IDF we able to get accuracy 96.2 %, which same to BOW.
I hope you had a good time working on this project