*Welcome!*

# **End-to-End Fake News Detection Application** 

>Using news headlines, we aim to identify relationships between false news headlines to train a machine learning model that can decipher whether a piece of information is fake or real.

In [1]:
import pandas as pd
import numpy as np
from sklearn.feature_extraction.text import CountVectorizer 
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import MultinomialNB

Loading Dataset
>Data Source- Kaggle

In [2]:
news_data = pd.read_csv("/content/fake_or_real_news.csv")

In [5]:
news_data.head()

Unnamed: 0.1,Unnamed: 0,title,text,label
0,8476,You Can Smell Hillary’s Fear,"Daniel Greenfield, a Shillman Journalism Fello...",FAKE
1,10294,Watch The Exact Moment Paul Ryan Committed Pol...,Google Pinterest Digg Linkedin Reddit Stumbleu...,FAKE
2,3608,Kerry to go to Paris in gesture of sympathy,U.S. Secretary of State John F. Kerry said Mon...,REAL
3,10142,Bernie supporters on Twitter erupt in anger ag...,"— Kaydee King (@KaydeeKing) November 9, 2016 T...",FAKE
4,875,The Battle of New York: Why This Primary Matters,It's primary day in New York and front-runners...,REAL


In [7]:
news_data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 6335 entries, 0 to 6334
Data columns (total 4 columns):
 #   Column      Non-Null Count  Dtype 
---  ------      --------------  ----- 
 0   Unnamed: 0  6335 non-null   int64 
 1   title       6335 non-null   object
 2   text        6335 non-null   object
 3   label       6335 non-null   object
dtypes: int64(1), object(3)
memory usage: 198.1+ KB


This dataset is quite vast, and luckily there are no missing/null values.

In [8]:
x = np.array(news_data['title'])
y = np.array(news_data['label'])

## Naive Bayes - Multinomial

In [9]:
CV = CountVectorizer()

In [12]:
x = CV.fit_transform(x)

In [15]:
xtrain, xtest, ytrain, ytest = train_test_split(x,y, test_size=0.2, random_state = 42)

In [18]:
model = MultinomialNB()

In [19]:
model.fit(xtrain,ytrain)

MultinomialNB(alpha=1.0, class_prior=None, fit_prior=True)

In [21]:
AccuracyScore = model.score(xtest,ytest)
print (AccuracyScore)

0.8074191002367798


### **Model Test**
>To put our trained model to the test, I'll write down the headline of any news article discovered on Google News and check if our model predicts whether or not the news is true

In [23]:
headline_news = "Garda Operation targets gang behind HSE cyber attack"
data_test = CV.transform([headline_news]).toarray()

In [25]:
output = model.predict(data_test)
print(output)

['REAL']


To test the trained model with a fake data, I'll write down a fake headline 

In [28]:
headline_news2 = "Joe Biden is now the Prime Minister of United Kingdom"
data_test2 = CV.transform([headline_news2]).toarray()

In [29]:
output2 = model.predict(data_test2)
print(output2)

['FAKE']


Based on the resulting output, we can conclude that our trained model works and predict correctly. Now Let's build an end to end application  for news detection

## **End-to-End Detection Application**

To develop an end-to-end application for the machine learning model to detect bogus news in real-time, I'll be using Python's streamlit module.

In [31]:
pip install streamlit

Collecting streamlit
  Downloading streamlit-0.88.0-py2.py3-none-any.whl (8.0 MB)
[K     |████████████████████████████████| 8.0 MB 14.5 MB/s 
Collecting base58
  Downloading base58-2.1.0-py3-none-any.whl (5.6 kB)
Collecting pydeck>=0.1.dev5
  Downloading pydeck-0.7.0-py2.py3-none-any.whl (4.3 MB)
[K     |████████████████████████████████| 4.3 MB 56.9 MB/s 
Collecting blinker
  Downloading blinker-1.4.tar.gz (111 kB)
[K     |████████████████████████████████| 111 kB 63.5 MB/s 
[?25hCollecting gitpython!=3.1.19
  Downloading GitPython-3.1.18-py3-none-any.whl (170 kB)
[K     |████████████████████████████████| 170 kB 56.7 MB/s 
[?25hCollecting validators
  Downloading validators-0.18.2-py3-none-any.whl (19 kB)
Collecting watchdog
  Downloading watchdog-2.1.5-py3-none-manylinux2014_x86_64.whl (75 kB)
[K     |████████████████████████████████| 75 kB 4.3 MB/s 
Collecting gitdb<5,>=4.0.1
  Downloading gitdb-4.0.7-py3-none-any.whl (63 kB)
[K     |████████████████████████████████| 63 kB 1.8

In [7]:
!pip install pyngrok

Collecting pyngrok
  Downloading pyngrok-5.1.0.tar.gz (745 kB)
[?25l[K     |▍                               | 10 kB 30.1 MB/s eta 0:00:01[K     |▉                               | 20 kB 34.5 MB/s eta 0:00:01[K     |█▎                              | 30 kB 38.0 MB/s eta 0:00:01[K     |█▊                              | 40 kB 25.2 MB/s eta 0:00:01[K     |██▏                             | 51 kB 20.6 MB/s eta 0:00:01[K     |██▋                             | 61 kB 14.4 MB/s eta 0:00:01[K     |███                             | 71 kB 14.9 MB/s eta 0:00:01[K     |███▌                            | 81 kB 16.3 MB/s eta 0:00:01[K     |████                            | 92 kB 17.2 MB/s eta 0:00:01[K     |████▍                           | 102 kB 16.0 MB/s eta 0:00:01[K     |████▉                           | 112 kB 16.0 MB/s eta 0:00:01[K     |█████▎                          | 122 kB 16.0 MB/s eta 0:00:01[K     |█████▊                          | 133 kB 16.0 MB/s eta 0:00:01[K

In [36]:
%%writefile FakeNewsDetectionApp.py
import streamlit as st

Overwriting FakeNewsDetectionApp.py


In [45]:
st.title("Fake News Detection App")

DeltaGenerator(_root_container=0, _provided_cursor=None, _parent=None, _block_type=None, _form_data=None)

In [46]:
def FakeNewsDetectionApp():
  UserPrompt = st.text_area("Enter any News Headline:")
  if len(UserPrompt) < 1:
    st.write("")
  else:
    data = CV.transform([UserPrompt]).toarray()
    output = model.predict(trform)
    st.title(output)

In [47]:
FakeNewsDetectionApp()

To check if the file was written to your current colab sandbox, you can use the list command: !ls

In [48]:
!ls

FakeNewsDetectionApp.py  fake_or_real_news.csv	sample_data


In [49]:
!ngrok authtoken 1xjOwqfhIYba4MuIsIc0E2CBlp9_4AREXCqKShSr64Nmbv8n4

Authtoken saved to configuration file: /root/.ngrok2/ngrok.yml


linux command for running processes in the background. specifying that Streamlit should start a server on port 80:

In [50]:
!streamlit run --server.port 80 FakeNewsDetectionApp.py &>/dev/null&

To create our tunnel we will be using pyngrok and passing in the port from streamlit (ie 8501) 

In [51]:
from pyngrok import ngrok

# Setup a tunnel to the streamlit port 8501
public_url = ngrok.connect(port='8501')
public_url

2021-09-05 18:35:44.333 Opening tunnel named: http-80-05b44311-fd4e-4ee9-8889-94d16bbb22ed
2021-09-05 18:35:44.363 t=2021-09-05T18:35:44+0000 lvl=info msg="no configuration paths supplied"
2021-09-05 18:35:44.367 t=2021-09-05T18:35:44+0000 lvl=info msg="using configuration at default config path" path=/root/.ngrok2/ngrok.yml
2021-09-05 18:35:44.369 t=2021-09-05T18:35:44+0000 lvl=info msg="open config file" path=/root/.ngrok2/ngrok.yml err=nil
2021-09-05 18:35:44.377 t=2021-09-05T18:35:44+0000 lvl=info msg="starting web service" obj=web addr=127.0.0.1:4040
2021-09-05 18:35:44.686 t=2021-09-05T18:35:44+0000 lvl=info msg="tunnel session started" obj=tunnels.session
2021-09-05 18:35:44.689 t=2021-09-05T18:35:44+0000 lvl=info msg="client session established" obj=csess id=7b241cc1e22b
2021-09-05 18:35:44.695 t=2021-09-05T18:35:44+0000 lvl=info msg=start pg=/api/tunnels id=9b078441f27f0700
2021-09-05 18:35:44.698 t=2021-09-05T18:35:44+0000 lvl=info msg=end pg=/api/tunnels id=9b078441f27f0700 

<NgrokTunnel: "http://3e0b-34-91-118-134.ngrok.io" -> "http://localhost:80">

In [43]:
!pgrep streamlit

1063
1145
1189
1248


Shutdown ngrok from python using the kill function

In [44]:
ngrok.kill()

2021-09-05 18:34:46.064 Killing ngrok process: 1260
