# Fake News Detection of Live Media using Speech to Text conversion

Team Members:
<ul>
<li>Yashashwini Dixit 19BCE1239</li>
<li>Kaarthik E. 19BAI1096</li>
<li>Aiswarya Sanjay 19BPS1049</li>
</ul>


### Importing required library
Here we import some of the required library, if extra library is required it will be install later on.

In [1]:
import pandas as pd
import numpy as np
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
from sklearn.metrics import classification_report
import re
import string

### Inserting fake and real dataset

In [2]:
df_fake = pd.read_csv("C:\\Users\\dixit\\Documents\\Research_Work\\Fake.csv")
df_true = pd.read_csv("C:\\Users\\dixit\\Documents\\Research_Work\\True.csv")

In [3]:
df_fake.head(5)

Unnamed: 0,title,text,subject,date
0,Donald Trump Sends Out Embarrassing New Year’...,Donald Trump just couldn t wish all Americans ...,News,"December 31, 2017"
1,Drunk Bragging Trump Staffer Started Russian ...,House Intelligence Committee Chairman Devin Nu...,News,"December 31, 2017"
2,Sheriff David Clarke Becomes An Internet Joke...,"On Friday, it was revealed that former Milwauk...",News,"December 30, 2017"
3,Trump Is So Obsessed He Even Has Obama’s Name...,"On Christmas day, Donald Trump announced that ...",News,"December 29, 2017"
4,Pope Francis Just Called Out Donald Trump Dur...,Pope Francis used his annual Christmas Day mes...,News,"December 25, 2017"


In [4]:
df_true.head(5)

Unnamed: 0,title,text,subject,date
0,"As U.S. budget fight looms, Republicans flip t...",WASHINGTON (Reuters) - The head of a conservat...,politicsNews,"December 31, 2017"
1,U.S. military to accept transgender recruits o...,WASHINGTON (Reuters) - Transgender people will...,politicsNews,"December 29, 2017"
2,Senior U.S. Republican senator: 'Let Mr. Muell...,WASHINGTON (Reuters) - The special counsel inv...,politicsNews,"December 31, 2017"
3,FBI Russia probe helped by Australian diplomat...,WASHINGTON (Reuters) - Trump campaign adviser ...,politicsNews,"December 30, 2017"
4,Trump wants Postal Service to charge 'much mor...,SEATTLE/WASHINGTON (Reuters) - President Donal...,politicsNews,"December 29, 2017"


Inserting a column called "class" for fake and real news dataset to categorize fake and true news. 

In [5]:
df_fake["class"] = 0
df_true["class"] = 1

In [6]:
df_fake.shape, df_true.shape

((23481, 5), (21417, 5))

Merging the main fake and true dataframe

In [7]:
df_marge = pd.concat([df_fake, df_true], axis =0 )
df_marge.head(10)

Unnamed: 0,title,text,subject,date,class
0,Donald Trump Sends Out Embarrassing New Year’...,Donald Trump just couldn t wish all Americans ...,News,"December 31, 2017",0
1,Drunk Bragging Trump Staffer Started Russian ...,House Intelligence Committee Chairman Devin Nu...,News,"December 31, 2017",0
2,Sheriff David Clarke Becomes An Internet Joke...,"On Friday, it was revealed that former Milwauk...",News,"December 30, 2017",0
3,Trump Is So Obsessed He Even Has Obama’s Name...,"On Christmas day, Donald Trump announced that ...",News,"December 29, 2017",0
4,Pope Francis Just Called Out Donald Trump Dur...,Pope Francis used his annual Christmas Day mes...,News,"December 25, 2017",0
5,Racist Alabama Cops Brutalize Black Boy While...,The number of cases of cops brutalizing and ki...,News,"December 25, 2017",0
6,"Fresh Off The Golf Course, Trump Lashes Out A...",Donald Trump spent a good portion of his day a...,News,"December 23, 2017",0
7,Trump Said Some INSANELY Racist Stuff Inside ...,In the wake of yet another court decision that...,News,"December 23, 2017",0
8,Former CIA Director Slams Trump Over UN Bully...,Many people have raised the alarm regarding th...,News,"December 22, 2017",0
9,WATCH: Brand-New Pro-Trump Ad Features So Muc...,Just when you might have thought we d get a br...,News,"December 21, 2017",0


In [8]:
df_marge.columns

Index(['title', 'text', 'subject', 'date', 'class'], dtype='object')

#### "title",  "subject" and "date" columns are not required for detecting the fake news, so we drop the columns.

In [9]:
df = df_marge.drop(["title", "subject","date"], axis = 1)

In [10]:
df.isnull().sum()

text     0
class    0
dtype: int64

#### Randomly shuffling the dataframe 

In [11]:
df = df.sample(frac = 1)

In [12]:
df.head()

Unnamed: 0,text,class
5151,WASHINGTON (Reuters) - With his administration...,1
20570,BERLIN (Reuters) - German officials found list...,1
18997,BRUSSELS (Reuters) - European Commission Presi...,1
2966,As the majority of Americans are completely de...,0
15348,It s time to stop hitting the snooze button Am...,0


In [13]:
df.reset_index(inplace = True)
df.drop(["index"], axis = 1, inplace = True)

In [14]:
df.columns

Index(['text', 'class'], dtype='object')

In [15]:
df.head()

Unnamed: 0,text,class
0,WASHINGTON (Reuters) - With his administration...,1
1,BERLIN (Reuters) - German officials found list...,1
2,BRUSSELS (Reuters) - European Commission Presi...,1
3,As the majority of Americans are completely de...,0
4,It s time to stop hitting the snooze button Am...,0


#### Creating a function to convert the text to lowercase, remove any extra spaces, special characters, ulr and links.

In [16]:
def wordopt(text):
    text = text.lower()
    text = re.sub('\[.*?\]', '', text)
    text = re.sub("\\W"," ",text) 
    text = re.sub('https?://\S+|www\.\S+', '', text)
    text = re.sub('<.*?>+', '', text)
    text = re.sub('[%s]' % re.escape(string.punctuation), '', text)
    text = re.sub('\n', '', text)
    text = re.sub('\w*\d\w*', '', text)    
    return text

In [17]:
df["text"] = df["text"].apply(wordopt)

#### Defining dependent and independent variable as x and y respectively

In [18]:
x = df["text"]
y = df["class"]

#### Splitting the dataset into training set and testing set. 

In [19]:
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.25)

#### Convert text to vectors

In [20]:
from sklearn.feature_extraction.text import TfidfVectorizer

In [21]:
vectorization = TfidfVectorizer()
xv_train = vectorization.fit_transform(x_train)
xv_test = vectorization.transform(x_test)

### 1. Logistic Regression

In [22]:
from sklearn.linear_model import LogisticRegression

In [23]:
LR = LogisticRegression()
LR.fit(xv_train,y_train)

LogisticRegression()

In [24]:
pred_lr=LR.predict(xv_test)

In [25]:
LR.score(xv_test, y_test)

0.9874387527839643

In [26]:
print(classification_report(y_test, pred_lr))

              precision    recall  f1-score   support

           0       0.99      0.99      0.99      5829
           1       0.99      0.99      0.99      5396

    accuracy                           0.99     11225
   macro avg       0.99      0.99      0.99     11225
weighted avg       0.99      0.99      0.99     11225



### 2. Decision Tree Classification

In [27]:
from sklearn.tree import DecisionTreeClassifier

In [28]:
DT = DecisionTreeClassifier()
DT.fit(xv_train, y_train)

DecisionTreeClassifier()

In [29]:
pred_dt = DT.predict(xv_test)

In [30]:
DT.score(xv_test, y_test)

0.996347438752784

In [31]:
print(classification_report(y_test, pred_dt))

              precision    recall  f1-score   support

           0       1.00      1.00      1.00      5829
           1       1.00      1.00      1.00      5396

    accuracy                           1.00     11225
   macro avg       1.00      1.00      1.00     11225
weighted avg       1.00      1.00      1.00     11225



### 3. Gradient Boosting Classifier

In [32]:
from sklearn.ensemble import GradientBoostingClassifier

In [33]:
GBC = GradientBoostingClassifier(random_state=0)
GBC.fit(xv_train, y_train)

GradientBoostingClassifier(random_state=0)

In [35]:
pred_gbc = GBC.predict(xv_test)

In [36]:
GBC.score(xv_test, y_test)

0.9959910913140312

In [37]:
print(classification_report(y_test, pred_gbc))

              precision    recall  f1-score   support

           0       1.00      0.99      1.00      5829
           1       0.99      1.00      1.00      5396

    accuracy                           1.00     11225
   macro avg       1.00      1.00      1.00     11225
weighted avg       1.00      1.00      1.00     11225



### 4. Random Forest Classifier

In [38]:
from sklearn.ensemble import RandomForestClassifier

In [39]:
RFC = RandomForestClassifier(random_state=0)
RFC.fit(xv_train, y_train)

RandomForestClassifier(random_state=0)

In [40]:
pred_rfc = RFC.predict(xv_test)

In [41]:
RFC.score(xv_test, y_test)

0.9902895322939866

In [42]:
print(classification_report(y_test, pred_rfc))

              precision    recall  f1-score   support

           0       0.99      0.99      0.99      5829
           1       0.99      0.99      0.99      5396

    accuracy                           0.99     11225
   macro avg       0.99      0.99      0.99     11225
weighted avg       0.99      0.99      0.99     11225



# Model Testing

### News

In [43]:
def output_lable(n):
    if n == 0:
        return "Fake News"
    elif n == 1:
        return "Real News"
    
def manual_testing(news):
    testing_news = {"text":[news]}
    new_def_test = pd.DataFrame(testing_news)
    new_def_test["text"] = new_def_test["text"].apply(wordopt) 
    new_x_test = new_def_test["text"]
    new_xv_test = vectorization.transform(new_x_test)
    pred_LR = LR.predict(new_xv_test)
    pred_DT = DT.predict(new_xv_test)
    pred_GBC = GBC.predict(new_xv_test)
    pred_RFC = RFC.predict(new_xv_test)

    return print("\n\nLR Prediction: {} \nDT Prediction: {} \nGBC Prediction: {} \nRFC Prediction: {}".format(output_lable(pred_LR[0]), 
                                                                                                              output_lable(pred_DT[0]), 
                                                                                                              output_lable(pred_GBC[0]), 
                                                                                                              output_lable(pred_RFC[0])))

In [45]:
pip install SpeechRecognition

Note: you may need to restart the kernel to use updated packages.


In [46]:
pip install pydub

Note: you may need to restart the kernel to use updated packages.


In [48]:
pip install ffmpeg

Note: you may need to restart the kernel to use updated packages.


In [49]:
# importing libraries 
import speech_recognition as sr 
import os 
from pydub import AudioSegment
from pydub.silence import split_on_silence

# create a speech recognition object
r = sr.Recognizer()

# a function that splits the audio file into chunks
# and applies speech recognition
def get_large_audio_transcription(path):
    """
    Splitting the large audio file into chunks
    and apply speech recognition on each of these chunks
    """
    # open the audio file using pydub
    sound = AudioSegment.from_wav(path)  
    # split audio sound where silence is 700 miliseconds or more and get chunks
    chunks = split_on_silence(sound,
        # experiment with this value for your target audio file
        min_silence_len = 500,
        # adjust this per requirement
        silence_thresh = sound.dBFS-14,
        # keep the silence for 1 second, adjustable as well
        keep_silence=500,
    )
    folder_name = "audio-chunks"
    # create a directory to store the audio chunks
    if not os.path.isdir(folder_name):
        os.mkdir(folder_name)
    whole_text = ""
    # process each chunk 
    for i, audio_chunk in enumerate(chunks, start=1):
        # export audio chunk and save it in
        # the `folder_name` directory.
        chunk_filename = os.path.join(folder_name, f"chunk{i}.wav")
        audio_chunk.export(chunk_filename, format="wav")
        # recognize the chunk
        with sr.AudioFile(chunk_filename) as source:
            audio_listened = r.record(source)
            # try converting it to text
            try:
                text = r.recognize_google(audio_listened)
            except sr.UnknownValueError as e:
                print("Error:", str(e))
            else:
                text = f"{text.capitalize()}. "
                print(chunk_filename, ":", text)
                whole_text += text
    # return the text for all chunks detected
    return whole_text

In [63]:
n=2
for i in range (0,n):
    path="D"+str(i)+".wav"
    x=get_large_audio_transcription(path)
    print("\nFull News:",x)
    text_file=open("out"+str(i)+".txt","w")
    text_file.write(x)
    text_file.close()

Error: 
audio-chunks\chunk2.wav : West palm beach florida washington reuters the white house set on friday it was set to kick off torch next week republican and democratic congressional leaders and immigration policy government spending and other issues that need to be wrapped up in the new year. 
audio-chunks\chunk3.wav : Expected salary of legislative activity comes as republicans and democrats begins at the stage for midterm congressional elections in november. 
audio-chunks\chunk4.wav : President trump to your trademark sign as republican party is equal to maintain control of congress democratic look for opening to receive the way in the senate and the house of representatives. 
audio-chunks\chunk5.wav : Invent a proper your trademark finance budget cheap mein mundeya ne and legislative the famed director mart store near me with senate majority leader mitch mcconnell and house speaker paul ryan both republican and democratic candidates and representative names in the white house sa

In [64]:
for i in range(0,n):
  n="out"+str(i)+".txt"
  with open (n, "r") as myfile:
      news = myfile.read()
      print("News",(i+1),": ")
      manual_testing(news)
      print("\n")

News 1 : 


LR Prediction: Real News 
DT Prediction: Real News 
GBC Prediction: Real News 
RFC Prediction: Real News


News 2 : 


LR Prediction: Fake News 
DT Prediction: Fake News 
GBC Prediction: Fake News 
RFC Prediction: Fake News




In [58]:
conda install pyaudio

Collecting package metadata (current_repodata.json): ...working... done
Solving environment: ...working... done

## Package Plan ##

  environment location: C:\Users\dixit\anaconda3

  added / updated specs:
    - pyaudio


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    portaudio-19.6.0           |       he774522_4         202 KB
    pyaudio-0.2.11             |   py38he774522_2         206 KB
    ------------------------------------------------------------
                                           Total:         408 KB

The following NEW packages will be INSTALLED:

  portaudio          pkgs/main/win-64::portaudio-19.6.0-he774522_4
  pyaudio            pkgs/main/win-64::pyaudio-0.2.11-py38he774522_2



Downloading and Extracting Packages

pyaudio-0.2.11       | 206 KB    |            |   0% 
pyaudio-0.2.11       | 206 KB    | 7          |   8% 
pyaudio-0.2.11       | 206 KB    | #####

In [59]:
import speech_recognition as sr
import pandas as pd

#Get audio from the microphone
x = sr.Recognizer()
a=0

n=int(input("Input the number of news you want to input:"))
for i in range (0,n):
    print("\nNews",i+1)
    with sr.Microphone() as source:
        while(a==0):
            audio=x.listen(source)
            try:
                txt1=x.recognize_google(audio)
                print("You said:",txt1)
                a=1
                #n1="out"+i+".txt"
                text_file=open("out"+str(i)+".txt","w")
                text_file.write(txt1)
                text_file.close()
            except:
                print("Could not understand audio")
    a=0

Input the number of news you want to input:2

News 1
Could not understand audio
You said: hello

News 2
Could not understand audio
Could not understand audio
You said: say no to dairy Government Girls University will be asked to seek and undertaking from students that they will not be party to climb of giving are accepting delivery Governor Arif Mohammad Khan assets student will be required to give the undertaking before accepting the decrease and diplomas become governor who is also the chancellor of University said while participating is in past organised by Daniel Organisation on Wednesday against the practice of delivery mistake in said it be writing to Vice Chancellor seeking the introduction of this matter


In [60]:
for i in range(0,n):
  n="out"+str(i)+".txt"
  with open (n, "r") as myfile:
      news = myfile.read()
      print("News",(i+1),": ")
      manual_testing(news)
      print("\n")

News 1 : 


LR Prediction: Fake News 
DT Prediction: Fake News 
GBC Prediction: Fake News 
RFC Prediction: Fake News


News 2 : 


LR Prediction: Real News 
DT Prediction: Fake News 
GBC Prediction: Fake News 
RFC Prediction: Real News


