<a href="https://colab.research.google.com/github/mohithxoxo/NLP/blob/master/nlp_with_Bidirectional.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Importing Libraries

In [1]:
import numpy as np
import pandas as pd
import tensorflow as tf 
import tensorflow_datasets as tfds 
import os

In [None]:
real = pd.read_csv('https://raw.githubusercontent.com/KaiDMML/FakeNewsNet/master/dataset/politifact_real.csv')
fake = pd.read_csv('https://raw.githubusercontent.com/KaiDMML/FakeNewsNet/master/dataset/politifact_fake.csv')

# Pre-Processing and Cleaning

In [19]:
fake = fake.drop(columns=['id','news_url','tweet_ids'])
real = real.drop(columns=['id','news_url','tweet_ids'])
fake['label'] = 1
real['label'] = 0
dataset = real.append(fake, ignore_index = True)

# Encoding the Corpus

In [21]:
vocab_size = 10000
encoder = tfds.features.text.SubwordTextEncoder.build_from_corpus(dataset["title"], vocab_size)

Here, we create a function to encode the DataFrame by looping through all the sentences in the corpus, with "**post**" padding using the [**tf.keras.preprocessing.sequence.pad_sequences()**](https://www.tensorflow.org/api_docs/python/tf/keras/preprocessing/sequence/pad_sequences?hl=en) function.

In [23]:
def enc(dataframe):
    tokenized = []
    for sentence in dataframe["title"].values:
        tokenized.append(encoder.encode(sentence))
    out = tf.keras.preprocessing.sequence.pad_sequences(tokenized, padding = "post")
    return out
x = enc(dataset)

In [24]:
x

array([[ 111, 3752,    4, ...,    0,    0,    0],
       [ 496,    9, 3757, ...,    0,    0,    0],
       [ 158,  771, 1628, ...,    0,    0,    0],
       ...,
       [3877, 4236, 3876, ...,    0,    0,    0],
       [ 181,  223,  303, ...,    0,    0,    0],
       [ 699,  535,    0, ...,    0,    0,    0]], dtype=int32)

In [25]:
y = dataset["label"]
print(y)

0       0
1       0
2       0
3       0
4       0
       ..
1051    1
1052    1
1053    1
1054    1
1055    1
Name: label, Length: 1056, dtype: int64


In [26]:
# Model Definition
model = tf.keras.Sequential([
    tf.keras.layers.Embedding(encoder.vocab_size, 64), # Embedding Layer using the vocab-size from encoder
    tf.keras.layers.Bidirectional(tf.keras.layers.LSTM(64,  return_sequences=True)), # Create the first Bidirectional layer with 64 LSTM units
    tf.keras.layers.Bidirectional(tf.keras.layers.LSTM(32)), # Second Bidirectional layer witth 32 LSTM units
    tf.keras.layers.Dense(64, activation='relu'), # A Dense Layer with 64 units
    tf.keras.layers.Dropout(0.5), # 50% Dropout
    tf.keras.layers.Dense(1) # Final Dense layer with a single unit
])

model.compile(optimizer='adam', loss='binary_crossentropy', metrics= ['acc']) # Compiling the Model

# Training the Model

In [28]:
history = model.fit(x,y, epochs = 20)

Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20


# Predicting with the Model

Here, we write 2 functions to predict using the model. A pad_to_size function to pad our vectors and a sample_predict function to encode a string and predict using the model.

In [44]:
def pad_to_size(vec, size):
  zero = [0] * (size - len(vec))
  vec.extend(zeros)
  return vec

def sample_predict(sample_pred_text, pad):
  encoded_sample_pred_text = encoder.encode(sample_pred_text)

  if pad:
    encoded_sample_pred_text = pad_to_size(encoded_sample_pred_text, 64)
  encoded_sample_pred_text = tf.cast(encoded_sample_pred_text, tf.float32)
  predictions = model.predict(tf.expand_dims(encoded_sample_pred_text, 0))

  return (predictions)

sample_pred_text = ('ELECTORAL COLLEGE ELECTOR COMMITS SUICIDE TO AVOID VOTING FOR DONALD TRUMP') #fake news
predictions = sample_predict(sample_pred_text, pad=False)
if predictions > 0 :
  print('Fake News')
else :
  print('Real News')

Fake News


 # Download the Model Weights for Yourself

In [None]:
model.save('my_model.h5') 
import os
from IPython.display import FileLink
FileLink(r'my_model.h5')