# IBMZ DATATHON 2021

In this project, we are going to look at 'true.csv' and 'fake.csv', a dataset that contains the fake and correct news between years 2011 and 2012 in the Capital bikeshare system. From the dataset, we are going to apply various machine learning algorithms to generate a model that can predict the number of bike rentals.

For more information on this dataset click here.

In [2]:
from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.preprocessing.sequence import pad_sequences

In [3]:
import tensorflow as tf

In [4]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

In [5]:
true_dataset=pd.read_csv('datasets/true.csv')
fake_dataset=pd.read_csv('datasets/fake.csv')

In [6]:
true_dataset.head()

Unnamed: 0,title,text,subject,date
0,"As U.S. budget fight looms, Republicans flip t...",WASHINGTON (Reuters) - The head of a conservat...,politicsNews,"December 31, 2017"
1,U.S. military to accept transgender recruits o...,WASHINGTON (Reuters) - Transgender people will...,politicsNews,"December 29, 2017"
2,Senior U.S. Republican senator: 'Let Mr. Muell...,WASHINGTON (Reuters) - The special counsel inv...,politicsNews,"December 31, 2017"
3,FBI Russia probe helped by Australian diplomat...,WASHINGTON (Reuters) - Trump campaign adviser ...,politicsNews,"December 30, 2017"
4,Trump wants Postal Service to charge 'much mor...,SEATTLE/WASHINGTON (Reuters) - President Donal...,politicsNews,"December 29, 2017"


In [7]:
true_dataset.iloc[5]['title']

'White House, Congress prepare for talks on spending, immigration'

In [8]:
true_dataset.tail()

Unnamed: 0,title,text,subject,date
21412,'Fully committed' NATO backs new U.S. approach...,BRUSSELS (Reuters) - NATO allies on Tuesday we...,worldnews,"August 22, 2017"
21413,LexisNexis withdrew two products from Chinese ...,"LONDON (Reuters) - LexisNexis, a provider of l...",worldnews,"August 22, 2017"
21414,Minsk cultural hub becomes haven from authorities,MINSK (Reuters) - In the shadow of disused Sov...,worldnews,"August 22, 2017"
21415,Vatican upbeat on possibility of Pope Francis ...,MOSCOW (Reuters) - Vatican Secretary of State ...,worldnews,"August 22, 2017"
21416,Indonesia to buy $1.14 billion worth of Russia...,JAKARTA (Reuters) - Indonesia will buy 11 Sukh...,worldnews,"August 22, 2017"


In [9]:
fake_dataset.head()

Unnamed: 0,title,text,subject,date
0,Donald Trump Sends Out Embarrassing New Year’...,Donald Trump just couldn t wish all Americans ...,News,"December 31, 2017"
1,Drunk Bragging Trump Staffer Started Russian ...,House Intelligence Committee Chairman Devin Nu...,News,"December 31, 2017"
2,Sheriff David Clarke Becomes An Internet Joke...,"On Friday, it was revealed that former Milwauk...",News,"December 30, 2017"
3,Trump Is So Obsessed He Even Has Obama’s Name...,"On Christmas day, Donald Trump announced that ...",News,"December 29, 2017"
4,Pope Francis Just Called Out Donald Trump Dur...,Pope Francis used his annual Christmas Day mes...,News,"December 25, 2017"


In [10]:
fake_dataset.iloc[0]['title']

' Donald Trump Sends Out Embarrassing New Year’s Eve Message; This is Disturbing'

In [11]:
print(f"Shape of true dataset is {true_dataset.shape}")
print(f"Shape of fake dataset is {fake_dataset.shape}")

Shape of true dataset is (21417, 4)
Shape of fake dataset is (23481, 4)


In [12]:
true_dataset["label"]=1
fake_dataset["label"]=0

In [13]:
full_dataset=true_dataset.append(fake_dataset)

In [14]:
full_dataset.iloc[0]

title      As U.S. budget fight looms, Republicans flip t...
text       WASHINGTON (Reuters) - The head of a conservat...
subject                                         politicsNews
date                                      December 31, 2017 
label                                                      1
Name: 0, dtype: object

In [15]:
full_dataset=full_dataset.sample(frac = 1).reset_index(drop=True)

In [16]:
full_dataset.head()

Unnamed: 0,title,text,subject,date,label
0,VIDEO: The Dallas Shooting Agenda,Daily Shooter 21st Century WireThe Dallas Snip...,US_News,"July 12, 2016",0
1,Joy Behar Delivers BRUTALLY Honest Message To...,Ever since Donald Trump lost the popular vote ...,News,"January 25, 2017",0
2,"More Republicans expect Clinton, rather than T...",NEW YORK (Reuters) - More Republicans now thin...,politicsNews,"October 26, 2016",1
3,Yemen war needs a political solution: U.S. def...,RIYADH (Reuters) - A political solution throug...,politicsNews,"April 17, 2017",1
4,JUST IN: TRUMP ENDS FREE MONEY TRAIN After Pak...,The Trump administration on Friday announced t...,politics,"Dec 30, 2017",0


In [17]:
vocab_size = 10000
embedding_dim = 16
max_length = 100
trunc_type='post'
padding_type='post'
oov_tok = "<OOV>"
training_size = int(len(full_dataset)*0.8)

In [18]:
titles=[]
texts=[]
target=[]

In [19]:
for i in full_dataset['title']:
    titles.append(i)
for i in full_dataset['text']:
    texts.append(i)
for i in full_dataset['label']:
    target.append(i)

In [20]:
titles

['VIDEO: The Dallas Shooting Agenda',
 ' Joy Behar Delivers BRUTALLY Honest Message To Trump That He’ll Have To Agree With (VIDEO)',
 'More Republicans expect Clinton, rather than Trump, to win U.S. election',
 'Yemen war needs a political solution: U.S. defense secretary',
 'JUST IN: TRUMP ENDS FREE MONEY TRAIN After Pakistan Refuses To Stop Housing Terrorists…Cuts STUNNING About Of Military Aid\xa0',
 'DESPERATION OR STUPIDITY? GERMAN State Recruits REFUGEES With No Passports For Police Officer Jobs',
 'Trump to receive first security briefing: sources',
 ' Hillary Clinton and Bernie Sanders Join Forces To Sue Arizona For Voter Suppression',
 'AWESOME! SEAN SPICER Gives Trump’s Salary Away At Press Briefing [Video]',
 ' Email ‘Scandal’ Unraveling: Explosive NBC Report Blows Up Major Anti-Hillary Talking Point',
 'Two more women accuse Senate candidate Moore of sexual misconduct',
 "Turkey's Vakifbank denies involvement\xa0in processes mentioned in U.S. lawsuit",
 'U.N. chief Guterres

In [21]:
titles[10]

'Two more women accuse Senate candidate Moore of sexual misconduct'

In [22]:
from sklearn.model_selection import train_test_split

In [23]:
titles_list=[]
texts_list=[]
target_list=[]

In [24]:
dataset=true_dataset.append(fake_dataset)
dataset=dataset.reset_index(drop=True)
for i in dataset['title']:
    titles_list.append(i)
for i in dataset['text']:
    texts_list.append(i)
for i in dataset['label']:
    target_list.append(i)

In [34]:
training_s,testing_s,training_l,testing_l=train_test_split(titles_list,target_list,test_size=0.4,random_state=40,stratify=target_list)

In [36]:
training_s=np.array(training_s)
testing_l=np.array(testing_l)
testing_s=np.array(testing_s)
training_l=np.array(training_l)

In [37]:
tokenizer_l=Tokenizer(num_words=vocab_size,oov_token=oov_tok)
tokenizer_l.fit_on_texts(training_s)

In [82]:
training_sequences_l=tokenizer_l.texts_to_sequences(training_s)
training_padded_l=pad_sequences(training_sequences_l,maxlen=max_length,padding=padding_type,truncating=trunc_type)

In [84]:
testing_sequences_l=tokenizer_l.texts_to_sequences(testing_s)
testing_padded_l=pad_sequences(testing_sequences_l,maxlen=max_length,padding=padding_type,truncating=trunc_type)

In [85]:
temp_model=tf.keras.Sequential([
    tf.keras.layers.Embedding(vocab_size,embedding_dim,input_length=max_length),
    tf.keras.layers.GlobalAveragePooling1D(),
    tf.keras.layers.Dense(24,activation='relu'),
    tf.keras.layers.Dense(1,activation='sigmoid')
])

temp_model.compile(loss='binary_crossentropy',optimizer="adam",metrics=['accuracy'])

In [86]:
temp_model.summary()

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding (Embedding)        (None, 100, 16)           160000    
_________________________________________________________________
global_average_pooling1d (Gl (None, 16)                0         
_________________________________________________________________
dense (Dense)                (None, 24)                408       
_________________________________________________________________
dense_1 (Dense)              (None, 1)                 25        
Total params: 160,433
Trainable params: 160,433
Non-trainable params: 0
_________________________________________________________________


In [89]:
num_epochs = 30
history = temp_model.fit(training_padded_l, training_l, epochs=num_epochs, validation_data=(testing_padded_l, testing_l), verbose=2)

Epoch 1/30
842/842 - 2s - loss: 0.4074 - accuracy: 0.8401 - val_loss: 0.1617 - val_accuracy: 0.9478
Epoch 2/30
842/842 - 2s - loss: 0.1175 - accuracy: 0.9601 - val_loss: 0.1081 - val_accuracy: 0.9601
Epoch 3/30
842/842 - 2s - loss: 0.0780 - accuracy: 0.9726 - val_loss: 0.0938 - val_accuracy: 0.9654
Epoch 4/30
842/842 - 2s - loss: 0.0587 - accuracy: 0.9799 - val_loss: 0.0862 - val_accuracy: 0.9681
Epoch 5/30
842/842 - 2s - loss: 0.0472 - accuracy: 0.9842 - val_loss: 0.0846 - val_accuracy: 0.9694
Epoch 6/30
842/842 - 2s - loss: 0.0379 - accuracy: 0.9880 - val_loss: 0.0834 - val_accuracy: 0.9693
Epoch 7/30
842/842 - 2s - loss: 0.0305 - accuracy: 0.9906 - val_loss: 0.0920 - val_accuracy: 0.9668
Epoch 8/30
842/842 - 2s - loss: 0.0249 - accuracy: 0.9921 - val_loss: 0.0892 - val_accuracy: 0.9684
Epoch 9/30
842/842 - 2s - loss: 0.0198 - accuracy: 0.9939 - val_loss: 0.0902 - val_accuracy: 0.9704
Epoch 10/30
842/842 - 2s - loss: 0.0166 - accuracy: 0.9948 - val_loss: 0.0942 - val_accuracy: 0.9703

In [90]:
pickle.dump(tokenizer_l, open('heading_tokenizer_l.pkl', 'wb'))


In [91]:
temp_model.save("fakenews_temp")

Instructions for updating:
This property should not be used in TensorFlow 2.0, as updates are applied automatically.
Instructions for updating:
This property should not be used in TensorFlow 2.0, as updates are applied automatically.
INFO:tensorflow:Assets written to: fakenews_temp\assets


In [None]:
vocab_size = 10000
embedding_dim = 16
max_length = 100
trunc_type='post'
padding_type='post'
oov_tok = "<OOV>"
#training_size = int(len(full_dataset)*0.8)

In [39]:
training_t,testing_t,training_tl,testing_tl=train_test_split(texts_list,target_list,test_size=0.4,random_state=40,stratify=target_list)

In [40]:
training_t=np.array(training_t)
testing_tl=np.array(testing_tl)
testing_t=np.array(testing_t)
training_tl=np.array(training_tl)

In [41]:
tokenizer_t=Tokenizer(num_words=vocab_size,oov_token=oov_tok)
tokenizer_t.fit_on_texts(training_t)

In [42]:
training_sequences_t=tokenizer_t.texts_to_sequences(training_t)
training_padded_t=pad_sequences(training_sequences_t,maxlen=max_length,padding=padding_type,truncating=trunc_type)

In [43]:
testing_sequences_t=tokenizer_t.texts_to_sequences(testing_t)
testing_padded_t=pad_sequences(testing_sequences_t,maxlen=max_length,padding=padding_type,truncating=trunc_type)

In [49]:
temp_model_text=tf.keras.Sequential([
    tf.keras.layers.Embedding(vocab_size,embedding_dim,input_length=1000),
    tf.keras.layers.GlobalAveragePooling1D(),
    tf.keras.layers.Dense(24,activation='relu'),
    tf.keras.layers.Dense(1,activation='sigmoid')
])

temp_model_text.compile(loss='binary_crossentropy',optimizer="adam",metrics=['accuracy'])

In [50]:
temp_model_text.summary()

Model: "sequential_2"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding_2 (Embedding)      (None, 1000, 16)          160000    
_________________________________________________________________
global_average_pooling1d_2 ( (None, 16)                0         
_________________________________________________________________
dense_4 (Dense)              (None, 24)                408       
_________________________________________________________________
dense_5 (Dense)              (None, 1)                 25        
Total params: 160,433
Trainable params: 160,433
Non-trainable params: 0
_________________________________________________________________


In [51]:
num_epochs = 30
history = temp_model_text.fit(training_padded_t, training_tl, epochs=num_epochs, validation_data=(testing_padded_t, testing_tl), verbose=2)

Epoch 1/30
842/842 - 2s - loss: 0.2092 - accuracy: 0.9617 - val_loss: 0.0427 - val_accuracy: 0.9891
Epoch 2/30
842/842 - 2s - loss: 0.0230 - accuracy: 0.9947 - val_loss: 0.0207 - val_accuracy: 0.9946
Epoch 3/30
842/842 - 2s - loss: 0.0075 - accuracy: 0.9987 - val_loss: 0.0148 - val_accuracy: 0.9957
Epoch 4/30
842/842 - 2s - loss: 0.0026 - accuracy: 0.9997 - val_loss: 0.0129 - val_accuracy: 0.9960
Epoch 5/30
842/842 - 2s - loss: 0.0012 - accuracy: 0.9999 - val_loss: 0.0126 - val_accuracy: 0.9962
Epoch 6/30
842/842 - 2s - loss: 6.9141e-04 - accuracy: 1.0000 - val_loss: 0.0125 - val_accuracy: 0.9963
Epoch 7/30
842/842 - 2s - loss: 5.2345e-04 - accuracy: 1.0000 - val_loss: 0.0127 - val_accuracy: 0.9963
Epoch 8/30
842/842 - 2s - loss: 4.4040e-04 - accuracy: 1.0000 - val_loss: 0.0129 - val_accuracy: 0.9967
Epoch 9/30
842/842 - 2s - loss: 4.1810e-04 - accuracy: 1.0000 - val_loss: 0.0131 - val_accuracy: 0.9965
Epoch 10/30
842/842 - 2s - loss: 3.9654e-04 - accuracy: 1.0000 - val_loss: 0.0131 - 

In [53]:
import pickle

In [56]:
pickle.dump(tokenizer_t, open('text_tokenizer_l.pkl', 'wb'))

In [55]:
temp_model_text.save("fakenews_temp_text")

Instructions for updating:
This property should not be used in TensorFlow 2.0, as updates are applied automatically.
Instructions for updating:
This property should not be used in TensorFlow 2.0, as updates are applied automatically.
INFO:tensorflow:Assets written to: fakenews_temp_text\assets


In [121]:
training_sentences = np.array(titles[:training_size])
testing_sentences  = np.array(titles[training_size:44898])
training_labels    = np.array(target[:training_size])
testing_labels     = np.array(target[training_size:])

In [122]:
training_sentences.shape

(35918,)

In [123]:
testing_sentences.shape

(8980,)

In [124]:
training_labels.shape

(35918,)

In [125]:
testing_labels.shape

(8980,)

In [126]:
np.array(titles[training_size:53878]).shape

(8980,)

In [127]:
testing_sentences.shape

(8980,)

In [128]:
testing_labels.shape

(8980,)

In [129]:
35918+8980

44898

In [130]:
tokenizer=Tokenizer(num_words=vocab_size,oov_token=oov_tok)
tokenizer.fit_on_texts(training_sentences)

In [131]:
import pickle
pickle.dump(tokenizer, open('heading_tokenizer.pkl', 'wb'))

In [85]:
word_index=tokenizer.word_index

In [86]:
training_sequences=tokenizer.texts_to_sequences(training_sentences)
training_padded=pad_sequences(training_sequences,maxlen=max_length,padding=padding_type,truncating=trunc_type)

In [87]:
testing_sequences=tokenizer.texts_to_sequences(testing_sentences)
testing_padded=pad_sequences(testing_sequences,maxlen=max_length,padding=padding_type,truncating=trunc_type)

In [88]:
model=tf.keras.Sequential([
    tf.keras.layers.Embedding(vocab_size,embedding_dim,input_length=max_length),
    tf.keras.layers.GlobalAveragePooling1D(),
    tf.keras.layers.Dense(24,activation='relu'),
    tf.keras.layers.Dense(1,activation='sigmoid')
])

model.compile(loss='binary_crossentropy',optimizer="adam",metrics=['accuracy'])

In [89]:
model.summary()

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding (Embedding)        (None, 100, 16)           160000    
_________________________________________________________________
global_average_pooling1d (Gl (None, 16)                0         
_________________________________________________________________
dense (Dense)                (None, 24)                408       
_________________________________________________________________
dense_1 (Dense)              (None, 1)                 25        
Total params: 160,433
Trainable params: 160,433
Non-trainable params: 0
_________________________________________________________________


In [90]:
num_epochs = 30
history = model.fit(training_padded, training_labels, epochs=num_epochs, validation_data=(testing_padded, testing_labels), verbose=2)

Epoch 1/30
1123/1123 - 3s - loss: 0.3635 - accuracy: 0.8610 - val_loss: 0.1474 - val_accuracy: 0.9455
Epoch 2/30
1123/1123 - 3s - loss: 0.1060 - accuracy: 0.9630 - val_loss: 0.0940 - val_accuracy: 0.9648
Epoch 3/30
1123/1123 - 3s - loss: 0.0735 - accuracy: 0.9742 - val_loss: 0.0794 - val_accuracy: 0.9689
Epoch 4/30
1123/1123 - 3s - loss: 0.0575 - accuracy: 0.9802 - val_loss: 0.0802 - val_accuracy: 0.9684
Epoch 5/30
1123/1123 - 3s - loss: 0.0459 - accuracy: 0.9845 - val_loss: 0.0728 - val_accuracy: 0.9722
Epoch 6/30
1123/1123 - 3s - loss: 0.0383 - accuracy: 0.9874 - val_loss: 0.0705 - val_accuracy: 0.9744
Epoch 7/30
1123/1123 - 3s - loss: 0.0318 - accuracy: 0.9899 - val_loss: 0.0747 - val_accuracy: 0.9714
Epoch 8/30
1123/1123 - 3s - loss: 0.0264 - accuracy: 0.9917 - val_loss: 0.0702 - val_accuracy: 0.9755
Epoch 9/30
1123/1123 - 3s - loss: 0.0225 - accuracy: 0.9930 - val_loss: 0.0796 - val_accuracy: 0.9738
Epoch 10/30
1123/1123 - 2s - loss: 0.0183 - accuracy: 0.9952 - val_loss: 0.0776 - 

In [91]:
sentence=[
    "granny starting to fear spiders in the garden might be real",
    "the weather today is bright and sunny",
    "Progressive Couple Thrilled With Latest Mandates",
    "Biden won US election against Trump in fair ways,' Republican-funded review claims"
]

In [92]:
sequences=tokenizer.texts_to_sequences(sentence)

In [93]:
padded=pad_sequences(sequences,maxlen=max_length,padding=padding_type,truncating=trunc_type)

In [94]:
print(model.predict(padded))

[[1.2463725e-12]
 [3.5248686e-05]
 [2.1977275e-08]
 [8.6118162e-01]]


In [95]:
model.save("fakenews")

Instructions for updating:
This property should not be used in TensorFlow 2.0, as updates are applied automatically.
Instructions for updating:
This property should not be used in TensorFlow 2.0, as updates are applied automatically.
INFO:tensorflow:Assets written to: fakenews\assets


In [11]:
from tensorflow import keras
from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.preprocessing.sequence import pad_sequences
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import tensorflow as tf
import pickle

In [12]:
new_models=keras.models.load_model("fakenews")

In [13]:
new_models.summary()

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding (Embedding)        (None, 100, 16)           160000    
_________________________________________________________________
global_average_pooling1d (Gl (None, 16)                0         
_________________________________________________________________
dense (Dense)                (None, 24)                408       
_________________________________________________________________
dense_1 (Dense)              (None, 1)                 25        
Total params: 160,433
Trainable params: 160,433
Non-trainable params: 0
_________________________________________________________________


In [14]:
sentence=[
    "granny starting to fear spiders in the garden might be real",
    "the weather today is bright and sunny",
    "Progressive Couple Thrilled With Latest Mandates",
    "'Biden won US election against Trump in fair ways,' Republican-funded review claims"
]

In [15]:
vocab_size = 10000
embedding_dim = 16
max_length = 100
trunc_type='post'
padding_type='post'
oov_tok = "<OOV>"
#training_size = int(len(full_dataset)*0.8)

In [16]:
tokenizer=pickle.load(open('heading_tokenizer.pkl','rb'))
#tokenizer.fit_on_texts(training_sentences)

In [17]:
sequences=tokenizer.texts_to_sequences(sentence)
padded=pad_sequences(sequences,maxlen=max_length,padding=padding_type,truncating=trunc_type)

In [18]:
print(new_models.predict(padded))

[[3.4875745e-15]
 [2.1293759e-04]
 [1.5511760e-01]
 [9.9997640e-01]]


In [19]:
type(new_models.predict(padded))

numpy.ndarray

In [20]:
x=new_models.predict(padded)

In [21]:
x

array([[3.4875745e-15],
       [2.1293759e-04],
       [1.5511760e-01],
       [9.9997640e-01]], dtype=float32)

In [22]:
x[0]

array([3.4875745e-15], dtype=float32)

In [23]:
x[0][0]

3.4875745e-15

In [26]:
float(x[0][0])

3.4875745158160403e-15

In [29]:
print(float("{:.8f}".format(x[2][0])))

0.1551176


In [125]:
from tensorflow import keras
from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.preprocessing.sequence import pad_sequences
#import pandas as pd
import numpy as np
#import matplotlib.pyplot as plt
import tensorflow as tf
import pickle

def predict(sentence):
    tokenizer=pickle.load(open('heading_tokenizer_l.pkl','rb'))
    new_models=keras.models.load_model("fakenews_temp")
    vocab_size = 10000
    embedding_dim = 16
    max_length = 100
    trunc_type='post'
    padding_type='post'
    oov_tok = "<OOV>"

    sequences=tokenizer.texts_to_sequences(sentence)
    padded=pad_sequences(sequences,maxlen=max_length,padding=padding_type,truncating=trunc_type)

    #print(new_models.predict(padded))
    x=new_models.predict(padded)

    #[print(float("{:.8f}".format(i[0]))) for i in x]
    return [float("{:.8f}".format(i[0])) for i in x]

    #print("Hello World")

In [131]:
y=[]
for i in range(50):
    y.append(training_s[i])
q=predict(y)
for i in range(50):
    print(q[i],training_l[i])

1.0 1
0.99998367 1
8.83e-06 0
0.99981993 1
1.3e-07 0
0.99999982 1
0.0 0
0.0 0
0.99999571 1
0.0 0
0.99999964 1
0.9999994 1
0.0066916 0
0.0 0
0.0 0
1.675e-05 0
0.99998116 1
0.99999988 1
0.0 0
0.99999923 1
0.99998736 1
1.0 1
0.99998462 1
0.99999905 1
3.35e-05 0
1e-08 0
0.99991667 1
0.0 0
1.0 1
1e-08 0
0.00011879 0
0.9999724 1
0.99999976 1
0.0 0
0.0 0
0.99999976 1
9.5e-07 0
0.99999756 1
0.99999964 1
0.99938601 1
0.0 0
0.99999869 1
1.09e-06 0
0.00447515 0
0.0 0
4.362e-05 0
0.99999917 1
0.0 0
1.0 1
0.9999969 1


In [133]:
y=[]
for i in range(50):
    y.append(testing_s[i])
q=predict(y)
for i in range(50):
    print(i,q[i],testing_l[i])

0 0.0 0
1 0.95685339 1
2 0.99999511 1
3 1.988e-05 0
4 0.99999934 1
5 0.99999785 1
6 0.0 0
7 0.0 0
8 0.9999845 0
9 1.0 1
10 0.99999982 1
11 0.0 0
12 0.99999386 1
13 0.0 0
14 0.99999917 1
15 0.0 0
16 0.99999619 1
17 0.0 0
18 0.0 0
19 1.0 1
20 0.0 0
21 1.0 1
22 1.0 1
23 0.9999975 1
24 0.0 0
25 0.0 0
26 0.0 0
27 0.99999422 1
28 7e-08 0
29 1.0 1
30 0.0 0
31 0.82988662 1
32 0.9999997 1
33 0.99996161 1
34 0.99999499 1
35 0.99999976 1
36 0.0 0
37 0.99999702 1
38 1.0 1
39 0.10930344 0
40 0.99993241 0
41 0.0 0
42 1.0 1
43 1.0 1
44 0.0 0
45 3.8e-07 0
46 0.0 0
47 0.0 0
48 0.0 0
49 0.0 0


In [135]:
print(y[40])

CONSERVATIVES FIGHT BACK Against Proposed “Obamacare Lite”…DEMAND Full Repeal Of Obamacare


# Try it out

In [57]:
to_be_predicted_heading=input("Enter your article heading here")
to_be_predicted_text=input("Enter your article text here")

Enter your article heading hereCONSERVATIVES FIGHT BACK Against Proposed “Obamacare Lite”…DEMAND Full Repeal Of Obamacare
Enter your article text hereCONSERVATIVES FIGHT BACK Against Proposed “Obamacare Lite”…DEMAND Full Repeal Of Obamacare


In [58]:
from nltk.corpus import stopwords

In [60]:
stopwords.words('english')

['i',
 'me',
 'my',
 'myself',
 'we',
 'our',
 'ours',
 'ourselves',
 'you',
 "you're",
 "you've",
 "you'll",
 "you'd",
 'your',
 'yours',
 'yourself',
 'yourselves',
 'he',
 'him',
 'his',
 'himself',
 'she',
 "she's",
 'her',
 'hers',
 'herself',
 'it',
 "it's",
 'its',
 'itself',
 'they',
 'them',
 'their',
 'theirs',
 'themselves',
 'what',
 'which',
 'who',
 'whom',
 'this',
 'that',
 "that'll",
 'these',
 'those',
 'am',
 'is',
 'are',
 'was',
 'were',
 'be',
 'been',
 'being',
 'have',
 'has',
 'had',
 'having',
 'do',
 'does',
 'did',
 'doing',
 'a',
 'an',
 'the',
 'and',
 'but',
 'if',
 'or',
 'because',
 'as',
 'until',
 'while',
 'of',
 'at',
 'by',
 'for',
 'with',
 'about',
 'against',
 'between',
 'into',
 'through',
 'during',
 'before',
 'after',
 'above',
 'below',
 'to',
 'from',
 'up',
 'down',
 'in',
 'out',
 'on',
 'off',
 'over',
 'under',
 'again',
 'further',
 'then',
 'once',
 'here',
 'there',
 'when',
 'where',
 'why',
 'how',
 'all',
 'any',
 'both',
 'each

In [61]:
type(stopwords.words('english'))

list