# Link to GitHub repo with code: https://github.com/prajwalseth/Advanced-ML-HW3/

## Can you use the following data to build....?
1. A model with an embedding layer and dense layers (but w/ no layers meant for sequential data)
2.  A model using Conv1d Layers
3.  A model with one sequential layer (LSTM or GRU)
4. A model with stacked sequential layers (LSTM or GRU)
5. A model with bidirectional sequential layers 

### After choosing a model, feed it some realistic tweets that are not from your training data to see if it returns meaningful/useful results.






Citation for dataset: Shahi, Gautam Kishore, Anne Dirkson, and Tim A. Majchrzak. "An exploratory study of covid-19 misinformation on twitter." Online Social Networks and Media 22 (2021): 100104.

## Import Data

In [167]:
#Source:Fighting an Infodemic: COVID-19 Fake News Dataset, https://github.com/diptamath/covid_fake_news,https://arxiv.org/abs/2011.03327 

import pandas as pd
trainingdata=pd.read_csv("https://raw.githubusercontent.com/diptamath/covid_fake_news/main/data/Constraint_Train.csv", usecols = ['tweet','label'])
testdata=pd.read_csv("https://raw.githubusercontent.com/diptamath/covid_fake_news/main/data/english_test_with_labels.csv", usecols = ['tweet','label'])

#trainingdata

## Present examples of tweets from the dataset that demonstrate real information or misinformation

In [168]:
real_tweets = list(trainingdata[trainingdata['label'] == 'real']['tweet'])
fake_tweets = list(trainingdata[trainingdata['label'] == 'fake']['tweet'])

### Examples of real tweets from the training dataset

In [169]:
real_tweets[0]

'The CDC currently reports 99031 deaths. In general the discrepancies in death counts between different sources are small and explicable. The death toll stands at roughly 100000 people today.'

In [170]:
real_tweets[1]

'States reported 1121 deaths a small rise from last Tuesday. Southern states reported 640 of those deaths. https://t.co/YASGRTT4ux'

In [171]:
real_tweets[2]

'#IndiaFightsCorona: We have 1524 #COVID testing laboratories in India and as on 25th August 2020 36827520 tests have been done : @ProfBhargava DG @ICMRDELHI #StaySafe #IndiaWillWin https://t.co/Yh3ZxknnhZ'

In [172]:
real_tweets[3]

'Populous states can generate large case counts but if you look at the new cases per million today 9 smaller states are showing more cases per million than California or Texas: AL AR ID KS KY LA MS NV and SC. https://t.co/1pYW6cWRaS'

In [173]:
real_tweets[4]

'Covid Act Now found "on average each person in Illinois with COVID-19 is infecting 1.11 other people. Data shows that the infection growth rate has declined over time this factors in the stay-at-home order and other restrictions put in place." https://t.co/hhigDd24fE'

### Examples of fake tweets from the training dataset

In [174]:
fake_tweets[0]

'Politically Correct Woman (Almost) Uses Pandemic as Excuse Not to Reuse Plastic Bag https://t.co/thF8GuNFPe #coronavirus #nashville'

In [175]:
fake_tweets[1]

'Obama Calls Trump’s Coronavirus Response A Chaotic Disaster https://t.co/DeDqZEhAsB'

In [176]:
fake_tweets[2]

'???Clearly, the Obama administration did not leave any kind of game plan for something like this.??�'

In [177]:
fake_tweets[3]

'Retraction—Hydroxychloroquine or chloroquine with or without a macrolide for treatment of COVID-19: a multinational registry analysis - The Lancet https://t.co/L5V2x6G9or'

In [178]:
fake_tweets[4]

'The NBA is poised to restart this month. In March we reported on how the Utah Jazz got 58 coronavirus tests in a matter of hours at a time when U.S. testing was sluggish. https://t.co/I8YjjrNoTh https://t.co/o0Nk6gpyos'

## Discuss the dataset in general terms and describe why building a predictive model using this data might be practically useful.  Who could benefit from a model like this? Explain.


The training and test dataset contain 6420 and 2140 rows respectively of COVID-19 related tweets along with a label to signify whether the tweet content is real (verified information) or fake (misinformation). Building a predictive model using this dataset might be practically useful to news organizations who might need to quickly fact-check tweets before reporting them. This would help build trust among the general public, by not allowing tweets containing misinformation to be spread in the news. Building such a predictive model can also benefit researchers who might be interested in ascertaining which politician retweeted the most amount of information.

## Define Preprocessor

In [179]:
from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.preprocessing.sequence import pad_sequences
import numpy as np

# Build vocabulary from training text data
tokenizer = Tokenizer(num_words=10000)
tokenizer.fit_on_texts(trainingdata.tweet)

# preprocessor tokenizes words and makes sure all documents have the same length
def preprocessor(data, maxlen, max_words):

    sequences = tokenizer.texts_to_sequences(data)

    word_index = tokenizer.word_index
    X = pad_sequences(sequences, maxlen=maxlen)

    return X

## Prepare Train and Test Data

In [180]:
# tokenize and pad X data
X_train = preprocessor(trainingdata.tweet, maxlen=40, max_words=10000)
X_test = preprocessor(testdata.tweet, maxlen=40, max_words=10000)

# ohe encode Y data
y_train = pd.get_dummies(trainingdata.label)
y_test = pd.get_dummies(testdata.label)

In [181]:
print(X_train.shape)
print(X_test.shape)

(6420, 40)
(2140, 40)


## Model 0 (1 embedding layer and 1 dense layer but no layers meant for sequential data) 

In [442]:
from tensorflow.keras.layers import Dense, Embedding,Flatten
from tensorflow.keras.models import Sequential

# replace this model with the architectures from the task description
model = Sequential()
model.add(Embedding(10000, 16, input_length=40))
model.add(Flatten())
model.add(Dense(2, activation='softmax'))

model.compile(optimizer='rmsprop', loss='categorical_crossentropy', metrics=['acc'])

history = model.fit(X_train, y_train,
                    epochs=10,
                    batch_size=32,
                    validation_split=0.2)

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


In [None]:
# format y_pred as labels 
y_pred = model.predict(X_test).argmax(axis=1)
predicted_labels = [y_test.columns[i] for i in y_pred]
predicted_labels[0:5]

['real', 'fake', 'fake', 'real', 'real']

## Submit Model 0

In [443]:
# install aimodelshare library
%%capture
! pip install aimodelshare --upgrade --extra-index-url https://test.pypi.org/simple/ 

In [444]:
import aimodelshare as ai
from aimodelshare.aimsonnx import model_to_onnx

In [445]:
# save preprocessor
ai.export_preprocessor(preprocessor,"")

In [446]:
# save model in onnx format
onnx_model = model_to_onnx(model, framework='keras',
                          transfer_learning=False,
                          deep_learning=True)

with open("onnx_model.onnx", "wb") as f:
    f.write(onnx_model.SerializeToString())

INFO:tensorflow:Assets written to: /tmp/assets


INFO:tensorflow:Assets written to: /tmp/assets


In [447]:
# set credentials for modeltoapi function 
# make sure you have uploaded your credentials.txt file
from aimodelshare.aws import set_credentials
api_url = "https://wvr23l2z9i.execute-api.us-east-1.amazonaws.com/prod/m"

set_credentials(apiurl=api_url,credential_file="credentials.txt", type="submit_model", manual=False)

AI Model Share login credentials set successfully.
AWS credentials set successfully.


In [449]:
# submit model and predictions to competition
ai.submit_model("onnx_model.onnx",
                api_url,
                prediction_submission=predicted_labels,
                preprocessor="preprocessor.zip")

'Your model has been submitted as model version 59'

In [450]:
# check leaderboard
data=ai.get_leaderboard(api_url, verbose=3)
ai.leaderboard.stylize_leaderboard(data)

Unnamed: 0,accuracy,f1_score,precision,recall,ml_framework,transfer_learning,deep_learning,model_type,depth,num_params,bidirectional_layers,conv1d_layers,dense_layers,embedding_layers,flatten_layers,globalmaxpooling1d_layers,lstm_layers,maxpooling1d_layers,simplernn_layers,relu_act,sigmoid_act,softmax_act,tanh_act,loss,optimizer,model_config,username,version
0,94.86%,94.85%,94.84%,94.87%,keras,False,True,Sequential,5,1035746,,,2,1,,,2.0,,,1.0,,1.0,2.0,str,RMSprop,"{'name': 'sequential_3', 'laye...",kagenlim,19
1,94.49%,94.47%,94.47%,94.48%,keras,False,True,Sequential,3,161282,,,1,1,1.0,,,,,,,1.0,,str,RMSprop,"{'name': 'sequential', 'layers...",newusertest,4
2,94.35%,94.34%,94.32%,94.37%,keras,False,True,Sequential,6,148066,,2.0,1,1,1.0,,,1.0,,2.0,,1.0,,str,RMSprop,"{'name': 'sequential_72', 'lay...",prajseth,40
3,94.25%,94.24%,94.24%,94.24%,keras,False,True,Sequential,3,98818,,,1,1,,,1.0,,,,,1.0,1.0,str,RMSprop,"{'name': 'sequential_78', 'lay...",prajseth,41
4,94.21%,94.19%,94.20%,94.19%,keras,False,True,Sequential,6,32098786,,2.0,1,1,1.0,,,1.0,,,,1.0,,str,RMSprop,"{'name': 'sequential_118', 'la...",prajseth,51
5,94.11%,94.10%,94.09%,94.11%,keras,False,True,Sequential,3,161282,,,1,1,1.0,,,,,,,1.0,,str,RMSprop,"{'name': 'sequential', 'layers...",prajseth,21
6,94.11%,94.10%,94.11%,94.09%,keras,False,True,Sequential,3,161282,,,1,1,1.0,,,,,,,1.0,,str,RMSprop,"{'name': 'sequential_4', 'laye...",newusertest,5
7,94.11%,94.10%,94.12%,94.08%,keras,False,True,Sequential,6,32098786,,2.0,1,1,1.0,,,1.0,,,,1.0,,str,Adam,"{'name': 'sequential_122', 'la...",prajseth,53
8,94.07%,94.06%,94.04%,94.11%,keras,False,True,Sequential,6,200834,,2.0,1,1,1.0,,,1.0,,2.0,,1.0,,str,RMSprop,"{'name': 'sequential_92', 'lay...",prajseth,47
9,94.07%,94.06%,94.04%,94.09%,keras,False,True,Sequential,6,3214626,,,1,1,,,1.0,,3.0,,,1.0,4.0,str,RMSprop,"{'name': 'sequential_41', 'lay...",prajseth,31


## Model 1 (two Conv 1D layers)

In [387]:
from tensorflow.keras.models import Sequential
from tensorflow.keras import layers
from tensorflow.keras.optimizers import RMSprop
from tensorflow.keras.layers import Embedding, Flatten, MaxPooling1D

model = Sequential()
model.add(Embedding(1000000, 32, input_length=40))
model.add(layers.Conv1D(256, 5)) 
model.add(MaxPooling1D(pool_size=4))
model.add(layers.Conv1D(32, 7)) 
model.add(Flatten())
model.add(Dense(2, activation='softmax'))


model.compile(optimizer='adam', loss='mse', metrics=['acc'])


history = model.fit(X_train, y_train,
                    epochs=3,
                    batch_size=40,
                    validation_split=0.2)

Epoch 1/3
Epoch 2/3
Epoch 3/3


In [388]:
# format y_pred as labels 
y_pred = model.predict(X_test).argmax(axis=1)
predicted_labels = [y_test.columns[i] for i in y_pred]
predicted_labels[0:5]

['real', 'fake', 'fake', 'real', 'real']

## Submit Model 1

In [389]:
# install aimodelshare library
%%capture
! pip install aimodelshare --upgrade --extra-index-url https://test.pypi.org/simple/ 

In [390]:
import aimodelshare as ai
from aimodelshare.aimsonnx import model_to_onnx

In [391]:
# save preprocessor
ai.export_preprocessor(preprocessor,"")

In [392]:
# save model in onnx format
onnx_model = model_to_onnx(model, framework='keras',
                          transfer_learning=False,
                          deep_learning=True)

with open("onnx_model.onnx", "wb") as f:
    f.write(onnx_model.SerializeToString())

INFO:tensorflow:Assets written to: /tmp/assets


INFO:tensorflow:Assets written to: /tmp/assets


In [393]:
# set credentials for modeltoapi function 
# make sure you have uploaded your credentials.txt file
from aimodelshare.aws import set_credentials
api_url = "https://wvr23l2z9i.execute-api.us-east-1.amazonaws.com/prod/m"

set_credentials(apiurl=api_url,credential_file="credentials.txt", type="submit_model", manual=False)

AI Model Share login credentials set successfully.
AWS credentials set successfully.


In [394]:
# submit model and predictions to competition
ai.submit_model("onnx_model.onnx",
                api_url,
                prediction_submission=predicted_labels,
                preprocessor="preprocessor.zip")

'Your model has been submitted as model version 53'

In [395]:
# check leaderboard
data=ai.get_leaderboard(api_url, verbose=3)
ai.leaderboard.stylize_leaderboard(data)

Unnamed: 0,accuracy,f1_score,precision,recall,ml_framework,transfer_learning,deep_learning,model_type,depth,num_params,bidirectional_layers,conv1d_layers,dense_layers,embedding_layers,flatten_layers,globalmaxpooling1d_layers,lstm_layers,maxpooling1d_layers,simplernn_layers,relu_act,sigmoid_act,softmax_act,tanh_act,loss,optimizer,model_config,username,version
0,94.86%,94.85%,94.84%,94.87%,keras,False,True,Sequential,5,1035746,,,2,1,,,2.0,,,1.0,,1.0,2.0,str,RMSprop,"{'name': 'sequential_3', 'laye...",kagenlim,19
1,94.49%,94.47%,94.47%,94.48%,keras,False,True,Sequential,3,161282,,,1,1,1.0,,,,,,,1.0,,str,RMSprop,"{'name': 'sequential', 'layers...",newusertest,4
2,94.35%,94.34%,94.32%,94.37%,keras,False,True,Sequential,6,148066,,2.0,1,1,1.0,,,1.0,,2.0,,1.0,,str,RMSprop,"{'name': 'sequential_72', 'lay...",prajseth,40
3,94.25%,94.24%,94.24%,94.24%,keras,False,True,Sequential,3,98818,,,1,1,,,1.0,,,,,1.0,1.0,str,RMSprop,"{'name': 'sequential_78', 'lay...",prajseth,41
4,94.21%,94.19%,94.20%,94.19%,keras,False,True,Sequential,6,32098786,,2.0,1,1,1.0,,,1.0,,,,1.0,,str,RMSprop,"{'name': 'sequential_118', 'la...",prajseth,51
5,94.11%,94.10%,94.09%,94.11%,keras,False,True,Sequential,3,161282,,,1,1,1.0,,,,,,,1.0,,str,RMSprop,"{'name': 'sequential', 'layers...",prajseth,21
6,94.11%,94.10%,94.11%,94.09%,keras,False,True,Sequential,3,161282,,,1,1,1.0,,,,,,,1.0,,str,RMSprop,"{'name': 'sequential_4', 'laye...",newusertest,5
7,94.11%,94.10%,94.12%,94.08%,keras,False,True,Sequential,6,32098786,,2.0,1,1,1.0,,,1.0,,,,1.0,,str,Adam,"{'name': 'sequential_122', 'la...",prajseth,53
8,94.07%,94.06%,94.04%,94.11%,keras,False,True,Sequential,6,200834,,2.0,1,1,1.0,,,1.0,,2.0,,1.0,,str,RMSprop,"{'name': 'sequential_92', 'lay...",prajseth,47
9,94.07%,94.06%,94.04%,94.09%,keras,False,True,Sequential,6,3214626,,,1,1,,,1.0,,3.0,,,1.0,4.0,str,RMSprop,"{'name': 'sequential_41', 'lay...",prajseth,31


## Model 2 (one LSTM layer with dropout)

In [410]:
from tensorflow.keras.models import Sequential
from tensorflow.keras import layers
from tensorflow.keras.optimizers import RMSprop
from tensorflow.keras.layers import Embedding
from tensorflow.keras.layers import LSTM

model = Sequential()
model.add(Embedding(10000, 8, input_length=40))
model.add(LSTM(32, dropout=0.2, recurrent_dropout=0.2)) 
model.add(Dense(2, activation='softmax'))


model.compile(optimizer='rmsprop', loss='categorical_crossentropy', metrics=['acc'])


history = model.fit(X_train, y_train,
                    epochs=6,
                    batch_size=40,
                    validation_split=0.2)

Epoch 1/6
Epoch 2/6
Epoch 3/6
Epoch 4/6
Epoch 5/6
Epoch 6/6


In [411]:
# format y_pred as labels 
y_pred = model.predict(X_test).argmax(axis=1)
predicted_labels = [y_test.columns[i] for i in y_pred]
predicted_labels[0:5]

['real', 'fake', 'fake', 'real', 'real']

## Submit Model 2

In [412]:
# install aimodelshare library
%%capture
! pip install aimodelshare --upgrade --extra-index-url https://test.pypi.org/simple/ 

In [413]:
import aimodelshare as ai
from aimodelshare.aimsonnx import model_to_onnx

In [414]:
# save preprocessor
ai.export_preprocessor(preprocessor,"")

In [415]:
# save model in onnx format
onnx_model = model_to_onnx(model, framework='keras',
                          transfer_learning=False,
                          deep_learning=True)

with open("onnx_model.onnx", "wb") as f:
    f.write(onnx_model.SerializeToString())

INFO:tensorflow:Assets written to: /tmp/assets


INFO:tensorflow:Assets written to: /tmp/assets


In [416]:
# set credentials for modeltoapi function 
# make sure you have uploaded your credentials.txt file
from aimodelshare.aws import set_credentials
api_url = "https://wvr23l2z9i.execute-api.us-east-1.amazonaws.com/prod/m"

set_credentials(apiurl=api_url,credential_file="credentials.txt", type="submit_model", manual=False)

AI Model Share login credentials set successfully.
AWS credentials set successfully.


In [417]:
# submit model and predictions to competition
ai.submit_model("onnx_model.onnx",
                api_url,
                prediction_submission=predicted_labels,
                preprocessor="preprocessor.zip")

'Your model has been submitted as model version 55'

In [418]:
# check leaderboard
data=ai.get_leaderboard(api_url, verbose=3)
ai.leaderboard.stylize_leaderboard(data)

Unnamed: 0,accuracy,f1_score,precision,recall,ml_framework,transfer_learning,deep_learning,model_type,depth,num_params,bidirectional_layers,conv1d_layers,dense_layers,embedding_layers,flatten_layers,globalmaxpooling1d_layers,lstm_layers,maxpooling1d_layers,simplernn_layers,relu_act,sigmoid_act,softmax_act,tanh_act,loss,optimizer,model_config,username,version
0,94.86%,94.85%,94.84%,94.87%,keras,False,True,Sequential,5,1035746,,,2,1,,,2.0,,,1.0,,1.0,2.0,str,RMSprop,"{'name': 'sequential_3', 'laye...",kagenlim,19
1,94.49%,94.47%,94.47%,94.48%,keras,False,True,Sequential,3,161282,,,1,1,1.0,,,,,,,1.0,,str,RMSprop,"{'name': 'sequential', 'layers...",newusertest,4
2,94.35%,94.34%,94.32%,94.37%,keras,False,True,Sequential,6,148066,,2.0,1,1,1.0,,,1.0,,2.0,,1.0,,str,RMSprop,"{'name': 'sequential_72', 'lay...",prajseth,40
3,94.25%,94.24%,94.24%,94.24%,keras,False,True,Sequential,3,98818,,,1,1,,,1.0,,,,,1.0,1.0,str,RMSprop,"{'name': 'sequential_78', 'lay...",prajseth,41
4,94.21%,94.19%,94.20%,94.19%,keras,False,True,Sequential,6,32098786,,2.0,1,1,1.0,,,1.0,,,,1.0,,str,RMSprop,"{'name': 'sequential_118', 'la...",prajseth,51
5,94.11%,94.10%,94.09%,94.11%,keras,False,True,Sequential,3,161282,,,1,1,1.0,,,,,,,1.0,,str,RMSprop,"{'name': 'sequential', 'layers...",prajseth,21
6,94.11%,94.10%,94.11%,94.09%,keras,False,True,Sequential,3,161282,,,1,1,1.0,,,,,,,1.0,,str,RMSprop,"{'name': 'sequential_4', 'laye...",newusertest,5
7,94.11%,94.10%,94.12%,94.08%,keras,False,True,Sequential,6,32098786,,2.0,1,1,1.0,,,1.0,,,,1.0,,str,Adam,"{'name': 'sequential_122', 'la...",prajseth,53
8,94.07%,94.06%,94.04%,94.11%,keras,False,True,Sequential,6,200834,,2.0,1,1,1.0,,,1.0,,2.0,,1.0,,str,RMSprop,"{'name': 'sequential_92', 'lay...",prajseth,47
9,94.07%,94.06%,94.04%,94.09%,keras,False,True,Sequential,6,3214626,,,1,1,,,1.0,,3.0,,,1.0,4.0,str,RMSprop,"{'name': 'sequential_41', 'lay...",prajseth,31


## Model 3 (Stacked LSTM with dropout)

In [419]:
from tensorflow.keras.models import Sequential
from tensorflow.keras import layers
from tensorflow.keras.optimizers import RMSprop
from tensorflow.keras.layers import Embedding, Bidirectional
from tensorflow.keras.layers import LSTM, SimpleRNN, Dense

model = Sequential()

model.add(Embedding(10000, 8, input_length=40))
model.add(LSTM(32, return_sequences=True))
model.add(LSTM(32, dropout=0.2, recurrent_dropout=0.2))
model.add(Dense(2, activation='softmax'))

model.compile(optimizer='rmsprop', loss='categorical_crossentropy', metrics=['acc'])


history = model.fit(X_train, y_train,
                    epochs=5,
                    batch_size=40,
                    validation_split=0.2)

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


In [420]:
# format y_pred as labels 
y_pred = model.predict(X_test).argmax(axis=1)
predicted_labels = [y_test.columns[i] for i in y_pred]
predicted_labels[0:5]

['real', 'fake', 'fake', 'real', 'real']

## Submit Model 3

In [421]:
# install aimodelshare library
%%capture
! pip install aimodelshare --upgrade --extra-index-url https://test.pypi.org/simple/ 

In [422]:
import aimodelshare as ai
from aimodelshare.aimsonnx import model_to_onnx

In [423]:
# save preprocessor
ai.export_preprocessor(preprocessor,"")

In [424]:
# save model in onnx format
onnx_model = model_to_onnx(model, framework='keras',
                          transfer_learning=False,
                          deep_learning=True)

with open("onnx_model.onnx", "wb") as f:
    f.write(onnx_model.SerializeToString())



INFO:tensorflow:Assets written to: /tmp/assets


INFO:tensorflow:Assets written to: /tmp/assets


In [425]:
# set credentials for modeltoapi function 
# make sure you have uploaded your credentials.txt file
from aimodelshare.aws import set_credentials
api_url = "https://wvr23l2z9i.execute-api.us-east-1.amazonaws.com/prod/m"

set_credentials(apiurl=api_url,credential_file="credentials.txt", type="submit_model", manual=False)

AI Model Share login credentials set successfully.
AWS credentials set successfully.


In [426]:
# submit model and predictions to competition
ai.submit_model("onnx_model.onnx",
                api_url,
                prediction_submission=predicted_labels,
                preprocessor="preprocessor.zip")

'Your model has been submitted as model version 56'

In [427]:
# check leaderboard
data=ai.get_leaderboard(api_url, verbose=3)
ai.leaderboard.stylize_leaderboard(data)

Unnamed: 0,accuracy,f1_score,precision,recall,ml_framework,transfer_learning,deep_learning,model_type,depth,num_params,bidirectional_layers,conv1d_layers,dense_layers,embedding_layers,flatten_layers,globalmaxpooling1d_layers,lstm_layers,maxpooling1d_layers,simplernn_layers,relu_act,sigmoid_act,softmax_act,tanh_act,loss,optimizer,model_config,username,version
0,94.86%,94.85%,94.84%,94.87%,keras,False,True,Sequential,5,1035746,,,2,1,,,2.0,,,1.0,,1.0,2.0,str,RMSprop,"{'name': 'sequential_3', 'laye...",kagenlim,19
1,94.49%,94.47%,94.47%,94.48%,keras,False,True,Sequential,3,161282,,,1,1,1.0,,,,,,,1.0,,str,RMSprop,"{'name': 'sequential', 'layers...",newusertest,4
2,94.35%,94.34%,94.32%,94.37%,keras,False,True,Sequential,6,148066,,2.0,1,1,1.0,,,1.0,,2.0,,1.0,,str,RMSprop,"{'name': 'sequential_72', 'lay...",prajseth,40
3,94.25%,94.24%,94.24%,94.24%,keras,False,True,Sequential,3,98818,,,1,1,,,1.0,,,,,1.0,1.0,str,RMSprop,"{'name': 'sequential_78', 'lay...",prajseth,41
4,94.21%,94.19%,94.20%,94.19%,keras,False,True,Sequential,6,32098786,,2.0,1,1,1.0,,,1.0,,,,1.0,,str,RMSprop,"{'name': 'sequential_118', 'la...",prajseth,51
5,94.11%,94.10%,94.09%,94.11%,keras,False,True,Sequential,3,161282,,,1,1,1.0,,,,,,,1.0,,str,RMSprop,"{'name': 'sequential', 'layers...",prajseth,21
6,94.11%,94.10%,94.11%,94.09%,keras,False,True,Sequential,3,161282,,,1,1,1.0,,,,,,,1.0,,str,RMSprop,"{'name': 'sequential_4', 'laye...",newusertest,5
7,94.11%,94.10%,94.12%,94.08%,keras,False,True,Sequential,6,32098786,,2.0,1,1,1.0,,,1.0,,,,1.0,,str,Adam,"{'name': 'sequential_122', 'la...",prajseth,53
8,94.07%,94.06%,94.04%,94.11%,keras,False,True,Sequential,6,200834,,2.0,1,1,1.0,,,1.0,,2.0,,1.0,,str,RMSprop,"{'name': 'sequential_92', 'lay...",prajseth,47
9,94.07%,94.06%,94.04%,94.09%,keras,False,True,Sequential,3,3208386,,,1,1,,,1.0,,,,,1.0,1.0,str,RMSprop,"{'name': 'sequential_40', 'lay...",prajseth,30


## Model 4 (Bidirectional LSTM)

In [428]:
from tensorflow.keras.models import Sequential
from tensorflow.keras import layers
from tensorflow.keras.optimizers import RMSprop
from tensorflow.keras.layers import Embedding, Bidirectional
from tensorflow.keras.layers import LSTM

model = Sequential()
model.add(Embedding(100000, 32, input_length=40))
model.add(Bidirectional(LSTM(32, return_sequences=True)))
model.add(Bidirectional(LSTM(10)))
model.add(Dense(2, activation='softmax'))


model.compile(optimizer='rmsprop', loss='categorical_crossentropy', metrics=['acc'])


history = model.fit(X_train, y_train,
                    epochs=10,
                    batch_size=40,
                    validation_split=0.2)

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


In [279]:
# format y_pred as labels 
y_pred = model.predict(X_test).argmax(axis=1)
predicted_labels = [y_test.columns[i] for i in y_pred]
predicted_labels[0:5]

['real', 'fake', 'fake', 'real', 'real']

## Submit Model 4

In [429]:
# install aimodelshare library
%%capture
! pip install aimodelshare --upgrade --extra-index-url https://test.pypi.org/simple/ 

In [430]:
import aimodelshare as ai
from aimodelshare.aimsonnx import model_to_onnx

In [431]:
# save preprocessor
ai.export_preprocessor(preprocessor,"")

In [432]:
# save model in onnx format
onnx_model = model_to_onnx(model, framework='keras',
                          transfer_learning=False,
                          deep_learning=True)

with open("onnx_model.onnx", "wb") as f:
    f.write(onnx_model.SerializeToString())



INFO:tensorflow:Assets written to: /tmp/assets


INFO:tensorflow:Assets written to: /tmp/assets


In [433]:
# set credentials for modeltoapi function 
# make sure you have uploaded your credentials.txt file
from aimodelshare.aws import set_credentials
api_url = "https://wvr23l2z9i.execute-api.us-east-1.amazonaws.com/prod/m"

set_credentials(apiurl=api_url,credential_file="credentials.txt", type="submit_model", manual=False)

AI Model Share login credentials set successfully.
AWS credentials set successfully.


In [434]:
# submit model and predictions to competition
ai.submit_model("onnx_model.onnx",
                api_url,
                prediction_submission=predicted_labels,
                preprocessor="preprocessor.zip")

'Your model has been submitted as model version 57'

In [435]:
# check leaderboard
data=ai.get_leaderboard(api_url, verbose=3)
ai.leaderboard.stylize_leaderboard(data)

Unnamed: 0,accuracy,f1_score,precision,recall,ml_framework,transfer_learning,deep_learning,model_type,depth,num_params,bidirectional_layers,conv1d_layers,dense_layers,embedding_layers,flatten_layers,globalmaxpooling1d_layers,lstm_layers,maxpooling1d_layers,simplernn_layers,relu_act,sigmoid_act,softmax_act,tanh_act,loss,optimizer,model_config,username,version
0,94.86%,94.85%,94.84%,94.87%,keras,False,True,Sequential,5,1035746,,,2,1,,,2.0,,,1.0,,1.0,2.0,str,RMSprop,"{'name': 'sequential_3', 'laye...",kagenlim,19
1,94.49%,94.47%,94.47%,94.48%,keras,False,True,Sequential,3,161282,,,1,1,1.0,,,,,,,1.0,,str,RMSprop,"{'name': 'sequential', 'layers...",newusertest,4
2,94.35%,94.34%,94.32%,94.37%,keras,False,True,Sequential,6,148066,,2.0,1,1,1.0,,,1.0,,2.0,,1.0,,str,RMSprop,"{'name': 'sequential_72', 'lay...",prajseth,40
3,94.25%,94.24%,94.24%,94.24%,keras,False,True,Sequential,3,98818,,,1,1,,,1.0,,,,,1.0,1.0,str,RMSprop,"{'name': 'sequential_78', 'lay...",prajseth,41
4,94.21%,94.19%,94.20%,94.19%,keras,False,True,Sequential,6,32098786,,2.0,1,1,1.0,,,1.0,,,,1.0,,str,RMSprop,"{'name': 'sequential_118', 'la...",prajseth,51
5,94.11%,94.10%,94.09%,94.11%,keras,False,True,Sequential,3,161282,,,1,1,1.0,,,,,,,1.0,,str,RMSprop,"{'name': 'sequential', 'layers...",prajseth,21
6,94.11%,94.10%,94.11%,94.09%,keras,False,True,Sequential,3,161282,,,1,1,1.0,,,,,,,1.0,,str,RMSprop,"{'name': 'sequential_4', 'laye...",newusertest,5
7,94.11%,94.10%,94.12%,94.08%,keras,False,True,Sequential,6,32098786,,2.0,1,1,1.0,,,1.0,,,,1.0,,str,Adam,"{'name': 'sequential_122', 'la...",prajseth,53
8,94.07%,94.06%,94.04%,94.11%,keras,False,True,Sequential,6,200834,,2.0,1,1,1.0,,,1.0,,2.0,,1.0,,str,RMSprop,"{'name': 'sequential_92', 'lay...",prajseth,47
9,94.07%,94.06%,94.04%,94.09%,keras,False,True,Sequential,3,3208386,,,1,1,,,1.0,,,,,1.0,1.0,str,RMSprop,"{'name': 'sequential_40', 'lay...",prajseth,30


## Discuss which models performed better and point out relevant hyper-parameter values for successful models.


In my experience, my model with 1 Embedding layer, 2 Conv1D layers, 1 max pooling layer, 1 flatten layer and 1 Dense layer performed the best. The summary of the model is below.

In [454]:
 bestmodel_prajwal_1 = ai.aimsonnx.instantiate_model(api_url, version=40) 

 bestmodel_prajwal_1.summary()

Model: "sequential_72"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding_68 (Embedding)     (None, 40, 8)             80000     
_________________________________________________________________
conv1d_109 (Conv1D)          (None, 36, 256)           10496     
_________________________________________________________________
max_pooling1d_39 (MaxPooling (None, 9, 256)            0         
_________________________________________________________________
conv1d_110 (Conv1D)          (None, 3, 32)             57376     
_________________________________________________________________
flatten_43 (Flatten)         (None, 96)                0         
_________________________________________________________________
dense_52 (Dense)             (None, 2)                 194       
Total params: 148,066
Trainable params: 148,066
Non-trainable params: 0
_______________________________________________

The second best model of mine had 1 Embedding layer, 1 LSTM layer and 1 Dense layer. The model parameters are below.

In [455]:
 bestmodel_prajwal_2 = ai.aimsonnx.instantiate_model(api_url, version=41) 

 bestmodel_prajwal_2.summary()

Model: "sequential_78"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding_74 (Embedding)     (None, 40, 8)             80000     
_________________________________________________________________
lstm_14 (LSTM)               (None, 64)                18688     
_________________________________________________________________
dense_58 (Dense)             (None, 2)                 130       
Total params: 98,818
Trainable params: 98,818
Non-trainable params: 0
_________________________________________________________________


My third best model also had 1 Embedding layer, 2 Conv1D layers, 1 Max pooling layer, 1 Flatten layer and 1 Dense layer (similar to my best performing model). However, it differed from my best performing model in that the number of parameters for the Embedding and two Conv1D layers were much more. The model parameters are below.

In [456]:
 bestmodel_prajwal_3 = ai.aimsonnx.instantiate_model(api_url, version=51) 

 bestmodel_prajwal_3.summary()

Model: "sequential_118"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding_114 (Embedding)    (None, 40, 32)            32000000  
_________________________________________________________________
conv1d_166 (Conv1D)          (None, 36, 256)           41216     
_________________________________________________________________
max_pooling1d_67 (MaxPooling (None, 9, 256)            0         
_________________________________________________________________
conv1d_167 (Conv1D)          (None, 3, 32)             57376     
_________________________________________________________________
flatten_71 (Flatten)         (None, 96)                0         
_________________________________________________________________
dense_103 (Dense)            (None, 2)                 194       
Total params: 32,098,786
Trainable params: 32,098,786
Non-trainable params: 0
________________________________________

## Import the best model from the leader board (whatever the best model is after your final submission). Visualize the model's structure using tf.kera's model.summary(). Explain how the model's structure is different from your best model.

In [457]:
 # Get best model architecture and view model summary, change version arg as needed
 
 bestmodel = ai.aimsonnx.instantiate_model(api_url, version=19) 

 bestmodel.summary()

Model: "sequential_3"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding_1 (Embedding)      (None, 40, 100)           1000000   
_________________________________________________________________
lstm_2 (LSTM)                (None, 40, 32)            17024     
_________________________________________________________________
lstm_3 (LSTM)                (None, 50)                16600     
_________________________________________________________________
dense_2 (Dense)              (None, 40)                2040      
_________________________________________________________________
dense_3 (Dense)              (None, 2)                 82        
Total params: 1,035,746
Trainable params: 1,035,746
Non-trainable params: 0
_________________________________________________________________


In [458]:
# Compare two model versions to see diffs
ai.aimsonnx.compare_models(api_url, version_list=[19,40]) 



Unnamed: 0,Model_19_Layer,Model_19_Shape,Model_19_Params,Model_40_Layer,Model_40_Shape,Model_40_Params
0,Embedding,"(None, 40, 100)",1000000.0,Embedding,"(None, 40, 8)",80000
1,LSTM,"(None, 40, 32)",17024.0,Conv1D,"(None, 36, 256)",10496
2,LSTM,"(None, 50)",16600.0,MaxPooling1D,"(None, 9, 256)",0
3,Dense,"(None, 40)",2040.0,Conv1D,"(None, 3, 32)",57376
4,Dense,"(None, 2)",82.0,Flatten,"(None, 96)",0
5,,,,Dense,"(None, 2)",194


The best model is made of 1 Embedding layer, 2 LSTM layers, and 2 Dense layers. My best model is made of 1 Embedding layer, 2 Conv1D layers, 1 Maxpooling1D layer, 1 Flatten layer and 1 Dense layer. The best model has a completely different architecture as it chooses to stack two LSTM's instead of two Conv1D layers. The total number of model parameters is also more than that of my best model (1,035,746 vs 148,066)

## Fit the best model from the leader board to training data and evaluate it on test data to complete your report.


In [459]:
bestmodel.compile(optimizer='rmsprop', loss='categorical_crossentropy', metrics=['acc'])


history = bestmodel.fit(X_train, y_train,
                    epochs=10,
                    batch_size=40,
                    validation_split=0.2)

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


In [460]:
# format y_pred as labels 
y_pred = bestmodel.predict(X_test).argmax(axis=1)
predicted_labels = [y_test.columns[i] for i in y_pred]
predicted_labels[0:5]

['real', 'fake', 'fake', 'real', 'real']

## Complete your report by feeding your model some realistic tweets  to see if it returns meaningful/useful results (these tweets can be found online or you can create them yourself).

In [522]:
# All these tweets should give us a 1 for being misinformation. I found them while browsing through Twitter

realistic_tweets = ["Let's not forget chloroquine/hydroxychloroquine, the efficacy of which has been well known to the NIH, Fauci et al. for the best part of SIXTEEN years now, but which, at roughly $20 per treatment per patient, apparently lacks suitable profit margins.",
                    'This is the same year Bill Gates said we would be a deadly pandemic. 2005!  What a coincidence!',
                    "While #Fauci intentionally lies. This man speaks. Dark to light",
                    "As the country starts opening up, #Fauci tries to explain away why states like #Florida & #Texas, without mask mandates and open to full capacity, have plummeting case rates.",
                    "#Fauci Funded the #Wuhan Research that Created Corona-19 https://paulcraigroberts.org/2021/04/08/fauci-funded-the-wuhan-research-that-created-corona-19/",
                    "SCAMDEMIC!!! CONFUSION: Hundreds Testing Positive for Covid After Being Fully Vaccinated! https://infowars.com/posts/confusion-hundreds-testing-positive-for-covid-after-being-fully-vaccinated/ #MAGA #Trump",
                    "No vaccine for HIV after almost 50 years of research. No vaccine for the common cold after over 100 years. But hey, COVID appears out of nowhere and within an year bam, We got a vaccine. I thought we shouldn’t peer pressure one another into trying experimental drugs. #maga",
                    "I noticed that too. The tracking chip has better cell and satellite coverage. #MAGA get your #Covid_19 tracking chip.",
                    ]
transform_tweets = preprocessor(realistic_tweets, maxlen=40, max_words=10000)
bestmodel_prajwal_1.predict(transform_tweets).argmax(axis=1) # performance of my best model

array([0, 0, 0, 1, 0, 0, 0, 1])

In [523]:
bestmodel.predict(transform_tweets).argmax(axis=1) # performance of the best model from the leaderboard on realistic tweets

array([1, 0, 0, 1, 0, 0, 0, 0])