# Detecting COVID-19 misinformation in tweets
## Author: Garda Ramadhito
## Date: 4/19/2021

Works Cited:

Shahi, Gautam Kishore, Anne Dirkson, and Tim A. Majchrzak. "An exploratory study of covid-19 misinformation on twitter." *Online Social Networks and Media* 22 (2021): 100104.


## Import Data

In [33]:
import numpy as np

In [1]:
#Source:Fighting an Infodemic: COVID-19 Fake News Dataset, https://github.com/diptamath/covid_fake_news,https://arxiv.org/abs/2011.03327 

import pandas as pd
trainingdata=pd.read_csv("https://raw.githubusercontent.com/diptamath/covid_fake_news/main/data/Constraint_Train.csv", usecols = ['tweet','label'])
testdata=pd.read_csv("https://raw.githubusercontent.com/diptamath/covid_fake_news/main/data/english_test_with_labels.csv", usecols = ['tweet','label'])

trainingdata

Unnamed: 0,tweet,label
0,The CDC currently reports 99031 deaths. In gen...,real
1,States reported 1121 deaths a small rise from ...,real
2,Politically Correct Woman (Almost) Uses Pandem...,fake
3,#IndiaFightsCorona: We have 1524 #COVID testin...,real
4,Populous states can generate large case counts...,real
...,...,...
6415,A tiger tested positive for COVID-19 please st...,fake
6416,???Autopsies prove that COVID-19 is??� a blood...,fake
6417,_A post claims a COVID-19 vaccine has already ...,fake
6418,Aamir Khan Donate 250 Cr. In PM Relief Cares Fund,fake


### Present examples of tweets from the dataset that demonstrate real information or misinformation.

The tweets below are examples of real information.

In [2]:
#Real Tweets

realtweets_10 = trainingdata[trainingdata['label'] == "real"]["tweet"].sample(n=10, random_state=15)

for tweet in realtweets_10:
  print(tweet)

You must act now in Oklahoma. To prevent hospital overload our projections indicate a Stay at Home order must be implemented between May 13th and May 18th at the latest. The sooner you act the more lives you save. Learn more https://t.co/PNFRYte3x2 #actnow https://t.co/zTRAM9878J
Take daily precautions to help prevent the spread of respiratory illnesses like #COVID19. Learn how to protect yourself from coronavirus (COVID-19): https://t.co/uArGZTrH5L. https://t.co/K1HT3mTsQ8
CDC Briefing Room: Dr. Nancy Messonnier gives an update on #COVID19. For more info visit https://t.co/1ifchVQ9jm https://t.co/50P7N2OoV4
People within the health sector and care sector are of course getting access to testing and they have been prioritised. @PritiPatel responds to reports that a lack of #COVID19 testing is forcing some frontline staff to miss work. #KayBurley: https://t.co/BvKsZv1hCC https://t.co/OtMzyZiGBY
Older adults and people of all ages with severe chronic medical conditions are more likely to 

The tweets below are examples of misinformation.

In [3]:
#Misinformation Tweets

faketweets_10 = trainingdata[trainingdata['label'] == "fake"]["tweet"].sample(n=10, random_state=15)

for tweet in faketweets_10:
  print(tweet)

Guatemalan Ministry of Health is hiding a COVID-19 case in a hospital in the city of Amatitlán.
NEWS! Boris denies country moved to the ‘pretend everything is fine’ stage of coronavirus strategy too soon https://t.co/1gyAKEoLC5 https://t.co/wPOkewOnEJ
Nasal flu vaccine side effects do not cause covid
BCCI officially suspends IPL 2020 indefinitely in the wake of Coronavirus pandemic.
Coronavirus ‘still not as stressful as explaining Windows 10 upgrade to your parents’ https://t.co/c6JiLnWADx
18 nations including the US and the UK want Indian Prime Minister Narendra Modi to be the leader of the task force to combat the coronavirus pandemic.
Disney World reopens in hopes that inevitable parental deaths will inspire future films #Disney #Covid_19 https://t.co/UoCg7nQi4a
???CDC announces that students may be kept from parents overnight as Ohio sets up COVID-19 FEMA camps.??�
.@realDonaldTrump has claimed that the U.S. has "one of the lowest mortality rates in the world” from COVID-19. That'

### Discuss the dataset in general terms and describe why building a predictive model using this data might be practically useful.  Who could benefit from a model like this? Explain.

This dataset contains tweets about COVID-19 from various sources and they have been manually labeled as either real or fake. One use of a predictive model using this data is to warn the user whether a tweet may contain misinformation or not. The problem with COVID-19 information in general is that research and studies about the disease are still in progress and a lot of information on COVID-19 might not be considered settled knowledge. For instance, the tweet 'vitamin D supplements can prevent a COVID-19 infection and reduce severity' may be based on a study that COVID-19 patients with higher vitamin D levels have less severe symptoms than those with lower levels. However, this can be misinformation because it is deterministic and it does not cite a source for this claim. 

Having a predictive model to warn the user that this tweet may be misinformation can be very helpful in distinguishing fact from a baseless claim. The party that benefits immediately is the user, who might make better choices with good information. Also, this helps public health officials in reducing the harmful effects of hoaxes on public compliance toward public health measures.

## Define Preprocessor

In [4]:
from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.preprocessing.sequence import pad_sequences
import numpy as np

# Build vocabulary from training text data
tokenizer = Tokenizer(num_words=10000)
tokenizer.fit_on_texts(trainingdata.tweet)

# preprocessor tokenizes words and makes sure all documents have the same length
def preprocessor(data, maxlen, max_words):

    sequences = tokenizer.texts_to_sequences(data)

    word_index = tokenizer.word_index
    X = pad_sequences(sequences, maxlen=maxlen)

    return X

## Prepare Train and Test Data

In [5]:
# tokenize and pad X data
X_train = preprocessor(trainingdata.tweet, maxlen=40, max_words=10000)
X_test = preprocessor(testdata.tweet, maxlen=40, max_words=10000)

# ohe encode Y data
y_train = pd.get_dummies(trainingdata.label)
y_test = pd.get_dummies(testdata.label)

In [6]:
print(X_train.shape)
print(X_test.shape)

(6420, 40)
(2140, 40)


### Model 1: Embedding Layers with LSTM

In [11]:
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Embedding, LSTM, Flatten

In [52]:
model1 = Sequential()
model1.add(Embedding(input_dim=10000, output_dim =16, input_length=40))
model1.add(LSTM(16))
model1.add(Flatten())
model1.add(Dense(2, activation='sigmoid'))

model1.compile(optimizer='rmsprop', loss='categorical_crossentropy', metrics=['acc'])

history1 = model1.fit(X_train, y_train,
                    epochs=15,
                    batch_size=32,
                    validation_data=(X_test, y_test))

Epoch 1/15
Epoch 2/15
Epoch 3/15
Epoch 4/15
Epoch 5/15
Epoch 6/15
Epoch 7/15
Epoch 8/15
Epoch 9/15
Epoch 10/15
Epoch 11/15
Epoch 12/15
Epoch 13/15
Epoch 14/15
Epoch 15/15


In [53]:
# format y_pred as labels 
y_pred1 = model1.predict(X_test).argmax(axis=1)
predicted_labels1 = [y_test.columns[i] for i in y_pred1]
predicted_labels1[0:5]

['real', 'fake', 'fake', 'real', 'real']

In [55]:
#Evaluation metrics

metrics = ['loss', 'acc', 'val_loss', 'val_acc']
eval_results = {}

for metric in metrics:
  eval_results[metric] = np.mean(history1.history[metric])

for k, v in eval_results.items():
  print(k,v)

loss 0.07573909057925145
acc 0.9703426798184712
val_loss 0.18896872401237488
val_acc 0.9397196253140767


### Model 2: Embedding Layer with Bidirectional LSTM

In [40]:
from tensorflow.keras.layers import Bidirectional

In [56]:
model2 = Sequential()
model2.add(Embedding(10000, 32, input_length=40))
model2.add(Bidirectional(LSTM(32)))
model2.add(Dense(2, activation="sigmoid"))

model2.compile(optimizer='rmsprop', loss='binary_crossentropy', metrics=['acc'])
history2 = model2.fit(X_train, y_train,
                    epochs=15,
                    batch_size=32,
                    validation_data=(X_test, y_test))

Epoch 1/15
Epoch 2/15
Epoch 3/15
Epoch 4/15
Epoch 5/15
Epoch 6/15
Epoch 7/15
Epoch 8/15
Epoch 9/15
Epoch 10/15
Epoch 11/15
Epoch 12/15
Epoch 13/15
Epoch 14/15
Epoch 15/15


In [63]:
y_pred2 = model2.predict(X_test).argmax(axis=1)
predicted_labels2 = [y_test.columns[i] for i in y_pred2]
predicted_labels2[0:5]

['real', 'fake', 'fake', 'real', 'real']

In [58]:
#Evaluation metrics

metrics = ['loss', 'acc', 'val_loss', 'val_acc']
eval_results = {}

for metric in metrics:
  eval_results[metric] = np.mean(history2.history[metric])

for k, v in eval_results.items():
  print(k,v)

loss 0.0611059367035826
acc 0.9761889934539795
val_loss 0.2025528202454249
val_acc 0.941370713710785


### Model 3: Bidirectional LSTM with dropout regularization

In [62]:
model3 = Sequential()
model3.add(Embedding(input_dim=10000, output_dim =32, input_length=40))
model3.add(Bidirectional(LSTM(32, dropout=0.2, recurrent_dropout=0.2)))
model3.add(Dense(2, activation="sigmoid"))

model3.compile(optimizer='rmsprop', loss='binary_crossentropy', metrics=['acc'])
history3 = model3.fit(X_train, y_train,
                    epochs=15,
                    batch_size=32,
                    validation_data=(X_test, y_test))

Epoch 1/15
Epoch 2/15
Epoch 3/15
Epoch 4/15
Epoch 5/15
Epoch 6/15
Epoch 7/15
Epoch 8/15
Epoch 9/15
Epoch 10/15
Epoch 11/15
Epoch 12/15
Epoch 13/15
Epoch 14/15
Epoch 15/15


In [64]:
y_pred3 = model3.predict(X_test).argmax(axis=1)
predicted_labels3 = [y_test.columns[i] for i in y_pred3]
predicted_labels3[0:5]

['real', 'fake', 'fake', 'real', 'real']

In [66]:
#Evaluation metrics

metrics = ['loss', 'acc', 'val_loss', 'val_acc']
eval_results = {}

for metric in metrics:
  eval_results[metric] = np.mean(history3.history[metric])

for k, v in eval_results.items():
  print(k,v)

loss 0.06984704211354256
acc 0.9728348890940348
val_loss 0.19773690005143482
val_acc 0.9425856749216716


### Model 4: Embedding layer with 1D Convolution layer

In [71]:
from tensorflow.keras.layers import Conv1D, MaxPooling1D, GlobalMaxPooling1D

In [72]:
model4 = Sequential()
model4.add(Embedding(10000, 32, input_length=40))
model4.add(Conv1D(32, 8, activation='relu')) 
model4.add(MaxPooling1D(5)) #
model4.add(Conv1D(32, 4, activation='relu'))
model4.add(GlobalMaxPooling1D())
model4.add(Dense(2, activation="sigmoid"))

model4.compile(optimizer='rmsprop', loss='binary_crossentropy', metrics=['acc'])
history4 = model4.fit(X_train, y_train,
                    epochs=15,
                    batch_size=32,
                    validation_data=(X_test, y_test))

Epoch 1/15
Epoch 2/15
Epoch 3/15
Epoch 4/15
Epoch 5/15
Epoch 6/15
Epoch 7/15
Epoch 8/15
Epoch 9/15
Epoch 10/15
Epoch 11/15
Epoch 12/15
Epoch 13/15
Epoch 14/15
Epoch 15/15


In [74]:
y_pred4 = model4.predict(X_test).argmax(axis=1)
predicted_labels4 = [y_test.columns[i] for i in y_pred4]
predicted_labels4[0:5]

['real', 'fake', 'fake', 'real', 'real']

In [75]:
#Evaluation metrics

metrics = ['loss', 'acc', 'val_loss', 'val_acc']
eval_results = {}

for metric in metrics:
  eval_results[metric] = np.mean(history4.history[metric])

for k, v in eval_results.items():
  print(k,v)

loss 0.07885276121087373
acc 0.9684008280436198
val_loss 0.29927953481674197
val_acc 0.9229283452033996


### Discuss which models performed better and point out relevant hyper-parameter values for successful models.

The models with bidirectional LSTM layers are more successful with the most successful being the model with a bidirectional LSTM layer and dropout regularization, achieving a mean validation accuracy of 0.94259 and a mean accuracy of 0.97283. The hyper-parameter values that contribute to success are:
- input_dim=10000, output_dim =32, input_length=40 for the Embedding layer
- units=32, dropout=0.2, recurrent_dropout=0.2 for the Bidirectional LSTM layer
- activation="sigmoid" for the Dense layer

## Submit Best Model

In [2]:
# install aimodelshare library
%%capture
! pip install aimodelshare --upgrade --extra-index-url https://test.pypi.org/simple/ 

In [77]:
import aimodelshare as ai
from aimodelshare.aimsonnx import model_to_onnx

In [78]:
# save preprocessor
ai.export_preprocessor(preprocessor,"")

In [79]:
# Model 3 is the best model

# save model in onnx format
onnx_model3 = model_to_onnx(model3, framework='keras',
                          transfer_learning=False,
                          deep_learning=True)

with open("onnx_model3.onnx", "wb") as f:
    f.write(onnx_model3.SerializeToString())

INFO:tensorflow:Assets written to: /tmp/assets


In [130]:
# set credentials for modeltoapi function 
# make sure you have uploaded your credentials.txt file
from aimodelshare.aws import set_credentials
api_url = "https://wvr23l2z9i.execute-api.us-east-1.amazonaws.com/prod/m"

set_credentials(apiurl=api_url,credential_file="credentials.txt", type="submit_model", manual=False)

AI Model Share login credentials set successfully.
AWS credentials set successfully.


In [81]:
# submit model and predictions to competition
ai.submit_model("onnx_model3.onnx",
                api_url,
                prediction_submission=predicted_labels3,
                preprocessor="preprocessor.zip")

'Your model has been submitted as model version 84'

In [82]:
# check leaderboard
data=ai.get_leaderboard(api_url, verbose=3)
ai.leaderboard.stylize_leaderboard(data)

Unnamed: 0,accuracy,f1_score,precision,recall,ml_framework,transfer_learning,deep_learning,model_type,depth,num_params,bidirectional_layers,conv1d_layers,dense_layers,embedding_layers,flatten_layers,globalmaxpooling1d_layers,lstm_layers,maxpooling1d_layers,simplernn_layers,relu_act,sigmoid_act,softmax_act,tanh_act,loss,optimizer,model_config,username,version
0,95.09%,95.09%,95.07%,95.12%,keras,False,True,Sequential,3,161922,,,1,1,1.0,,,,,,,1.0,,str,RMSprop,"{'name': 'sequential', 'layers...",hpeters,66
1,95.09%,95.09%,95.07%,95.12%,keras,False,True,Sequential,3,161922,,,1,1,1.0,,,,,,,1.0,,str,RMSprop,"{'name': 'sequential', 'layers...",hpeters,67
2,95.00%,94.99%,94.97%,95.02%,keras,False,True,Sequential,5,1081482,1.0,,2,1,,,1.0,,,1.0,,1.0,1.0,str,RMSprop,"{'name': 'sequential_29', 'lay...",kagenlim,61
3,94.86%,94.85%,94.84%,94.87%,keras,False,True,Sequential,5,1035746,,,2,1,,,2.0,,,1.0,,1.0,2.0,str,RMSprop,"{'name': 'sequential_3', 'laye...",kagenlim,19
4,94.77%,94.76%,94.74%,94.78%,keras,False,True,Sequential,9,1313030,,,2,1,1.0,,1.0,,4.0,,3.0,,4.0,str,RMSprop,"{'name': 'sequential_1', 'laye...",kka2120,69
5,94.58%,94.57%,94.57%,94.57%,keras,False,True,Sequential,5,1070202,,,2,1,,,2.0,,,1.0,,1.0,2.0,str,RMSprop,"{'name': 'sequential_4', 'laye...",kagenlim,60
6,94.58%,94.57%,94.58%,94.55%,keras,False,True,Sequential,3,336770,1.0,,1,1,,,,,,,1.0,,,str,RMSprop,"{'name': 'sequential_18', 'lay...",mr3536,84
7,94.49%,94.47%,94.47%,94.48%,keras,False,True,Sequential,3,161282,,,1,1,1.0,,,,,,,1.0,,str,RMSprop,"{'name': 'sequential', 'layers...",newusertest,4
8,94.35%,94.34%,94.32%,94.37%,keras,False,True,Sequential,6,148066,,2.0,1,1,1.0,,,1.0,,2.0,,1.0,,str,RMSprop,"{'name': 'sequential_72', 'lay...",prajseth,40
9,94.25%,94.24%,94.24%,94.24%,keras,False,True,Sequential,3,98818,,,1,1,,,1.0,,,,,1.0,1.0,str,RMSprop,"{'name': 'sequential_78', 'lay...",prajseth,41


### Import the best model from the leader board

In [131]:
 # Get best model architecture and view model summary, change version arg as needed
 
 bestmodel = ai.aimsonnx.instantiate_model(api_url, version=66) 
 bestmodel2 = ai.aimsonnx.instantiate_model(api_url, version=67)
 bestmodel3 = ai.aimsonnx.instantiate_model(api_url, version=61) #Since the top two models cannot be fit into the training and test data, we select the third best

 bestmodel3.summary()

Model: "sequential_29"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding_27 (Embedding)     (None, 40, 100)           1000000   
_________________________________________________________________
bidirectional_5 (Bidirection (None, 40, 80)            45120     
_________________________________________________________________
lstm_37 (LSTM)               (None, 60)                33840     
_________________________________________________________________
dense_43 (Dense)             (None, 40)                2440      
_________________________________________________________________
dense_44 (Dense)             (None, 2)                 82        
Total params: 1,081,482
Trainable params: 1,081,482
Non-trainable params: 0
_________________________________________________________________


In [132]:
# Compare two model versions to see diffs
ai.aimsonnx.compare_models(api_url, version_list=[61,84]) 



Unnamed: 0,Model_61_Layer,Model_61_Shape,Model_61_Params,Model_84_Layer,Model_84_Shape,Model_84_Params
0,Embedding,"(None, 40, 100)",1000000,Embedding,"(None, 40, 32)",320000.0
1,Bidirectional,"(None, 40, 80)",45120,Bidirectional,"(None, 64)",16640.0
2,LSTM,"(None, 60)",33840,Dense,"(None, 2)",130.0
3,Dense,"(None, 40)",2440,,,
4,Dense,"(None, 2)",82,,,


#### Explain how the model's structure is different from your best model.

Kagen Lim's model is different from my best model in several ways. First, the output dimension for the embedding layer is 100 in Kagen's model while mine is 32. Second, there are 80 units for the Bidirectional LSTM layer in Kagen's model while there are 64 units in mine. Third, Kagen's model has an additional LSTM layer. Fourth, Kagen's model has two Dense layers while mine has only one. Lastly, there are more parameters in Kagen's model than mine.

### Fit the best model from the leader board to training data and evaluate it on test data to complete your report.


In [104]:
bestmodel3.compile(optimizer='rmsprop', loss='binary_crossentropy', metrics=['acc'])

In [105]:
bestmodel3.fit(X_train,y_train)



<tensorflow.python.keras.callbacks.History at 0x7f1933c10610>

In [106]:
score, acc = bestmodel3.evaluate(X_test, y_test)
print('Test score:', score)
print('Test accuracy:', acc)

Test score: 0.17195403575897217
Test accuracy: 0.9233644604682922


Kagen's model has a test accuracy of 0.92336, which is lower than the 95% accuracy score on the leaderboard.

### Complete your report by feeding your model some realistic tweets  to see if it returns meaningful/useful results (these tweets can be found online or you can create them yourself).

In [119]:
#Manually select real and fake tweets

tweet = ['The U.S. State Department announced plans for new advisories urging Americans not to travel to nearly 80% of countries worldwide because of "unprecedented risks" from COVID-19 around the globe. https://t.co/dxTAgRRibO?amp=1',
         'Doesn’t Fauci have a connection with the wuhan lab too?  This guy is taking in the money and that’s why he keeps up this false BS about covid. How anyone believes one thing out of that guys mouth is beyond me. Disgraceful',
         '2 instructive reports on immunocompromised and covid today —93 patients with multiple myeloma: ~70% had a good Ab response to vaccinations https://thelancet.com/journals/lanhae/article/PIIS2352-3026(21)00110-1/fulltext —A reinfection in a patient after liver transplant with serial virus genomic sequencing https://acpjournals.org/doi/10.7326/L21-0108',
         'During the past year, front-line workers — especially health workers — have taken far greater risks each day, @MJGerson writes. "Many have paid with their lives. Many are paying with symptoms of post-traumatic stress disorder." https://t.co/mEdpn5xRdA?amp=1',
         '#JailFauci Video on gain of function funding https://video.foxnews.com/v/video-embed.html?video_id=6225847837001&ref=twitter.com Fauci made 9 million https://realrawnews.com/2021/04/deep-state-dr-fauci-mysteriously-earned-9000000-in-2020/ Dr. David Martin (video summary of crimes) https://brandnewtube.com/watch/dr-david-martin-releases-039-the-fauci-covid-19-dossier-quot-205-pages-22-years-of-research_NNqgRitCkgnPYUo.html Document of crimes https://sendfox.com/lp/1rl8j9 PDF of document above https://f.hubspotusercontent10.net/hubfs/8079569/The%20FauciCOVID-19%20Dossier.pdf',
         'My friends from @ArtWithAim created a brilliant guerrilla art video against #COVID-related lockdowns, censorship, and media propaganda. These 6 minutes are breath-taking and definitely worth your time. https://t.co/1Tuw4Vkosl?amp=1',
         "You can't get Covid from vaccination. Not possible.",
         "“94 patients with laboratory-confirmed COVID-19 and MODELED COVID-19 infectiousness profiles from a SEPARATE SAMPLE of 77 infector–infectee transmission pairs.” Were the results based on observation of asymptomatic spread or a COMPUTER MODEL that can be manipulated?",
         "I pray for the speedy recovery of Former PM Shri Manmohan Singh ji who is currently admitted at AIIMS after tested positive for COVID19. Prayers of every Indian is with you. Wishing speedy recovery. #ManmohanSingh",
         "Patients with #COVID19 who have been consistently physically inactive have a significantly higher risk of severe outcomes than patients who regularly met physical activity guidelines prior to the illness, researchers found. https://t.co/mfaNGfv1rC?amp=1"]

label = ['real','fake','real','real','fake', 'fake', 'real','fake', 'real',"real"]

In [120]:
newtweets = pd.DataFrame({
    "tweet": tweet,
    "label": label
})

newtweets

Unnamed: 0,tweet,label
0,The U.S. State Department announced plans for ...,real
1,Doesn’t Fauci have a connection with the wuhan...,fake
2,2 instructive reports on immunocompromised and...,real
3,"During the past year, front-line workers — esp...",real
4,#JailFauci Video on gain of function funding h...,fake
5,My friends from @ArtWithAim created a brillian...,fake
6,You can't get Covid from vaccination. Not poss...,real
7,“94 patients with laboratory-confirmed COVID-1...,fake
8,I pray for the speedy recovery of Former PM Sh...,real
9,Patients with #COVID19 who have been consisten...,real


In [121]:
#Preprocess

X_test_new = preprocessor(newtweets.tweet, maxlen=40, max_words=10000)
y_test_new = pd.get_dummies(newtweets.label)

In [123]:
#Generate predictions

y_pred_new = model3.predict(X_test_new).argmax(axis=1)
predicted_labels_new = [y_test_new.columns[i] for i in y_pred_new]
predicted_labels_new

['real',
 'fake',
 'real',
 'fake',
 'fake',
 'fake',
 'fake',
 'real',
 'fake',
 'real']

In [124]:
#Compare label with prediction

predicted_new = pd.DataFrame({'prediction': predicted_labels_new})

newtweets_pred = pd.concat([newtweets, predicted_new],axis=1)

newtweets_pred

Unnamed: 0,tweet,label,prediction
0,The U.S. State Department announced plans for ...,real,real
1,Doesn’t Fauci have a connection with the wuhan...,fake,fake
2,2 instructive reports on immunocompromised and...,real,real
3,"During the past year, front-line workers — esp...",real,fake
4,#JailFauci Video on gain of function funding h...,fake,fake
5,My friends from @ArtWithAim created a brillian...,fake,fake
6,You can't get Covid from vaccination. Not poss...,real,fake
7,“94 patients with laboratory-confirmed COVID-1...,fake,real
8,I pray for the speedy recovery of Former PM Sh...,real,fake
9,Patients with #COVID19 who have been consisten...,real,real


In [125]:
score, acc = model3.evaluate(X_test_new, y_test_new)
print('Test score:', score)
print('Test accuracy:', acc)

Test score: 2.8132853507995605
Test accuracy: 0.6000000238418579


After downloading ten tweets containing real information and misinformation and labeling them manually, I fit my best model to see whether it predicts the label correctly. It does not seem so. It mislabeled three real tweets as fake and mislabeled one fake tweet as real. The model has a 60% accuracy on this new test set and this is significantly lower than when it was fit to the original dataset.

The GitHub repository for this project can be found [here](https://github.com/mramadhito/COVID-Misinformation).

