# Load Data and Libraries

In [None]:
Github Link: 

In [1]:
! pip install delayed

Collecting delayed
  Downloading delayed-0.11.0b1-py2.py3-none-any.whl (19 kB)
Collecting hiredis
  Downloading hiredis-2.0.0-cp39-cp39-win_amd64.whl (18 kB)
Collecting redis
  Downloading redis-3.5.3-py2.py3-none-any.whl (72 kB)
Installing collected packages: redis, hiredis, delayed
Successfully installed delayed-0.11.0b1 hiredis-2.0.0 redis-3.5.3


In [2]:
#Source:Fighting an Infodemic: COVID-19 Fake News Dataset, https://github.com/diptamath/covid_fake_news,https://arxiv.org/abs/2011.03327 

import os
import pandas as pd
from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.preprocessing.sequence import pad_sequences
import numpy as np

from tensorflow.keras.layers import Dense, Embedding,Flatten, Bidirectional, LSTM, Conv1D, MaxPooling1D, GlobalMaxPooling1D
from tensorflow.keras.models import Sequential
from sklearn.metrics import classification_report

trainingdata=pd.read_csv("https://raw.githubusercontent.com/diptamath/covid_fake_news/main/data/Constraint_Train.csv", usecols = ['tweet','label'])
testdata=pd.read_csv("https://raw.githubusercontent.com/diptamath/covid_fake_news/main/data/english_test_with_labels.csv", usecols = ['tweet','label'])

INFO:tensorflow:Enabling eager execution
INFO:tensorflow:Enabling v2 tensorshape
INFO:tensorflow:Enabling resource variables
INFO:tensorflow:Enabling tensor equality
INFO:tensorflow:Enabling control flow v2


# About the Data

**Present examples of tweets from the dataset that demonstrate real information or misinformation.**

**Discuss the dataset in general terms and describe why building a predictive model using this data might be practically useful.  Who could benefit from a model like this? Explain.**

This dataset contains labeled data of tweets concerning covid-19 and if they are real or fake. By using a predicitive model, we will be able more easily filter through content for fake infomation that can misinform the public. Using such a predictive model to predict if a tweet is fake or real can be extremely useful for social media companies to manage their content and the health of their community. 

In [3]:
trainingdata.head()

Unnamed: 0,tweet,label
0,The CDC currently reports 99031 deaths. In gen...,real
1,States reported 1121 deaths a small rise from ...,real
2,Politically Correct Woman (Almost) Uses Pandem...,fake
3,#IndiaFightsCorona: We have 1524 #COVID testin...,real
4,Populous states can generate large case counts...,real


In [4]:
trainingdata['label'].value_counts()

real    3360
fake    3060
Name: label, dtype: int64

In [6]:
trainingdata[trainingdata.label == 'real'].sample(n=10)

Unnamed: 0,tweet,label
5597,Coronavirus: London could face fresh restricti...,real
795,To achieve major health priorities in times of...,real
4704,Check out what a huge difference it makes in t...,real
1394,For six weeks we've been tracking COVID race a...,real
5001,RT @drharshvardhan: I said we have ensured fin...,real
464,Coronavirus: Travel and hospitality stocks hit...,real
2300,We are tracking the number of people who have ...,real
1464,RT @MoHFW_INDIA: #IndiaFightsCorona Around 60%...,real
3995,Acc to @MoHFW_INDIA #COVID19 #RecoveryRate and...,real
661,The UK has a plan for reopening schools. https...,real


In [7]:
trainingdata[trainingdata.label == 'fake'].sample(n=10)

Unnamed: 0,tweet,label
5806,Trump Administration Collaborates with McKesso...,fake
3004,Doctor says in a video: We must all go out to ...,fake
1518,President Trump Says That He Now Knows Who Bro...,fake
1702,"???Man visited Albany, N.Y. days before dying ...",fake
1604,*JOB AT WORLD HEALTH ORGANISATION*Help us figh...,fake
2789,_The maker of the novel coronavirus has been a...,fake
5809,"The United States is ""recruiting"" doctors to c...",fake
3845,Chlorine dioxide has already proved efficaciou...,fake
897,Over 5000 students tested positive for COVID-1...,fake
1409,New York’s coronavirus hospitalizations fall t...,fake


# Modeling

Run at least four prediction models to try to predict real or fake tweets well.
* Use Embedding layers and at least one LSTM layer for at least one of these models
* Experiment with Bidirectional LSTMs, stacked LSTMS, and dropout regularization with at least two models.
* Use Embedding layers and at least one 1D Convolution layer for at least one of these models
* Discuss which models performed better and point out relevant hyper-parameter values for successful models.

I ran five different models, a base model using source code from class, a stacked lstm model, a bidirectional lstm model, a model with 1D convolution layer, and my final model which combined stack lstm layers and 1D convolution layer. Playing around with the hyperparameters, I found the best input length to be 140 - which interestingly enough used to also be the max length of a tweet. With my embedding layers, I found success with 10000 input dim and 32 output dim. My final model has the micro and macro average of 93% and is overall very consistent in terms of both types of possible errors. 

#### Data Preprocessing

In [8]:
# Build vocabulary from training text data
tokenizer = Tokenizer(num_words=10000)
tokenizer.fit_on_texts(trainingdata.tweet)

# preprocessor tokenizes words and makes sure all documents have the same length
def preprocessor(data, maxlen, max_words):

    sequences = tokenizer.texts_to_sequences(data)

    word_index = tokenizer.word_index
    X = pad_sequences(sequences, maxlen=maxlen)

    return X

In [9]:
maxlen = 140

# tokenize and pad X data
X_train = preprocessor(trainingdata.tweet, maxlen=140, max_words=10000)
X_test = preprocessor(testdata.tweet, maxlen=140, max_words=10000)

# one encode Y data
y_train = pd.get_dummies(trainingdata.label)
y_test = pd.get_dummies(testdata.label)

In [10]:
print(X_train.shape)
print(X_test.shape)

(6420, 140)
(2140, 140)


#### Base model

In [10]:


# replace this model with the architectures from the task description
model = Sequential()
model.add(Embedding(10000, 16, input_length=maxlen))
model.add(Flatten())
model.add(Dense(2, activation='softmax'))

model.compile(optimizer='rmsprop', loss='categorical_crossentropy', metrics=['acc'])

history = model.fit(X_train, y_train,
                    epochs=5,
                    batch_size=32,
                    validation_split=0.2)

Instructions for updating:
Call initializer instance with the dtype argument instead of passing it to the constructor
Instructions for updating:
If using Keras pass *_constraint arguments to layers.
Train on 5136 samples, validate on 1284 samples
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


In [11]:
y_pred = model.predict(X_test).argmax(axis=1)

predicted_labels = [y_test.columns[i] for i in y_pred]

labels_pred = pd.get_dummies(predicted_labels)


print(classification_report(y_test, labels_pred))

              precision    recall  f1-score   support

           0       0.91      0.95      0.93      1020
           1       0.95      0.91      0.93      1120

   micro avg       0.93      0.93      0.93      2140
   macro avg       0.93      0.93      0.93      2140
weighted avg       0.93      0.93      0.93      2140
 samples avg       0.93      0.93      0.93      2140



#### LSTM model

In [12]:
model = Sequential()
model.add(Embedding(10000, 32, input_length=maxlen))
model.add(LSTM(64, return_sequences=True, dropout=0.2, recurrent_dropout=0.2))
model.add(LSTM(32, return_sequences=True, dropout=0.1, recurrent_dropout=0.1))
model.add(LSTM(32, return_sequences=True, dropout=0.1, recurrent_dropout=0.1))
model.add(LSTM(16))
model.add(Dense(2, activation='softmax'))

model.compile(optimizer='rmsprop', loss='categorical_crossentropy', metrics=['acc'])

history = model.fit(X_train, y_train,
                    epochs=5,
                    batch_size=32,
                    validation_split=0.2)

Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where
Train on 5136 samples, validate on 1284 samples
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


In [13]:
y_pred = model.predict(X_test).argmax(axis=1)

predicted_labels = [y_test.columns[i] for i in y_pred]

labels_pred = pd.get_dummies(predicted_labels)

print(classification_report(y_test, labels_pred))

              precision    recall  f1-score   support

           0       0.95      0.90      0.93      1020
           1       0.92      0.95      0.93      1120

   micro avg       0.93      0.93      0.93      2140
   macro avg       0.93      0.93      0.93      2140
weighted avg       0.93      0.93      0.93      2140
 samples avg       0.93      0.93      0.93      2140



#### Bidirectional LSTM

In [14]:
model = Sequential()
model.add(Embedding(10000, 32, input_length=maxlen))
model.add(Bidirectional(LSTM(64, return_sequences=True, dropout=0.2, recurrent_dropout=0.2)))
model.add(Bidirectional(LSTM(32, return_sequences=True, dropout=0.2, recurrent_dropout=0.2)))
model.add(Bidirectional(LSTM(32, return_sequences=True, dropout=0.1, recurrent_dropout=0.1)))
model.add(Bidirectional(LSTM(16)))
model.add(Dense(2, activation='softmax'))

model.compile(optimizer='rmsprop', loss='categorical_crossentropy', metrics=['acc'])

history = model.fit(X_train, y_train,
                    epochs=5,
                    batch_size=32,
                    validation_split=0.2)

Instructions for updating:
Call initializer instance with the dtype argument instead of passing it to the constructor
Instructions for updating:
Call initializer instance with the dtype argument instead of passing it to the constructor
Instructions for updating:
Call initializer instance with the dtype argument instead of passing it to the constructor
Train on 5136 samples, validate on 1284 samples
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


In [15]:
y_pred = model.predict(X_test).argmax(axis=1)

predicted_labels = [y_test.columns[i] for i in y_pred]

labels_pred = pd.get_dummies(predicted_labels)

print(classification_report(y_test, labels_pred))

              precision    recall  f1-score   support

           0       0.91      0.96      0.93      1020
           1       0.96      0.91      0.94      1120

   micro avg       0.94      0.94      0.94      2140
   macro avg       0.94      0.94      0.94      2140
weighted avg       0.94      0.94      0.94      2140
 samples avg       0.94      0.94      0.94      2140



#### With 1D Convolutional Layer

In [16]:
model = Sequential()
model.add(Embedding(10000, 16, input_length=maxlen))
model.add(Conv1D(32, 4, activation='relu')) 
model.add(GlobalMaxPooling1D())
model.add(Dense(2, activation='softmax'))

model.compile(optimizer='rmsprop', loss='categorical_crossentropy', metrics=['acc'])

history = model.fit(X_train, y_train,
                    epochs=5,
                    batch_size=32,
                    validation_split=0.2)

Train on 5136 samples, validate on 1284 samples
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


In [17]:
y_pred = model.predict(X_test).argmax(axis=1)

predicted_labels = [y_test.columns[i] for i in y_pred]

labels_pred = pd.get_dummies(predicted_labels)

print(classification_report(y_test, labels_pred))

              precision    recall  f1-score   support

           0       0.92      0.94      0.93      1020
           1       0.94      0.92      0.93      1120

   micro avg       0.93      0.93      0.93      2140
   macro avg       0.93      0.93      0.93      2140
weighted avg       0.93      0.93      0.93      2140
 samples avg       0.93      0.93      0.93      2140



#### Final Model

In [11]:
final_model = Sequential()
final_model.add(Embedding(10000, 32, input_length=maxlen))
final_model.add(LSTM(64, return_sequences=True, dropout=0.2, recurrent_dropout=0.2))
final_model.add(LSTM(32, return_sequences=True, dropout=0.2, recurrent_dropout=0.1))
final_model.add(Conv1D(32, 4, activation='relu')) 
final_model.add(GlobalMaxPooling1D())
final_model.add(Dense(2, activation='softmax'))

final_model.summary()

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding (Embedding)        (None, 140, 32)           320000    
_________________________________________________________________
lstm (LSTM)                  (None, 140, 64)           24832     
_________________________________________________________________
lstm_1 (LSTM)                (None, 140, 32)           12416     
_________________________________________________________________
conv1d (Conv1D)              (None, 137, 32)           4128      
_________________________________________________________________
global_max_pooling1d (Global (None, 32)                0         
_________________________________________________________________
dense (Dense)                (None, 2)                 66        
Total params: 361,442
Trainable params: 361,442
Non-trainable params: 0
__________________________________________________

In [12]:
final_model.compile(optimizer='rmsprop', loss='categorical_crossentropy', metrics=['acc'])

history = final_model.fit(X_train, y_train,
                    epochs=5,
                    batch_size=32,
                    validation_split=0.2)

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


In [13]:
y_pred = final_model.predict(X_test).argmax(axis=1)

predicted_labels = [y_test.columns[i] for i in y_pred]

labels_pred = pd.get_dummies(predicted_labels)

print(classification_report(y_test, labels_pred))

              precision    recall  f1-score   support

           0       0.92      0.95      0.93      1020
           1       0.95      0.92      0.94      1120

   micro avg       0.93      0.93      0.93      2140
   macro avg       0.93      0.93      0.93      2140
weighted avg       0.93      0.93      0.93      2140
 samples avg       0.93      0.93      0.93      2140



# AIModelShare

* Submit your best model to the leader board for the Covid Misinformation AI Model Share competition.
* Import the best model from the leader board (whatever the best model is after your final submission)
* Note: Use the aimodelshare ai.aimsonnx.instantiate_model() function in same manner as example provided in hackathon notebook
* Visualize the model's structure using tf.kera's model.summary() 
* Explain how the model's structure is different from your best model.
* Fit the best model from the leader board to training data and evaluate it on test data to complete your report.

I submitted my best model to the leaderboard and was in the mid pack range in comparison the the other models (40 / 89). The best model surprised me in that it was also one of the most simple models with 1 embedding layer, a flatten layer and dense output layer. My best model tried to incorporate a lot of the elements we previously talked about for predictive modeling regarding text. However, this shows me that a more simple approach is also desirable for certain datasets.

In [14]:
%%capture
! pip install aimodelshare --upgrade --extra-index-url https://test.pypi.org/simple/ 

In [18]:
!pip install dill pydot

Collecting pydot
  Downloading pydot-1.4.2-py2.py3-none-any.whl (21 kB)
Installing collected packages: pydot
Successfully installed pydot-1.4.2


In [23]:
import aimodelshare as ai

In [25]:
ai.export_preprocessor(preprocessor,"") #ignore error "can't pickle module objects"

In [19]:
from aimodelshare.aimsonnx import model_to_onnx

onnx_model = model_to_onnx(final_model, framework='keras', transfer_learning=False, deep_learning=True, task_type= "classification")

with open("model.onnx", "wb") as f:
    f.write(onnx_model.SerializeToString())

INFO:tensorflow:Assets written to: C:\Users\ELLIOT~1\AppData\Local\Temp\assets


In [87]:
from aimodelshare.aws import set_credentials
api_url = "https://wvr23l2z9i.execute-api.us-east-1.amazonaws.com/prod/m"

set_credentials(apiurl=api_url,credential_file="credentials.txt", type="submit_model", manual=False)

AI Model Share login credentials set successfully.
AWS credentials set successfully.


In [27]:
ai.submit_model("model.onnx",
                api_url,
                prediction_submission=predicted_labels,
                preprocessor="preprocessor.zip")

'Your model has been submitted as model version 87'

In [88]:
data=ai.get_leaderboard(api_url, verbose=3)
ai.leaderboard.stylize_leaderboard(data.head())

Unnamed: 0,accuracy,f1_score,precision,recall,ml_framework,transfer_learning,deep_learning,model_type,depth,num_params,bidirectional_layers,conv1d_layers,dense_layers,embedding_layers,flatten_layers,globalmaxpooling1d_layers,lstm_layers,maxpooling1d_layers,simplernn_layers,relu_act,sigmoid_act,softmax_act,tanh_act,loss,optimizer,model_config,username,version
0,95.09%,95.09%,95.07%,95.12%,keras,False,True,Sequential,3,161922,,,1,1,1.0,,,,,,,1.0,,str,RMSprop,"{'name': 'sequential', 'layers...",hpeters,67
1,95.09%,95.09%,95.07%,95.12%,keras,False,True,Sequential,3,161922,,,1,1,1.0,,,,,,,1.0,,str,RMSprop,"{'name': 'sequential', 'layers...",hpeters,66
2,95.00%,94.99%,94.97%,95.02%,keras,False,True,Sequential,5,1081482,1.0,,2,1,,,1.0,,,1.0,,1.0,1.0,str,RMSprop,"{'name': 'sequential_29', 'lay...",kagenlim,61
3,94.86%,94.85%,94.84%,94.87%,keras,False,True,Sequential,5,1035746,,,2,1,,,2.0,,,1.0,,1.0,2.0,str,RMSprop,"{'name': 'sequential_3', 'laye...",kagenlim,19
4,94.77%,94.76%,94.74%,94.78%,keras,False,True,Sequential,9,1313030,,,2,1,1.0,,1.0,,4.0,,3.0,,4.0,str,RMSprop,"{'name': 'sequential_1', 'laye...",kka2120,69


In [52]:
 # Get best model architecture and view model summary, change version arg as needed
 
 bestmodel = ai.aimsonnx.instantiate_model(api_url, version = 1) 

 bestmodel.summary()

Model: "sequential_1"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding (Embedding)        (None, 100, 16)           160000    
_________________________________________________________________
flatten (Flatten)            (None, 1600)              0         
_________________________________________________________________
dense (Dense)                (None, 2)                 3202      
Total params: 163,202
Trainable params: 163,202
Non-trainable params: 0
_________________________________________________________________


In [41]:
# Compare two model versions to see diffs
ai.aimsonnx.compare_models(api_url, version_list=[1,87]) 

Unnamed: 0,Model_1_Layer,Model_1_Shape,Model_1_Params,Model_87_Layer,Model_87_Shape,Model_87_Params
0,Embedding,"(None, 100, 16)",160000.0,Embedding,"(None, 140, 32)",320000
1,Flatten,"(None, 1600)",0.0,LSTM,"(None, 140, 64)",24832
2,Dense,"(None, 2)",3202.0,LSTM,"(None, 140, 32)",12416
3,,,,Conv1D,"(None, 137, 32)",4128
4,,,,GlobalMaxPooling1D,"(None, 32)",0
5,,,,Dense,"(None, 2)",66


In [89]:
data[data.username == 'eat2153']

Unnamed: 0,accuracy,f1_score,precision,recall,ml_framework,transfer_learning,deep_learning,model_type,depth,num_params,...,relu_act,sigmoid_act,softmax_act,tanh_act,loss,optimizer,model_config,username,timestamp,version
39,0.933645,0.933585,0.933413,0.934287,keras,False,True,Sequential,6,361442,...,1.0,,1.0,2.0,str,RMSprop,"{'name': 'sequential', 'layers': [{'class_name...",eat2153,2021-04-20 23:44:57.971864,87
40,0.933645,0.933585,0.933413,0.934287,keras,False,True,Sequential,6,361442,...,1.0,,1.0,2.0,str,RMSprop,"{'name': 'sequential', 'layers': [{'class_name...",eat2153,2021-04-20 23:43:16.462774,85
41,0.933645,0.933585,0.933413,0.934287,keras,False,True,Sequential,6,361442,...,1.0,,1.0,2.0,str,RMSprop,"{'name': 'sequential', 'layers': [{'class_name...",eat2153,2021-04-20 23:43:39.124540,86


In [51]:
data.loc[0]['model_config']

"{'name': 'sequential', 'layers': [{'class_name': 'InputLayer', 'config': {'batch_input_shape': (None, 60), 'dtype': 'float32', 'sparse': False, 'ragged': False, 'name': 'embedding_input'}}, {'class_name': 'Embedding', 'config': {'name': 'embedding', 'trainable': True, 'batch_input_shape': (None, 60), 'dtype': 'float32', 'input_dim': 10000, 'output_dim': 16, 'embeddings_initializer': {'class_name': 'RandomUniform', 'config': {'minval': -0.05, 'maxval': 0.05, 'seed': None}}, 'embeddings_regularizer': None, 'activity_regularizer': None, 'embeddings_constraint': None, 'mask_zero': False, 'input_length': 60}}, {'class_name': 'Flatten', 'config': {'name': 'flatten', 'trainable': True, 'dtype': 'float32', 'data_format': 'channels_last'}}, {'class_name': 'Dense', 'config': {'name': 'dense', 'trainable': True, 'dtype': 'float32', 'units': 2, 'activation': 'softmax', 'use_bias': True, 'kernel_initializer': {'class_name': 'GlorotUniform', 'config': {'seed': None}}, 'bias_initializer': {'class_na

In [77]:
maxlen = 100

X_train = preprocessor(trainingdata.tweet, maxlen=maxlen, max_words=10000)
X_test = preprocessor(testdata.tweet, maxlen=maxlen, max_words=10000)

y_train = pd.get_dummies(trainingdata.label)
y_test = pd.get_dummies(testdata.label)

In [78]:
bestmodel.compile(optimizer='rmsprop', loss='categorical_crossentropy', metrics=['acc'])

history = bestmodel.fit(X_train, y_train,
                    epochs=5,
                    batch_size=32,
                    validation_split=0.2)

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


In [79]:
y_pred = bestmodel.predict(X_test).argmax(axis=1)

predicted_labels = [y_test.columns[i] for i in y_pred]

labels_pred = pd.get_dummies(predicted_labels)

print(classification_report(y_test, labels_pred))

              precision    recall  f1-score   support

           0       0.94      0.93      0.94      1020
           1       0.93      0.95      0.94      1120

   micro avg       0.94      0.94      0.94      2140
   macro avg       0.94      0.94      0.94      2140
weighted avg       0.94      0.94      0.94      2140
 samples avg       0.94      0.94      0.94      2140



## Complete your report by feeding your model some realistic tweets  to see if it returns meaningful/useful results (these tweets can be found online or you can create them yourself).

I fed realistic tweets into my best model by using a Markov chain generator package named Markovify. Using this package, I generated 100 "tweets" to feed back into my model. As seen below, using this approach led to some plausible and some very nonsensical results. The model was able to predict that all 100 of such sentences are fake. 


In [54]:
!pip install markovify
import markovify

Collecting markovify
  Downloading markovify-0.9.0.tar.gz (27 kB)
Collecting unidecode
  Downloading Unidecode-1.2.0-py2.py3-none-any.whl (241 kB)
Building wheels for collected packages: markovify
  Building wheel for markovify (setup.py): started
  Building wheel for markovify (setup.py): finished with status 'done'
  Created wheel for markovify: filename=markovify-0.9.0-py3-none-any.whl size=18476 sha256=e207b6183cd2b84825308644bd20f976b961665388b36ccc2ba7458e14fd4ecd
  Stored in directory: c:\users\elliotttran\appdata\local\pip\cache\wheels\5a\8b\a9\23dc9b10a5dfc0c20e6c9e1fe031d3db669bfb10a237c4f2f7
Successfully built markovify
Installing collected packages: unidecode, markovify
Successfully installed markovify-0.9.0 unidecode-1.2.0


In [56]:
training_text = trainingdata['tweet']

In [57]:
model_markov_baseline = markovify.Text(training_text, state_size=1)

In [58]:
sentence_markov_baseline = []
for _ in range(100):
    sentence = model_markov_baseline.make_short_sentence(max_chars = 140)
    if sentence is not None:
      sentence_markov_baseline.append(sentence)

In [59]:
sentence_markov_baseline

['The first task force vaccinate people.',
 'We explain here: https://t.co/BSgsPiqW0q. https://t.co/qGaJV8MTJ8',
 'There is false. https://t.co/Ldttt2FPwL',
 'NEW: Updated on #COVID19Nigeria',
 'When you believe everything they would cover most people stay at disaster@leo.gov. https://t.co/oJWSeGXMYm',
 'A big deal with coronavirus.',
 'On therapeutics include events at the charts here.',
 'Wear masks gowns has imposed on every morning: Take steps to sign a noticeable uptick.',
 '3 may, 2020 there are at Dhanbad Hospital.',
 'RT @MoHFW_INDIA: #CoronaVirusUpdates #coronavirus updates: https://t.co/7c8W5pWNmp https://t.co/QG8EJnQbWH',
 "@NateSilver538 It's extremely hot water and Covid Deaths, Bullet Holes Aside from state is published.",
 'A photo of the next week ending #COVID19 affects your hands often to discuss the therapeutic options for the cases pneumonia.',
 'Learn more: https://t.co/Avjf91j28V https://t.co/wVbi1O3jEr',
 'To Turn On Each habit adds uncertainty of active cases 11

In [60]:
df = pd.DataFrame(sentence_markov_baseline, columns= ['new_text'])

In [72]:
df['fake'] = 1

maxlen = 140

# tokenize and pad X data
X = preprocessor(df.new_text, maxlen=140, max_words=10000)

# one encode Y data
y_true = pd.get_dummies(df.fake)

In [73]:
from sklearn.metrics import classification_report
print(classification_report(y_true, y_pred, digits=4))

              precision    recall  f1-score   support

           0     0.0000    0.0000    0.0000         0
           1     1.0000    0.2900    0.4496       100

    accuracy                         0.2900       100
   macro avg     0.5000    0.1450    0.2248       100
weighted avg     1.0000    0.2900    0.4496       100

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


**Works Cited**

Shahi, Gautam Kishore, Anne Dirkson, and Tim A. Majchrzak. "An exploratory study of covid-19 misinformation on twitter." Online Social Networks and Media 22 (2021): 100104.