<p align="center"><img width="50%" src="https://aimodelsharecontent.s3.amazonaws.com/aimodshare_banner.jpg" /></p>


---

## Covid Tweet Misinformation Prediction Competition
Let's share our models to a centralized leaderboard, so that we can collaborate and learn from the model experimentation process...

**Instructions:**
1.   Get data in and set up X_train / X_test / y_train
2.   Preprocess data using keras Tokenizer/ Write and Save Preprocessor function
3. Fit model on preprocessed data and save preprocessor function and model 
4. Generate predictions from X_test data and submit model to competition
5. Repeat submission process to improve place on leaderboard



## 1. Get data in and set up X_train, X_test, y_train objects

In [None]:
#install aimodelshare library
! pip install aimodelshare --upgrade

In [None]:
# Get competition data
from aimodelshare import download_data
download_data('public.ecr.aws/y2e2a1d6/covid_tweet_competition_data-repository:latest') 


Data downloaded successfully.


In [None]:
# Set up X_train, X_test, and y_train_labels objects
import pandas as pd
X_train=pd.read_csv("covid_tweet_competition_data/X_train.csv", squeeze=True)
X_test=pd.read_csv("covid_tweet_competition_data/X_test.csv", squeeze=True)

y_train_labels=pd.read_csv("covid_tweet_competition_data/y_train_labels.csv", squeeze=True)

# ohe encode Y data
y_train = pd.get_dummies(y_train_labels)

X_train.head()

0    "[T]he label of the popular Lysol already show...
1    There were more deaths on the roads of France ...
2    250 new cases of #COVID19Nigeria; Plateau-69 F...
3    @XanderArmstrong Why was chloroquine described...
4    Our new Can Compare 'College' tags make it eas...
Name: tweet, dtype: object

##2.   Preprocess data using keras tokenizer / Write and Save Preprocessor function


In [None]:
from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.preprocessing.sequence import pad_sequences
import numpy as np

# Build vocabulary from training text data
tokenizer = Tokenizer(num_words=10000)
tokenizer.fit_on_texts(X_train)

# preprocessor tokenizes words and makes sure all documents have the same length
def preprocessor(data, maxlen=40, max_words=10000):

    sequences = tokenizer.texts_to_sequences(data)

    word_index = tokenizer.word_index
    X = pad_sequences(sequences, maxlen=maxlen)

    return X

print(preprocessor(X_train).shape)
print(preprocessor(X_test).shape)

(6505, 40)
(2055, 40)


##3. Fit model on preprocessed data and save preprocessor function and model 


In [None]:
from tensorflow.keras.layers import Dense, Embedding,Flatten
from tensorflow.keras.models import Sequential

model = Sequential()
model.add(Embedding(10000, 16, input_length=40))
model.add(Flatten())
model.add(Dense(2, activation='softmax'))
model.summary()

model.compile(optimizer='rmsprop', loss='categorical_crossentropy', metrics=['acc'])

history = model.fit(preprocessor(X_train), y_train,
                    epochs=1,
                    batch_size=32,
                    validation_split=0.2)

Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 embedding (Embedding)       (None, 40, 16)            160000    
                                                                 
 flatten (Flatten)           (None, 640)               0         
                                                                 
 dense (Dense)               (None, 2)                 1282      
                                                                 
Total params: 161,282
Trainable params: 161,282
Non-trainable params: 0
_________________________________________________________________


#### Save preprocessor function to local "preprocessor.zip" file

In [None]:
import aimodelshare as ai
ai.export_preprocessor(preprocessor,"") 

Your preprocessor is now saved to 'preprocessor.zip'


#### Save model to local ".onnx" file

In [None]:
# Save keras model to local ONNX file
from aimodelshare.aimsonnx import model_to_onnx

onnx_model = model_to_onnx(model, framework='keras',
                          transfer_learning=False,
                          deep_learning=True)

with open("model.onnx", "wb") as f:
    f.write(onnx_model.SerializeToString())

## 4. Generate predictions from X_test data and submit model to competition


In [None]:
#Set credentials using modelshare.org username/password

from aimodelshare.aws import set_credentials
    
apiurl='https://n8tentavl5.execute-api.us-east-1.amazonaws.com/prod/m' #This is the unique rest api that powers this Covid Tweet Playground

set_credentials(apiurl=apiurl)

AI Modelshare Username:··········
AI Modelshare Password:··········
AI Model Share login credentials set successfully.


In [None]:
#Instantiate Competition

mycompetition= ai.Competition(apiurl)

In [None]:
#Submit Model 1: 

#-- Generate predicted y values (Model 1)
#Note: Keras predict returns the predicted column index location for classification models
prediction_column_index=model.predict(preprocessor(X_test)).argmax(axis=1)

# extract correct prediction labels 
prediction_labels = [y_train.columns[i] for i in prediction_column_index]

# Submit Model 1 to Competition Leaderboard
mycompetition.submit_model(model_filepath = "model.onnx",
                                 preprocessor_filepath="preprocessor.zip",
                                 prediction_submission=prediction_labels)

Insert search tags to help users find your model (optional): 
Provide any useful notes about your model (optional): 

Your model has been submitted as model version 2

To submit code used to create this model or to view current leaderboard navigate to Model Playground: 

 https://www.modelshare.org/detail/model:1572


In [None]:
# Get leaderboard to explore current best model architectures

# Get raw data in pandas data frame
data = mycompetition.get_leaderboard()

# Stylize leaderboard data
mycompetition.stylize_leaderboard(data)

Unnamed: 0,accuracy,f1_score,precision,recall,ml_framework,transfer_learning,deep_learning,model_type,depth,num_params,flatten_layers,dense_layers,embedding_layers,softmax_act,loss,optimizer,model_config,memory_size,username,version
0,81.31%,81.22%,81.49%,81.17%,keras,False,True,Sequential,3.0,161282.0,1.0,1.0,1.0,1.0,str,RMSprop,"{'name': 'sequential', 'layers...",1413536.0,hp2500test,2
1,76.11%,74.36%,83.13%,75.30%,sklearn,False,False,RandomForestClassifier,,,,,,,,,"{'bootstrap': True, 'ccp_alpha...",,AdvProjectsinML,1


In [None]:
# Compare two or more models 
data=mycompetition.compare_models([1, 2], verbose=1)
mycompetition.stylize_compare(data)

Unnamed: 0,param_name,default_value,model_version_1
0,bootstrap,True,True
1,ccp_alpha,0.000000,0.000000
2,class_weight,,
3,criterion,gini,gini
4,max_depth,,3
5,max_features,auto,auto
6,max_leaf_nodes,,
7,max_samples,,
8,min_impurity_decrease,0.000000,0.000000
9,min_impurity_split,,







Unnamed: 0,Model_2_Layer,Model_2_Shape,Model_2_Params
0,Embedding,"[None, 40, 16]",160000
1,Flatten,"[None, 640]",0
2,Dense,"[None, 2]",1282
