<p align="center"><img width="50%" src="https://aimodelsharecontent.s3.amazonaws.com/aimodshare_banner.jpg" /></p>


---

## Stanford Sentiment Treebank - Movie Review Classification Competition
Let's share our models to a centralized leaderboard, so that we can collaborate and learn from the model experimentation process...

**Instructions:**
1.   Get data in and set up X_train / X_test / y_train
2.   Preprocess data using keras Tokenizer/ Write and Save Preprocessor function
3. Fit model on preprocessed data and save preprocessor function and model 
4. Generate predictions from X_test data and submit model to competition
5. Repeat submission process to improve place on leaderboard



## Link to Repo

#### https://github.com/ruiqixue16/Stanford-SST-Sentiment-Dataset

## 1. Get data in and set up X_train, X_test, y_train objects

In [2]:
#install aimodelshare library
! pip install aimodelshare==0.0.189

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting aimodelshare==0.0.189
  Downloading aimodelshare-0.0.189-py3-none-any.whl (967 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m967.8/967.8 kB[0m [31m24.8 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting docker==5.0.0
  Downloading docker-5.0.0-py2.py3-none-any.whl (146 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m147.0/147.0 kB[0m [31m18.2 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting onnxruntime>=1.7.0
  Downloading onnxruntime-1.14.1-cp39-cp39-manylinux_2_27_x86_64.whl (5.0 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m5.0/5.0 MB[0m [31m91.9 MB/s[0m eta [36m0:00:00[0m
Collecting tf2onnx
  Downloading tf2onnx-1.14.0-py3-none-any.whl (451 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m451.2/451.2 kB[0m [31m39.9 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting importlib-resources==5.10.0
  Do

In [1]:
# Get competition data
from aimodelshare import download_data
download_data('public.ecr.aws/y2e2a1d6/sst2_competition_data-repository:latest') 


Data downloaded successfully.


In [2]:
# Set up X_train, X_test, and y_train_labels objects
import pandas as pd
import warnings
warnings.simplefilter(action='ignore', category=Warning)

X_train=pd.read_csv("sst2_competition_data/X_train.csv", squeeze=True)
X_test=pd.read_csv("sst2_competition_data/X_test.csv", squeeze=True)

y_train_labels=pd.read_csv("sst2_competition_data/y_train_labels.csv", squeeze=True)

# ohe encode Y data
y_train = pd.get_dummies(y_train_labels)

X_train.head()

0    The Rock is destined to be the 21st Century 's...
1    The gorgeously elaborate continuation of `` Th...
2    Singer/composer Bryan Adams contributes a slew...
3                 Yet the act is still charming here .
4    Whether or not you 're enlightened by any of D...
Name: text, dtype: object

##2.   Preprocess data using keras tokenizer / Write and Save Preprocessor function


In [3]:
# This preprocessor function makes use of the tf.keras tokenizer

from tensorflow import keras
from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.utils import pad_sequences
import numpy as np

# Build vocabulary from training text data
tokenizer = Tokenizer(num_words=10000)
tokenizer.fit_on_texts(X_train)

# preprocessor tokenizes words and makes sure all documents have the same length
def preprocessor(data, maxlen=40, max_words=10000):

    sequences = tokenizer.texts_to_sequences(data)

    word_index = tokenizer.word_index
    X = pad_sequences(sequences, maxlen=maxlen)

    return X

print(preprocessor(X_train).shape)
print(preprocessor(X_test).shape)

(6920, 40)
(1821, 40)


## Discuss the dataset in general terms and describe why building a predictive model using this data might be practically useful.  Who could benefit from a model like this? Explain.

In [11]:
X_train.head(10)

0    The Rock is destined to be the 21st Century 's...
1    The gorgeously elaborate continuation of `` Th...
2    Singer/composer Bryan Adams contributes a slew...
3                 Yet the act is still charming here .
4    Whether or not you 're enlightened by any of D...
5    Just the labour involved in creating the layer...
6    Part of the charm of Satin Rouge is that it av...
7    a screenplay more ingeniously constructed than...
8             `` Extreme Ops '' exceeds expectations .
9    Good fun , good action , good acting , good di...
Name: text, dtype: object

In [12]:
y_train.head(10)

Unnamed: 0,Negative,Positive
0,0,1
1,0,1
2,0,1
3,0,1
4,0,1
5,0,1
6,0,1
7,0,1
8,0,1
9,0,1


### The Stanford Sentiment Treebank v2 (SST2) is a collection of fully classified parse trees that enables a thorough examination of the compositional effects of sentiment in language. It contains 215,154 distinct sentennces, each with a sentiment attached. Building a predictive model using this dataset can be useful for all types of projects and industries involving sentiment analysis.

### For example, we can predict the sentiments of future movies with given set of comments. This can help production companies better navigate their production strategies and marketing strategies based on those sentiment predictions. The impact of this model may extend beyond the film industry. This can be applied to resturants, TV shows, or anywhere involving reviews.



## Run at least three: Use an Embedding layer and LSTM layers in at least one model

In [None]:
# Train and submit model 2 using same preprocessor (note that you could save a new preprocessor, but we will use the same one for this example).
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Embedding, LSTM, Flatten

model1 = Sequential()
model1.add(Embedding(10000, 16, input_length=40))
model1.add(LSTM(32, return_sequences=True, dropout=0.2))
model1.add(LSTM(32, return_sequences=True, dropout=0.2))
model1.add(LSTM(32, return_sequences=True, dropout=0.2))
model1.add(LSTM(32, dropout=0.2))
model1.add(Flatten())
model1.add(Dense(2, activation='softmax'))

model1.compile(optimizer='rmsprop', loss='binary_crossentropy', metrics=['acc'])
history = model1.fit(preprocessor(X_train), y_train,
                    epochs=1,
                    batch_size=32,
                    validation_split=0.2)




In [None]:
import aimodelshare as ai
ai.export_preprocessor(preprocessor,"") 

Your preprocessor is now saved to 'preprocessor.zip'


In [None]:
# Save keras model to local ONNX file
from aimodelshare.aimsonnx import model_to_onnx

onnx_model = model_to_onnx(model1, framework='keras',
                          transfer_learning=False,
                          deep_learning=True)

with open("model1.onnx", "wb") as f:
    f.write(onnx_model.SerializeToString())

In [None]:
#Set credentials using modelshare.org username/password

from aimodelshare.aws import set_credentials
    
apiurl="https://rlxjxnoql9.execute-api.us-east-1.amazonaws.com/prod/m" #This is the unique rest api that powers this specific Playground

set_credentials(apiurl=apiurl)

AI Modelshare Username:··········
AI Modelshare Password:··········
AI Model Share login credentials set successfully.


In [None]:
#Instantiate Competition

mycompetition= ai.Competition(apiurl)

In [None]:
#Submit Model 1: 

#-- Generate predicted y values (Model 1)
prediction_column_index=model1.predict(preprocessor(X_test)).argmax(axis=1)

# extract correct prediction labels 
prediction_labels = [y_train.columns[i] for i in prediction_column_index]

# Submit Model 2 to Competition Leaderboard
mycompetition.submit_model(model_filepath = "model1.onnx",
                                 preprocessor_filepath="preprocessor.zip",
                                 prediction_submission=prediction_labels)

Insert search tags to help users find your model (optional): model1
Provide any useful notes about your model (optional): LSTM

Your model has been submitted as model version 313

To submit code used to create this model or to view current leaderboard navigate to Model Playground: 

 https://www.modelshare.org/detail/model:2763


## Run at least three: Use an Embedding layer and Conv1d layers in at least one model

In [None]:
from tensorflow.keras.models import Sequential
from tensorflow.keras import layers
from tensorflow.keras.optimizers import RMSprop
from tensorflow.keras.layers import SimpleRNN, LSTM,Embedding

maxlen = 40
model2 = Sequential()
model2.add(layers.Embedding(10000, 8, input_length=maxlen))
model2.add(layers.Conv1D(32, 7, activation='relu')) 
model2.add(layers.MaxPooling1D(2)) 
model2.add(layers.Conv1D(32, 7, activation='relu'))
model2.add(layers.GlobalMaxPooling1D())
model2.add(layers.Dense(1))

model2.compile(optimizer=RMSprop(lr=1e-4),
              loss='binary_crossentropy',
              metrics=['acc'])
history = model2.fit(preprocessor(X_train), y_train,
                    epochs=1,
                    batch_size=32,
                    validation_split=0.2)



In [None]:
# Save keras model to local ONNX file
from aimodelshare.aimsonnx import model_to_onnx

onnx_model = model_to_onnx(model2, framework='keras',
                          transfer_learning=False,
                          deep_learning=True)

with open("model2.onnx", "wb") as f:
    f.write(onnx_model.SerializeToString())

In [None]:
#Submit Model 2: 

#-- Generate predicted y values (Model 2)
prediction_column_index=model2.predict(preprocessor(X_test)).argmax(axis=1)

# extract correct prediction labels 
prediction_labels = [y_train.columns[i] for i in prediction_column_index]

# Submit Model 2 to Competition Leaderboard
mycompetition.submit_model(model_filepath = "model2.onnx",
                                 preprocessor_filepath="preprocessor.zip",
                                 prediction_submission=prediction_labels)

Insert search tags to help users find your model (optional): model2
Provide any useful notes about your model (optional): conv1d

Your model has been submitted as model version 319

To submit code used to create this model or to view current leaderboard navigate to Model Playground: 

 https://www.modelshare.org/detail/model:2763


## Run at least three: Use transfer learning with glove embeddings for at least one of these models


In [None]:
# What if we wanted to use a matrix of pretrained embeddings?  Same as transfer learning before, but now we are importing a pretrained Embedding matrix:
# Download Glove embedding matrix weights (Might take 10 mins or so!)
! wget http://nlp.stanford.edu/data/wordvecs/glove.6B.zip

--2023-04-17 22:42:39--  http://nlp.stanford.edu/data/wordvecs/glove.6B.zip
Resolving nlp.stanford.edu (nlp.stanford.edu)... 171.64.67.140
Connecting to nlp.stanford.edu (nlp.stanford.edu)|171.64.67.140|:80... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://nlp.stanford.edu/data/wordvecs/glove.6B.zip [following]
--2023-04-17 22:42:39--  https://nlp.stanford.edu/data/wordvecs/glove.6B.zip
Connecting to nlp.stanford.edu (nlp.stanford.edu)|171.64.67.140|:443... connected.
HTTP request sent, awaiting response... 301 Moved Permanently
Location: https://downloads.cs.stanford.edu/nlp/data/wordvecs/glove.6B.zip [following]
--2023-04-17 22:42:39--  https://downloads.cs.stanford.edu/nlp/data/wordvecs/glove.6B.zip
Resolving downloads.cs.stanford.edu (downloads.cs.stanford.edu)... 171.64.64.22
Connecting to downloads.cs.stanford.edu (downloads.cs.stanford.edu)|171.64.64.22|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 862182753 (822M) [app

In [None]:
! unzip glove.6B.zip 

Archive:  glove.6B.zip
  inflating: glove.6B.100d.txt       
  inflating: glove.6B.200d.txt       
  inflating: glove.6B.300d.txt       
  inflating: glove.6B.50d.txt        


In [None]:
import os

In [None]:
# Extract embedding data for 100 feature embedding matrix
glove_dir = os.getcwd()

embeddings_index = {}
f = open(os.path.join(glove_dir, 'glove.6B.100d.txt'))
for line in f:
    values = line.split()
    word = values[0]
    coefs = np.asarray(values[1:], dtype='float32')
    embeddings_index[word] = coefs
f.close()

print('Found %s word vectors.' % len(embeddings_index))

Found 400001 word vectors.


In [None]:
# Build embedding matrix
embedding_dim = 100 # change if you use txt files using larger number of features
word_index = tokenizer.word_index
embedding_matrix = np.zeros((max_words, embedding_dim))
for word, i in word_index.items():
    embedding_vector = embeddings_index.get(word)
    if i < max_words:
        if embedding_vector is not None:
            # Words not found in embedding index will be all-zeros.
            embedding_matrix[i] = embedding_vector

In [None]:
# Set up same model architecture as before and then import Glove weights to Embedding layer:

model3 = Sequential()
model3.add(Embedding(max_words, embedding_dim, input_length=maxlen))
model3.add(Flatten())
model3.add(Dense(32, activation='relu'))
model3.add(Dense(2, activation='softmax'))
model3.summary()



Model: "sequential_25"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 embedding_21 (Embedding)    (None, 40, 100)           1000000   
                                                                 
 flatten_4 (Flatten)         (None, 4000)              0         
                                                                 
 dense_15 (Dense)            (None, 32)                128032    
                                                                 
 dense_16 (Dense)            (None, 2)                 66        
                                                                 
Total params: 1,128,098
Trainable params: 1,128,098
Non-trainable params: 0
_________________________________________________________________


In [None]:

# Add weights in same manner as transfer learning and turn of trainable option before fitting model to freeze weights.
model3.layers[0].set_weights([embedding_matrix])
model3.layers[0].trainable = False



model3.compile(optimizer='rmsprop',
              loss='binary_crossentropy',
              metrics=['acc'])
history = model3.fit(preprocessor(X_train), y_train,
                    epochs=1,
                    batch_size=32,
                    validation_split=0.2)
model3.save_weights('pre_trained_glove_model.h5')

# this is the end
# Training data small to speed up training. Increase for better fit.




In [None]:
# Save keras model to local ONNX file
from aimodelshare.aimsonnx import model_to_onnx

onnx_model = model_to_onnx(model3, framework='keras',
                          transfer_learning=False,
                          deep_learning=True)

with open("model3.onnx", "wb") as f:
    f.write(onnx_model.SerializeToString())

In [None]:
#Submit Model 3: 

#-- Generate predicted y values (Model 3)
prediction_column_index=model3.predict(preprocessor(X_test)).argmax(axis=1)

# extract correct prediction labels 
prediction_labels = [y_train.columns[i] for i in prediction_column_index]

# Submit Model 3 to Competition Leaderboard
mycompetition.submit_model(model_filepath = "model3.onnx",
                                 preprocessor_filepath="preprocessor.zip",
                                 prediction_submission=prediction_labels)

Insert search tags to help users find your model (optional): 摸的l
Provide any useful notes about your model (optional): Transfer learning

Your model has been submitted as model version 324

To submit code used to create this model or to view current leaderboard navigate to Model Playground: 

 https://www.modelshare.org/detail/model:2763


In [None]:
! pip install keras_tuner

In [None]:
#Separate validation data 
from sklearn.model_selection import train_test_split
x_train_split, x_val, y_train_split, y_val = train_test_split(
     X_train, y_train, test_size=0.2, random_state=42)

In [None]:
from tensorflow import keras
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Embedding, LSTM, Flatten
import keras_tuner as kt

#Define model structure & parameter search space with function
def build_model(hp):
    model = keras.Sequential()
    model.add(Embedding(10000, 16, input_length=40))
    model.add(LSTM(units=hp.Int("units", min_value=32, max_value=512, step=32), #range 32-512 inclusive, minimum step between tested values is 32
                   return_sequences=True, dropout=0.2, recurrent_dropout=0.2))
    model.add(Flatten())
    model.add(Dense(2, activation='softmax'))
    model.compile(
        optimizer="rmsprop", loss="binary_crossentropy", metrics=["accuracy"],
    )
    return model

#initialize the tuner (which will search through parameters)
tuner = kt.RandomSearch(
    hypermodel=build_model, 
    objective="val_accuracy", # objective to optimize
    max_trials=3, #max number of trials to run during search
    executions_per_trial=1, #higher number reduces variance of results; guages model performance more accurately 
    overwrite=True,
    directory="tuning_model",
    project_name="tuning_units",
)

tuner.search(preprocessor(x_train_split), y_train_split, epochs=1, validation_data=(preprocessor(x_val), y_val))


Trial 3 Complete [00h 00m 08s]
val_accuracy: 0.6748554706573486

Best val_accuracy So Far: 0.6849710941314697
Total elapsed time: 00h 01m 45s


In [None]:
# Build model with best hyperparameters

# Get the top 2 hyperparameters.
best_hps = tuner.get_best_hyperparameters(5)
# Build the model with the best hp.
tuned_model = build_model(best_hps[0])
# Fit with the entire dataset.
tuned_model.fit(x=preprocessor(X_train), y=y_train, epochs=1)




<keras.callbacks.History at 0x7fa29638e040>

In [None]:
# Save keras model to local ONNX file
from aimodelshare.aimsonnx import model_to_onnx

onnx_model = model_to_onnx(tuned_model, framework='keras',
                          transfer_learning=False,
                          deep_learning=True)

with open("tuned_model.onnx", "wb") as f:
    f.write(onnx_model.SerializeToString())

In [None]:
#Submit Model 3: 

#-- Generate predicted y values (Model 3)
prediction_column_index=tuned_model.predict(preprocessor(X_test)).argmax(axis=1)

# extract correct prediction labels 
prediction_labels = [y_train.columns[i] for i in prediction_column_index]

# Submit Model 1 to Competition Leaderboard
mycompetition.submit_model(model_filepath = "tuned_model.onnx",
                                 preprocessor_filepath="preprocessor.zip",
                                 prediction_submission=prediction_labels)

Insert search tags to help users find your model (optional): 
Provide any useful notes about your model (optional): 

Your model has been submitted as model version 3

To submit code used to create this model or to view current leaderboard navigate to Model Playground: 

 https://www.modelshare.org/detail/model:2763


## Discuss which models performed better and point out relevant hyper-parameter values for successful models.

#### among the three models I submitted, the LSTM model performed the best with an accuracy score of 0.66. The model has four LSTM layers all with feature depth of 32. The embedding layer has 10000 as input dim, 16 as output, and 40 as input length. 

#### However, this is still not ideal. I will try to tune more after sharing my notes with my teammates.

## Fit and submit up to three more models after learning from your team

#### after our team discussion, we have all suggested that LSTM models performed the best. One of my teammates suggested that he had a better result with more epochs, so I will try now.

### submit first model after discussion

In [13]:
# Train and submit model 2 using same preprocessor (note that you could save a new preprocessor, but we will use the same one for this example).
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Embedding, LSTM, Flatten

model4 = Sequential()
model4.add(Embedding(10000, 50, input_length=40))
model4.add(LSTM(32, return_sequences=True, dropout=0.2))
model4.add(LSTM(32, return_sequences=True, dropout=0.2))
model4.add(LSTM(32, return_sequences=True, dropout=0.2))
model4.add(LSTM(32, dropout=0.2))
model4.add(Flatten())
model4.add(Dense(2, activation='softmax'))

model4.compile(optimizer='rmsprop', loss='binary_crossentropy', metrics=['acc'])
history = model4.fit(preprocessor(X_train), y_train,
                    epochs=5,
                    batch_size=32,
                    validation_split=0.2)


Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


In [14]:
import aimodelshare as ai
ai.export_preprocessor(preprocessor,"") 

Your preprocessor is now saved to 'preprocessor.zip'


In [15]:
# Save keras model to local ONNX file
from aimodelshare.aimsonnx import model_to_onnx

onnx_model = model_to_onnx(model4, framework='keras',
                          transfer_learning=False,
                          deep_learning=True)

with open("model4.onnx", "wb") as f:
    f.write(onnx_model.SerializeToString())

In [16]:
#Instantiate Competition

mycompetition= ai.Competition(apiurl)

In [17]:
#Submit Model 4: 

#-- Generate predicted y values (Model 1=4)
prediction_column_index=model4.predict(preprocessor(X_test)).argmax(axis=1)

# extract correct prediction labels 
prediction_labels = [y_train.columns[i] for i in prediction_column_index]

# Submit Model 4 to Competition Leaderboard
mycompetition.submit_model(model_filepath = "model4.onnx",
                                 preprocessor_filepath="preprocessor.zip",
                                 prediction_submission=prediction_labels)

Insert search tags to help users find your model (optional): model 4
Provide any useful notes about your model (optional): epoch 5

Your model has been submitted as model version 472

To submit code used to create this model or to view current leaderboard navigate to Model Playground: 

 https://www.modelshare.org/detail/model:2763


### submit second model after discussion

In [19]:
model5 = Sequential()
model5.add(Embedding(10000, 50, input_length=40))
model5.add(LSTM(32, return_sequences=True, dropout=0.2))
model5.add(LSTM(32, dropout=0.2))
model5.add(Flatten())
model5.add(Dense(2, activation='softmax'))

model5.compile(optimizer='rmsprop', loss='binary_crossentropy', metrics=['acc'])
history = model5.fit(preprocessor(X_train), y_train,
                    epochs=5,
                    batch_size=32,
                    validation_split=0.2)


Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


In [21]:
# Save keras model to local ONNX file
from aimodelshare.aimsonnx import model_to_onnx

onnx_model = model_to_onnx(model5, framework='keras',
                          transfer_learning=False,
                          deep_learning=True)

with open("model5.onnx", "wb") as f:
    f.write(onnx_model.SerializeToString())

In [22]:
#Submit Model 5: 

#-- Generate predicted y values (Model 1=4)
prediction_column_index=model5.predict(preprocessor(X_test)).argmax(axis=1)

# extract correct prediction labels 
prediction_labels = [y_train.columns[i] for i in prediction_column_index]

# Submit Model 5 to Competition Leaderboard
mycompetition.submit_model(model_filepath = "model5.onnx",
                                 preprocessor_filepath="preprocessor.zip",
                                 prediction_submission=prediction_labels)

Insert search tags to help users find your model (optional): model 5
Provide any useful notes about your model (optional): simple LSTM

Your model has been submitted as model version 473

To submit code used to create this model or to view current leaderboard navigate to Model Playground: 

 https://www.modelshare.org/detail/model:2763


### submit third model after discussion

In [26]:
model6 = Sequential()
model6.add(Embedding(10000, 50, input_length=40))
model6.add(LSTM(32, return_sequences=True, dropout=0.2))
model6.add(LSTM(32, return_sequences=True, dropout=0.2))
model6.add(LSTM(32, return_sequences=True, dropout=0.2))
model6.add(LSTM(32, return_sequences=True, dropout=0.2))
model6.add(LSTM(32, dropout=0.2))
model6.add(Flatten())
model6.add(Dense(2, activation='softmax'))

model6.compile(optimizer='rmsprop', loss='binary_crossentropy', metrics=['acc'])
history = model6.fit(preprocessor(X_train), y_train,
                    epochs=5,
                    batch_size=32,
                    validation_split=0.2)


Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


In [27]:
# Save keras model to local ONNX file
from aimodelshare.aimsonnx import model_to_onnx

onnx_model = model_to_onnx(model6, framework='keras',
                          transfer_learning=False,
                          deep_learning=True)

with open("model6.onnx", "wb") as f:
    f.write(onnx_model.SerializeToString())

In [28]:
#Submit Model 6: 

#-- Generate predicted y values (Model 1=4)
prediction_column_index=model6.predict(preprocessor(X_test)).argmax(axis=1)

# extract correct prediction labels 
prediction_labels = [y_train.columns[i] for i in prediction_column_index]

# Submit Model 6 to Competition Leaderboard
mycompetition.submit_model(model_filepath = "model6.onnx",
                                 preprocessor_filepath="preprocessor.zip",
                                 prediction_submission=prediction_labels)

Insert search tags to help users find your model (optional): model 6
Provide any useful notes about your model (optional): more LSTM layers

Your model has been submitted as model version 474

To submit code used to create this model or to view current leaderboard navigate to Model Playground: 

 https://www.modelshare.org/detail/model:2763


### submit fourth model after discussion

In [29]:
model7 = Sequential()
model7.add(Embedding(10000, 50, input_length=40))
model7.add(LSTM(32, return_sequences=True, dropout=0.2))
model7.add(LSTM(32, return_sequences=True, dropout=0.2))
model7.add(LSTM(32, return_sequences=True, dropout=0.2))
model7.add(LSTM(32, return_sequences=True, dropout=0.2))
model7.add(LSTM(32, return_sequences=True, dropout=0.2))
model7.add(LSTM(32, return_sequences=True, dropout=0.2))
model7.add(LSTM(32, dropout=0.2))
model7.add(Flatten())
model7.add(Dense(2, activation='softmax'))

model7.compile(optimizer='rmsprop', loss='binary_crossentropy', metrics=['acc'])
history = model7.fit(preprocessor(X_train), y_train,
                    epochs=5,
                    batch_size=32,
                    validation_split=0.2)


Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


In [30]:
# Save keras model to local ONNX file
from aimodelshare.aimsonnx import model_to_onnx

onnx_model = model_to_onnx(model7, framework='keras',
                          transfer_learning=False,
                          deep_learning=True)

with open("model7.onnx", "wb") as f:
    f.write(onnx_model.SerializeToString())

In [31]:
#Submit Model 7: 

#-- Generate predicted y values (Model 1=4)
prediction_column_index=model7.predict(preprocessor(X_test)).argmax(axis=1)

# extract correct prediction labels 
prediction_labels = [y_train.columns[i] for i in prediction_column_index]

# Submit Model 7 to Competition Leaderboard
mycompetition.submit_model(model_filepath = "model7.onnx",
                                 preprocessor_filepath="preprocessor.zip",
                                 prediction_submission=prediction_labels)

Insert search tags to help users find your model (optional): model 7
Provide any useful notes about your model (optional): more LSTM

Your model has been submitted as model version 475

To submit code used to create this model or to view current leaderboard navigate to Model Playground: 

 https://www.modelshare.org/detail/model:2763


## Discuss results

#### Out of all the three LSTM models I've tried, the second model (model 6) has the accuracy score of 0.8. It has four LSTM layers and the embedding layer has 10000 input, 50 output, and an input length of 40. I've tried adding more or deleting layers, but neither has shown imrpovement. However, all were better than my previous ones.

## Discuss which models you tried and which models performed better and point out relevant hyper-parameter values for successful models.

#### In this assignment, I have tried models with LSTM, Conv1d, and transfer learning model using glove. LSTM layers performed the best in general, and transfer learning model was the worst. My best LSTM model has an accuracy score of 0.8 with four LSTM layers, each with 32 features. The embedding layer has 10000 output, 50 output, and an input length of 40. The epochs was 5.