<a href="https://colab.research.google.com/github/mintusf/Sentiment-analysis/blob/master/Pre_trained_model.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

To download dataset and glove embeddings, you need an account on kaggle.
Download kaggle.json from My account -> API -> Create New API Token

In order to run notebook without crashing, I recommend setting RAM to max. 25GB.

**Downloading data**

In [0]:
from google.colab import files

Choose the kaggle.json file that you have downloaded

In [10]:
files.upload()

Saving kaggle.json to kaggle.json


{'kaggle.json': b'{"username":"filipmintus","key":"1c1621b9844018fe447d00bcca7d2409"}'}

File configuration in order to use in google colab.

In [11]:
!ls -lha kaggle.json
!mkdir -p ~/.kaggle 
!cp kaggle.json ~/.kaggle/
!chmod 600 ~/.kaggle/kaggle.json

-rw-r--r-- 1 root root 67 Dec 11 13:45 kaggle.json


Downloading 100-dimensional GloVe embedding with 6 billions tokens. 

In [12]:
!kaggle datasets download terenceliu4444/glove6b100dtxt
!unzip glove6b100dtxt

Downloading glove6b100dtxt.zip to /content
 93% 122M/131M [00:05<00:00, 14.7MB/s]
100% 131M/131M [00:05<00:00, 26.4MB/s]
Archive:  glove6b100dtxt.zip
  inflating: glove.6B.100d.txt       


A function for extracting look-up dictionaries between words, indexes and embeddings.

In [0]:
def read_glove_vecs(glove_file):
    with open(glove_file, 'r') as f:
        words = set()
        word_to_vec_map = {}
        for line in f:
            line = line.strip().split()
            curr_word = line[0]
            words.add(curr_word)
            word_to_vec_map[curr_word] = np.array(line[1:], dtype=np.float64)       
        i = 1
        words_to_index = {}
        index_to_words = {}
        for w in sorted(words):
            words_to_index[w] = i
            index_to_words[i] = w
            i = i + 1
    return words_to_index, index_to_words, word_to_vec_map

In [0]:
import numpy as np
words_to_index, index_to_words, word_to_vec_map = read_glove_vecs('glove.6B.100d.txt')

**Model building**

In [15]:
from tensorflow.keras.layers import Embedding

A function creating an embedding layer with the pre-trained gloVe embeddings.
Weights of the Embedding layer are set to be trainable.

In [0]:
import numpy as np
def build_embedding_layer():
  voc_size = 400001
  emb_dim = 100
  emb_matrix = np.zeros((voc_size,emb_dim))
  embedding_layer = Embedding(voc_size, emb_dim, trainable = True)
  embedding_layer.build((None,))
  embedding_layer.set_weights([emb_matrix])
  return embedding_layer

A function building a NLP model for sentiment analysis.

In [0]:
from tensorflow.keras.models import Model
from tensorflow.keras.layers import Dense, Input, Dropout, LSTM, Activation

def build_sentiment_model(input_shape):
  sentence_input = Input((input_shape),dtype = 'int32')
  
  embedding_layer = build_embedding_layer()
  X = embedding_layer(sentence_input)
  X = LSTM(256,return_sequences = True)(X)
  X = Dropout(0.1)(X)
  X = LSTM(256,return_sequences = False)(X)
  X = Dropout(0.1)(X)
  X = Dense(1)(X)
  X = Activation('sigmoid')(X)

  model = Model (sentence_input, X)
  return model

In [19]:
sentiment_model = build_sentiment_model((150,))


Instructions for updating:
Call initializer instance with the dtype argument instead of passing it to the constructor
Instructions for updating:
If using Keras pass *_constraint arguments to layers.


In [20]:
sentiment_model.summary()

Model: "model"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
input_1 (InputLayer)         [(None, 150)]             0         
_________________________________________________________________
embedding (Embedding)        (None, 150, 100)          40000100  
_________________________________________________________________
lstm (LSTM)                  (None, 150, 256)          365568    
_________________________________________________________________
dropout (Dropout)            (None, 150, 256)          0         
_________________________________________________________________
lstm_1 (LSTM)                (None, 256)               525312    
_________________________________________________________________
dropout_1 (Dropout)          (None, 256)               0         
_________________________________________________________________
dense (Dense)                (None, 1)                 257   

In [21]:
from tensorflow.keras.optimizers import Adam
sentiment_model.compile(loss='binary_crossentropy', optimizer=Adam(learning_rate = 0.0001), metrics=['accuracy'])

Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where


In [22]:
!pip install -U -q PyDrive
from pydrive.auth import GoogleAuth
from pydrive.drive import GoogleDrive
from google.colab import auth
from oauth2client.client import GoogleCredentials
auth.authenticate_user()
gauth = GoogleAuth()
gauth.credentials = GoogleCredentials.get_application_default()
drive = GoogleDrive(gauth)


weight_file = drive.CreateFile({'id': '1DDr_sZZRuVwXqyhsOGJ_-l6gXPThEvfh'}) 
weight_file.GetContentFile('last_weights.mat')
sentiment_model.load_weights('last_weights.mat')
print('Weights loaded.')

Weights loaded.


Now, you can try checking the sentiment of your own sentences.

In [0]:
def split_sentence(sentences, words_to_index,max_words):
  m = len(sentences)
  new_sentences = []
  for i, sentence  in enumerate(sentences):
    words = sentence.lower().split()
    new_sentence = []
    for j,word in enumerate(words):
      if word[-1] in [',', '.', ':','!','?' ,' ', ',','''\'''']:
        word = word[:-1]
      if word in words_to_index.keys():
        new_sentence.append(words_to_index[word])
      else: new_sentence.append(0)
    new_sentences.append(new_sentence)
  return new_sentences

In [0]:
from tensorflow.keras.preprocessing.sequence import pad_sequences
def predict_sentiment(sentence):
  X_test = split_sentence(sentence, words_to_index, 150)
  X_test_padded = pad_sequences(X_test,maxlen = 150)
  prediction = sentiment_model.predict(X_test_padded)
  if prediction > 0.5: print ('Positive!')
  else: print('Negative!')
  print(prediction)

In [63]:
predict_sentiment(['One of the best game music soundtracks - for a game I did not really play: Despite the fact that I have only played a small portion of the game, the music I heard (plus the connection to Chrono Trigger which was great as well) led me to purchase the soundtrack, and it remains one of my favorite albums. There is an incredible mix of fun, epic, and emotional songs. Those sad and beautiful tracks I especially like, as there is not too many of those kinds of songs in my other video game soundtracks. I must admit that one of the songs (Life-A Distant Promise) has brought tears to my eyes on many occasions.My one complaint about this soundtrack is that they use guitar fretting effects in many of the songs, which I find distracting. But even if those were not included I would still consider the collection worth it'])
predict_sentiment(['Great book for travelling Europe: I currently live in Europe, and this is the book I recommend for my visitors. It covers many countries, colour pictures, and is a nice starter for before you go, and once you are there.'])
predict_sentiment(['Lousy RF range; ON worked from further away than OFF: The range of use was practically useless. I could get about 10 - 15 feet of range for the "ON" button, but had to put the remote right against the receiver to get the "OFF" to even work. I bought this particular model because it had discrete on and off buttons, to make it easier for my elderly mom to figure out (was supposed to be used to remotely reboot her cable modem located in the basement), but in the end it was too flaky to let her fuss with it. I switched to a different unit which had about 50 feet of consistent, reliable range and tossed this one in the junk drawer'])
predict_sentiment(['Item Pictured is not what is Shipped: I ordered 3 of these to replace 2 that I have had for nearly 15yrs. What I received were particularly cheaply made - poor effort knockoffs of the original CornerSpray that is pictured.The product shipped is not labeled in any way, nor does it appear to be capable of spraying in a square pattern for right angle corners either. Particularly lightweight unlike the originals, cheaply painted & not powder coated, plus the typical last only 1-2yrs quality plastic couplers.No Star rating.Ace Hardware can do much better than this.Returning tomorrow.5/19/11 Update. Fast response to my request to return, but it cost $12 shipping to return, plus the original $9 shipping when ordered.This all could have been avoided with some truth in advertising that consists on accurate picture of the actual product carried plus a more accurate description of this predictable Made in China product.'])

Positive!
[[0.98771405]]
Positive!
[[0.9935688]]
Negative!
[[0.00198411]]
Negative!
[[0.0010964]]
