#Attention-with-logits, a simplistic but effective approach to use document-level sentiment labels to train sentence-level sentiment classifiers*

Sentence level sentiment classification is a very important task in review mining. To train such sentence-level sentiment classifiers, we need to prepare datasets of review sentences labelled with their ground-truth sentiment labels. However, the sentence-level labelling is very labor-intensive. A popular way to get around the manual labelling efforts is using reviews, instead of review sentences as the training data. In this approach, each review is treated as a single very long sentence, and the rating of the review is treated as the sentiment label for the long sentence.

Howerer, this approach suffers some disadvantages:

1. there exists a high possibility that the sentiments of indiviual sentences in a review are very different from the  sentiment indicated by the rating. Using the coarse-grained review-level rating as the training label may misrepresent the sentiments of individual sentences in the training process

2. a review may consist of hudrends of words. Some deep learning models, such as LSTM, GRU, or CNN cannot deal with such long word sequences. In practice, we usually have to truncate long sentences to limit the length of each training example within 500. This will lead to severe loss of information.


We proposed an end-to-end weakly supervised approach, called 'attention-with-logits', that takes individual sentences as the inputs, but uses the review-level ratings as the supervision signal. In the proposed approach, individual sentences are fed into a neural network model, and the resulting sentence-level logit vectors are combined into document-level logit vectors with a LSTM-based attention meachnism. The document-level logit vectors and the document-level ratings are used to define the loss of the neural network model. The structure of the approach is shown in the accompying paper.

In [0]:
!pip install ibm-cos-sdk

Collecting ibm-cos-sdk
[?25l  Downloading https://files.pythonhosted.org/packages/b1/d4/7e1fe33819b80d47dafa5c02c905f7acbbdff7e6cca9af668aaeaa127990/ibm-cos-sdk-2.4.4.tar.gz (50kB)
[K    100% |████████████████████████████████| 51kB 2.4MB/s 
[?25hCollecting ibm-cos-sdk-core==2.*,>=2.0.0 (from ibm-cos-sdk)
[?25l  Downloading https://files.pythonhosted.org/packages/85/72/99afcdf6b92840d47c8765533ef6093e43059424e3b35dd31049f09c8d7a/ibm-cos-sdk-core-2.4.4.tar.gz (1.1MB)
[K    100% |████████████████████████████████| 1.1MB 20.9MB/s 
[?25hCollecting ibm-cos-sdk-s3transfer==2.*,>=2.0.0 (from ibm-cos-sdk)
[?25l  Downloading https://files.pythonhosted.org/packages/27/44/c71a4595d311772953775b3588307ac8dd5a36501b3dfda6324173b963cc/ibm-cos-sdk-s3transfer-2.4.4.tar.gz (214kB)
[K    100% |████████████████████████████████| 215kB 26.2MB/s 
Building wheels for collected packages: ibm-cos-sdk, ibm-cos-sdk-core, ibm-cos-sdk-s3transfer
  Building wheel for ibm-cos-sdk (setup.py) ... [?25ldone
[?2

In [0]:
#import json
from keras.preprocessing.text import Tokenizer
#import os
#from nltk.tokenize import sent_tokenize
from keras.preprocessing.sequence import pad_sequences
import numpy as np 
import pickle
import tensorflow as tf
#import nltk
#import re
import collections
#import gensim
import pandas as pd 

import pandas as pd
import io
import ibm_boto3
from ibm_botocore.client import Config
bucket_name = 'aclawl'
filename='w2v.csv'
credentials = {
  "apikey": "eOiK2XTQ9ryhfjzQBBYCz2bw7jASG00KD132KLtcjIIY",
  "cos_hmac_keys": {
    "access_key_id": "08598b29228a4a2bb8b3b16ab9c6449a",
    "secret_access_key": "ba0fdde59ab475b5484132512c77499f0feb8e6046732f46"
  },
  "endpoints": "https://control.cloud-object-storage.cloud.ibm.com/v2/endpoints",
  "iam_apikey_description": "Auto generated apikey during resource-key operation for Instance - crn:v1:bluemix:public:cloud-object-storage:global:a/fb3c25e92a5a47319ba409b53b2c431e:0e6438c2-bd8f-41b2-a5a3-0d11f4b74253::",
  "iam_apikey_name": "auto-generated-apikey-08598b29-228a-4a2b-b8b3-b16ab9c6449a",
  "iam_role_crn": "crn:v1:bluemix:public:iam::::serviceRole:Manager",
  "iam_serviceid_crn": "crn:v1:bluemix:public:iam-identity::a/fb3c25e92a5a47319ba409b53b2c431e::serviceid:ServiceId-64f51fd7-2ced-4e80-a71a-9bfbea957ce6",
  "resource_instance_id": "crn:v1:bluemix:public:cloud-object-storage:global:a/fb3c25e92a5a47319ba409b53b2c431e:0e6438c2-bd8f-41b2-a5a3-0d11f4b74253::"
}
auth_endpoint = 'https://iam.bluemix.net/oidc/token'
service_endpoint = 'https://s3.ap-geo.objectstorage.softlayer.net'
#service_endpoint='https://s3.ap.cloud-object-storage.appdomain.cloud'
resource = ibm_boto3.resource('s3',
                      ibm_api_key_id=credentials['apikey'],
                      ibm_service_instance_id=credentials['resource_instance_id'],
                      ibm_auth_endpoint=auth_endpoint,
                      config=Config(signature_version='oauth'),
                      endpoint_url=service_endpoint)


In [0]:
def get_pickle_from_obj_storage(bucket_name, key):   # read the data from IBM cloud
    obj = resource.Object(bucket_name=bucket_name, key=key).get()
    return pickle.load(io.BytesIO(obj['Body'].read()))

#Part 1: The attention-with-logits approach

In [0]:
#get data for AWL. The content of the data for AWL is 
#all the same as that for the baselines, but organised in different structures
all_data=get_pickle_from_obj_storage(bucket_name,'data.pickle')  


In [0]:
max_rev_len=10 # the maximum number of setences a review should have 
max_sen_len=50 # the maximum number of words of a sentences
class_num=5 #number of target classes

###class Atten_logits implements the idea of the proposed model. To create an instance, a base model has to be supplied. The parameters of a base model can be learned by fitting the Atten_logits  model

In [0]:
class Atten_logits(tf.keras.Model):
  def __init__(self, base_model):
    super(Atten_logits, self).__init__()
    self.base_model=base_model
    self.lstm_atten=tf.keras.layers.LSTM(30,return_sequences=True,name='lstm_atten') #LSTM for attention
    self.atten_score=tf.keras.layers.Dense(1,name='atten_score') # map the attention vec to a single score
    
  def call(self,x): 
    # x should a list of 2 elements:
    #1.the token ids of each sentence. Shape=doc_num*max_rev_len*max_sen_len
    #2.the number of sentences each review contains. shape=doc_num*1 
    trng_x,trng_emp_sen_msk=x
    trng_x=tf.reshape(trng_x, [-1,max_sen_len])
    trng_emp_sen_msk=tf.reshape(trng_emp_sen_msk,[-1,max_rev_len,1])   
    
    base_model_output=self.base_model(trng_x)   #get the output of the base_model
    base_model_output = tf.reshape(base_model_output, [-1, max_rev_len, 5],name='output')          
    
    #compute the attention weight each sentence should be given
    atten_vec = self.lstm_atten(base_model_output)    
    atten_score=self.atten_score(atten_vec)    
    atten_score= tf.exp(atten_score)*trng_emp_sen_msk
    atten_score_norm=tf.reshape(tf.reduce_sum(atten_score,axis=1),[-1,1,1])    
    attention_weights= atten_score/atten_score_norm  
    
    #aggregate sen-level logits into doc-level logits 
    atten_logit=attention_weights * base_model_output 
    atten_logit=tf.reduce_sum(atten_logit, axis=1)     
    
    return  tf.nn.softmax(atten_logit,axis=1)

###Create base models. Any neural network based models can be used as the base model, as long as they output logit vectors **

In [0]:
def base_mlp():
  new_model=tf.keras.Sequential()
  new_embedding=tf.keras.layers.Embedding(embedding_matrix.shape[0],
                              embedding_matrix.shape[1],
                              weights=[embedding_matrix],
                              input_length=max_sen_len,
                              trainable=False,name='embedding')
  new_pooling =tf.keras.layers.GlobalAveragePooling1D()
  new_dense=tf.keras.layers.Dense(500,activation='relu',name='dense',
                                 )
  dp_1=tf.keras.layers.Dropout(rate=0.20)
  new_dense_1=tf.keras.layers.Dense(50,activation='relu',name='dense_1'
                                   )
  dp_2=tf.keras.layers.Dropout(rate=0.15)
  new_dense_2=tf.keras.layers.Dense(5,name='dense_2')
  new_model.add(new_embedding)
  new_model.add(new_pooling)
  new_model.add(new_dense)
  new_model.add(dp_1)
  new_model.add(new_dense_1)
  new_model.add(dp_2)
  new_model.add(new_dense_2)
  new_model.compile(optimizer='adam',loss='categorical_crossentropy')
  return new_model

In [0]:
def base_seq(seq_type='LSTM'):
  new_model=tf.keras.Sequential()
  new_embedding=tf.keras.layers.Embedding(embedding_matrix.shape[0],
                              embedding_matrix.shape[1],
                              weights=[embedding_matrix],
                              input_length=max_sen_len,
                              trainable=False,name='embedding')
  if seq_type=='LSTM':
    new_rnn = tf.keras.layers.LSTM(300,
                             kernel_regularizer=tf.keras.regularizers.l2(0.0005),
                             #recurrent_regularizer=tf.keras.regularizers.l2(0.001),
                             dropout=0.2,
                             recurrent_dropout=0.2,name='lstm') 
  else:
    new_rnn = tf.keras.layers.GRU(200,
                             kernel_regularizer=tf.keras.regularizers.l2(0.001),
                             recurrent_regularizer=tf.keras.regularizers.l2(0.001),
                             dropout=0.1,
                             recurrent_dropout=0.1,name='lstm')  
  
  new_dense_1=tf.keras.layers.Dense(50,activation='tanh',name='dense_1')
  drop_1=tf.keras.layers.Dropout(rate=0.3)
  new_dense_2=tf.keras.layers.Dense(5,name='dense_2')
  new_model.add(new_embedding)
  new_model.add(new_rnn)
  new_model.add(new_dense_1)
  new_model.add(drop_1)
  
  new_model.add(new_dense_2)
  
  new_model.compile(optimizer='adam',loss='categorical_crossentropy')
  return new_model

In [0]:
trn_sen_content=all_data['trn_sen_content'] # get the sen tokens of each review in the training data. Shape=doc_num*max_rev_len*max_sen_len
##### Assuming max_rev_len=10, however, if a review has only 2 sentences, then we need create 8 dummy sentences with content of all 0s
trn_rating_oh=all_data['trn_rating_oh']# get the rating of each document in the one-hot encoding format
trn_rev_len=all_data['trn_rev_len'] # get the actual number of sentences in each review

embedding_matrix=all_data['embedding_matrix']

val_sen_content=all_data['val_x'] 
val_rating_oh=all_data['val_y']
val_rev_len=all_data['val_len']

tst=all_data['tst']  # get the sen tokens for test
tst_labels=all_data['tst_labels']  # get the ground truth labels of the test sentences



In [0]:
####create masks to not let the dummy sentences play roles when computing the attention weights
def mask_based_len(rev_len):
  masks=[]
  for i in rev_len:
    masks.append([1]*i+[0]*(max_rev_len-i))
  return np.array(masks)

trng_epty_sen_mask=mask_based_len(trn_rev_len.squeeze())
val_epty_sen_mask=mask_based_len(val_rev_len.squeeze())

### Train a LSTM classifiers with the proposed approach

In [0]:
tf.keras.backend.clear_session()
base_model=base_seq()
awl_model=Atten_logits(base_model)
awl_model.compile(optimizer=tf.keras.optimizers.Adam(lr=0.005),
              loss='categorical_crossentropy',
              metrics=['accuracy'])
#history=model.fit([trn_sen_content,trn_rev_len], trn_rating_oh,validation_data=([val_x,val_len],val_y), batch_size=128, epochs=80)
#my_model.predict([trn_sen_content[0:30],epty_sen_mask[0:30]])
es = tf.keras.callbacks.EarlyStopping(monitor='val_loss', mode='min',patience=10,verbose=1)
#my_model.fit(trn_sen_content, trn_rating_oh, validation_data=[val_x,val_y], callbacks=[es],batch_size=128, epochs=200,shuffle=False)
awl_model.fit([trn_sen_content,trng_epty_sen_mask],trn_rating_oh,
             validation_data=([val_sen_content,val_epty_sen_mask],val_rating_oh),
             batch_size=64,callbacks=[es],epochs=70,shuffle=False)

Train on 11535 samples, validate on 1500 samples
Epoch 1/70
Epoch 2/70
Epoch 3/70
Epoch 4/70
Epoch 5/70
Epoch 6/70
Epoch 7/70
Epoch 8/70
Epoch 9/70
Epoch 10/70
Epoch 11/70
Epoch 12/70
Epoch 13/70
Epoch 14/70
Epoch 15/70
Epoch 16/70
Epoch 17/70
Epoch 00017: early stopping


<tensorflow.python.keras.callbacks.History at 0x7f9ec185a908>

In [0]:
def evaluate(model,tst,tst_labels):
  predictions=np.argmax(model.predict(tst),axis=1)
  predictions=np.squeeze(predictions).tolist()
  converted_preds=[]
  for i in predictions:
    if i<2:
      converted_preds.append(0)
    elif i==2:
      converted_preds.append(1)
    else:
      converted_preds.append(2)

  converted_preds=np.array(converted_preds).reshape([-1,1])
  print('the accuracy of awl_lstm is %f' %np.mean(tst_labels==converted_preds))
  return pd.DataFrame(np.c_[tst_labels,converted_preds],columns=['ground-truth','prediction'])
  

### Evaluate the performance of the lstm classifier trained with the proposed approach

In [0]:
evaluate(base_model,tst,tst_labels)

the accuracy of awl_lstm is 0.707165


Unnamed: 0,ground-truth,prediction
0,2,2
1,2,2
2,2,2
3,2,2
4,2,2
5,2,0
6,2,2
7,2,2
8,2,2
9,2,2


### Train a MLP classifier with the proposed approach

In [0]:
tf.keras.backend.clear_session()
base_model=base_mlp()
#history=model.fit([trn_sen_content,trn_rev_len], trn_rating_oh,validation_data=([val_x,val_len],val_y), batch_size=128, epochs=80)
#awl_model.fit(trn_sen_content, trn_rating_oh,batch_size=128, epochs=80,shuffle=False)
awl_model=Atten_logits(base_model)
awl_model.compile(optimizer=tf.keras.optimizers.Adam(lr=0.005),
              loss='categorical_crossentropy',
              metrics=['accuracy'])
#history=model.fit([trn_sen_content,trn_rev_len], trn_rating_oh,validation_data=([val_x,val_len],val_y), batch_size=128, epochs=80)
#my_model.predict([trn_sen_content[0:30],epty_sen_mask[0:30]])
es = tf.keras.callbacks.EarlyStopping(monitor='val_loss', mode='min',patience=20,verbose=1)
#my_model.fit(trn_sen_content, trn_rating_oh, validation_data=[val_x,val_y], callbacks=[es],batch_size=128, epochs=200,shuffle=False)
awl_model.fit([trn_sen_content,trng_epty_sen_mask],trn_rating_oh,
             validation_data=([val_sen_content,val_epty_sen_mask],val_rating_oh),
             batch_size=64,callbacks=[es],epochs=70,shuffle=False)

Train on 11535 samples, validate on 1500 samples
Epoch 1/70
Epoch 2/70
Epoch 3/70
Epoch 4/70
Epoch 5/70
Epoch 6/70
Epoch 7/70
Epoch 8/70
Epoch 9/70
Epoch 10/70
Epoch 11/70
Epoch 12/70
Epoch 13/70
Epoch 14/70
Epoch 15/70
Epoch 16/70
Epoch 17/70
Epoch 18/70
Epoch 19/70
Epoch 20/70
Epoch 21/70
Epoch 22/70
Epoch 23/70
Epoch 24/70
Epoch 25/70
Epoch 26/70
Epoch 27/70
Epoch 28/70
Epoch 29/70
Epoch 30/70
Epoch 31/70
Epoch 32/70
Epoch 33/70
Epoch 34/70
Epoch 35/70
Epoch 36/70
Epoch 37/70
Epoch 38/70
Epoch 39/70
Epoch 40/70
Epoch 41/70
Epoch 42/70
Epoch 00042: early stopping


<tensorflow.python.keras.callbacks.History at 0x7f9e8da48780>

###  Evaluate the performance of the MLP classifier trained with the proposed approach

In [0]:
evaluate(base_model,tst,tst_labels)

the accuracy of awl_mlp is 0.741433


Unnamed: 0,ground-truth,prediction
0,2,2
1,2,2
2,2,2
3,2,2
4,2,2
5,2,2
6,2,2
7,2,2
8,2,2
9,2,2


#Part 2: The documents-as-sens approach

In [0]:
######## Read data for docs-as-sens approach
######## The content of the data is all the same as that for the AWL,
#######  except for they are organised in different structures
baseline_max_sen_len=100
all_data=get_pickle_from_obj_storage(bucket_name,'base_data.pickle')  
trng_content=all_data['all_content']
trng_label=all_data['all_rating_oh']
val_content=all_data['test_content']
val_label=all_data['test_rating_oh']
tst=all_data['tst']
tst_label=all_data['tst_labels']
embedding_matrix=all_data['embedding_matrix']





### Train a LSTM model with docs-as-sens approach

In [0]:
tf.keras.backend.clear_session()
model = tf.keras.Sequential()
embedding_layer = tf.keras.layers.Embedding(embedding_matrix.shape[0],
                            embedding_matrix.shape[1],
                            weights=[embedding_matrix],
                            input_length=baseline_max_sen_len,
                            trainable=False)
model.add(embedding_layer)
model.add(tf.keras.layers.LSTM(128,dropout=0.2,
                               recurrent_dropout=0.2,
                               kernel_regularizer=tf.keras.regularizers.l2(0.001),
                               recurrent_regularizer=tf.keras.regularizers.l2(0.001)))                        
model.add(tf.keras.layers.Dense(200,activation='tanh'))
model.add(tf.keras.layers.Dropout(rate=0.20))
model.add(tf.keras.layers.Dense(50,activation='tanh'))
model.add(tf.keras.layers.Dense(5,activation='softmax'))


model.compile(loss='categorical_crossentropy',optimizer='adam',metrics=['accuracy'])
es = tf.keras.callbacks.EarlyStopping(monitor='val_loss', mode='min',patience=20,verbose=1)
model.fit(trng_content,trng_label,batch_size=128,callbacks=[es],validation_data=[val_content,val_label] ,epochs=70)

Train on 10250 samples, validate on 1500 samples
Epoch 1/70
Epoch 2/70
Epoch 3/70
Epoch 4/70
Epoch 5/70
Epoch 6/70
Epoch 7/70
Epoch 8/70
Epoch 9/70
Epoch 10/70
Epoch 11/70
Epoch 12/70
Epoch 13/70
Epoch 14/70
Epoch 15/70
Epoch 16/70
Epoch 17/70
Epoch 18/70
Epoch 19/70
Epoch 20/70
Epoch 21/70
Epoch 22/70
Epoch 23/70
Epoch 24/70
Epoch 25/70
Epoch 26/70
Epoch 27/70
Epoch 28/70
Epoch 29/70
Epoch 30/70
Epoch 31/70
Epoch 32/70
Epoch 33/70
Epoch 34/70
Epoch 35/70
Epoch 36/70
Epoch 37/70
Epoch 38/70
Epoch 39/70
Epoch 40/70
Epoch 41/70
Epoch 42/70
Epoch 43/70
Epoch 44/70
Epoch 45/70
Epoch 46/70
Epoch 47/70
Epoch 48/70
Epoch 49/70
Epoch 50/70
Epoch 51/70
Epoch 52/70
Epoch 53/70
Epoch 54/70
Epoch 55/70
Epoch 56/70
Epoch 57/70
Epoch 58/70
Epoch 59/70
Epoch 60/70
Epoch 61/70
Epoch 62/70
Epoch 63/70
Epoch 64/70
Epoch 65/70
Epoch 66/70
Epoch 00066: early stopping


<tensorflow.python.keras.callbacks.History at 0x7f9e83ec44a8>

### Evaluate the performance of the LSTM classifier trained with the docs-as-sens approach

In [0]:
evaluate(model,tst,tst_labels)

the accuracy of awl_lstm is 0.657321


Unnamed: 0,ground-truth,prediction
0,2,2
1,2,2
2,2,2
3,2,2
4,2,2
5,2,2
6,2,1
7,2,2
8,2,2
9,2,2
