# Dialogue Act Tagging

Dialogue act (DA) tagging is an important step in the process of developing dialog systems. DA tagging is a problem usually solved by supervised machine learning approaches that all require large amounts of hand labeled data. A wide range of techniques have been investigated for DA tagging. In this lab, we explore two approaches to DA classification. We are using the Switchboard Dialog Act Corpus for training.
Corpus can be downloaded from http://compprag.christopherpotts.net/swda.html.


The downloaded dataset should be kept in a data folder in the same directory as this file. 

In [1]:
import pandas as pd
import glob
from keras.preprocessing.sequence import pad_sequences
from sklearn.model_selection import train_test_split
import numpy as np

import sklearn.metrics
import tensorflow as tf
import matplotlib.pyplot as plt
from tqdm import tqdm_notebook as tqdm

In [2]:
  from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [3]:
f = glob.glob("/content/drive/My Drive/Lab10/swda/sw*/sw*.csv")
frames = []
for i in range(0, len(f)):
    frames.append(pd.read_csv(f[i]))

result = pd.concat(frames, ignore_index=True)


In [4]:
print("Number of converations in the dataset:",len(result))


Number of converations in the dataset: 223606


The dataset has many different features, we are only using act_tag and text for this training.


In [5]:
reduced_df = result[['act_tag','text']]


Reduce down the number of tags to 43 - converting the combined tags to their generic classes:

In [6]:
# Imported from "https://github.com/cgpotts/swda"
# Convert the combination tags to the generic 43 tags

import re
def damsl_act_tag(input):
        """
        Seeks to duplicate the tag simplification described at the
        Coders' Manual: http://www.stanford.edu/~jurafsky/ws97/manual.august1.html
        """
        d_tags = []
        tags = re.split(r"\s*[,;]\s*", input)
        for tag in tags:
            if tag in ('qy^d', 'qw^d', 'b^m'): pass
            elif tag == 'nn^e': tag = 'ng'
            elif tag == 'ny^e': tag = 'na'
            else: 
                tag = re.sub(r'(.)\^.*', r'\1', tag)
                tag = re.sub(r'[\(\)@*]', '', tag)            
                if tag in ('qr', 'qy'):                         tag = 'qy'
                elif tag in ('fe', 'ba'):                       tag = 'ba'
                elif tag in ('oo', 'co', 'cc'):                 tag = 'oo_co_cc'
                elif tag in ('fx', 'sv'):                       tag = 'sv'
                elif tag in ('aap', 'am'):                      tag = 'aap_am'
                elif tag in ('arp', 'nd'):                      tag = 'arp_nd'
                elif tag in ('fo', 'o', 'fw', '"', 'by', 'bc'): tag = 'fo_o_fw_"_by_bc'            
            d_tags.append(tag)
        # Dan J says (p.c.) that it makes sense to take the first;
        # there are only a handful of examples with 2 tags here.
        return d_tags[0]

In [7]:
reduced_df["act_tag"] = reduced_df["act_tag"].apply(lambda x: damsl_act_tag(x))

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  """Entry point for launching an IPython kernel.


There are 43 tags in this dataset. Some of the tags are Yes-No-Question('qy'), Statement-non-opinion('sd') and Statement-opinion('sv'). Tags information can be found here http://compprag.christopherpotts.net/swda.html#tags. 


To get unique tags:

In [8]:
unique_tags = set()
for tag in reduced_df['act_tag']:
    unique_tags.add(tag)

In [9]:
one_hot_encoding_dic = pd.get_dummies(list(unique_tags))


In [10]:
tags_encoding = []
for i in range(0, len(reduced_df)):
    tags_encoding.append(one_hot_encoding_dic[reduced_df['act_tag'].iloc[i]])

The tags are one hot encoded.

To create utterance representations as sequences of words:

In [11]:
utterances = []
for i in range(0, len(reduced_df)):
    utterances.append(reduced_df['text'].iloc[i].split(" "))


In [12]:
wordvectors = {}
index = 1
for u in utterances:
    for w in u:
        if w not in wordvectors:
            wordvectors[w] = index
            index += 1

In [13]:
# Max length of 137
MAX_LENGTH = len(max(utterances, key=len))

In [14]:
utterance_embeddings = []
for u in utterances:
    utterance_emb = []
    for w in u:
        utterance_emb.append(wordvectors[w])
    utterance_embeddings.append(utterance_emb)


Then we split the dataset into test and train.

In [15]:
from sklearn.model_selection import train_test_split
import numpy as np
X_train, X_test, y_train, y_test = train_test_split(utterance_embeddings, np.array(tags_encoding))


And pad the utterances with zero to make all utterances of equal length.


In [16]:
MAX_LENGTH = 137

In [17]:
from keras.preprocessing.sequence import pad_sequences
 
train_utterances_X = pad_sequences(X_train, maxlen=MAX_LENGTH, padding='post')
test_utterances_X = pad_sequences(X_test, maxlen=MAX_LENGTH, padding='post')

Split Train into Train and Validation - about 10% into validation - In order to validate the model as it is training

In [18]:


train_input = train_utterances_X[:140000]
val_input = train_utterances_X[140000:]

train_labels = y_train[:140000]
val_labels = y_train[140000:]


# Model 1 - 

The first approach we'll try is to treat DA tagging as a standard multi-class text classification task, in the way you've done before with sentiment analysis and other tasks. Each utterance will be treated independently as a text to be classified with its DA tag label. This model has an architecture of:

- Embedding  
- BLSTM  
- Fully Connected Layer
- Softmax Activation

 The model architecture is as follows: Embedding Layer (to generate word embeddings) Next layer Bidirectional LSTM. Feed forward layer with number of neurons = number of tags. Softmax activation to get the probabilities.


In [19]:
VOCAB_SIZE = len(wordvectors) # 43,731
MAX_LENGTH = len(max(utterances, key=len))
EMBED_SIZE = 100 # arbitary
HIDDEN_SIZE = len(unique_tags) 

In [20]:
import keras

In [21]:
from keras.models import Sequential
from keras.layers import LSTM
from keras.layers import Dense
from keras.layers import Dropout, InputLayer, Bidirectional, TimeDistributed, Activation, Embedding
from keras.optimizers import Adam

#Building the network

# Include 2 BLSTM layers, in order to capture both the forward and backward hidden states
model = keras.Sequential()
#model.add(InputLayer(MAX_LENGTH,))
model.add(Embedding(VOCAB_SIZE+1, EMBED_SIZE, input_length = MAX_LENGTH, name = 'embedding_1',
                    embeddings_initializer='glorot_uniform'))
# Embedding layer
model.add(Bidirectional(LSTM(HIDDEN_SIZE, return_sequences= True)))
model.add(Bidirectional(LSTM(HIDDEN_SIZE, return_sequences= False)))

model.add(Dense(HIDDEN_SIZE))
model.add(Activation('softmax'))
# Bidirectional 1
# Bidirectional 2
# Dense layer
# Activation

model.compile(loss='categorical_crossentropy',optimizer='adam',metrics=['accuracy'])

model.summary()

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding_1 (Embedding)      (None, 137, 100)          4373200   
_________________________________________________________________
bidirectional (Bidirectional (None, 137, 86)           49536     
_________________________________________________________________
bidirectional_1 (Bidirection (None, 86)                44720     
_________________________________________________________________
dense (Dense)                (None, 43)                3741      
_________________________________________________________________
activation (Activation)      (None, 43)                0         
Total params: 4,471,197
Trainable params: 4,471,197
Non-trainable params: 0
_________________________________________________________________


The above model is build with one embedding layer, two biLSTM layers and one dense layers and softmax activation layer. For multi class classification adam optimizer is used with categorical _crossentropy loss. 

In [22]:
# Train the model - using validation 
model.fit(train_input, train_labels,
          validation_data = (val_input, val_labels),
          epochs = 3,
          batch_size = 512,
          verbose = 1)

Epoch 1/3
Epoch 2/3
Epoch 3/3


<tensorflow.python.keras.callbacks.History at 0x7f00b01bf890>

Here the model is trained for 3 epcohs with a batch size of 512.

In [23]:
score = model.evaluate(test_utterances_X, y_test, batch_size=100)



In [24]:
print("Overall Accuracy:", score[1]*100)


Overall Accuracy: 69.22650337219238


Here we are getting the accuracy of 69%.

## Evaluation


The overall accuracy is 67%, an effective accuracy for this task.

In addition to overall accuracy, you need to look at the accuracy of some minority classes. Signal-non-understanding ('br') is a good indicator of "other-repair" or cases in which the other conversational participant attempts to repair the speaker's error. Summarize/reformulate ('bf') has been used in dialogue summarization. Report the accuracy for these classes and some frequent errors you notice the system makes in predicting them. What do you think the reasons are？

## Minority Classes

In [25]:
# Generate predictions for the test data
label_pred = model.predict(test_utterances_X, batch_size=100)

Predicting the labels

In [26]:
label_pred

array([[1.73706282e-03, 2.15499522e-03, 3.88589664e-03, ...,
        1.65701266e-02, 7.09115574e-03, 2.27407411e-01],
       [5.23861963e-05, 8.37792351e-04, 1.37821815e-04, ...,
        1.15930731e-03, 2.15349472e-04, 8.66751373e-01],
       [5.11357968e-04, 1.02662860e-04, 9.16923687e-04, ...,
        1.02535174e-04, 5.06404757e-01, 6.92125992e-04],
       ...,
       [6.92666435e-05, 3.99518409e-04, 1.86881400e-04, ...,
        1.88924751e-04, 2.40834153e-04, 1.05741218e-01],
       [1.04566815e-03, 2.74903979e-03, 1.91890856e-03, ...,
        2.48856694e-01, 5.96832251e-03, 2.80633867e-01],
       [2.03770702e-04, 1.15198316e-03, 6.81421894e-04, ...,
        7.79251195e-03, 1.09741639e-03, 8.59174609e-01]], dtype=float32)

In [27]:
# Build the confusion matrix off these predictions

matrix = sklearn.metrics.confusion_matrix(y_test.argmax(axis=1), label_pred.argmax(axis=1))


Building the confusion matrix of the model.

In [28]:
matrix

array([[    0,     0,     0, ...,     1,     4,     6],
       [    0,     0,     0, ...,     0,     0,    37],
       [    0,     0,     0, ...,     0,     0,     9],
       ...,
       [    0,     0,     0, ...,     2,     0,   288],
       [    0,     0,     0, ...,     0,  9118,    21],
       [    0,     0,     0, ...,     0,     9, 15666]])

In [29]:
acc_class = matrix.diagonal()/matrix.sum(axis=1)

index_br = list(one_hot_encoding_dic["br"][one_hot_encoding_dic["br"]==1].index)[0]
br_accuracy = acc_class[index_br]*100
print("br accuracy: {}".format(br_accuracy))

index_bf = list(one_hot_encoding_dic["bf"][one_hot_encoding_dic["bf"]==1].index)[0]
bf_accuracy = acc_class[index_bf]*100
print("bf accuracy: {}".format(bf_accuracy))

br accuracy: 0.0
bf accuracy: 0.0


Getting the accuracy of the minority classes.


Due to the reduced lack of training data for the minority classes, these minority classifiers will not be very confident in classification, as they have not been fully optimised. The frequent classifiers will be more optimised and will generate more confident scores for all examples, effectively crowding out the less confident minority classifiers. 




# Model 2 - Balanced Network


One thing we can do to try to improve performance is therefore to balance the data more sensibly. As the dataset is highly imbalanced, we can simply weight the loss function in training, to weight up the minority classes proportionally to their underrepresentation. 

In [30]:
import numpy as np
from sklearn.utils.class_weight import compute_class_weight

y_integers = np.argmax(tags_encoding, axis=1)
class_weights = compute_class_weight('balanced', np.unique(y_integers), y_integers)
d_class_weights = dict(enumerate(class_weights))

## Define & Train the model

In [66]:
# Re-built the model for the balanced training
model_balanced = keras.Sequential()
model_balanced.add(Embedding(VOCAB_SIZE+1, EMBED_SIZE, input_length = MAX_LENGTH, name = 'embedding_1',
                    embeddings_initializer='glorot_uniform'))
model_balanced.add(Bidirectional(LSTM(43, return_sequences= True)))
model_balanced.add(Bidirectional(LSTM(43, return_sequences= False)))

model_balanced.add(Dense(HIDDEN_SIZE))
model_balanced.add(Activation('softmax'))
model_balanced.compile(loss='categorical_crossentropy',optimizer='adam',metrics=['accuracy'])

model_balanced.summary()

Model: "sequential_2"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding_1 (Embedding)      (None, 137, 100)          4373200   
_________________________________________________________________
bidirectional_6 (Bidirection (None, 137, 86)           49536     
_________________________________________________________________
bidirectional_7 (Bidirection (None, 86)                44720     
_________________________________________________________________
dense_5 (Dense)              (None, 43)                3741      
_________________________________________________________________
activation_3 (Activation)    (None, 43)                0         
Total params: 4,471,197
Trainable params: 4,471,197
Non-trainable params: 0
_________________________________________________________________


In [67]:
# Train the balanced network -  takes  time to achieve good accuracy
# Train the model - using validation 
model_balanced.fit(train_input, train_labels,
          validation_data = (val_input, val_labels),
          epochs = 3,
          batch_size = 512,
          class_weight = d_class_weights,
          verbose = 1)

Epoch 1/3
Epoch 2/3
Epoch 3/3


<tensorflow.python.keras.callbacks.History at 0x7f00064b6410>

Adding class_weight=d_class_weights for a balanced model. This adds a higher penalty for the missclassification of minority classes. This weights the loss function during training. It is used to pay more attention to the minority classes.

## Test the model

In [68]:
# Overall Accuracy
score = model_balanced.evaluate(test_utterances_X, y_test, batch_size=100)



In [34]:
print("Overall Accuracy:", score[1]*100)

Overall Accuracy: 34.67139005661011


Here we are getting arounf 34% if the accuracy for model2.

In [69]:
# Generate predictions for the test data
label_pred = model_balanced.predict(test_utterances_X, batch_size=100)

In [70]:
label_pred

array([[6.10516453e-03, 2.63903290e-02, 5.79506997e-03, ...,
        5.25612896e-03, 2.91061867e-03, 1.35663655e-02],
       [2.10255588e-04, 6.94551598e-03, 1.71627314e-03, ...,
        3.94339347e-03, 4.73002234e-04, 8.10729340e-02],
       [6.61013043e-03, 1.87285501e-03, 1.27577735e-02, ...,
        3.03130764e-05, 1.66977882e-01, 1.31256631e-04],
       ...,
       [5.43833303e-04, 1.98218096e-02, 8.89843679e-04, ...,
        2.95529906e-02, 1.77580077e-04, 2.07288191e-01],
       [1.09919300e-03, 9.79048479e-03, 8.33772356e-05, ...,
        6.16566122e-01, 6.51680864e-04, 3.69770452e-02],
       [1.92953006e-03, 4.25570495e-02, 3.18737933e-03, ...,
        6.11068541e-03, 4.38986666e-04, 4.02017720e-02]], dtype=float32)

Here we are predciting the labels.

## Balanced network evaluation

Report the overall accuracy and the accuracy of  'br' and 'bf'  classes. Suggest other ways to handle imbalanced classes.

In [37]:
matrix_balanced = sklearn.metrics.confusion_matrix(y_test.argmax(axis=1), label_pred.argmax(axis=1))
acc_class_balanced = matrix_balanced.diagonal()/matrix_balanced.sum(axis=1)

index_br = list(one_hot_encoding_dic["br"][one_hot_encoding_dic["br"]==1].index)[0]
br_accuracy = acc_class_balanced[index_br]*100
print("br accuracy: {}".format(br_accuracy))

index_bf = list(one_hot_encoding_dic["bf"][one_hot_encoding_dic["bf"]==1].index)[0]
bf_accuracy = acc_class_balanced[index_bf]*100
print("bf accuracy: {}".format(bf_accuracy))

br accuracy: 43.66197183098591
bf accuracy: 18.340611353711793


In [38]:
matrix_balanced

array([[   9,    0,    0, ...,    2,    1,    0],
       [   0,   19,    0, ...,    1,    0,    2],
       [   0,    1,   25, ...,    0,    0,    0],
       ...,
       [   0,    6,    0, ...,  217,    0,    2],
       [  15,    0,    0, ...,    1, 3985,    0],
       [  25, 1943,    7, ...,  197,    4, 3632]])

The overall accuracy of the balanced model is lower than the accuracy of the unbalanced model. But we can see that the individual classes have a much better accuracy in balanced model than the unbalanced model. In balanced model there is a higher penalty when the classification of minority classes were wrong, which results in the low overall accuracy.



### Accuracies



### Explanation


### Other ways to handle imbalanced classes


- Under-sampling: Under-sampling can be used to decrease the instances of majority classes untill it is comparable with the minority class. But as this method removes the data from dataset, some usseful information may be lost.

- Over-sampling: Over-sampling can be used to increase the isntances of minority classes on the training set by duplication. The advantage here is that in over-sampling there is no loss of information, whereas there is a chance that model becomes prone to overfitting.

Can we improve things by using context information?  Next we try to build a model which predicts DA tag from the sequence of 
previous DA tags, plus the utterance representation. 

# Using Context for Dialog Act Classification

The second approach we will try is a hierarchical approach to DA tagging. We expect there is valuable sequential information among the DA tags. So in this section we apply a BiLSTM on top of the utterance CNN representation. The CNN model learns textual information in each utterance for DA classification, acting like the text classifier from Model 1 above. Then we use a bidirectional-LSTM (BLSTM) above that to learn how to use the context before and after the current utterance to improve the output.

## Define the model

This model has an architecture of:

- Word Embedding
- CNN
- Bidirectional LSTM
- Fully-Connected output



## CNN


This is a classical CNN layer used to convolve over embedings tensor and gether useful information from it. The data is represented by hierarchy of features, which can be modelled using a CNN. We transform/reshape conv output to 2d matrix. Then we pass it to the max pooling layer that applies the max pool operation on windows of different sizes. 

In [39]:
from keras.layers import Input
from keras.layers import Reshape
from keras.layers import Dense, Conv2D, Flatten, MaxPool2D#concat
from keras.layers import concatenate
from keras.layers.normalization import BatchNormalization
from keras import Model 
filter_sizes = [3,4,5]
num_filters = 64
drop = 0.2
VOCAB_SIZE = len(wordvectors) # 43,731
MAX_LENGTH = len(max(utterances, key=len))
#MAX_LENGTH = len(max(sentences, key=len))

EMBED_SIZE = 100 # arbitary
HIDDEN_SIZE = len(unique_tags) 

# CNN model
inputs = Input(shape=(MAX_LENGTH, ), dtype='int32')
embedding = Embedding(input_dim=VOCAB_SIZE+1, output_dim=EMBED_SIZE, input_length=MAX_LENGTH)(inputs)
reshape = Reshape((MAX_LENGTH, EMBED_SIZE, 1))(embedding)

# 3 convolutions
conv_0 = Conv2D(num_filters, kernel_size=(filter_sizes[0], EMBED_SIZE), strides=1, padding='valid', kernel_initializer='normal', activation='relu')(reshape)
bn_0 = BatchNormalization()(conv_0)
conv_1 = Conv2D(num_filters, kernel_size=(filter_sizes[1], EMBED_SIZE), strides=1, padding='valid', kernel_initializer='normal', activation='relu')(reshape)
bn_1 = BatchNormalization()(conv_1)
conv_2 = Conv2D(num_filters, kernel_size=(filter_sizes[2], EMBED_SIZE), strides=1, padding='valid', kernel_initializer='normal', activation='relu')(reshape)
bn_2 = BatchNormalization()(conv_2)

# maxpool for 3 layers
maxpool_0 = MaxPool2D(pool_size=(MAX_LENGTH - filter_sizes[0] + 1, 1), padding='valid')(bn_0)
maxpool_1 = MaxPool2D(pool_size=(MAX_LENGTH - filter_sizes[1] + 1, 1), padding='valid')(bn_1)
maxpool_2 = MaxPool2D(pool_size=(MAX_LENGTH - filter_sizes[2] + 1, 1), padding='valid')(bn_2)

# concatenate tensors
merged_1 = concatenate([maxpool_0, maxpool_1, maxpool_2])

# flatten concatenated tensors
# applying time distributed layer so that cnn output is compatible with BiLSTM input
flat = TimeDistributed(Flatten())(merged_1)
# dense layer (dense_1)
dense_1 = Dense(HIDDEN_SIZE, activation='relu')(flat)
# dropout_1
dropout_1 = Dropout(drop)(dense_1)

Here I have concatenated the three maxpooling layer and then applied time distributed layer so that the output of the cnn is compatible with BiLSTM input .

If you want CNN layers to interact with the LSTM layer, they need to be distributed across time.

## BLSTM

This is used to create LSTM layers. The data we’re working with has temporal properties which we want to model as well — hence the use of a LSTM. You should create a BiLSTM. Try the output of cnn as the input for blstm.

In [40]:
biLSTM1 = Bidirectional(LSTM(HIDDEN_SIZE, return_sequences='true'))(dropout_1)
# Bidirectional 2
biLSTM2 = Bidirectional(LSTM(HIDDEN_SIZE))(biLSTM1)
# Dense layer (dense_2)
dense_2 = Dense(HIDDEN_SIZE, activation='relu')(biLSTM2)
# dropout_2
dropout_2 = Dropout(drop)(dense_2)


Here I have create a BiLSTM layer and the output of CNN is the input of the bilstm.

Concatenate 2 last layers and create the output layer. You need to concatenate the outputs of CNN and LSTM (dropout_1 and dropout_2)

In [41]:
# concatenate 2 final layers
# flatten the output of the CNN + dense + dropout so that it can be concatenated with the output of BiLSTM
dropout_flat = Flatten()(dropout_1)
# concatenating the output of CNN + dense + dropout with the output of BiLSTM + dense + dropout
merged_2 = concatenate([dropout_flat, dropout_2])
# merged_2 has the dimension of (None, 86)
# adding a dense layer
dense_3 = Dense(units=HIDDEN_SIZE, input_shape=(1,))(merged_2)
# adding softmax for multiclass classification
output = Activation('softmax')(dense_3)

optimizer = Adam()

model = Model(inputs=[inputs], outputs=[output])
model.compile(loss='categorical_crossentropy',optimizer='adam',metrics=['accuracy'])
model.summary()


Model: "model"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
input_1 (InputLayer)            [(None, 137)]        0                                            
__________________________________________________________________________________________________
embedding (Embedding)           (None, 137, 100)     4373200     input_1[0][0]                    
__________________________________________________________________________________________________
reshape (Reshape)               (None, 137, 100, 1)  0           embedding[0][0]                  
__________________________________________________________________________________________________
conv2d (Conv2D)                 (None, 135, 1, 64)   19264       reshape[0][0]                    
______________________________________________________________________________________________

In [42]:
model.fit(train_input,
                     train_labels,
                     epochs=3,
                     batch_size=512,
                     validation_data=(val_input, val_labels),verbose=1)


Epoch 1/3
Epoch 2/3
Epoch 3/3


<tensorflow.python.keras.callbacks.History at 0x7f009a65c810>

In [43]:
score = model.evaluate(test_utterances_X, y_test, batch_size=100)



In [44]:
print("Overall Accuracy:", score[1]*100)

Overall Accuracy: 71.16203308105469


Here for the CNN model we are getting arounf 71% accuracy.

In [60]:
# Generate predictions for the test data
label_cnn = model.predict(test_utterances_X, batch_size=100)

In [61]:
label_cnn

array([[7.23096542e-03, 5.23082446e-03, 6.40062382e-03, ...,
        4.10733372e-03, 1.16247825e-01, 7.28917494e-02],
       [1.16260730e-04, 6.15743455e-03, 1.83145439e-05, ...,
        1.23911304e-03, 2.97348568e-04, 7.70550787e-01],
       [5.68815041e-04, 8.16458414e-05, 8.90097581e-05, ...,
        3.98963348e-05, 5.17999470e-01, 3.64799082e-04],
       ...,
       [6.61671447e-06, 1.04867599e-04, 9.32865532e-07, ...,
        1.14181705e-04, 2.97188617e-05, 1.30002528e-01],
       [5.05816424e-05, 6.79959776e-04, 1.84000328e-05, ...,
        8.57532740e-01, 3.20229854e-04, 4.87545356e-02],
       [7.70159604e-05, 2.33231834e-03, 1.53257515e-05, ...,
        4.90286329e-04, 5.49002492e-04, 9.07488942e-01]], dtype=float32)

In [47]:
matrix_1 = sklearn.metrics.confusion_matrix(y_test.argmax(axis=1), label_pred.argmax(axis=1))
acc_class_balanced = matrix_1.diagonal()/matrix_1.sum(axis=1)

index_br = list(one_hot_encoding_dic["br"][one_hot_encoding_dic["br"]==1].index)[0]
br_accuracy = acc_class_balanced[index_br]*100
print("br accuracy: {}".format(br_accuracy))

index_bf = list(one_hot_encoding_dic["bf"][one_hot_encoding_dic["bf"]==1].index)[0]
bf_accuracy = acc_class_balanced[index_bf]*100
print("bf accuracy: {}".format(bf_accuracy))

br accuracy: 46.478873239436616
bf accuracy: 0.8733624454148471


Here the accuracy of the minority class br and bf are 46.4% and 0.87%, We can see that the accuracy if the br class has increased and the accuracy if the model bf has decreased compared to the previous model2.

Report your overall accuracy and the minority class accuracies. Discuss whether context helped disambiguate and better predict the minority classes ('br' and 'bf'). What are some frequent errors? Show one positive example where adding context changed the prediction.




IF the model initially predicted the wrong class (BiLSTM) but it later got rectified by CNN + BILSTM then it is considered as a positive change. The code below gets the index of all positive chanegs as well as negative changes.

In [85]:
index_pos_change = []
index_neg_change = []
for i in range(len(y_test)):
  true_label = y_test[i].argmax(axis=0)
  bLISTM_pred = label_pred[i].argmax(axis=0)
  cnn_pred = label_cnn[i].argmax(axis=0)
  if true_label == bLISTM_pred and true_label != cnn_pred:
    index_neg_change.append(i)
  elif true_label != bLISTM_pred and true_label == cnn_pred:
    index_pos_change.append(i)

In [86]:
reverse_word_index = dict([(value, key) for (key, value) in wordvectors.items()])
# method to decode the sentence from a list of IDs to a string
def decode_sentence(text):
    return ' '.join([reverse_word_index.get(i, '?') for i in text])

Now we are printing the values of sentences which changed from biLSTM model to CNN + BiLSTM. 

In [87]:
index = 0
for i, val in enumerate(index_pos_change):
  index +=1
  print(decode_sentence(X_test[val]))
  if index ==5:
    break

{C So } the advice we gave to them --
{C and } # also, {F uh, } they do some of that in Wichita, Kansas. /
Oh, yeah.  /
Yeah. /
-- right now trying to keep abreast of, {F uh, } what's going on in Europe, {D you know, } with all the, U S S R's satellites breaking off, trying to become independent and, {D you know, } European community coming together. /


### Minority Classes

