#Midterm - AG News Data classification with Adam
### Shailesh Patro

I have implemented an Adam optimizer based on *ADAM: A METHOD FOR STOCHASTIC OPTIMIZATION* by Diederik P. Kingma & Jimmy Lei Ba         
(https://arxiv.org/pdf/1412.6980v8.pdf). 
The implementation uses Keras base class and common methods for compatbility with a keras model. The keras compatibility is preferable for testing the optimizer quickly on several datasets.

I have tested the optimizer on three data sets:

1.   Sine Curve Regression
2.   MNIST Classification
3.   AG News data Classification


While doing this assignment, I used the tutorial presented here: https://machinelearningmastery.com/use-word-embedding-layers-deep-learning-keras/. The model presented in this tutorial uses word embeddings which is superior to the bag of words model.

In [13]:
from google.colab import drive
drive.mount('/content/gdrive')

Drive already mounted at /content/gdrive; to attempt to forcibly remount, call drive.mount("/content/gdrive", force_remount=True).


In [0]:
import pandas as pd
import keras
import numpy as np
import tensorflow as tf

from sklearn.model_selection import train_test_split
from keras.preprocessing.text import one_hot
from keras.preprocessing.sequence import pad_sequences
from keras.models import Sequential
from keras.layers import Conv1D, BatchNormalization, Activation,Flatten
from keras.layers import Embedding, Input, Dense, Dropout, Lambda, MaxPooling1D
from keras.layers.advanced_activations import LeakyReLU, PReLU
from keras.optimizers import SGD
from keras.models import Model

from keras.optimizers import Optimizer


dataPath = '/content/gdrive/My Drive/Colab Notebooks/ag_news_csv/'

In [0]:
train = pd.read_csv(dataPath+'train.csv', names=["label", "title", "text"])
test = pd.read_csv(dataPath+'test.csv', names=["label","title", "text"])

In [16]:
x_train = train["title"] + " " + train["text"]
y_train = train["label"]

x_train, x_val, y_train, y_val = train_test_split(x_train, y_train, test_size=0.1, random_state=123)
x_train
# y_train

54257     Pakistanis ban religious meetings Authorities ...
57940     Marks  amp; Spencer Says First-Half Earnings F...
19952     SFA DELAY MOLDOVA DECISION The Scottish Footba...
9864      Deutsche Bank hit again by phishing attack Aft...
116957    Top British Official Resigns Amid Scandal A ca...
8612      Japan to Deport Ex-Chess Champion Bobby Fische...
48105     Moats sinks Fresno St. RUSTON, La. -- Ryan Moa...
13701     40 injured in French motorway pileup Bordeaux,...
29994     Afghan President Escapes Attack Afghanistan Pr...
38872     Roddick Powers U.S. to Lead in Davis Semis CHA...
62115     Walsh in charge for Masconomet Myles Walsh sto...
92763     Wholesale prices jump in October by 1.7 percen...
3707      Sprint stars create new Greek tragedy ATHENS, ...
10502     Server sales rocket by almost a quarter HP and...
52501     Ethics Panel Rebukes DeLay Twice in a Week WAS...
106083    Barrera Proved he #39;s Better than Morales Th...
98416     Alien aims to multiply and thr

In [0]:
x_test = test["title"] + " " + test["text"]
y_test = test["label"]


In [18]:
num_class = 4
vocab = 50000 # size of vocabulary
max_length = 1024

encode_train = [one_hot(d,vocab) for d in x_train]
encode_val = [one_hot(d,vocab) for d in x_val]
encode_test = [one_hot(d,vocab) for d in x_test]
pad_train = np.array(pad_sequences(encode_train, maxlen=max_length, padding='post'))
pad_val = np.array(pad_sequences(encode_val, maxlen=max_length, padding='post'))
pad_test = np.array(pad_sequences(encode_test, maxlen=max_length, padding='post'))
trainLen = pad_train.shape[0]
valLen = pad_val.shape[0]
testLen = pad_test.shape[0]
pad_train = pad_train.flatten()
pad_val = pad_val.flatten()
pad_test = pad_test.flatten()

pad_train = pad_train.reshape(trainLen,max_length)
pad_val = pad_val.reshape(valLen,max_length)
pad_test = pad_test.reshape(testLen,max_length)

y_train[y_train==4] = 0
y_val[y_val==4] = 0
y_test[y_test==4] = 0
y_train = keras.utils.to_categorical(y_train, num_class)
y_val = keras.utils.to_categorical(y_val, num_class)
y_test = keras.utils.to_categorical(y_test, num_class)


A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy


In [0]:
class AdamOptimizer(Optimizer):
  def __init__(self, alpha=0.001, beta_1=0.9,
               beta_2=0.999, epsilon=1e-08, 
               **kwargs):
    super(AdamOptimizer, self).__init__(**kwargs)
    with keras.backend.name_scope(self.__class__.__name__):
      self.iterations = keras.backend.variable(0, dtype='int64', name='iterations')
      # alpha is the stepsize/learning rate as described in the paper
      self.alpha = keras.backend.variable(alpha, name='alpha')
      # beta_1, beta_2 are the exponential decay rates for the moment estimates
      self.beta_1 = keras.backend.variable(beta_1, name='beta_1')
      self.beta_2 = keras.backend.variable(beta_2, name='beta_2')
      self.epsilon = epsilon
 


  def get_updates(self, loss, params):
    xs = params
    # get gradients with tensorflow's built in gradient function
    grads = tf.gradients(loss, xs, colocate_gradients_with_ops=True)
    self.updates = [tf.assign_add(self.iterations, 1)]
    # alpha is the learning rate as defined in the paper
    alpha = self.alpha
    # increment timestep by 1
    t = tf.cast(self.iterations, 'float32') + 1
    
    # suggested improvement as mentioned in section 2: algorithm
    alpha_t = alpha * (tf.sqrt(1. - tf.pow(self.beta_2, t)) / (1. - tf.pow(self.beta_1, t))) 
    
    # initialize m, v to zero
    ms = [keras.backend.zeros(x.shape, dtype=x.dtype.base_dtype.name) for x in xs]
    vs = [keras.backend.zeros(x.shape, dtype=x.dtype.base_dtype.name) for x in xs]
 
    self.weights = [self.iterations] + ms + vs
    
    for x, g, m, v in zip(xs, grads, ms, vs):
        # Update biased first moment estimate
        m_t = (self.beta_1 * m) + (1. - self.beta_1) * g
        
        # Update biased second raw moment estimate 
        # also used tensorflow's elementwise square
        v_t = (self.beta_2 * v) + (1. - self.beta_2) * tf.square(g) 
        
        # Update Parameters
        x_t = x - alpha_t * m_t / (tf.sqrt(v_t) + self.epsilon)
        self.updates.append(tf.assign(m, m_t))
        self.updates.append(tf.assign(v, v_t))
        new_x = x_t

        self.updates.append(tf.assign(x, new_x))
    return self.updates

  
  
  def get_config(self):
    config = {'alpha': float(keras.backend.get_value(self.alpha)),
              'beta_1': float(keras.backend.get_value(self.beta_1)),
              'beta_2': float(keras.backend.get_value(self.beta_2)),
              'epsilon': self.epsilon}
    base_config = super(AdamOptimizer, self).get_config()
    return dict(list(base_config.items()) + list(config.items()))
  

    

In [8]:
model = Sequential()
model.add(Embedding(vocab, 512, input_length=max_length))
model.add(Flatten())

model.add(Dense(4, activation='softmax'))
adamopt = AdamOptimizer()
model.compile(optimizer=adamopt, loss='categorical_crossentropy', metrics=['accuracy'])
print(model.summary())
model.fit(pad_train,y_train, epochs=4, batch_size=1024,
    validation_data=(pad_val,y_val), verbose=1)

scores = model.evaluate(pad_test,y_test)
print('Test loss: ', scores[0])
print('Test accuracy:',scores[1])


_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding_1 (Embedding)      (None, 1024, 512)         25600000  
_________________________________________________________________
flatten_1 (Flatten)          (None, 524288)            0         
_________________________________________________________________
dense_1 (Dense)              (None, 4)                 2097156   
Total params: 27,697,156
Trainable params: 27,697,156
Non-trainable params: 0
_________________________________________________________________
None
Train on 108000 samples, validate on 12000 samples
Epoch 1/4
Epoch 2/4
Epoch 3/4
Epoch 4/4
Test loss:  0.2705410565788809
Test accuracy: 0.9125


_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
embedding_4 (Embedding)      (None, 1024, 512)         25600000  
_________________________________________________________________
flatten_4 (Flatten)          (None, 524288)            0         
_________________________________________________________________
dense_4 (Dense)              (None, 4)                 2097156   
=================================================================
Total params: 27,697,156
Trainable params: 27,697,156
Non-trainable params: 0
_________________________________________________________________
None
7600/7600 [==============================] - 1s 181us/step
Test loss:  0.26758113999900063
Test accuracy: 0.9169736842105263