# **Character-Level Classification using CNNs**

For general background on this topic, check out the following link.

[Best Practices for Document Classifcation with Deep Learning](https://machinelearningmastery.com/best-practices-document-classification-deep-learning/)

I am implementing the network described in this [paper](https://arxiv.org/pdf/1606.01781.pdf) which was also done by someone else in [this kernel](https://www.kaggle.com/robwec/character-level-author-identification-with-cnns) which I suggest you also check out. I found [this kernel](https://www.kaggle.com/marijakekic/cnn-in-keras-with-pretrained-word2vec-weights) utilizing a CNN approach helpful as well. 

Overall, this is most likely not the best approach for this particular dataset but may be of use for others in the future tackling larger datasets. The authors in the paper linked above describe how this method does better on larger datasets. 

In [1]:
import numpy as np
import pandas as pd

In [2]:
train = pd.read_csv("train.csv")
test = pd.read_csv("test.csv")

In [3]:
train.head()

Unnamed: 0,id,text,author
0,id26305,"This process, however, afforded me no means of...",EAP
1,id17569,It never once occurred to me that the fumbling...,HPL
2,id11008,"In his left hand was a gold snuff box, from wh...",EAP
3,id27763,How lovely is spring As we looked from Windsor...,MWS
4,id12958,"Finding nothing else, not even gold, the Super...",HPL


In [4]:
x_train = train.iloc[:,1].values
y_train = train.iloc[:,2].values


In [5]:
print(len(x_train))

19579


The following block of code is adapted from [this repository](https://github.com/johnb30/py_crepe).

This will take the sentences as input and turn each of them into a sparse character array. 

The max length is equivalent to the length of the sentence. 250 should be more than enough and you most likely can go even lower. 

In [6]:
import string

maxlen = 250
alphabet = (list(string.ascii_lowercase) + list(string.digits) +
                list(string.punctuation) + ['\n'])
vocab_size = len(alphabet)
check = set(alphabet)

vocab = {}
reverse_vocab = {}
for ix, t in enumerate(alphabet):
    vocab[t] = ix
    reverse_vocab[ix] = t

input_array = np.zeros((len(x_train), maxlen, vocab_size))
for i, sentence in enumerate(x_train):
    counter = 0
    sentence_array = np.zeros((maxlen, vocab_size))
    chars = list(sentence.lower().replace(' ', ''))
    for c in chars:
        if counter >= maxlen:
            pass
        else:
            char_array = np.zeros(vocab_size, dtype=np.int)
            if c in check:
                ix = vocab[c]
                char_array[ix] = 1
            sentence_array[counter, :] = char_array
            counter +=1
    input_array[i, :, :] = sentence_array

In [7]:
print(input_array)
print(np.shape(input_array))

[[[ 0.  0.  0. ...,  0.  0.  0.]
  [ 0.  0.  0. ...,  0.  0.  0.]
  [ 0.  0.  0. ...,  0.  0.  0.]
  ..., 
  [ 0.  0.  0. ...,  0.  0.  0.]
  [ 0.  0.  0. ...,  0.  0.  0.]
  [ 0.  0.  0. ...,  0.  0.  0.]]

 [[ 0.  0.  0. ...,  0.  0.  0.]
  [ 0.  0.  0. ...,  0.  0.  0.]
  [ 0.  0.  0. ...,  0.  0.  0.]
  ..., 
  [ 0.  0.  0. ...,  0.  0.  0.]
  [ 0.  0.  0. ...,  0.  0.  0.]
  [ 0.  0.  0. ...,  0.  0.  0.]]

 [[ 0.  0.  0. ...,  0.  0.  0.]
  [ 0.  0.  0. ...,  0.  0.  0.]
  [ 0.  0.  0. ...,  0.  0.  0.]
  ..., 
  [ 0.  0.  0. ...,  0.  0.  0.]
  [ 0.  0.  0. ...,  0.  0.  0.]
  [ 0.  0.  0. ...,  0.  0.  0.]]

 ..., 
 [[ 0.  0.  0. ...,  0.  0.  0.]
  [ 1.  0.  0. ...,  0.  0.  0.]
  [ 0.  0.  0. ...,  0.  0.  0.]
  ..., 
  [ 0.  0.  0. ...,  0.  0.  0.]
  [ 0.  0.  0. ...,  0.  0.  0.]
  [ 0.  0.  0. ...,  0.  0.  0.]]

 [[ 0.  0.  0. ...,  0.  0.  0.]
  [ 0.  0.  0. ...,  0.  0.  0.]
  [ 0.  0.  0. ...,  0.  0.  0.]
  ..., 
  [ 0.  0.  0. ...,  0.  0.  0.]
  [ 0.  0.  0. ...,  

One-Hot Encoding of the Labels

In [8]:
from sklearn.preprocessing import LabelBinarizer

one_hot = LabelBinarizer()
y_train = one_hot.fit_transform(y_train)
y_train

array([[1, 0, 0],
       [0, 1, 0],
       [1, 0, 0],
       ..., 
       [1, 0, 0],
       [1, 0, 0],
       [0, 1, 0]])

The following is the architecture described in the paper linked to in the beginning. They describe 9-layer, 17-layer, 29-layer, and 49-layer variations but they all proceed in the same general manner as below. 

Do note that this method utilizes batch normalization instead of dropout. 

In [9]:
from keras.models import Sequential
from keras.layers import Dropout, Flatten, Dense, BatchNormalization, Activation
from keras.layers import Conv1D, MaxPooling1D, GlobalMaxPooling1D
#from keras.optimizers import SGD

model = Sequential()
model.add(Conv1D(filters=64, kernel_size=3, padding='same', input_shape=(250, 69)))
model.add(Conv1D(filters=64, kernel_size=3, padding='same'))
model.add(BatchNormalization())
model.add(Activation('relu'))
model.add(Conv1D(filters=64, kernel_size=3, padding='same'))
model.add(BatchNormalization())
model.add(Activation('relu'))
model.add(MaxPooling1D(pool_size=3, strides=2))
model.add(Conv1D(filters=128, kernel_size=3, padding='same'))
model.add(BatchNormalization())
model.add(Activation('relu'))
model.add(Conv1D(filters=128, kernel_size=3, padding='same'))
model.add(BatchNormalization())
model.add(Activation('relu'))
model.add(MaxPooling1D(pool_size=3, strides=2))
model.add(Conv1D(filters=256, kernel_size=3, padding='same'))
model.add(BatchNormalization())
model.add(Activation('relu'))
model.add(Conv1D(filters=256, kernel_size=3, padding='same'))
model.add(BatchNormalization())
model.add(Activation('relu'))
model.add(MaxPooling1D(pool_size=3, strides=2))
model.add(Conv1D(filters=512, kernel_size=3, padding='same'))
model.add(BatchNormalization())
model.add(Activation('relu'))
model.add(Conv1D(filters=512, kernel_size=3, padding='same'))
model.add(BatchNormalization())
model.add(Activation('relu'))
model.add(GlobalMaxPooling1D())
model.add(Dense(2048, activation='relu'))
model.add(Dense(2048, activation='relu'))
model.add(Dense(3, activation='softmax'))

model.summary()

Using TensorFlow backend.


_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv1d_1 (Conv1D)            (None, 250, 64)           13312     
_________________________________________________________________
conv1d_2 (Conv1D)            (None, 250, 64)           12352     
_________________________________________________________________
batch_normalization_1 (Batch (None, 250, 64)           256       
_________________________________________________________________
activation_1 (Activation)    (None, 250, 64)           0         
_________________________________________________________________
conv1d_3 (Conv1D)            (None, 250, 64)           12352     
_________________________________________________________________
batch_normalization_2 (Batch (None, 250, 64)           256       
_________________________________________________________________
activation_2 (Activation)    (None, 250, 64)           0         
__________

This alternate model below is the same as the one above but essentially doubled in size. 

In [10]:
from keras.models import Sequential
from keras.layers import Dropout, Flatten, Dense, BatchNormalization, Activation
from keras.layers import Conv1D, MaxPooling1D, GlobalMaxPooling1D

model2 = Sequential()
model2.add(Conv1D(filters=64, kernel_size=3, padding='same', input_shape=(400, 69)))
model2.add(Conv1D(filters=64, kernel_size=3, padding='same'))
model2.add(BatchNormalization())
model2.add(Activation('relu'))
model2.add(Conv1D(filters=64, kernel_size=3, padding='same'))
model2.add(BatchNormalization())
model2.add(Activation('relu'))
model2.add(Conv1D(filters=64, kernel_size=3, padding='same'))
model2.add(BatchNormalization())
model2.add(Activation('relu'))
model2.add(Conv1D(filters=64, kernel_size=3, padding='same'))
model2.add(BatchNormalization())
model2.add(Activation('relu'))
model2.add(MaxPooling1D(pool_size=3, strides=2))
model2.add(Conv1D(filters=128, kernel_size=3, padding='same'))
model2.add(BatchNormalization())
model2.add(Activation('relu'))
model2.add(Conv1D(filters=128, kernel_size=3, padding='same'))
model2.add(BatchNormalization())
model2.add(Activation('relu'))
model2.add(Conv1D(filters=128, kernel_size=3, padding='same'))
model2.add(BatchNormalization())
model2.add(Activation('relu'))
model2.add(Conv1D(filters=128, kernel_size=3, padding='same'))
model2.add(BatchNormalization())
model2.add(Activation('relu'))
model2.add(MaxPooling1D(pool_size=3, strides=2))
model2.add(Conv1D(filters=256, kernel_size=3, padding='same'))
model2.add(BatchNormalization())
model2.add(Activation('relu'))
model2.add(Conv1D(filters=256, kernel_size=3, padding='same'))
model2.add(BatchNormalization())
model2.add(Activation('relu'))
model2.add(Conv1D(filters=256, kernel_size=3, padding='same'))
model2.add(BatchNormalization())
model2.add(Activation('relu'))
model2.add(Conv1D(filters=256, kernel_size=3, padding='same'))
model2.add(BatchNormalization())
model2.add(Activation('relu'))
model2.add(MaxPooling1D(pool_size=3, strides=2))
model2.add(Conv1D(filters=512, kernel_size=3, padding='same'))
model2.add(BatchNormalization())
model2.add(Activation('relu'))
model2.add(Conv1D(filters=512, kernel_size=3, padding='same'))
model2.add(BatchNormalization())
model2.add(Activation('relu'))
model2.add(Conv1D(filters=512, kernel_size=3, padding='same'))
model2.add(BatchNormalization())
model2.add(Activation('relu'))
model2.add(Conv1D(filters=512, kernel_size=3, padding='same'))
model2.add(BatchNormalization())
model2.add(Activation('relu'))
model2.add(GlobalMaxPooling1D()) ###features = 512?
model2.add(Dense(2048, activation='relu'))
model2.add(Dense(2048, activation='relu'))
model2.add(Dense(3, activation='softmax'))

model2.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv1d_10 (Conv1D)           (None, 400, 64)           13312     
_________________________________________________________________
conv1d_11 (Conv1D)           (None, 400, 64)           12352     
_________________________________________________________________
batch_normalization_9 (Batch (None, 400, 64)           256       
_________________________________________________________________
activation_9 (Activation)    (None, 400, 64)           0         
_________________________________________________________________
conv1d_12 (Conv1D)           (None, 400, 64)           12352     
_________________________________________________________________
batch_normalization_10 (Batc (None, 400, 64)           256       
_________________________________________________________________
activation_10 (Activation)   (None, 400, 64)           0         
__________

In [11]:
model.compile(optimizer='sgd', loss='categorical_crossentropy', metrics=['accuracy'])

In [12]:
from keras.callbacks import EarlyStopping

earlystop = EarlyStopping(monitor='val_acc', min_delta=0.0001, patience=5, verbose=2, mode='auto')
callbacks_list = [earlystop]

model.fit(input_array, y_train, validation_split=0.2, epochs=60, callbacks=callbacks_list, batch_size=64, verbose=2)

Train on 15663 samples, validate on 3916 samples
Epoch 1/60
17s - loss: 1.1930 - acc: 0.4159 - val_loss: 1.1961 - val_acc: 0.3999
Epoch 2/60
9s - loss: 0.9588 - acc: 0.5349 - val_loss: 1.0099 - val_acc: 0.4612
Epoch 3/60
9s - loss: 0.8386 - acc: 0.6171 - val_loss: 0.9020 - val_acc: 0.5838
Epoch 4/60
9s - loss: 0.7272 - acc: 0.6819 - val_loss: 0.9003 - val_acc: 0.6121
Epoch 5/60
9s - loss: 0.6381 - acc: 0.7327 - val_loss: 1.0512 - val_acc: 0.5628
Epoch 6/60
9s - loss: 0.5541 - acc: 0.7716 - val_loss: 1.2110 - val_acc: 0.5365
Epoch 7/60
9s - loss: 0.4769 - acc: 0.8089 - val_loss: 1.5731 - val_acc: 0.5546
Epoch 8/60
9s - loss: 0.4118 - acc: 0.8353 - val_loss: 1.4723 - val_acc: 0.5429
Epoch 9/60
9s - loss: 0.3443 - acc: 0.8658 - val_loss: 1.5008 - val_acc: 0.5707
Epoch 10/60
9s - loss: 0.2866 - acc: 0.8888 - val_loss: 1.0211 - val_acc: 0.6359
Epoch 11/60
9s - loss: 0.2403 - acc: 0.9071 - val_loss: 1.4509 - val_acc: 0.6080
Epoch 12/60
9s - loss: 0.2122 - acc: 0.9183 - val_loss: 1.6534 - val

<keras.callbacks.History at 0x26373be6eb8>

Data Preparation of the Test Set

In [13]:
test.head()

Unnamed: 0,id,text
0,id02310,"Still, as I urged our leaving Ireland with suc..."
1,id24541,"If a fire wanted fanning, it could readily be ..."
2,id00134,And when they had broken down the frail door t...
3,id27757,While I was thinking how I should possibly man...
4,id04081,I am not sure to what limit his knowledge may ...


In [14]:
x_test = test.iloc[:,1].values

In [15]:
print(x_test)

[ 'Still, as I urged our leaving Ireland with such inquietude and impatience, my father thought it best to yield.'
 'If a fire wanted fanning, it could readily be fanned with a newspaper, and as the government grew weaker, I have no doubt that leather and iron acquired durability in proportion, for, in a very short time, there was not a pair of bellows in all Rotterdam that ever stood in need of a stitch or required the assistance of a hammer.'
 'And when they had broken down the frail door they found only this: two cleanly picked human skeletons on the earthen floor, and a number of singular beetles crawling in the shadowy corners.'
 ...,
 'It is easily understood that what might improve a closely scrutinized detail, may at the same time injure a general or more distantly observed effect.'
 'Be this as it may, I now began to feel the inspiration of a burning hope, and at length nurtured in my secret thoughts a stern and desperate resolution that I would submit no longer to be enslaved

In [16]:
test_array = np.zeros((len(x_test), maxlen, vocab_size))
for i, sentence in enumerate(x_test):
    counter = 0
    sentence_array = np.zeros((maxlen, vocab_size))
    chars = list(sentence.lower().replace(' ', ''))
    for c in chars:
        if counter >= maxlen:
            pass
        else:
            char_array = np.zeros(vocab_size, dtype=np.int)
            if c in check:
                ix = vocab[c]
                char_array[ix] = 1
            sentence_array[counter, :] = char_array
            counter +=1
    test_array[i, :, :] = sentence_array

In [17]:
print(test_array)
print(np.shape(test_array))

[[[ 0.  0.  0. ...,  0.  0.  0.]
  [ 0.  0.  0. ...,  0.  0.  0.]
  [ 0.  0.  0. ...,  0.  0.  0.]
  ..., 
  [ 0.  0.  0. ...,  0.  0.  0.]
  [ 0.  0.  0. ...,  0.  0.  0.]
  [ 0.  0.  0. ...,  0.  0.  0.]]

 [[ 0.  0.  0. ...,  0.  0.  0.]
  [ 0.  0.  0. ...,  0.  0.  0.]
  [ 1.  0.  0. ...,  0.  0.  0.]
  ..., 
  [ 0.  0.  0. ...,  0.  0.  0.]
  [ 0.  0.  0. ...,  0.  0.  0.]
  [ 1.  0.  0. ...,  0.  0.  0.]]

 [[ 1.  0.  0. ...,  0.  0.  0.]
  [ 0.  0.  0. ...,  0.  0.  0.]
  [ 0.  0.  0. ...,  0.  0.  0.]
  ..., 
  [ 0.  0.  0. ...,  0.  0.  0.]
  [ 0.  0.  0. ...,  0.  0.  0.]
  [ 0.  0.  0. ...,  0.  0.  0.]]

 ..., 
 [[ 0.  0.  0. ...,  0.  0.  0.]
  [ 0.  0.  0. ...,  0.  0.  0.]
  [ 0.  0.  0. ...,  0.  0.  0.]
  ..., 
  [ 0.  0.  0. ...,  0.  0.  0.]
  [ 0.  0.  0. ...,  0.  0.  0.]
  [ 0.  0.  0. ...,  0.  0.  0.]]

 [[ 0.  1.  0. ...,  0.  0.  0.]
  [ 0.  0.  0. ...,  0.  0.  0.]
  [ 0.  0.  0. ...,  0.  0.  0.]
  ..., 
  [ 0.  0.  0. ...,  0.  0.  0.]
  [ 0.  0.  0. ...,  

In [18]:
y_test = model.predict_proba(test_array)



In [19]:
ids = test['id']

In [20]:
submission = pd.DataFrame(y_test, columns=['EAP', 'HPL', 'MWS'])
submission.insert(0, "id", ids)
submission.to_csv("my_submission.csv", index=False)