## Reuters Newswire Dataset 
<br>
A collection of newswire data is assembled for text classification purposes, and full description of the dataset can be found at [UCI machine learning repositoty](https://archive.ics.uci.edu/ml/datasets/reuters-21578+text+categorization+collection). Load data to jupyter notebook with Keras. 

In [67]:
import numpy as np
import pandas as pd
from collections import Counter

#### Load Data

In [68]:
from keras.datasets import reuters 

n = 10000  # top 10000 most common words

(train_data, train_label), (test_data, test_label) = reuters.load_data(num_words=n)
print('Number of training examples: ', train_data.shape[0])
print('Number of test examples: ', test_data.shape[0])

print('Example training data: ', train_data[0])
print('Example training data label: ', train_label[0])

Number of training examples:  8982
Number of test examples:  2246
Example training data:  [1, 2, 2, 8, 43, 10, 447, 5, 25, 207, 270, 5, 3095, 111, 16, 369, 186, 90, 67, 7, 89, 5, 19, 102, 6, 19, 124, 15, 90, 67, 84, 22, 482, 26, 7, 48, 4, 49, 8, 864, 39, 209, 154, 6, 151, 6, 83, 11, 15, 22, 155, 11, 15, 7, 48, 9, 4579, 1005, 504, 6, 258, 6, 272, 11, 15, 22, 134, 44, 11, 15, 16, 8, 197, 1245, 90, 67, 52, 29, 209, 30, 32, 132, 6, 109, 15, 17, 12]
Example training data label:  3


#### Decode Data to Newswire

In [69]:
def decode_newswire(example):
    """
        Args:
            List of word indices 
        Returns:
            List of words matched to given indices
    """
    word_to_index = reuters.get_word_index()
    index_to_word = {key: value for (value, key) in word_to_index.items()}
    words = [index_to_word.get(i-3, 'UNK') for i in example] #indices offset by 3
    return ' '.join(words)

In [70]:
# print one example newswire
decode_newswire(train_data[0])

'UNK UNK UNK said as a result of its december acquisition of space co it expects earnings per share in 1987 of 1 15 to 1 30 dlrs per share up from 70 cts in 1986 the company said pretax net should rise to nine to 10 mln dlrs from six mln dlrs in 1986 and rental operation revenues to 19 to 22 mln dlrs from 12 5 mln dlrs it said cash flow per share this year should be 2 50 to three dlrs reuter 3'

In [71]:
print('Number of exmples for each topic label: ', Counter(train_label))

Number of exmples for each topic label:  Counter({3: 3159, 4: 1949, 19: 549, 16: 444, 1: 432, 11: 390, 20: 269, 13: 172, 8: 139, 10: 124, 9: 101, 21: 100, 25: 92, 2: 74, 18: 66, 24: 62, 0: 55, 34: 50, 12: 49, 36: 49, 28: 48, 6: 48, 30: 45, 23: 41, 31: 39, 17: 39, 40: 36, 32: 32, 41: 30, 14: 26, 26: 24, 39: 24, 43: 21, 15: 20, 38: 19, 37: 19, 29: 19, 45: 18, 5: 17, 7: 16, 27: 15, 22: 15, 42: 13, 44: 12, 33: 11, 35: 10})


### 1. Data Preprocessing 

All observations in traning dataset are lists of word indices. 

#### Construct Vectorized Input Data 

In [72]:
def construct_input_vectors(X, N):
    """vectorize newswire data"""
    input = np.zeros((X.shape[0], N))
    for i in range(X.shape[0]):
        for j in range(len(X[i])):
            input[i][X[i][j]] = 1
    return input

In [73]:
X_train = construct_input_vectors(train_data, n)
y_train = train_label
X_test = construct_input_vectors(test_data, n)
y_test = test_label

### 2. Construct Neural Network 

#### Cross Entropy Loss 
For classification tasks, cross entropy loss funtion is often used.  

In [75]:
def relu_activation(X, W, b):
    Z = np.maximum(np.dot(X, W) + b, 0) # element-wise max between two arrays
    return Z

In [76]:
def softmax(A):
    exps = np.exp(A - np.max(A, axis=1, keepdims=True)) # prevent overflow
    return exps / np.sum(exps, axis=1, keepdims=True) 

In [77]:
def cross_entropy_loss(model_output, target):
    ce = -np.sum(target * np.log(model_output) + (1 - target) * np.log(1 - model_output))
    return ce

In [78]:
def back_prop_relu(delta, X):  
    gradient = np.dot(X.T, delta)
    gradient[X < 0] = 0
    return gradient

In [79]:
def evaluate_accuracy(y, X, W1, b1, W2, b2):
    A1 = relu_activation(X, W1, b1)
    class_prob = np.dot(A1, W2) + b2
    pred = np.argmax(class_prob, axis=1)
    print('prediction accuracy: %.2f%% \n' % (100 * np.mean(pred == y)))

In [80]:
h = 64 # size of hidden layer
num_classes = 46 # number of classes
batch_size = 32 #X_train.shape[0]
num_batches = int(np.ceil(X_train.shape[0] / batch_size))
learning_rate = 1
epochs = 10

In [81]:
# initialize parameters 
np.random.seed(0)

W1 = 0.01 * np.random.randn(n, h)
b1 = np.zeros((1, h))
W2 = 0.01 * np.random.randn(h, num_classes)
b2 = np.zeros((1, num_classes))

In [82]:
# batch gradient descent 

for i in range(epochs):
    
    for j in range(num_batches):
        
        X_batch = X_train[j*batch_size : (j+1)*batch_size:, :]
        y_batch = y_train[j*batch_size : (j+1)*batch_size:]
        
        # forward propogation
        A1 = relu_activation(X_batch, W1, b1) 
        A2 = np.dot(A1, W2) + b2
        probs = softmax(A2)  

        # cross entropy loss 
        target_logprob = -np.log(probs[range(len(y_batch)), y_batch])
        loss = np.sum(target_logprob) / batch_size

        # backprop classification probs
        d2 = probs
        d2[range(len(y_batch)), y_batch] -= 1
        d2 /= batch_size

        dW2 = np.dot(A1.T, d2)
        db2 = np.sum(d2, axis=0, keepdims=True)

        # backprop into hidden layer
        d1 = np.dot(d2, W2.T)
        d1[A1 <= 0] = 0

        dW1 = np.dot(X_batch.T, d1)
        db1 = np.sum(d1, axis=0, keepdims=True)

        # update weights
        W1 -= learning_rate * dW1
        b1 -= learning_rate * db1
        W2 -= learning_rate * dW2
        b2 -= learning_rate * db2
        
    print("epoch {0}: loss {1}".format(i, loss)) 
    # evaluate training set accuracy
    evaluate_accuracy(y_train, X_train, W1, b1, W2, b2)

epoch 0: loss 0.9684661784874473
prediction accuracy: 72.44% 

epoch 1: loss 0.8889367327046472
prediction accuracy: 80.71% 

epoch 2: loss 0.423098092830655
prediction accuracy: 83.22% 

epoch 3: loss 0.44830396821304547
prediction accuracy: 83.81% 

epoch 4: loss 0.3772117712930138
prediction accuracy: 85.75% 

epoch 5: loss 0.5047915356294725
prediction accuracy: 89.74% 

epoch 6: loss 0.38096821102222045
prediction accuracy: 90.84% 

epoch 7: loss 0.3360112797655889
prediction accuracy: 89.89% 

epoch 8: loss 0.35608725570136124
prediction accuracy: 91.77% 

epoch 9: loss 0.3558634495363089
prediction accuracy: 89.71% 



In [83]:
# evaluate test set accuracy
evaluate_accuracy(y_test, X_test, W1, b1, W2, b2)

prediction accuracy: 73.86% 

