In this assignment we will design a neural net language model. The model will learn to predict the next word given the previous three words. The network looks like this:

![Network](misc/proga2.png)

The starter code implements a basic framework for training neural nets with mini-batch gradient descent. Your job is to **write code to complete the implementation of forward and back propagation**. See the README file for a description of the dataset, starter code and how to run it.

Softmax output
$$
    y_i = \frac{e^{z_i}}{\sum_{j}e^{z_j}}
$$

Implies,
$$
    \frac{\partial y_i}{\partial z_i} = y_i (1 - y_i)
$$

We denote the **cross-entropy** cost function $C$ as:
$$
    C = -\sum_j t_j \log y_j
$$
With partial w.r.t. output $i$ ($z_i$) as
$$
    \frac{\partial C}{\partial z_i} = \frac{\partial C}{\partial y_i} \frac{\partial y_i}{\partial z_i} = y_i - t_i
$$

In the backpropagation algorithm, we want to compute the change (partial derivative) in the **cost function w.r.t. the weights at each node**, for every layer in the neural network.

$$
    \delta := \frac{\partial C}{\partial w_{ij}} = \frac{\partial C}{\partial z_i} \frac{\partial z_i}{\partial w_{ij}}
$$

In [1]:
load programming_assignment_2/data.mat
fieldnames(data)

ans = 
{
  [1,1] = testData
  [2,1] = trainData
  [3,1] = validData
  [4,1] = vocab
}


`data.vocab` contains the vocabulary of 250 words. Training, validation and
test sets are in `data.trainData`, `data.validData` and `data.testData`  respectively.

In [5]:
data.vocab(1:10)

ans = 
{
  [1,1] = all
  [1,2] = set
  [1,3] = just
  [1,4] = show
  [1,5] = being
  [1,6] = money
  [1,7] = over
  [1,8] = both
  [1,9] = years
  [1,10] = four
}


In [6]:
data.testData = data.testData';
data.validData = data.validData';
data.trainData = data.trainData';
size(data.trainData)

ans =

   372550        4



In [7]:
data.trainData(1:10, 1:4)

ans =

   28   26   90  144
  184   44  249  117
  183   32   76  122
  117  247  201  186
  223  190  249    6
   42   74   26   32
  242   32  223   32
  223   32  158  144
   74   32  221   32
   42  192   91   68



In [8]:
% --2 column, all rows--
data.vocab([184, 44, 249, 117])

ans = 
{
  [1,1] = were
  [1,2] = not
  [1,3] = the
  [1,4] = first
}


'data.trainData' is a matrix of 372550 X 4. This means there are 372550
training cases and 4 words per training case. Each entry is an integer that is
the index of a word in the vocabulary. So each row represents a sequence of 4
words. 'data.validData' and 'data.testData' are also similar. They contain
46,568 4-grams each. **All three need to be separated into inputs and targets
and the training set needs to be split into mini-batches**. The file load_data.m provides code for doing that.

In [9]:
addpath("programming_assignment_2/")

In [10]:
[train_x, train_t, valid_x, valid_t, test_x, test_t, vocab] = load_data(100);

    load_data at line 16 column 1


In [13]:
%Output with the initial code
model = train(1);

    load_data at line 16 column 1
    train at line 32 column 30
Epoch 1
Batch 100 Train CE 4.601
Batch 200 Train CE 4.498
Batch 300 Train CE 4.467
Batch 400 Train CE 4.476
Batch 500 Train CE 4.448
Batch 600 Train CE 4.493
Batch 700 Train CE 4.459
Batch 800 Train CE 4.421
Batch 900 Train CE 4.464
Batch 1000 Train CE 4.428
Running validation ... Validation CE 4.413
Batch 1100 Train CE 4.336
Batch 1200 Train CE 4.274
Batch 1300 Train CE 4.201
Batch 1400 Train CE 4.153
Batch 1500 Train CE 4.104
Batch 1600 Train CE 4.070
Batch 1700 Train CE 4.040
Batch 1800 Train CE 4.019
Batch 1900 Train CE 3.981
Batch 2000 Train CE 3.918
Running validation ... Validation CE 3.946
Batch 2100 Train CE 3.919
Batch 2200 Train CE 3.870
Batch 2300 Train CE 3.824
Batch 2400 Train CE 3.761
Batch 2500 Train CE 3.717
Batch 2600 Train CE 3.706
Batch 2700 Train CE 3.660
Batch 2800 Train CE 3.594
Batch 2900 Train CE 3.557
Batch 3000 Train CE 3.485
Running validation ... Validation CE 3.487
Batch 3100 Train CE 3.451
B

1) Train a model with 50 dimensional embedding space, 200 dimensional hidden layer and default setting of all other hyperparameters. What is average training set cross entropy as reported by the training program after 10 epochs ? Please provide a numeric answer (three decimal places).

In [14]:
model = train(10);

    load_data at line 16 column 1
    train at line 32 column 30
Epoch 1
Batch 100 Train CE 4.603
Batch 200 Train CE 4.498
Batch 300 Train CE 4.467
Batch 400 Train CE 4.476
Batch 500 Train CE 4.448
Batch 600 Train CE 4.493
Batch 700 Train CE 4.459
Batch 800 Train CE 4.422
Batch 900 Train CE 4.468
Batch 1000 Train CE 4.444
Running validation ... Validation CE 4.444
Batch 1100 Train CE 4.377
Batch 1200 Train CE 4.325
Batch 1300 Train CE 4.248
Batch 1400 Train CE 4.183
Batch 1500 Train CE 4.118
Batch 1600 Train CE 4.084
Batch 1700 Train CE 4.053
Batch 1800 Train CE 4.034
Batch 1900 Train CE 4.000
Batch 2000 Train CE 3.941
Running validation ... Validation CE 3.972
Batch 2100 Train CE 3.947
Batch 2200 Train CE 3.897
Batch 2300 Train CE 3.848
Batch 2400 Train CE 3.775
Batch 2500 Train CE 3.724
Batch 2600 Train CE 3.711
Batch 2700 Train CE 3.664
Batch 2800 Train CE 3.602
Batch 2900 Train CE 3.567
Batch 3000 Train CE 3.494
Running validation ... Validation CE 3.499
Batch 3100 Train CE 3.463
B

Batch 700 Train CE 2.606
Batch 800 Train CE 2.584
Batch 900 Train CE 2.602
Batch 1000 Train CE 2.604
Running validation ... Validation CE 2.643
Batch 1100 Train CE 2.568
Batch 1200 Train CE 2.597
Batch 1300 Train CE 2.570
Batch 1400 Train CE 2.615
Batch 1500 Train CE 2.622
Batch 1600 Train CE 2.588
Batch 1700 Train CE 2.582
Batch 1800 Train CE 2.620
Batch 1900 Train CE 2.581
Batch 2000 Train CE 2.576
Running validation ... Validation CE 2.654
Batch 2100 Train CE 2.599
Batch 2200 Train CE 2.582
Batch 2300 Train CE 2.574
Batch 2400 Train CE 2.588
Batch 2500 Train CE 2.587
Batch 2600 Train CE 2.607
Batch 2700 Train CE 2.629
Batch 2800 Train CE 2.583
Batch 2900 Train CE 2.594
Batch 3000 Train CE 2.556
Running validation ... Validation CE 2.633
Batch 3100 Train CE 2.578
Batch 3200 Train CE 2.557
Batch 3300 Train CE 2.573
Batch 3400 Train CE 2.597
Batch 3500 Train CE 2.592
Batch 3600 Train CE 2.578
Batch 3700 Train CE 2.558
Average Training CE 2.590
Epoch 9
Batch 100 Train CE 2.047
Batch 200

2) Train a model for 10 epochs with a 50 dimensional embedding space, 200 dimensional hidden layer, a learning rate of 100.0 and default setting of all other hyperparameters. What do you observe ?

In [15]:
% learning rate set to 100 (learning_rate = 100;)
model = train(10);

    load_data at line 16 column 1
    train at line 32 column 30
Epoch 1
Batch 100 Train CE 26.962
Batch 200 Train CE 27.618
Batch 300 Train CE 27.437
Batch 400 Train CE 27.361
Batch 500 Train CE 27.119
Batch 600 Train CE 27.291
Batch 700 Train CE 26.967
Batch 800 Train CE 27.203
Batch 900 Train CE 27.083
Batch 1000 Train CE 26.960
Running validation ... Validation CE 28.448
Batch 1100 Train CE 27.141
Batch 1200 Train CE 26.972
Batch 1300 Train CE 26.989
Batch 1400 Train CE 27.221
Batch 1500 Train CE 26.922
Batch 1600 Train CE 27.044
Batch 1700 Train CE 26.877
Batch 1800 Train CE 27.045
Batch 1900 Train CE 26.980
Batch 2000 Train CE 26.934
Running validation ... Validation CE 27.802
Batch 2100 Train CE 26.957
Batch 2200 Train CE 26.901
Batch 2300 Train CE 26.874
Batch 2400 Train CE 26.763
Batch 2500 Train CE 27.166
Batch 2600 Train CE 27.339
Batch 2700 Train CE 27.051
Batch 2800 Train CE 26.902
Batch 2900 Train CE 26.816
Batch 3000 Train CE 26.721
Running validation ... Validation CE 2

Batch 3400 Train CE 26.667
Batch 3500 Train CE 26.602
Batch 3600 Train CE 26.567
Batch 3700 Train CE 26.633
Average Training CE 26.693
Epoch 8
Batch 100 Train CE 21.362
Batch 200 Train CE 26.691
Batch 300 Train CE 26.671
Batch 400 Train CE 26.624
Batch 500 Train CE 26.460
Batch 600 Train CE 26.811
Batch 700 Train CE 26.691
Batch 800 Train CE 26.687
Batch 900 Train CE 26.633
Batch 1000 Train CE 26.715
Running validation ... Validation CE 26.175
Batch 1100 Train CE 26.720
Batch 1200 Train CE 26.594
Batch 1300 Train CE 26.516
Batch 1400 Train CE 26.875
Batch 1500 Train CE 26.614
Batch 1600 Train CE 26.799
Batch 1700 Train CE 26.564
Batch 1800 Train CE 26.751
Batch 1900 Train CE 26.849
Batch 2000 Train CE 26.661
Running validation ... Validation CE 25.106
Batch 2100 Train CE 26.755
Batch 2200 Train CE 26.549
Batch 2300 Train CE 26.632
Batch 2400 Train CE 26.801
Batch 2500 Train CE 26.626
Batch 2600 Train CE 26.931
Batch 2700 Train CE 26.613
Batch 2800 Train CE 26.597
Batch 2900 Train CE 26

3) If all weights and biases in this network were set to zero and no training is performed, what will be the average cross entropy on the training set ? Please provide a numeric answer (three decimal places).

In [17]:
% learning rate set to 0 (learning_rate = 0;)
model = train(1);

    load_data at line 16 column 1
    train at line 32 column 30
Epoch 1
Batch 100 Train CE 5.497
Batch 200 Train CE 5.497
Batch 300 Train CE 5.496
Batch 400 Train CE 5.497
Batch 500 Train CE 5.497
Batch 600 Train CE 5.498
Batch 700 Train CE 5.498
Batch 800 Train CE 5.496
Batch 900 Train CE 5.497
Batch 1000 Train CE 5.497
Running validation ... Validation CE 5.497
Batch 1100 Train CE 5.497
Batch 1200 Train CE 5.498
Batch 1300 Train CE 5.498
Batch 1400 Train CE 5.499
Batch 1500 Train CE 5.497
Batch 1600 Train CE 5.497
Batch 1700 Train CE 5.498
Batch 1800 Train CE 5.497
Batch 1900 Train CE 5.498
Batch 2000 Train CE 5.497
Running validation ... Validation CE 5.497
Batch 2100 Train CE 5.498
Batch 2200 Train CE 5.497
Batch 2300 Train CE 5.496
Batch 2400 Train CE 5.496
Batch 2500 Train CE 5.497
Batch 2600 Train CE 5.499
Batch 2700 Train CE 5.498
Batch 2800 Train CE 5.497
Batch 2900 Train CE 5.497
Batch 3000 Train CE 5.498
Running validation ... Validation CE 5.497
Batch 3100 Train CE 5.497
B

4) Train three models each with 50 dimensional embedding space, 200 dimensional hidden layer.

* Model A: Learning rate = 0.001,
* Model B: Learning rate = 0.1
* Model C: Learning rate = 10.0.

Use a momentum of 0.5 and default settings for all other hyperparameters. Which model gives the lowest training set cross entropy after 1 epoch ? 

In [18]:
% learning rate set to 0.001 (learning_rate = 0.001)
model = train(1);

    load_data at line 16 column 1
    train at line 32 column 30
Epoch 1
Batch 100 Train CE 5.359
Batch 200 Train CE 5.054
Batch 300 Train CE 4.872
Batch 400 Train CE 4.804
Batch 500 Train CE 4.731
Batch 600 Train CE 4.726
Batch 700 Train CE 4.651
Batch 800 Train CE 4.621
Batch 900 Train CE 4.634
Batch 1000 Train CE 4.586
Running validation ... Validation CE 4.584
Batch 1100 Train CE 4.563
Batch 1200 Train CE 4.566
Batch 1300 Train CE 4.562
Batch 1400 Train CE 4.559
Batch 1500 Train CE 4.506
Batch 1600 Train CE 4.527
Batch 1700 Train CE 4.507
Batch 1800 Train CE 4.502
Batch 1900 Train CE 4.499
Batch 2000 Train CE 4.474
Running validation ... Validation CE 4.486
Batch 2100 Train CE 4.487
Batch 2200 Train CE 4.469
Batch 2300 Train CE 4.488
Batch 2400 Train CE 4.461
Batch 2500 Train CE 4.469
Batch 2600 Train CE 4.486
Batch 2700 Train CE 4.479
Batch 2800 Train CE 4.457
Batch 2900 Train CE 4.449
Batch 3000 Train CE 4.432
Running validation ... Validation CE 4.447
Batch 3100 Train CE 4.432
B

In [19]:
% learning rate set to 0.1 (learning_rate = 0.1)
model = train(1);

    load_data at line 16 column 1
    train at line 32 column 30
Epoch 1
Batch 100 Train CE 4.522
Batch 200 Train CE 4.425
Batch 300 Train CE 4.394
Batch 400 Train CE 4.405
Batch 500 Train CE 4.377
Batch 600 Train CE 4.426
Batch 700 Train CE 4.383
Batch 800 Train CE 4.372
Batch 900 Train CE 4.429
Batch 1000 Train CE 4.399
Running validation ... Validation CE 4.402
Batch 1100 Train CE 4.394
Batch 1200 Train CE 4.418
Batch 1300 Train CE 4.415
Batch 1400 Train CE 4.424
Batch 1500 Train CE 4.399
Batch 1600 Train CE 4.412
Batch 1700 Train CE 4.399
Batch 1800 Train CE 4.402
Batch 1900 Train CE 4.407
Batch 2000 Train CE 4.385
Running validation ... Validation CE 4.405
Batch 2100 Train CE 4.410
Batch 2200 Train CE 4.401
Batch 2300 Train CE 4.413
Batch 2400 Train CE 4.396
Batch 2500 Train CE 4.406
Batch 2600 Train CE 4.430
Batch 2700 Train CE 4.433
Batch 2800 Train CE 4.402
Batch 2900 Train CE 4.398
Batch 3000 Train CE 4.380
Running validation ... Validation CE 4.408
Batch 3100 Train CE 4.395
B

In [20]:
% learning rate set to 10 (learning_rate = 10)
model = train(1);

    load_data at line 16 column 1
    train at line 32 column 30
Epoch 1
Batch 100 Train CE 5.038
Batch 200 Train CE 4.335
Batch 300 Train CE 4.286
Batch 400 Train CE 4.285
Batch 500 Train CE 4.155
Batch 600 Train CE 4.077
Batch 700 Train CE 3.931
Batch 800 Train CE 3.842
Batch 900 Train CE 3.832
Batch 1000 Train CE 3.756
Running validation ... Validation CE 3.726
Batch 1100 Train CE 3.672
Batch 1200 Train CE 3.647
Batch 1300 Train CE 3.598
Batch 1400 Train CE 3.594
Batch 1500 Train CE 3.543
Batch 1600 Train CE 3.521
Batch 1700 Train CE 3.494
Batch 1800 Train CE 3.510
Batch 1900 Train CE 3.472
Batch 2000 Train CE 3.455
Running validation ... Validation CE 3.477
Batch 2100 Train CE 3.480
Batch 2200 Train CE 3.486
Batch 2300 Train CE 3.483
Batch 2400 Train CE 3.489
Batch 2500 Train CE 3.466
Batch 2600 Train CE 3.503
Batch 2700 Train CE 3.526
Batch 2800 Train CE 3.466
Batch 2900 Train CE 3.484
Batch 3000 Train CE 3.395
Running validation ... Validation CE 3.411
Batch 3100 Train CE 3.395
B

5) Train three models each with 50 dimensional embedding space, 200 dimensional hidden layer.

* Model A: Learning rate = 0.001,
* Model B: Learning rate = 0.1
* Model C: Learning rate = 10.0.

Use a momentum of 0.5 and default settings for all other hyperparameters. Which model gives the lowest training set cross entropy after 10 epoch ? 

In [21]:
% learning rate set to 0.001 (learning_rate = 0.001)
model = train(10);

    load_data at line 16 column 1
    train at line 32 column 30
Epoch 1
Batch 100 Train CE 5.366
Batch 200 Train CE 5.059
Batch 300 Train CE 4.873
Batch 400 Train CE 4.804
Batch 500 Train CE 4.730
Batch 600 Train CE 4.726
Batch 700 Train CE 4.651
Batch 800 Train CE 4.620
Batch 900 Train CE 4.634
Batch 1000 Train CE 4.586
Running validation ... Validation CE 4.584
Batch 1100 Train CE 4.563
Batch 1200 Train CE 4.566
Batch 1300 Train CE 4.562
Batch 1400 Train CE 4.559
Batch 1500 Train CE 4.506
Batch 1600 Train CE 4.527
Batch 1700 Train CE 4.507
Batch 1800 Train CE 4.502
Batch 1900 Train CE 4.499
Batch 2000 Train CE 4.474
Running validation ... Validation CE 4.486
Batch 2100 Train CE 4.487
Batch 2200 Train CE 4.470
Batch 2300 Train CE 4.487
Batch 2400 Train CE 4.461
Batch 2500 Train CE 4.469
Batch 2600 Train CE 4.486
Batch 2700 Train CE 4.479
Batch 2800 Train CE 4.457
Batch 2900 Train CE 4.449
Batch 3000 Train CE 4.432
Running validation ... Validation CE 4.448
Batch 3100 Train CE 4.432
B

Batch 700 Train CE 4.359
Batch 800 Train CE 4.351
Batch 900 Train CE 4.404
Batch 1000 Train CE 4.376
Running validation ... Validation CE 4.383
Batch 1100 Train CE 4.371
Batch 1200 Train CE 4.395
Batch 1300 Train CE 4.394
Batch 1400 Train CE 4.405
Batch 1500 Train CE 4.375
Batch 1600 Train CE 4.391
Batch 1700 Train CE 4.377
Batch 1800 Train CE 4.380
Batch 1900 Train CE 4.386
Batch 2000 Train CE 4.365
Running validation ... Validation CE 4.383
Batch 2100 Train CE 4.390
Batch 2200 Train CE 4.381
Batch 2300 Train CE 4.390
Batch 2400 Train CE 4.373
Batch 2500 Train CE 4.385
Batch 2600 Train CE 4.405
Batch 2700 Train CE 4.410
Batch 2800 Train CE 4.383
Batch 2900 Train CE 4.380
Batch 3000 Train CE 4.362
Running validation ... Validation CE 4.383
Batch 3100 Train CE 4.372
Batch 3200 Train CE 4.376
Batch 3300 Train CE 4.355
Batch 3400 Train CE 4.381
Batch 3500 Train CE 4.368
Batch 3600 Train CE 4.382
Batch 3700 Train CE 4.363
Average Training CE 4.381
Epoch 9
Batch 100 Train CE 3.525
Batch 200

In [24]:
% learning rate set to 0.1 (learning_rate = 0.1)
model = train(10);

    load_data at line 16 column 1
    train at line 32 column 30
Epoch 1
Batch 100 Train CE 4.522
Batch 200 Train CE 4.425
Batch 300 Train CE 4.394
Batch 400 Train CE 4.405
Batch 500 Train CE 4.377
Batch 600 Train CE 4.427
Batch 700 Train CE 4.383
Batch 800 Train CE 4.372
Batch 900 Train CE 4.429
Batch 1000 Train CE 4.399
Running validation ... Validation CE 4.402
Batch 1100 Train CE 4.394
Batch 1200 Train CE 4.418
Batch 1300 Train CE 4.415
Batch 1400 Train CE 4.424
Batch 1500 Train CE 4.399
Batch 1600 Train CE 4.412
Batch 1700 Train CE 4.399
Batch 1800 Train CE 4.402
Batch 1900 Train CE 4.407
Batch 2000 Train CE 4.385
Running validation ... Validation CE 4.405
Batch 2100 Train CE 4.410
Batch 2200 Train CE 4.401
Batch 2300 Train CE 4.413
Batch 2400 Train CE 4.396
Batch 2500 Train CE 4.406
Batch 2600 Train CE 4.430
Batch 2700 Train CE 4.433
Batch 2800 Train CE 4.402
Batch 2900 Train CE 4.399
Batch 3000 Train CE 4.381
Running validation ... Validation CE 4.408
Batch 3100 Train CE 4.395
B

Batch 700 Train CE 3.073
Batch 800 Train CE 3.028
Batch 900 Train CE 3.062
Batch 1000 Train CE 3.062
Running validation ... Validation CE 3.058
Batch 1100 Train CE 3.028
Batch 1200 Train CE 3.050
Batch 1300 Train CE 3.021
Batch 1400 Train CE 3.080
Batch 1500 Train CE 3.051
Batch 1600 Train CE 3.049
Batch 1700 Train CE 3.029
Batch 1800 Train CE 3.061
Batch 1900 Train CE 3.042
Batch 2000 Train CE 3.014
Running validation ... Validation CE 3.056
Batch 2100 Train CE 3.038
Batch 2200 Train CE 3.046
Batch 2300 Train CE 3.025
Batch 2400 Train CE 3.035
Batch 2500 Train CE 3.029
Batch 2600 Train CE 3.056
Batch 2700 Train CE 3.072
Batch 2800 Train CE 3.037
Batch 2900 Train CE 3.046
Batch 3000 Train CE 3.008
Running validation ... Validation CE 3.027
Batch 3100 Train CE 3.011
Batch 3200 Train CE 3.000
Batch 3300 Train CE 2.990
Batch 3400 Train CE 3.026
Batch 3500 Train CE 3.005
Batch 3600 Train CE 3.025
Batch 3700 Train CE 2.988
Average Training CE 3.040
Epoch 9
Batch 100 Train CE 2.403
Batch 200

In [25]:
% lerning rate set to 10 (learning_rate = 10)
model = train(10);

    load_data at line 16 column 1
    train at line 32 column 30
Epoch 1
Batch 100 Train CE 5.027
Batch 200 Train CE 4.329
Batch 300 Train CE 4.284
Batch 400 Train CE 4.284
Batch 500 Train CE 4.166
Batch 600 Train CE 4.065
Batch 700 Train CE 3.896
Batch 800 Train CE 3.816
Batch 900 Train CE 3.824
Batch 1000 Train CE 3.770
Running validation ... Validation CE 3.733
Batch 1100 Train CE 3.705
Batch 1200 Train CE 3.681
Batch 1300 Train CE 3.633
Batch 1400 Train CE 3.607
Batch 1500 Train CE 3.543
Batch 1600 Train CE 3.538
Batch 1700 Train CE 3.501
Batch 1800 Train CE 3.539
Batch 1900 Train CE 3.485
Batch 2000 Train CE 3.474
Running validation ... Validation CE 3.522
Batch 2100 Train CE 3.465
Batch 2200 Train CE 3.504
Batch 2300 Train CE 3.481
Batch 2400 Train CE 3.476
Batch 2500 Train CE 3.486
Batch 2600 Train CE 3.501
Batch 2700 Train CE 3.493
Batch 2800 Train CE 3.448
Batch 2900 Train CE 3.458
Batch 3000 Train CE 3.408
Running validation ... Validation CE 3.457
Batch 3100 Train CE 3.434
B

Batch 700 Train CE 3.338
Batch 800 Train CE 3.317
Batch 900 Train CE 3.350
Batch 1000 Train CE 3.332
Running validation ... Validation CE 3.342
Batch 1100 Train CE 3.301
Batch 1200 Train CE 3.332
Batch 1300 Train CE 3.318
Batch 1400 Train CE 3.353
Batch 1500 Train CE 3.366
Batch 1600 Train CE 3.348
Batch 1700 Train CE 3.322
Batch 1800 Train CE 3.386
Batch 1900 Train CE 3.321
Batch 2000 Train CE 3.336
Running validation ... Validation CE 3.357
Batch 2100 Train CE 3.320
Batch 2200 Train CE 3.358
Batch 2300 Train CE 3.329
Batch 2400 Train CE 3.343
Batch 2500 Train CE 3.340
Batch 2600 Train CE 3.376
Batch 2700 Train CE 3.390
Batch 2800 Train CE 3.346
Batch 2900 Train CE 3.355
Batch 3000 Train CE 3.314
Running validation ... Validation CE 3.365
Batch 3100 Train CE 3.318
Batch 3200 Train CE 3.306
Batch 3300 Train CE 3.311
Batch 3400 Train CE 3.341
Batch 3500 Train CE 3.335
Batch 3600 Train CE 3.333
Batch 3700 Train CE 3.289
Average Training CE 3.334
Epoch 9
Batch 100 Train CE 2.643
Batch 200

Train each of following models:

* Model A: 5 dimensional embedding, 100 dimensional hidden layer
* Model B: 50 dimensional embedding, 10 dimensional hidden layer
* Model C: 50 dimensional embedding, 200 dimensional hidden layer
* Model D: 100 dimensional embedding, 5 dimensional hidden layer

Use default values for all other hyperparameters.

Which model gives the best training set cross entropy after 10 epochs of training ? 

In [26]:
% Model A: numhid1 = 5 & numhid2 = 100
model = train(10);

    load_data at line 16 column 1
    train at line 32 column 30
Epoch 1
Batch 100 Train CE 4.546
Batch 200 Train CE 4.457
Batch 300 Train CE 4.422
Batch 400 Train CE 4.432
Batch 500 Train CE 4.401
Batch 600 Train CE 4.446
Batch 700 Train CE 4.406
Batch 800 Train CE 4.386
Batch 900 Train CE 4.432
Batch 1000 Train CE 4.402
Running validation ... Validation CE 4.393
Batch 1100 Train CE 4.360
Batch 1200 Train CE 4.328
Batch 1300 Train CE 4.283
Batch 1400 Train CE 4.247
Batch 1500 Train CE 4.168
Batch 1600 Train CE 4.132
Batch 1700 Train CE 4.077
Batch 1800 Train CE 4.057
Batch 1900 Train CE 4.038
Batch 2000 Train CE 3.990
Running validation ... Validation CE 4.037
Batch 2100 Train CE 4.006
Batch 2200 Train CE 3.978
Batch 2300 Train CE 3.953
Batch 2400 Train CE 3.894
Batch 2500 Train CE 3.866
Batch 2600 Train CE 3.860
Batch 2700 Train CE 3.816
Batch 2800 Train CE 3.744
Batch 2900 Train CE 3.700
Batch 3000 Train CE 3.627
Running validation ... Validation CE 3.632
Batch 3100 Train CE 3.593
B

Batch 700 Train CE 2.854
Batch 800 Train CE 2.814
Batch 900 Train CE 2.847
Batch 1000 Train CE 2.852
Running validation ... Validation CE 2.856
Batch 1100 Train CE 2.820
Batch 1200 Train CE 2.835
Batch 1300 Train CE 2.805
Batch 1400 Train CE 2.855
Batch 1500 Train CE 2.848
Batch 1600 Train CE 2.840
Batch 1700 Train CE 2.826
Batch 1800 Train CE 2.858
Batch 1900 Train CE 2.829
Batch 2000 Train CE 2.812
Running validation ... Validation CE 2.886
Batch 2100 Train CE 2.841
Batch 2200 Train CE 2.840
Batch 2300 Train CE 2.817
Batch 2400 Train CE 2.834
Batch 2500 Train CE 2.825
Batch 2600 Train CE 2.852
Batch 2700 Train CE 2.881
Batch 2800 Train CE 2.820
Batch 2900 Train CE 2.850
Batch 3000 Train CE 2.809
Running validation ... Validation CE 2.851
Batch 3100 Train CE 2.838
Batch 3200 Train CE 2.809
Batch 3300 Train CE 2.824
Batch 3400 Train CE 2.843
Batch 3500 Train CE 2.837
Batch 3600 Train CE 2.839
Batch 3700 Train CE 2.804
Average Training CE 2.834
Epoch 9
Batch 100 Train CE 2.254
Batch 200

In [27]:
% Model B: numhid1 = 50 & numhid2 = 10
model = train(10);

    load_data at line 16 column 1
    train at line 32 column 30
Epoch 1
Batch 100 Train CE 4.665
Batch 200 Train CE 4.427
Batch 300 Train CE 4.386
Batch 400 Train CE 4.393
Batch 500 Train CE 4.353
Batch 600 Train CE 4.365
Batch 700 Train CE 4.269
Batch 800 Train CE 4.223
Batch 900 Train CE 4.253
Batch 1000 Train CE 4.197
Running validation ... Validation CE 4.195
Batch 1100 Train CE 4.159
Batch 1200 Train CE 4.175
Batch 1300 Train CE 4.165
Batch 1400 Train CE 4.161
Batch 1500 Train CE 4.106
Batch 1600 Train CE 4.107
Batch 1700 Train CE 4.081
Batch 1800 Train CE 4.077
Batch 1900 Train CE 4.058
Batch 2000 Train CE 4.015
Running validation ... Validation CE 4.050
Batch 2100 Train CE 4.040
Batch 2200 Train CE 4.011
Batch 2300 Train CE 4.003
Batch 2400 Train CE 3.962
Batch 2500 Train CE 3.941
Batch 2600 Train CE 3.945
Batch 2700 Train CE 3.914
Batch 2800 Train CE 3.853
Batch 2900 Train CE 3.830
Batch 3000 Train CE 3.750
Running validation ... Validation CE 3.767
Batch 3100 Train CE 3.732
B

Batch 700 Train CE 3.032
Batch 800 Train CE 3.013
Batch 900 Train CE 3.040
Batch 1000 Train CE 3.036
Running validation ... Validation CE 3.033
Batch 1100 Train CE 2.997
Batch 1200 Train CE 3.035
Batch 1300 Train CE 3.002
Batch 1400 Train CE 3.062
Batch 1500 Train CE 3.034
Batch 1600 Train CE 3.023
Batch 1700 Train CE 3.009
Batch 1800 Train CE 3.048
Batch 1900 Train CE 3.011
Batch 2000 Train CE 2.999
Running validation ... Validation CE 3.038
Batch 2100 Train CE 3.031
Batch 2200 Train CE 3.042
Batch 2300 Train CE 3.005
Batch 2400 Train CE 3.025
Batch 2500 Train CE 3.033
Batch 2600 Train CE 3.048
Batch 2700 Train CE 3.062
Batch 2800 Train CE 3.026
Batch 2900 Train CE 3.039
Batch 3000 Train CE 3.000
Running validation ... Validation CE 3.035
Batch 3100 Train CE 3.015
Batch 3200 Train CE 2.996
Batch 3300 Train CE 2.998
Batch 3400 Train CE 3.039
Batch 3500 Train CE 3.017
Batch 3600 Train CE 3.019
Batch 3700 Train CE 2.998
Average Training CE 3.024
Epoch 9
Batch 100 Train CE 2.415
Batch 200

In [28]:
% Model C: numhid1 = 50 & numhid2 = 200
model = train(10);

    load_data at line 16 column 1
    train at line 32 column 30
Epoch 1
Batch 100 Train CE 4.602
Batch 200 Train CE 4.498
Batch 300 Train CE 4.467
Batch 400 Train CE 4.476
Batch 500 Train CE 4.448
Batch 600 Train CE 4.492
Batch 700 Train CE 4.458
Batch 800 Train CE 4.420
Batch 900 Train CE 4.461
Batch 1000 Train CE 4.419
Running validation ... Validation CE 4.396
Batch 1100 Train CE 4.319
Batch 1200 Train CE 4.257
Batch 1300 Train CE 4.191
Batch 1400 Train CE 4.148
Batch 1500 Train CE 4.101
Batch 1600 Train CE 4.068
Batch 1700 Train CE 4.038
Batch 1800 Train CE 4.017
Batch 1900 Train CE 3.979
Batch 2000 Train CE 3.917
Running validation ... Validation CE 3.947
Batch 2100 Train CE 3.921
Batch 2200 Train CE 3.872
Batch 2300 Train CE 3.825
Batch 2400 Train CE 3.759
Batch 2500 Train CE 3.716
Batch 2600 Train CE 3.704
Batch 2700 Train CE 3.660
Batch 2800 Train CE 3.598
Batch 2900 Train CE 3.558
Batch 3000 Train CE 3.482
Running validation ... Validation CE 3.484
Batch 3100 Train CE 3.450
B

Batch 700 Train CE 2.605
Batch 800 Train CE 2.579
Batch 900 Train CE 2.605
Batch 1000 Train CE 2.599
Running validation ... Validation CE 2.645
Batch 1100 Train CE 2.571
Batch 1200 Train CE 2.594
Batch 1300 Train CE 2.561
Batch 1400 Train CE 2.610
Batch 1500 Train CE 2.607
Batch 1600 Train CE 2.581
Batch 1700 Train CE 2.582
Batch 1800 Train CE 2.620
Batch 1900 Train CE 2.580
Batch 2000 Train CE 2.573
Running validation ... Validation CE 2.660
Batch 2100 Train CE 2.602
Batch 2200 Train CE 2.586
Batch 2300 Train CE 2.573
Batch 2400 Train CE 2.590
Batch 2500 Train CE 2.583
Batch 2600 Train CE 2.606
Batch 2700 Train CE 2.618
Batch 2800 Train CE 2.581
Batch 2900 Train CE 2.586
Batch 3000 Train CE 2.551
Running validation ... Validation CE 2.636
Batch 3100 Train CE 2.579
Batch 3200 Train CE 2.546
Batch 3300 Train CE 2.570
Batch 3400 Train CE 2.597
Batch 3500 Train CE 2.594
Batch 3600 Train CE 2.573
Batch 3700 Train CE 2.555
Average Training CE 2.587
Epoch 9
Batch 100 Train CE 2.045
Batch 200

In [29]:
% Model D: numhid1 = 100 & numhid2 = 5
model = train(10);

    load_data at line 16 column 1
    train at line 32 column 30
Epoch 1
Batch 100 Train CE 4.720
Batch 200 Train CE 4.438
Batch 300 Train CE 4.389
Batch 400 Train CE 4.392
Batch 500 Train CE 4.348
Batch 600 Train CE 4.360
Batch 700 Train CE 4.264
Batch 800 Train CE 4.220
Batch 900 Train CE 4.247
Batch 1000 Train CE 4.191
Running validation ... Validation CE 4.187
Batch 1100 Train CE 4.151
Batch 1200 Train CE 4.166
Batch 1300 Train CE 4.156
Batch 1400 Train CE 4.151
Batch 1500 Train CE 4.096
Batch 1600 Train CE 4.095
Batch 1700 Train CE 4.068
Batch 1800 Train CE 4.069
Batch 1900 Train CE 4.054
Batch 2000 Train CE 4.014
Running validation ... Validation CE 4.041
Batch 2100 Train CE 4.041
Batch 2200 Train CE 4.020
Batch 2300 Train CE 4.021
Batch 2400 Train CE 3.989
Batch 2500 Train CE 3.988
Batch 2600 Train CE 4.013
Batch 2700 Train CE 4.000
Batch 2800 Train CE 3.956
Batch 2900 Train CE 3.953
Batch 3000 Train CE 3.889
Running validation ... Validation CE 3.918
Batch 3100 Train CE 3.888
B

Batch 700 Train CE 3.232
Batch 800 Train CE 3.228
Batch 900 Train CE 3.233
Batch 1000 Train CE 3.251
Running validation ... Validation CE 3.240
Batch 1100 Train CE 3.218
Batch 1200 Train CE 3.233
Batch 1300 Train CE 3.219
Batch 1400 Train CE 3.277
Batch 1500 Train CE 3.242
Batch 1600 Train CE 3.235
Batch 1700 Train CE 3.211
Batch 1800 Train CE 3.248
Batch 1900 Train CE 3.230
Batch 2000 Train CE 3.212
Running validation ... Validation CE 3.244
Batch 2100 Train CE 3.237
Batch 2200 Train CE 3.240
Batch 2300 Train CE 3.210
Batch 2400 Train CE 3.225
Batch 2500 Train CE 3.227
Batch 2600 Train CE 3.254
Batch 2700 Train CE 3.270
Batch 2800 Train CE 3.234
Batch 2900 Train CE 3.262
Batch 3000 Train CE 3.216
Running validation ... Validation CE 3.246
Batch 3100 Train CE 3.212
Batch 3200 Train CE 3.215
Batch 3300 Train CE 3.195
Batch 3400 Train CE 3.239
Batch 3500 Train CE 3.217
Batch 3600 Train CE 3.237
Batch 3700 Train CE 3.198
Average Training CE 3.232
Epoch 9
Batch 100 Train CE 2.591
Batch 200

Train three models each with 50 dimensional embedding space, 200 dimensional hidden layer.

* Model A: Momentum = 0.0
* Model B: Momentum = 0.5
* Model C: Momentum = 0.9

Use the default settings for all other hyperparameters. Which model gives the lowest training set cross entropy after 5 epochs ?

In [30]:
% Model A: momentum = 0.0
model = train(5);

    load_data at line 16 column 1
    train at line 32 column 30
Epoch 1
Batch 100 Train CE 4.569
Batch 200 Train CE 4.427
Batch 300 Train CE 4.389
Batch 400 Train CE 4.397
Batch 500 Train CE 4.369
Batch 600 Train CE 4.417
Batch 700 Train CE 4.372
Batch 800 Train CE 4.362
Batch 900 Train CE 4.418
Batch 1000 Train CE 4.387
Running validation ... Validation CE 4.391
Batch 1100 Train CE 4.384
Batch 1200 Train CE 4.407
Batch 1300 Train CE 4.404
Batch 1400 Train CE 4.414
Batch 1500 Train CE 4.387
Batch 1600 Train CE 4.402
Batch 1700 Train CE 4.389
Batch 1800 Train CE 4.392
Batch 1900 Train CE 4.396
Batch 2000 Train CE 4.373
Running validation ... Validation CE 4.403
Batch 2100 Train CE 4.400
Batch 2200 Train CE 4.391
Batch 2300 Train CE 4.402
Batch 2400 Train CE 4.385
Batch 2500 Train CE 4.396
Batch 2600 Train CE 4.419
Batch 2700 Train CE 4.422
Batch 2800 Train CE 4.392
Batch 2900 Train CE 4.389
Batch 3000 Train CE 4.371
Running validation ... Validation CE 4.396
Batch 3100 Train CE 4.385
B

In [31]:
% Model B: momentum = 0.5
model = train(5);

    load_data at line 16 column 1
    train at line 32 column 30
Epoch 1
Batch 100 Train CE 4.523
Batch 200 Train CE 4.425
Batch 300 Train CE 4.394
Batch 400 Train CE 4.405
Batch 500 Train CE 4.377
Batch 600 Train CE 4.427
Batch 700 Train CE 4.383
Batch 800 Train CE 4.372
Batch 900 Train CE 4.429
Batch 1000 Train CE 4.399
Running validation ... Validation CE 4.402
Batch 1100 Train CE 4.394
Batch 1200 Train CE 4.418
Batch 1300 Train CE 4.415
Batch 1400 Train CE 4.425
Batch 1500 Train CE 4.399
Batch 1600 Train CE 4.412
Batch 1700 Train CE 4.399
Batch 1800 Train CE 4.402
Batch 1900 Train CE 4.407
Batch 2000 Train CE 4.385
Running validation ... Validation CE 4.405
Batch 2100 Train CE 4.410
Batch 2200 Train CE 4.401
Batch 2300 Train CE 4.413
Batch 2400 Train CE 4.396
Batch 2500 Train CE 4.406
Batch 2600 Train CE 4.431
Batch 2700 Train CE 4.433
Batch 2800 Train CE 4.402
Batch 2900 Train CE 4.399
Batch 3000 Train CE 4.381
Running validation ... Validation CE 4.408
Batch 3100 Train CE 4.395
B

In [32]:
% Model C: momentum = 0.9
model = train(5);

    load_data at line 16 column 1
    train at line 32 column 30
Epoch 1
Batch 100 Train CE 4.601
Batch 200 Train CE 4.498
Batch 300 Train CE 4.467
Batch 400 Train CE 4.476
Batch 500 Train CE 4.448
Batch 600 Train CE 4.493
Batch 700 Train CE 4.459
Batch 800 Train CE 4.421
Batch 900 Train CE 4.466
Batch 1000 Train CE 4.436
Running validation ... Validation CE 4.430
Batch 1100 Train CE 4.356
Batch 1200 Train CE 4.301
Batch 1300 Train CE 4.229
Batch 1400 Train CE 4.169
Batch 1500 Train CE 4.110
Batch 1600 Train CE 4.077
Batch 1700 Train CE 4.047
Batch 1800 Train CE 4.028
Batch 1900 Train CE 3.993
Batch 2000 Train CE 3.932
Running validation ... Validation CE 3.962
Batch 2100 Train CE 3.935
Batch 2200 Train CE 3.883
Batch 2300 Train CE 3.834
Batch 2400 Train CE 3.766
Batch 2500 Train CE 3.718
Batch 2600 Train CE 3.707
Batch 2700 Train CE 3.658
Batch 2800 Train CE 3.594
Batch 2900 Train CE 3.557
Batch 3000 Train CE 3.483
Running validation ... Validation CE 3.487
Batch 3100 Train CE 3.451
B

Train a model with 50 dimensional embedding layer and 200 dimensional hidden layer for 10 epochs. Use default values for all other hyperparameters.

Which words are among the 10 closest words to the word 'day'.

In [33]:
model = train(10);

    load_data at line 16 column 1
    train at line 32 column 30
Epoch 1
Batch 100 Train CE 4.603
Batch 200 Train CE 4.498
Batch 300 Train CE 4.467
Batch 400 Train CE 4.476
Batch 500 Train CE 4.448
Batch 600 Train CE 4.493
Batch 700 Train CE 4.458
Batch 800 Train CE 4.421
Batch 900 Train CE 4.464
Batch 1000 Train CE 4.432
Running validation ... Validation CE 4.422
Batch 1100 Train CE 4.347
Batch 1200 Train CE 4.289
Batch 1300 Train CE 4.217
Batch 1400 Train CE 4.163
Batch 1500 Train CE 4.110
Batch 1600 Train CE 4.077
Batch 1700 Train CE 4.047
Batch 1800 Train CE 4.027
Batch 1900 Train CE 3.991
Batch 2000 Train CE 3.929
Running validation ... Validation CE 3.959
Batch 2100 Train CE 3.932
Batch 2200 Train CE 3.881
Batch 2300 Train CE 3.834
Batch 2400 Train CE 3.768
Batch 2500 Train CE 3.722
Batch 2600 Train CE 3.711
Batch 2700 Train CE 3.664
Batch 2800 Train CE 3.602
Batch 2900 Train CE 3.566
Batch 3000 Train CE 3.493
Running validation ... Validation CE 3.497
Batch 3100 Train CE 3.461
B

Batch 700 Train CE 2.602
Batch 800 Train CE 2.583
Batch 900 Train CE 2.596
Batch 1000 Train CE 2.600
Running validation ... Validation CE 2.642
Batch 1100 Train CE 2.565
Batch 1200 Train CE 2.592
Batch 1300 Train CE 2.559
Batch 1400 Train CE 2.606
Batch 1500 Train CE 2.608
Batch 1600 Train CE 2.590
Batch 1700 Train CE 2.574
Batch 1800 Train CE 2.611
Batch 1900 Train CE 2.577
Batch 2000 Train CE 2.566
Running validation ... Validation CE 2.649
Batch 2100 Train CE 2.603
Batch 2200 Train CE 2.583
Batch 2300 Train CE 2.565
Batch 2400 Train CE 2.580
Batch 2500 Train CE 2.580
Batch 2600 Train CE 2.605
Batch 2700 Train CE 2.618
Batch 2800 Train CE 2.581
Batch 2900 Train CE 2.592
Batch 3000 Train CE 2.545
Running validation ... Validation CE 2.629
Batch 3100 Train CE 2.576
Batch 3200 Train CE 2.544
Batch 3300 Train CE 2.565
Batch 3400 Train CE 2.593
Batch 3500 Train CE 2.588
Batch 3600 Train CE 2.574
Batch 3700 Train CE 2.560
Average Training CE 2.584
Epoch 9
Batch 100 Train CE 2.044
Batch 200

In [36]:
fieldnames(model)

ans = 
{
  [1,1] = word_embedding_weights
  [2,1] = embed_to_hid_weights
  [3,1] = hid_to_output_weights
  [4,1] = hid_bias
  [5,1] = output_bias
  [6,1] = vocab
}


In [46]:
display_nearest_words("today", model, 10)

yesterday 1.62
ago 2.21
though 2.22
music 2.61
season 2.69
? 2.72
states 2.77
) 2.82
now 2.83
company 2.83


In the model trained in Question 10, why is the word 'percent' close to 'dr.' even though they have very different contexts and are not expected to be close in word embedding space? 

In [45]:
display_nearest_words("year", model, 10)

week 1.85
days 2.09
years 2.20
day 2.33
season 2.38
ago 2.44
yesterday 2.54
times 2.56
game 2.60
case 2.62
