<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Reading-the-data" data-toc-modified-id="Reading-the-data-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>Reading the data</a></span></li><li><span><a href="#Process-data-to-determine-number-of-categories" data-toc-modified-id="Process-data-to-determine-number-of-categories-2"><span class="toc-item-num">2&nbsp;&nbsp;</span>Process data to determine number of categories</a></span></li><li><span><a href="#Creating-a-number-of-features-x-number-of-samples-data-matrix" data-toc-modified-id="Creating-a-number-of-features-x-number-of-samples-data-matrix-3"><span class="toc-item-num">3&nbsp;&nbsp;</span>Creating a number of features x number of samples data matrix</a></span></li><li><span><a href="#Designing-a-machine-learning-algorithm-for-supervised-learning" data-toc-modified-id="Designing-a-machine-learning-algorithm-for-supervised-learning-4"><span class="toc-item-num">4&nbsp;&nbsp;</span>Designing a machine learning algorithm for supervised learning</a></span><ul class="toc-item"><li><span><a href="#Training-a-logistic-network-or-logistic-regression-model" data-toc-modified-id="Training-a-logistic-network-or-logistic-regression-model-4.1"><span class="toc-item-num">4.1&nbsp;&nbsp;</span>Training a logistic network or logistic regression model</a></span></li><li><span><a href="#Training-a-deep-learning-neural-network" data-toc-modified-id="Training-a-deep-learning-neural-network-4.2"><span class="toc-item-num">4.2&nbsp;&nbsp;</span>Training a deep learning neural network</a></span></li></ul></li><li><span><a href="#Creating-a-test-dataset-so-we-can-test-our-algorithm" data-toc-modified-id="Creating-a-test-dataset-so-we-can-test-our-algorithm-5"><span class="toc-item-num">5&nbsp;&nbsp;</span>Creating a test dataset so we can test our algorithm</a></span><ul class="toc-item"><li><span><a href="#Using-the-trained-algorithm-to-classify-a-new-label" data-toc-modified-id="Using-the-trained-algorithm-to-classify-a-new-label-5.1"><span class="toc-item-num">5.1&nbsp;&nbsp;</span>Using the trained algorithm to classify a new label</a></span></li><li><span><a href="#Determining-the-accuracy-of-the-algorithm-on-the-entire-dataset" data-toc-modified-id="Determining-the-accuracy-of-the-algorithm-on-the-entire-dataset-5.2"><span class="toc-item-num">5.2&nbsp;&nbsp;</span>Determining the accuracy of the algorithm on the <em>entire</em> dataset</a></span></li></ul></li><li><span><a href="#Examining-the-output-of-the-model-for-samples-that-incorrectly-classify" data-toc-modified-id="Examining-the-output-of-the-model-for-samples-that-incorrectly-classify-6"><span class="toc-item-num">6&nbsp;&nbsp;</span>Examining the output of the model for samples that incorrectly classify</a></span></li></ul></div>

# Reading the data

We will use the [`CSV.jl`](https://github.com/JuliaData/CSV.jl) package so we can load the data which is stored as a `.csv` file.

In [1]:
using DataFrames, CSV

We now load the data in the next code cell. 

In [2]:
wine_data = CSV.read("winetrain.csv") ## enter file name

Unnamed: 0_level_0,fixedacidity,volatileacidity,citricacid,residualsugar,chlorides,freesulfurdioxide
Unnamed: 0_level_1,Float64,Float64,Float64,Float64,Float64,Float64
1,7.4,0.7,0.0,1.9,0.076,11.0
2,6.2,0.25,0.54,7.0,0.046,58.0
3,6.4,0.16,0.42,1.0,0.036,29.0
4,7.3,0.69,0.32,2.2,0.069,35.0
5,7.9,0.37,0.23,1.8,0.077,23.0
6,6.8,0.34,0.44,6.6,0.052,28.0
7,8.6,0.47,0.47,2.4,0.074,7.0
8,6.0,0.28,0.27,2.3,0.051,23.0
9,6.6,0.24,0.27,15.8,0.035,46.0
10,6.4,0.29,0.18,15.0,0.04,21.0


**Question**: How many features are meaured for each wine? How many columns are returned? Why are we not counting the leftmost column?  

12 features are measured for each wine. 13 columns are returns. Because the leftmost column is the order of the wine we choose which contains no relative information. 

This dataset was downloaded from  [here](https://archive.ics.uci.edu/ml/datasets/wine+quality)

We are using the first 11 of the 12 attributes listed. We are not using the last column because it contains the label we are trying to predict.

**Question**: What are the 11 attributes of wine that we are using? Do you expect them to be predictive?


The 11 attributes of wine that we are using are 1 - fixed acidity 2 - volatile acidity 3 - citric acid 4 - residual sugar 5 - chlorides 6 - free sulfur dioxide 7 - total sulfur dioxide 8 - density 9 - pH 10 - sulphates 11 - alcohol. I expect them to be predictive

# Process data to determine number of categories 

Note that the first eleven columns of the `wine_data` matrix correspond to numeric values. The last column corresponds to a label. We would like to automatically determine the number of categories (or labels) in the dataset and extract out only the numeric portions so we can train our machine learning algorithm.

In other words, we would like to design a machine learning algorithm that takes as its **input** the eleven features corresponding to the numeric values associated with the various chemical parameters of wine and use them to predict whether the wine is red or white.  

We can extract the header information using the command in the next cell. 

In [3]:
names(wine_data)

12-element Array{Symbol,1}:
 :fixedacidity      
 :volatileacidity   
 :citricacid        
 :residualsugar     
 :chlorides         
 :freesulfurdioxide 
 :totalsulfurdioxide
 :density           
 :pH                
 :sulphates         
 :alcohol           
 :Color             

**Question**: 

Why would training the machine learning algorithm trained with input data corresponding to all twelve elements above not be valid when given a new test vector?

Hint: What do we expect the  the dimension of the test vector to be? Why is the dimension of each row of the training data set not the same? 

The dimension of the test vector is 11. Because one of the features of training data is label which we want to predict during testing time and it is not avaialbe during testing time.  

The command `unique` returns the number of unique elements and we use  that below to determine the unique categories from the 12-th column of the data. 

In [4]:
wine_data_matrix = convert(Matrix, wine_data[:,1:11]) ## convert to Matrix
categories = wine_data[:,12] ## which column has the categories? 
#wine_data[:,12] #takes  ALL rows and the 12th column of wine_data 
unique_categories = unique(categories)

2-element Array{String,1}:
 "Red"  
 "White"

**Question**:

What are the categories? 

There are two categories: "Rea" and "White".

The command `length` computes the number of elements (or length) of a vector. We can use this to determine the number of categories.

In [7]:
@show num_categories = length(unique_categories);

num_categories = length(unique_categories) = 2


In [8]:
println("Number of categories equals $(num_categories)")

Number of categories equals 2


**Question**: How many unique categories are there as listed by the `unique_categories` vector? Does this equal the number we have computed and stored in `num_categories`? 

There are 2 unique categories as listed by the unique_categories vector, which equal the number we have computed and stored in num_categories. 

Each row of the `wine_data_matrix` contains the numeric varlues corresponding to the features. 

In [9]:
size(wine_data_matrix)

(5848, 11)

**Question**: How many rows are there?  What do the rows correspond to? What do the columns correspond? 

There are 5848 rows, corresponding to the number of sample of wine. The columns correspond to features of the wine. 

# Creating a number of features x number of samples data matrix

The `wine_data_matrix` is a $5848 \times 11 $ matrix where the number of features equals 11 and the number of samples equals 5848. We would like to create a matrix that has size number of features x number of samples so that we can feed into the machine learning algorihtm one column (or sample) of the matrix at a time.

We do so via the  `X = transpose(wine_data_matrix)` command which creates an $11 \times 5848 $ matrix where every row of the `wine_data_matrix` is a column of `X`.


In [10]:
X = transpose(wine_data_matrix)

11×5848 LinearAlgebra.Transpose{Float64,Array{Float64,2}}:
  7.4       6.2        6.4       7.3      …    6.7       7.0      9.8   
  0.7       0.25       0.16      0.69          0.41      0.25     0.51  
  0.0       0.54       0.42      0.32          0.24      0.32     0.19  
  1.9       7.0        1.0       2.2           5.4       9.0      3.2   
  0.076     0.046      0.036     0.069         0.035     0.046    0.081 
 11.0      58.0       29.0      35.0      …   33.0      56.0      8.0   
 34.0     176.0      113.0     104.0         115.0     245.0     30.0   
  0.9978    0.99454    0.9908    0.99632       0.9901    0.9955   0.9984
  3.51      3.19       3.18      3.33          3.12      3.25     3.23  
  0.56      0.7        0.52      0.51          0.44      0.5      0.58  
  9.4      10.4       11.0       9.5      …   12.8933   10.4     10.5   

In [11]:
X[:,1] ## This is the command to inspect the first column of X 

11-element Array{Float64,1}:
  7.4   
  0.7   
  0.0   
  1.9   
  0.076 
 11.0   
 34.0   
  0.9978
  3.51  
  0.56  
  9.4   

In [12]:
wine_data_matrix[1,:] ## this is the command to inspect the first row of wine_data_matrix

11-element Array{Float64,1}:
  7.4   
  0.7   
  0.0   
  1.9   
  0.076 
 11.0   
 34.0   
  0.9978
  3.51  
  0.56  
  9.4   

**Question**: Are they equal? 

They are equal, because they contains the same information but just transpose and rows and columns.

The notation `X[:,10]` denotes the `10`-th column of `X`

The notation `leaf_data_matrix[10,:]` denotes the `10`-th row of `leaf_data_matrix`

**Question**: Are they equal if `X = transpose(leaf_data_matrix)`? 

In [13]:
X[:,10] 

11-element Array{Float64,1}:
   6.4    
   0.29   
   0.18   
  15.0    
   0.04   
  21.0    
 116.0    
   0.99736
   3.14   
   0.5    
   9.2    

In [14]:
wine_data_matrix[10,:]

11-element Array{Float64,1}:
   6.4    
   0.29   
   0.18   
  15.0    
   0.04   
  21.0    
 116.0    
   0.99736
   3.14   
   0.5    
   9.2    

The command `size(X)` in `Julia` returns as its output the number of rows and the number of columns of the argument `X`.

In [15]:
size(X)

(11, 5848)

**Question**: What is the size of the `X` matrix. 

Note that each column of `X` represents a vector of features (how many are there?) of the 180 different training examples.

The size of the X matrix is (11, 5848). There are 11 different features.

# Designing a machine learning algorithm for supervised learning 

We will use the Flux.jl package for this. `

In [16]:
using Flux
using Flux: onehotbatch, throttle, crossentropy

This code in the next line converts the labels into `onehot` encoded vectors.

In [17]:
Y = onehotbatch(wine_data[:,12], unique_categories) ## which column contains the data

2×5848 Flux.OneHotMatrix{Array{Flux.OneHotVector,1}}:
  true  false  false   true   true  …  false  false  false  false   true
 false   true   true  false  false      true   true   true   true  false

## Training a logistic network or logistic regression model

The following lines of code train a softmax classifier (or equivalently perform  logistic regression) which takes as its input the 4 features and tries to match as its output the onehot encoded vectors.

In [18]:
# Declare model taking # features as inputs and outputting as many  probabiltiies as number of categories, 
# one for each species of leaf.
num_features = 11  ## How many features are there?
model = Chain(Dense(num_features, num_categories),softmax) ## this is called a logistic regression model

loss_fn = crossentropy
loss(x, y) = loss_fn(model(x), y)
opt = ADAM() 
evalcb = () -> @show([loss(X,Y)])



#3 (generic function with 1 method)

In [19]:
using Flux:shuffle
using Base.Iterators: repeated, partition

epochs = 20000 ## run for as many epochs as needed to get a small enough training error
batch_size = 500 ## change batch size if you get a Loss is Infinite error  

for epoch_idx in 1:epochs
    dataset = [(X[:, i], Y[:, i]) for i in partition(shuffle(1:size(X, 2)), batch_size)]
    Flux.train!(loss, params(model),dataset, opt)
    if rem(epoch_idx,1000) == 0
        println("Training loss is $(Tracker.data(loss(X,Y)))")
    end
end

Training loss is 0.06494047
Training loss is 0.057232805
Training loss is 0.054624766
Training loss is 0.052585527
Training loss is 0.05121835
Training loss is 0.05005693
Training loss is 0.04919568
Training loss is 0.048414815
Training loss is 0.047771342
Training loss is 0.047297828
Training loss is 0.046927877
Training loss is 0.046805847
Training loss is 0.046655588
Training loss is 0.046237577
Training loss is 0.04613518
Training loss is 0.04597516
Training loss is 0.04597536
Training loss is 0.045829777
Training loss is 0.045702934
Training loss is 0.04566455


In [20]:
println("Training loss is $(Tracker.data(loss(X,Y)))")

Training loss is 0.04566455


**Question**: How low does the training loss get? Write down your answer. Does it decrease if you run it for longer by running previous cell again? 

The training loss I get is 0.045661207. It decrease a little bit (from 0.045661207 to 0.04536579)if I run it for longer by running previous cell again

## Training a deep learning neural network

In [21]:
model2 = Chain(Dense(num_features, num_categories,σ),Dense(num_categories,num_categories),softmax) 
## this is a deep network between we are chaining or gluing multiple Dense layers together

loss_fn = crossentropy
loss2(x, y) = loss_fn(model2(x), y)
opt = ADAM()
evalcb = () -> @show([loss2(X,Y)])

#7 (generic function with 1 method)

In [22]:
using Flux:shuffle
using Base.Iterators: repeated, partition

epochs = 20000
batch_size = 500

for epoch_idx in 1:epochs
    dataset = [(X[:, i], Y[:, i]) for i in partition(shuffle(1:size(X, 2)), batch_size)]
    Flux.train!(loss2, params(model2),dataset, opt)
    if rem(epoch_idx,500) == 0
        println("Training loss is $(Tracker.data(loss2(X,Y)))")
    end
end

Training loss is 0.058872506
Training loss is 0.049194474
Training loss is 0.0467646
Training loss is 0.044715617
Training loss is 0.043265454
Training loss is 0.041811097
Training loss is 0.040547933
Training loss is 0.039479718
Training loss is 0.0386773
Training loss is 0.03795004
Training loss is 0.037402045
Training loss is 0.036843285
Training loss is 0.036916636
Training loss is 0.036251795
Training loss is 0.03598608
Training loss is 0.035877734
Training loss is 0.036077056
Training loss is 0.036170866
Training loss is 0.036370024
Training loss is 0.035406068
Training loss is 0.03581641
Training loss is 0.035310097
Training loss is 0.035200402
Training loss is 0.03630146
Training loss is 0.036773186
Training loss is 0.03527789
Training loss is 0.035363942
Training loss is 0.03527546
Training loss is 0.03492837
Training loss is 0.03495665
Training loss is 0.03690005
Training loss is 0.03490976
Training loss is 0.035485573
Training loss is 0.034846827
Training loss is 0.034886755

In [23]:
println("Training loss is $(Tracker.data(loss2(X,Y)))")

Training loss is 0.034906816


**Question**: 

-- How low does the training loss get? 

-- Does it get lower than it does for the logistic regression case? 

-- Does it decrease if you run it for longer by running previous cell again? 

The training loss I get is 0.034416985.
$$$$
It get lower than it does for the logistic regression case.
$$$$
It decrease a little bit (from 0.034416985 to 0.033921164) if I run it for longer by running previous cell again.

# Creating a test dataset so we can test our algorithm

We now test our algorithms using test data that is indepedent of data we trained our models on.

In [24]:
wine_test_data = CSV.read("winetest.csv") 

Unnamed: 0_level_0,fixedacidity,volatileacidity,citricacid,residualsugar,chlorides,freesulfurdioxide
Unnamed: 0_level_1,Float64,Float64,Float64,Float64,Float64,Float64
1,7.4,0.7,0.0,1.9,0.076,11.0
2,6.6,0.425,0.25,2.35,0.034,23.0
3,7.2,0.25,0.19,8.0,0.044,51.0
4,6.9,0.25,0.29,2.4,0.038,28.0
5,7.7,0.275,0.3,1.0,0.039,19.0
6,7.1,0.26,0.32,16.2,0.044,31.0
7,10.6,0.36,0.57,2.3,0.087,6.0
8,7.5,0.58,0.56,3.1,0.153,5.0
9,6.0,0.24,0.32,6.3,0.03,34.0
10,6.9,0.25,0.26,5.2,0.024,36.0


In [25]:
Xtest = transpose(convert(Matrix,wine_test_data[:,1:11])) 
# We  are doing this because the first 11 columns contain the numeric data. 
Ytest = onehotbatch(wine_test_data[:,12], unique_categories) ## the 13th column contains the data

2×651 Flux.OneHotMatrix{Array{Flux.OneHotVector,1}}:
  true  false  false  false  false  …  false  false  false  false  false
 false   true   true   true   true      true   true   true   true   true

## Using the trained algorithm to classify a new label

We now test the predictions made by our algorithm(s).

In [26]:
model(Xtest[:,1])

Tracked 2-element Array{Float32,1}:
 0.99987686f0   
 0.00012311297f0

**Question**: 

Which element of `model(Xtest[:,1])` is the largest -- the first or second element?  

Does this suggest that the first test vector is "red" or "white" wine? 

Why does the command `unique_categories[onecold(model2(Xtest[:,1]))]` return the predicted category? 

The first element of model(Xtest[:,1]) is the largest.
$$$$
This suggests that the first test vector is "red".
$$$$
Because the first element of model(Xtest[:,1]) is the largest, onecold(model2(Xtest[:,1])) returns the index of the largest element, which is equal to 1. Then unique_categories[1] is "Red".

The predicted category by the logistic network is given by the command below.

In [27]:
using Flux:onecold
unique_categories[onecold(model2(Xtest[:,1]))]

"Red"

The array `Ytest` contains the ground truth labels for the test vectors. The correct answer is given by the element in the `Y` vector that equals `true`

In [28]:
Ytest[:,1]

2-element Flux.OneHotVector:
  true
 false

The `onecold` function in the `Flux` package returns the index of the largest element as illustrated next.

In [29]:
onecold(Ytest[:,1])

1

The `onecold` function will return 1 because it is the element of the `Ytest[:,1] ` vector that is the largest (since `true` corresponds to `1` and `false` corresponds to `0`).  So we can check to see whether the model prediction is correct using the command in the next cell. 

In [30]:
@show onecold(model(Xtest[:,1])) == Ytest[1];

onecold(model(Xtest[:, 1])) == Ytest[1] = true


We now repeat the computation for the deep network.

In [31]:
model2(Xtest[:,1])

Tracked 2-element Array{Float32,1}:
 0.9999999f0 
 1.0837769f-7

**Question**:

How does the output of the command `model2(Xtest[:,1])` help predict which category of wine we are predicting based on the input features? Why?

The command model2(Xtest[:,1]) computes the product of X[;,1]  and weight trained based the input features, which is actually module matching processing, the Xtest[:,1]' features matches the Red'features most, so it will return highest scores for category 1. 

We make the prediction algorithmically using the command below. 

In [32]:
unique_categories[onecold(model2(Xtest[:,1]))]

"Red"

Finally, we check to see if our prediction matches the ground truth label.

In [33]:
onecold(model2(Xtest[:,1])) == Ytest[1]

true

Let us try it with another example.

**Question**:

Did both predict the correct label? 

Are the output are the same? Does having the same prediction imply that the output are the same? Why or why not?

Both predict the correct label.
$$$$
The output of model(Xtest[:,1]) and model2(Xtest[:,1]) are not same, but both of them implies that the Xtest[:,1] should be "Red", In this sense, they have same prediction implying that the output are the same

We now repeat the computation with another example.

In [34]:
@show model(Xtest[:,10]);
@show model2(Xtest[:,10]);


model(Xtest[:, 10]) = Float32[0.001614, 0.998386] (tracked)
model2(Xtest[:, 10]) = Float32[0.00198709, 0.998013] (tracked)


**Question**:

What are the predictions made by `model` and `model2`?

Both the predictions made by model and model2 are "White".

In [35]:
Ytest[:,10]

2-element Flux.OneHotVector:
 false
  true

**Question**: 

Do both models return an output that matches the prediction? 

Which one returns higher "confidence" values? 

Are the confidence levels about the same or dramatically different?

Both models return an output that matches the prediction.
$$$$
model(Xtest[:,10]) returns higher "confidence" values.
$$$$
The confidence levels are about the same

## Determining the accuracy of the algorithm on the *entire* dataset

In the previous section we did tests on simple examples at a time. We would now like to determine how accurate the algorithm is in a systematic way. To do this, we use the test data set (which contains data that we did not train our algorithm on) and compare the predictions of our model to the 11 features of the wine with the known test labels of whether it is red wine or white wine. We will report the accuracy as the proportion of wines that are correctly classified. A proportion of 1.0 corresponds to 100% accuracy. A proportion of 0.9 corresponds to 90% accuracy and so on.

To that end we load two functions from the `Flux` and `Statistics` packages next.

In [36]:
using Flux:onecold
using Statistics:mean

We now define a function that computes  the accuracy.  The `mean` function computes the average. The `.==` compares the output of the `onecold(model(Xtest)` function (which is the model's prediction) to the (correct) test labels in the `onecold(Ytest)` vector. 

In [37]:
@show model_accuracy = mean(onecold(model(Xtest)) .== onecold(Ytest))

model_accuracy = mean(onecold(model(Xtest)) .== onecold(Ytest)) = 0.9861751152073732


0.9861751152073732

In [38]:
println("Logistic network test accuracy is $(model_accuracy)")

Logistic network test accuracy is 0.9861751152073732


We now evaluate the test accuracy of the deep network.

In [39]:
@show model2_accuracy = mean(onecold(model2(Xtest)) .== onecold(Ytest)); 
## if it does not pass cell below train the deep network for longer

model2_accuracy = mean(onecold(model2(Xtest)) .== onecold(Ytest)) = 0.9907834101382489


In [40]:
println("Deep network test accuracy is $(model2_accuracy)")

Deep network test accuracy is 0.9907834101382489


**Exercise**:

Keep training the deep network till it attains a higher accuracy than the logistic network.

In [41]:
if(model2_accuracy > model_accuracy)
    println("Deep network has a higher test accuracy than logistic network")
end

Deep network has a higher test accuracy than logistic network


# Examining the output of the model for samples that incorrectly classify

The algorithm works well but not perfectly. Let us determine which samples it misidentifies using the code below.

Tip: The `.!=` is a way of checking which samples do not match.

In [42]:
model_wrong_samples = findall(onecold(model(Xtest)) .!= onecold(Ytest))

9-element Array{Int64,1}:
  26
  45
  95
 264
 275
 322
 324
 502
 549

We now examine which samples the deeep model gets wrong.

In [43]:
model2_wrong_samples = findall(onecold(model2(Xtest)) .!= onecold(Ytest))

6-element Array{Int64,1}:
  26
 275
 322
 324
 502
 565

We notice that both models misidentify a common set of samples. We would like to see which ones `model2` correctly classifiers for which `model` fails and vice versa.  The command `setdiff` does this below.

In [44]:
?setdiff

search: [0m[1ms[22m[0m[1me[22m[0m[1mt[22m[0m[1md[22m[0m[1mi[22m[0m[1mf[22m[0m[1mf[22m [0m[1ms[22m[0m[1me[22m[0m[1mt[22m[0m[1md[22m[0m[1mi[22m[0m[1mf[22m[0m[1mf[22m! [0m[1ms[22m[0m[1me[22mlec[0m[1mt[22m[0m[1md[22m[0m[1mi[22mm [0m[1ms[22m[0m[1me[22m[0m[1mt[22mroun[0m[1md[22m[0m[1mi[22mng [0m[1ms[22m[0m[1me[22marchsor[0m[1mt[22me[0m[1md[22mf[0m[1mi[22mrst



```
setdiff(s, itrs...)
```

Construct the set of elements in `s` but not in any of the iterables in `itrs`. Maintain order with arrays.

# Examples

```jldoctest
julia> setdiff([1,2,3], [3,4,5])
2-element Array{Int64,1}:
 1
 2
```

---

```
setdiff(ss1, ss2)
```

The two arguments are sorted sets with the same key and order type. This operation computes the difference, i.e., a sorted set containing entries that in are in `ss1` but not `ss2`. Time: O(*cn* log *n*), where *n* is the total size of the two containers.


The `samples_model_gets_wrong` variable stores which variables `model` gets wrong that `model2` does not.

In [45]:
samples_model_gets_wrong = setdiff(model_wrong_samples,model2_wrong_samples)


4-element Array{Int64,1}:
  45
  95
 264
 549

We now examine the predictions (class 1 or 2) and the output of the model (the numerical values before we decide what the largest element is).

In [46]:
@show onecold(model(Xtest[:,samples_model_gets_wrong]))
model(Xtest[:,samples_model_gets_wrong])

onecold(model(Xtest[:, samples_model_gets_wrong])) = [1, 1, 2, 2]


Tracked 2×4 Array{Float32,2}:
 0.598059  0.835461  0.485785  0.499444
 0.401941  0.164539  0.514215  0.500556

We now compare it to the prediction of `model2` and the output of `model2`. These are samples that `model2` correctly classifies.

In [47]:
@show onecold(model2(Xtest[:,samples_model_gets_wrong]))
model2(Xtest[:,samples_model_gets_wrong])

onecold(model2(Xtest[:, samples_model_gets_wrong])) = [2, 2, 1, 1]


Tracked 2×4 Array{Float32,2}:
 0.455251  0.144984  0.554659  0.601809
 0.544749  0.855016  0.445341  0.398191

**Question**: 

Does `model2` correctly predict the categories for these samples in a way that `model` does not?

Is it significantly more confident about these predictions? 

Model2 correctly predict the categories for these samples in a way that model does not. I think it is not significantly more confident about these predictions. But more confident about these prediction. 

We now repeat the same analysis for samples that `model2` gets wrong that `model` gets wrong - if there are any such samples.


In [48]:
samples_model2_gets_wrong = setdiff(model2_wrong_samples,model_wrong_samples)
if ~isempty(samples_model2_gets_wrong)
    @show onecold(model2(Xtest[:,samples_model2_gets_wrong]))
    @show model2(Xtest[:,samples_model2_gets_wrong])
    @show onecold(model(Xtest[:,samples_model2_gets_wrong]))
    @show model(Xtest[:,samples_model2_gets_wrong])
end

onecold(model2(Xtest[:, samples_model2_gets_wrong])) = [2]
model2(Xtest[:, samples_model2_gets_wrong]) = Float32[0.258891; 0.741109] (tracked)
onecold(model(Xtest[:, samples_model2_gets_wrong])) = [1]
model(Xtest[:, samples_model2_gets_wrong]) = Float32[0.745761; 0.254239] (tracked)


Tracked 2×1 Array{Float32,2}:
 0.7457609f0 
 0.25423908f0

**Question**: 

Does `model` correctly predict the categories for the samples in `samples_model2_gets_wrong` in a way that `model` does not?

Is it significantly more confident about these predictions? 

Model correctly predict the categories for the samples in samples_model2_gets_wrong in a way that model does not. I think it is not siginificantly more confident about these predictions.