# COMP541 | Deep Learning Lab 1

## First lab session practice!       

### 15 June 2020

* In this exercise, you’re supposed to preprocess **Boston Housing Dataset**, so that we can use it in some machine learning models like linear regression later.

* The housing dataset has housing related information for 506 neighborhoods in Boston from 1978. Each neighborhood is represented using **13 attributes** such as crime rate or distance to employment centers. The goal is to predict the median value of the houses given in $1000's.


### EXERCISE 0

In order to use some necessary functions, we need to import some modules. Just insert the following line as first line or cell,
* *using DelimitedFiles, Statistics, Random*

**Statistics** contains statistical procedures like *mean* and *std*, **DelimitedFiles** contains our data read procedure functions (*readdlm*) and **Random** is for random numbers (*rand, Random.seed!* etc.).


In [62]:
using DelimitedFiles, Statistics, Random

### EXERCISE 1

* First download, and then read the file. 
* You need to download the data within Julia notebook (please have a look: readdlm, download functions of Julia by typing e.g. @doc download). 
* If you look at the data, you see that each house is represented with 13 attributes separated by whitespaces and there are 506 lines in total. Here’s the link to the dataset.

In [63]:
@doc download

```
download(url::AbstractString, [localfile::AbstractString])
```

Download a file from the given url, optionally renaming it to the given local file name. If no filename is given this will download into a randomly-named file in your temp directory. Note that this function relies on the availability of external tools such as `curl`, `wget` or `fetch` to download the file and is provided for convenience. For production use or situations in which more options are needed, please use a package that provides the desired functionality instead.

Returns the filename of the downloaded file.


In [64]:
file = download("https://raw.githubusercontent.com/ilkerkesen/ufldl-tutorial/master/ex1/housing.data") # download the file

"C:\\Users\\FATIHB~1\\AppData\\Local\\Temp\\jl_AE2F.tmp"

In [65]:
# now we need to read the data
data = readdlm(file)

506×14 Array{Float64,2}:
 0.00632  18.0   2.31  0.0  0.538  6.575  …  296.0  15.3  396.9    4.98  24.0
 0.02731   0.0   7.07  0.0  0.469  6.421     242.0  17.8  396.9    9.14  21.6
 0.02729   0.0   7.07  0.0  0.469  7.185     242.0  17.8  392.83   4.03  34.7
 0.03237   0.0   2.18  0.0  0.458  6.998     222.0  18.7  394.63   2.94  33.4
 0.06905   0.0   2.18  0.0  0.458  7.147     222.0  18.7  396.9    5.33  36.2
 0.02985   0.0   2.18  0.0  0.458  6.43   …  222.0  18.7  394.12   5.21  28.7
 0.08829  12.5   7.87  0.0  0.524  6.012     311.0  15.2  395.6   12.43  22.9
 0.14455  12.5   7.87  0.0  0.524  6.172     311.0  15.2  396.9   19.15  27.1
 0.21124  12.5   7.87  0.0  0.524  5.631     311.0  15.2  386.63  29.93  16.5
 0.17004  12.5   7.87  0.0  0.524  6.004     311.0  15.2  386.71  17.1   18.9
 0.22489  12.5   7.87  0.0  0.524  6.377  …  311.0  15.2  392.52  20.45  15.0
 0.11747  12.5   7.87  0.0  0.524  6.009     311.0  15.2  396.9   13.27  18.9
 0.09378  12.5   7.87  0.0  0.524  5.88

**It's a matrix of the size 506x14**

### EXERCISE 2

* The resulting data matrix should have 506 rows representing neighborhoods and 14 columns representing the attributes. 
* **The last attribute is the median house price to be predicted, so let’s separate it.** 
* Also, take transpose of this data matrix to make data convenient with common mathematical notation (deep learning people represent instances/samples as column vectors mostly). 

We will use Julia’s array indexing operation to split the data array into input x and output y. (Hint: you may want to reshape y array into a matrix with size 1x506, use reshape procedure for this purpose)


In [66]:
target = data[:,end]
input = data[:,1:end-1]'

13×506 LinearAlgebra.Adjoint{Float64,Array{Float64,2}}:
   0.00632    0.02731    0.02729  …    0.06076    0.10959    0.04741
  18.0        0.0        0.0           0.0        0.0        0.0    
   2.31       7.07       7.07         11.93      11.93      11.93   
   0.0        0.0        0.0           0.0        0.0        0.0    
   0.538      0.469      0.469         0.573      0.573      0.573  
   6.575      6.421      7.185    …    6.976      6.794      6.03   
  65.2       78.9       61.1          91.0       89.3       80.8    
   4.09       4.9671     4.9671        2.1675     2.3889     2.505  
   1.0        2.0        2.0           1.0        1.0        1.0    
 296.0      242.0      242.0         273.0      273.0      273.0    
  15.3       17.8       17.8      …   21.0       21.0       21.0    
 396.9      396.9      392.83        396.9      393.45     396.9    
   4.98       9.14       4.03          5.64       6.48       7.88   

In [67]:
@doc reshape

```
reshape(A, dims...) -> AbstractArray
reshape(A, dims) -> AbstractArray
```

Return an array with the same data as `A`, but with different dimension sizes or number of dimensions. The two arrays share the same underlying data, so that the result is mutable if and only if `A` is mutable, and setting elements of one alters the values of the other.

The new dimensions may be specified either as a list of arguments or as a shape tuple. At most one dimension may be specified with a `:`, in which case its length is computed such that its product with all the specified dimensions is equal to the length of the original array `A`. The total number of elements must not change.

# Examples

```jldoctest
julia> A = Vector(1:16)
16-element Array{Int64,1}:
  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16

julia> reshape(A, (4, 4))
4×4 Array{Int64,2}:
 1  5   9  13
 2  6  10  14
 3  7  11  15
 4  8  12  16

julia> reshape(A, 2, :)
2×8 Array{Int64,2}:
 1  3  5  7   9  11  13  15
 2  4  6  8  10  12  14  16

julia> reshape(1:6, 2, 3)
2×3 reshape(::UnitRange{Int64}, 2, 3) with eltype Int64:
 1  3  5
 2  4  6
```


In [68]:
target = reshape(target,(1,506))

1×506 Array{Float64,2}:
 24.0  21.6  34.7  33.4  36.2  28.7  …  16.8  22.4  20.6  23.9  22.0  11.9

### EXERCISE 3

As you can see, input attributes have different ranges. 
* We need to normalize attributes by **subtracting their mean and then dividing by their standard deviation** (hint: take means and standard deviations of row vectors). 
* The ***mean*** and ***std*** functions calculate mean and standard deviation values of x. Calculate mean and standard deviation values. Perform normalization on input data.


In [69]:
input

13×506 LinearAlgebra.Adjoint{Float64,Array{Float64,2}}:
   0.00632    0.02731    0.02729  …    0.06076    0.10959    0.04741
  18.0        0.0        0.0           0.0        0.0        0.0    
   2.31       7.07       7.07         11.93      11.93      11.93   
   0.0        0.0        0.0           0.0        0.0        0.0    
   0.538      0.469      0.469         0.573      0.573      0.573  
   6.575      6.421      7.185    …    6.976      6.794      6.03   
  65.2       78.9       61.1          91.0       89.3       80.8    
   4.09       4.9671     4.9671        2.1675     2.3889     2.505  
   1.0        2.0        2.0           1.0        1.0        1.0    
 296.0      242.0      242.0         273.0      273.0      273.0    
  15.3       17.8       17.8      …   21.0       21.0       21.0    
 396.9      396.9      392.83        396.9      393.45     396.9    
   4.98       9.14       4.03          5.64       6.48       7.88   

In [70]:
input[1:1,:]

1×506 Array{Float64,2}:
 0.00632  0.02731  0.02729  0.03237  …  0.04527  0.06076  0.10959  0.04741

In [71]:
std(input[1,:]),mean(input[1,:])

(8.60154510533249, 3.6135235573122535)

In [72]:
for column_index in 1:506
    column = input[:,column_index]
    s = std(column)
    m = mean(column)
    
    input[:,column_index] = (column.-m)./s
end

In [73]:
input

13×506 LinearAlgebra.Adjoint{Float64,Array{Float64,2}}:
 -0.483768   -0.483736  -0.470793   …  -0.494178  -0.494467  -0.489905
 -0.344203   -0.48396   -0.471019      -0.49466   -0.495343  -0.490282
 -0.4659     -0.425868  -0.412476      -0.399977  -0.400032  -0.395432
 -0.483817   -0.48396   -0.471019      -0.49466   -0.495343  -0.490282
 -0.479644   -0.480107  -0.467135      -0.490112  -0.490765  -0.485726
 -0.432819   -0.431201  -0.411523   …  -0.439294  -0.441065  -0.44234 
  0.0218972   0.164338   0.0349192      0.227567   0.218088   0.152121
 -0.452094   -0.443147  -0.429889      -0.477457  -0.476258  -0.470366
 -0.476061   -0.467527  -0.454458      -0.486723  -0.487354  -0.482331
  1.81206     1.50448    1.53286        1.67202    1.68569    1.68021 
 -0.365145   -0.337703  -0.323626   …  -0.327992  -0.327571  -0.323321
  2.59468     2.77725    2.78181        2.65536    2.64799    2.66528 
 -0.44519    -0.40886   -0.437648      -0.449898  -0.443573  -0.427632

### EXERCISE 4

It is necessary to split our dataset into training and test subsets so we can estimate how good our model will perform on unseen data. 

There are 506 house in our dataset. Let’s take **400** of them randomly, use them as training data. Let the rest be test data.

**In the end, you will have 4 different arrays: xtrn, ytrn, xtst and ytst.**

In [93]:
@doc Random.seed!

```
seed!([rng=GLOBAL_RNG], seed) -> rng
seed!([rng=GLOBAL_RNG]) -> rng
```

Reseed the random number generator: `rng` will give a reproducible sequence of numbers if and only if a `seed` is provided. Some RNGs don't accept a seed, like `RandomDevice`. After the call to `seed!`, `rng` is equivalent to a newly created object initialized with the same seed.

If `rng` is not specified, it defaults to seeding the state of the shared thread-local generator.

# Examples

```julia-repl
julia> Random.seed!(1234);

julia> x1 = rand(2)
2-element Array{Float64,1}:
 0.590845
 0.766797

julia> Random.seed!(1234);

julia> x2 = rand(2)
2-element Array{Float64,1}:
 0.590845
 0.766797

julia> x1 == x2
true

julia> rng = MersenneTwister(1234); rand(rng, 2) == x1
true

julia> MersenneTwister(1) == Random.seed!(rng, 1)
true

julia> rand(Random.seed!(rng), Bool) # not reproducible
true

julia> rand(Random.seed!(rng), Bool)
false

julia> rand(MersenneTwister(), Bool) # not reproducible either
true
```


In [100]:
Random.seed!(1)

MersenneTwister(UInt32[0x00000001], Random.DSFMT.DSFMT_state(Int32[1749029653, 1072851681, 1610647787, 1072862326, 1841712345, 1073426746, -198061126, 1073322060, -156153802, 1073567984  …  1977574422, 1073209915, 278919868, 1072835605, 1290372147, 18858467, 1815133874, -1716870370, 382, 0]), [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0  …  0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], UInt128[0x00000000000000000000000000000000, 0x00000000000000000000000000000000, 0x00000000000000000000000000000000, 0x00000000000000000000000000000000, 0x00000000000000000000000000000000, 0x00000000000000000000000000000000, 0x00000000000000000000000000000000, 0x00000000000000000000000000000000, 0x00000000000000000000000000000000, 0x00000000000000000000000000000000  …  0x00000000000000000000000000000000, 0x00000000000000000000000000000000, 0x00000000000000000000000000000000, 0x00000000000000000000000000000000, 0x00000000000000000000000000000000, 0x00000000000000000000000000000000, 0x000000000000

In [101]:
@doc randperm

```
randperm([rng=GLOBAL_RNG,] n::Integer)
```

Construct a random permutation of length `n`. The optional `rng` argument specifies a random number generator (see [Random Numbers](@ref)). The element type of the result is the same as the type of `n`.

To randomly permute an arbitrary vector, see [`shuffle`](@ref) or [`shuffle!`](@ref).

!!! compat "Julia 1.1"
    In Julia 1.1 `randperm` returns a vector `v` with `eltype(v) == typeof(n)` while in Julia 1.0 `eltype(v) == Int`.


# Examples

```jldoctest
julia> randperm(MersenneTwister(1234), 4)
4-element Array{Int64,1}:
 2
 1
 4
 3
```


In [108]:
Random.seed!(1)
indexes = randperm(506)

xtrn, ytrn, xtst, ytst = input[:,indexes[1:400]],target[:,indexes[1:400]],input[:,indexes[401:end]],target[:,indexes[401:end]]

xtrn

13×400 Array{Float64,2}:
 -0.480773  -0.45548   -0.482986  …  -0.485203   -0.499224  -0.48524  
 -0.481495  -0.477348  -0.212065     -0.0786389  -0.500116  -0.485902 
 -0.460114  -0.395201  -0.4364       -0.474229   -0.440443  -0.420256 
 -0.481495  -0.477348  -0.483375     -0.485695   -0.500116  -0.485902 
 -0.477253  -0.472834  -0.47998      -0.482906   -0.496491  -0.482315 
 -0.432789  -0.436084  -0.439144  …  -0.445776   -0.454985  -0.441864 
  0.298975  -0.163695  -0.263227     -0.360186    0.176787   0.0492357
 -0.455526  -0.456443  -0.431898     -0.413033   -0.483156  -0.468343 
 -0.455421  -0.444157  -0.475623     -0.458558   -0.46526   -0.446553 
  1.19591    2.04515    1.87314       2.30264     2.17682    2.34718  
 -0.326791  -0.324671  -0.352371  …  -0.361543   -0.354418  -0.369168 
  2.91677    2.43058    2.57095       2.17598     2.25518    2.1084   
 -0.359992  -0.372465  -0.387021     -0.432845   -0.41458   -0.419272 

In [109]:
ytrn

1×400 Array{Float64,2}:
 26.4  16.1  17.1  19.0  21.7  17.4  …  23.9  18.3  8.8  18.6  19.8  22.8

In [110]:
xtst

13×106 Array{Float64,2}:
 -0.500386  -0.362399  -0.491272  …  -0.441134   -0.485236   -0.479284
 -0.501035  -0.441211  -0.236055     -0.469329   -0.486341   -0.166252
 -0.268878  -0.342412  -0.445776     -0.379669   -0.377197   -0.456165
 -0.501035  -0.441211  -0.491538     -0.469329   -0.486341   -0.480162
 -0.495776  -0.437952  -0.488284     -0.466441   -0.482915   -0.477113
 -0.446693  -0.403809  -0.442019  …  -0.439043   -0.440943   -0.434429
  0.260149   0.104638  -0.187965     -0.0740346  -0.0315732  -0.277167
 -0.481146  -0.433211  -0.450272     -0.451764   -0.436787   -0.448305
 -0.482933  -0.310207  -0.438938     -0.350443   -0.454978   -0.445283
  1.20054    3.19414    1.98063       2.82974     1.77966     2.2962  
 -0.328162  -0.330949  -0.370559  …  -0.369267   -0.360888   -0.37413 
  2.91723    0.537824   2.4822        1.47583     2.62568     2.19044 
 -0.371878  -0.333242  -0.420153     -0.395124   -0.362143   -0.448352

In [111]:
ytst

1×106 Array{Float64,2}:
 20.3  27.5  22.0  30.7  19.4  24.5  …  24.8  42.3  16.3  19.1  20.3  29.8

### EXERCISE 5

**Our data is ready to be used. This week, we will not deal with the training of a model, but let’s look at how good a randomly initialized linear regression model performs on our processed data.**

* Basically, we need to use some weights with whom we’re going to multiply the attributes of houses so that we can predict the price of that house. Neighborhoods are represented with 13 attributes and we need to predict the prices which is a single number. We need to have a **weight matrices with size of 1x13.** We also use a **bias value which is 0.**

* To create weight matrix, we will sample from normal distribution with zero mean and a small standard deviation. 
* In this tutorial, our standard deviation value is equal to **0.1**. Use *randn* function to create a random weight matrix whose values are sampled from a unit normal distribution (mean=0, standard deviation=1). 
* Multiply our weight matrix by 0.1 which is our desired standard deviation. We will not use bias in this tutorial.

In [112]:
@doc randn

```
randn([rng=GLOBAL_RNG], [T=Float64], [dims...])
```

Generate a normally-distributed random number of type `T` with mean 0 and standard deviation 1. Optionally generate an array of normally-distributed random numbers. The `Base` module currently provides an implementation for the types [`Float16`](@ref), [`Float32`](@ref), and [`Float64`](@ref) (the default), and their [`Complex`](@ref) counterparts. When the type argument is complex, the values are drawn from the circularly symmetric complex normal distribution of variance 1 (corresponding to real and imaginary part having independent normal distribution with mean zero and variance `1/2`).

# Examples

```jldoctest
julia> using Random

julia> rng = MersenneTwister(1234);

julia> randn(rng, ComplexF64)
0.6133070881429037 - 0.6376291670853887im

julia> randn(rng, ComplexF32, (2, 3))
2×3 Array{Complex{Float32},2}:
 -0.349649-0.638457im  0.376756-0.192146im  -0.396334-0.0136413im
  0.611224+1.56403im   0.355204-0.365563im  0.0905552+1.31012im
```


In [183]:
w = reshape(randn(13).*(0.1),(1,13))

1×13 Array{Float64,2}:
 -0.0197354  0.0737622  0.0479149  …  0.0148911  -0.0221678  0.0131052

### EXERCISE 6

**Now, we have input and weights.** 
**Let’s write a function to predict price.** 

* Implement a function takes weight matrix and neighborhood attributes as input and outputs a single value, house price prediction. 


* **Simply perform a matrix multiplication inside this function and return the output vector. We call this function as predict function.**


* Call predict function and store the output as ypred.


* ypred is an 1x400 dimensional array/matrix. Each value in this array is the model’s price prediction for an average house in corresponding neighborhood.


In [184]:
# we have inputs with 13x506 dim and weights with 1x13 dims we we can do w*xtrn

ypred = w*xtrn

1×400 Array{Float64,2}:
 -0.0650817  -0.057373  -0.0310276  …  -0.0231113  -0.0740885  -0.0724072

### EXERCISE 7

* **Let’s implement a loss function which is called as Mean Squared Error (MSE),**

* In this function we calculate J, our loss value, average of squared difference between real price values and predicted price values.


* Implement MSE loss function which takes weight matrix, input matrix (xtrn or xtst) and ground truth prices (ytrn or ytst). 


* Make first parameter of loss function weight matrix, it’s not crucial, but make it a habit. 

 
* Helpful functions: **sum, mean, size, abs2, .*** You don’t have to use all of them. Use abs2 with dot syntax as abs2.(x) if you’re using it. Calculate the loss value for both splits by using your MSE loss function.


In [185]:
function loss(weigths,inputs,targets)
    
    preds = weigths*inputs
    
    loss = sum(abs2.(targets-preds))/size(inputs)[2]
    
    return loss
end

loss (generic function with 1 method)

In [186]:
loss(w,xtrn,ytrn)

593.5506213087656

#### it's quite higher than the example in the practice document. This reason behind that might be that I did not normalize the targets?

#### no it wasnt... index starts with 1 in julia.

### EXERCISE 8

Lastly, let’s find in how many of them, the model predicts the price with an error less than average error. Measure the absolute difference between the predicted price and correct price for each neighborhood and compare those differences with the square root of the loss value calculated in previous exercise. Use sqrt function (with dot syntax, e.g. sqrt.(x)) to take square roots. Perform this step for only training set. The result should be 108.

In [190]:
SE = sqrt(loss(w,xtrn,ytrn))

24.362894354094415

In [197]:
sum((ytrn-w*xtrn).-SE .< 0)

283

### There is something wrong! But what?