# Machine Leaning Exercise 1
From Week 2 of Coursera course, Machine Learning by Andrew Ng: https://www.coursera.org/learn/machine-learning/.

Eric Nam, https://github.com/eric-nam, 2020

# Make an 5x5 identity matrix
First, it is just a warm-up to get familiar with the environment. Just create a 5x5 identity matrix.

## Most Julia idiomatic way using LinearAlgebra Package

In [None]:
using LinearAlgebra
Matrix{Int}(I, 5, 5)

# Linear Regression
Probably the simplest prediction model from data is a linear regression. The exercise provides a data set with one independent variable. The exercise will use gradient descent method to get a solution.

## Read data from `ex1data1.txt`

In [None]:
import CSV
using DataFrames
fpath_csv = "ex1data1.txt"
df_data1 = CSV.File(fpath_csv, header=false) |> DataFrame!;

# Plot the data
Importing `Plots` for the first time will take a while. On my machine, it takes more than 20 seconds, sometimes one minute.

In [None]:
using Plots
xlabel = "Profit in \$10,000s"
ylabel = "Population of City in 10,000s"
scatter(df_data1.Column1, df_data1.Column2, xlabel=xlabel, ylabel=ylabel, legend=false)

## Define the cost function
The cost (or penalty) function in the exercise is the sum of the error squares divided by the number of samples.

In [None]:
"""
    compute_cost(x, y, theta)

Compute the cost, sum of the squares of the errors.

# Arguments
- `x::Array{Number,2}` : the independent variable matrix. The rows are examples, the columns features.
- `y::Array{Number,1}` : the dependent vector.
- `theta::Array{Number,1}` : the parameter vector (``\\theta_0 + \\theta_1 x_i``)

# Return
`::Number` : Cost value
"""
function compute_cost(x, y, theta)
    x1 = hcat(ones(size(x)[1]), x)
    residue = x1 * theta - y
    (residue' * residue * 0.5 / size(x)[1])[1]
end

### Two examples of the cost calculation 
Two cells below tests the cost function.

In [None]:
theta = [0.; 0.]
x = reshape(df_data1.Column1, size(df_data1.Column1)[1], :)
y = df_data1.Column2
compute_cost(x, y, theta)

In [None]:
theta = [-1.; 2.]
compute_cost(x, y, theta)

## Define a cost gradient function
Now define a gradient of the cost function, which will be used the gradient descent method.

In [None]:
"""
    cost_gradient(x, y, theta)

Compute the gradient of the cost function

# Arguments
- `x::Array{Number,2}` : the independent variables
- `y::Array{Number,1}` : the dependent variables
- `theta::Array{Number,1}` : the parameter vector (``\\theta_0 + \\theta_1 x_i``)

# Return
`::Array{Number, 1}` : Gradient of the cost function  
"""
function cost_gradient(x, y, theta)
    x1 = hcat(ones(size(x)[1]), x)
    reshape(x1' * (x1 * theta - y), :)
end

### An exmaple of the cost gradient

In [None]:
cost_gradient(x, y, theta)

## Define the gradient descent function

In [None]:
function gradient_descent(x, y, theta, alpha, num_iters)
    n = size(x)[1]
    for i in 1:num_iters
        theta -= cost_gradient(x, y, theta) * alpha ./ n
    end
    theta
end

## Calculate the minimum using the gradient descent method
This gradient descent simply iterates the given number of time, and the step size is fixed.

In [None]:
num_iters = 1500
alpha = 0.01

In [None]:
theta_init = [0.; 0.]  # Initial guess
theta = gradient_descent(x, y, theta_init, alpha, num_iters)

# Two predictions using the solutions above

In [None]:
predict1 = [1 3.5] * theta

In [None]:
predice2 = [1 7] * theta

# Plot the landscape of the cost
Plot the whole solution space of the cost function to get a sense of the gradient descent method.

### Create a grid of $\theta$s  and calculate costs 

In [None]:
theta0 = range(-10, stop=10, length=50)
theta1 = range(-1, stop=4, length=50)
thetas = Iterators.product(theta0, theta1)

cost_space = map(t -> compute_cost(x, y, collect(t)), thetas);

## Plot the surface of the cost

In [None]:
using LaTeXStrings
pyplot()
plot(theta0, theta1, cost_space', st=:surface, camera=(-40, 40),
     xlabel=L"\theta_0", ylabel=L"\theta_1")

## Plot a contour and the solution from above
This is the same cost function but in a contour plot. The dot shows the solution obtained by the gradient descent above.

In [None]:
plot(theta0, theta1, cost_space', st=:contour,
     xlabel=L"\theta_0", ylabel=L"\theta_1")
scatter!((theta[1], theta[2]), legend=false)

# Regression with multi-variables
The functions above for the single variable are all defined for multiple variables in the matrix forms, so there is no need to define separate functions for multiple variables.

## Feature normalizing

In [None]:
# load data
fpath_csv = "ex1data2.txt"
df_data2 = CSV.File(fpath_csv, header=false) |> DataFrame!;

In [None]:
# x = convert(Matrix, df_data2[:, 1:2])
x = convert(Matrix, df_data2[:, 1:2])
y = df_data2[:, 3];

## Define a cost function for the multi-variable problem

In [None]:
theta = zeros(3)
compute_cost(x, y, theta)

## Define a normalizing function

In [None]:
# This time functions for Julia are used for mean and standard deviations
using Statistics

"""
    feature_normalize(x)

Normalize features 
# Argument
- `x::Array{Number, 2}` : the independent variable matrix. The rows are examples, the columns features.

# Returns
- `::Array{Number, 2}`: Normalized independent variables
- `::Array{Number, 2}`: means of the examples
- `::Array{Number, 2}`: standard deviations of the examples
"""
function feature_normalize(x)
    mu_x = [mean(col) for col = eachcol(x)]
    std_x = [std(col) for col = eachcol(x)]
    x_norm = (x' .- mu_x) ./ std_x
    x_norm', mu_x, std_x
end

In [None]:
x_norm, mu_x, std_x = feature_normalize(x);

## Solve using the gradient descent

In [None]:
alpha = 0.1
num_iters = 400
theta_init = zeros(3);

In [None]:
theta = gradient_descent(x_norm, y, theta_init, alpha, num_iters)

## Try to predict with the theta above

In [None]:
x1 = [1650.; 3.]
theta' * vcat(1, (x1 .- mu_x) ./ std_x )

## Normal Equation to solve the linear regression

In [None]:
"""
   normal_eqn(x, y)

Closed form of the linear regression to calculate the linear regression.

# Arguments
- `x::Array{Number,2}` : the independent variables
- `y::Array{Number,1}` : the dependent variables

# Return
`::Array{Number, 1}` : coefficients of the linear regression
"""
function normal_eqn(x, y)
    n = size(x)[1]
    x1 = hcat(ones(n, 1), x)
    (x1' * x1) \ x1' * y 
end

In [None]:
theta_norm = normal_eqn(x, y)

In [None]:
theta_norm' * vcat(1., x1)