# Machine Leaning Exercise 1
From Week 2 of Coursera course, Machine Learning by Andrew Ng: https://www.coursera.org/learn/machine-learning/.

Eric Nam, https://github.com/eric-nam, 2020

# Make an 5x5 identity matrix
First, it is just a warm-up to get familiar with the environment. Just create a 5x5 identity matrix.

## Most Julia idiomatic way using LinearAlgebra Package

In [None]:
using LinearAlgebra
Matrix{Int}(I, 5, 5)

# Linear Regression
Probably the simplest prediction model from data is a linear regression. The exercise provides a data set with one independent variable. The exercise will use gradient descent method to get a solution.

## Read data from `ex1data1.txt`

In [None]:
import CSV
using DataFrames
fpath_csv = "ex1data1.txt"
df_data1 = CSV.File(fpath_csv, header=false) |> DataFrame!

# Plot the data
Importing `Plots` for the first time will take a while. On my machine, it takes more than 20 seconds, sometimes one minute.

In [None]:
using Plots
xlabel = "Profit in \$10,000s"
ylabel = "Population of City in 10,000s"
x = df_data1.Column1
y = df_data1.Column2
scatter(x, y, xlabel=xlabel, ylabel=ylabel, legend=false)

## Define the cost function
The cost (or penalty) function in the exercise is the sum of the error squares divided by the number of samples.

In [None]:
"""
    cost(x, y, theta)

Compute the cost, sum of the squares of the errors.

# Arguments
- `x::Array` : the independent variables
- `y::Array` : the dependent variables
- `theta::Array` : the parameter vector (``\\theta_0 + \\theta_1 x_i``)
"""
function cost(x, y, theta)
    x1 = hcat(ones(size(x)), x)
    sum((x1 * theta .- y) .^ 2) * 0.5 / size(x)[1] 
end

### Two examples of the cost calculation 
Two cells below tests the cost function.

In [None]:
theta = [0, 0]
cost(x, y, theta)

In [None]:
theta = [-1, 2]
cost(x, y, theta)

## Define a cost gradient function
Now define a gradient of the cost function, which will be used the gradient descent method.

In [None]:
"""
    cost(x, y, theta)

Compute the gradient of the cost function

# Arguments
- `x::Array` : the independent variables
- `y::Array` : the dependent variables
- `theta::Array` : the parameter vector (``\\theta_0 + \\theta_1 x_i``)
"""
function cost_gradient(x, y, theta)
    x1 = hcat(ones(size(x)), x)
    x1' * (x1 * theta - y)
end

### An exmaple of the cost gradient

In [None]:
cost_gradient(x, y, theta)

## Calculate the minimum using the gradient descent method
This gradient descent simply iterates the given number of time, and the step size is fixed.

In [None]:
iterations = 1500
alpha = 0.01

In [None]:
theta = [0, 0]  # Initial guess
n = size(x)[1]
for i in 1:iterations
    theta -= cost_gradient(x, y, theta) * alpha ./ n
end

theta

# Two predictions using the solutions above

In [None]:
predict1 = [1, 3.5]' * theta

In [None]:
predice2 = [1, 7]' * theta

# Plot the landscape of the cost
Plot the whole solution space of the cost function to get a sense of the gradient descent method.

### Create a grid of $\theta$s  and calculate costs 

In [None]:
theta0 = range(-10, stop=10, length=50)
theta1 = range(-1, stop=4, length=50)
thetas = Iterators.product(theta0, theta1)

cost_space = map(t -> cost(x, y, collect(t)), thetas)

## Plot the surface of the cost

In [None]:
using LaTeXStrings
pyplot()
plot(theta0, theta1, cost_space', st=:surface, camera=(-40, 40),
     xlabel=L"\theta_0", ylabel=L"\theta_1")

## Plot a contour and the solution from above

In [None]:
plot(theta0, theta1, cost_space', st=:contour,
     xlabel=L"\theta_0", ylabel=L"\theta_1")
scatter!((theta[1], theta[2]), legend=false)