# Gradient Descent

$$
{\color{Green}\mathit{w}}^{t+1} = \mathit{w}^t + 2 \alpha \dots
$$

Loss Function in Linear Entropy is called cross-entropy

## [1. Load Julia modules](https://jihongzhang.org/posts/2021-08-30-gradient-descent-via-julia/#load-julia-modules)

In [2]:
using RDatasets
using DataFrames

In [3]:
mtcars = dataset("datasets", "mtcars")

Unnamed: 0_level_0,Model,MPG,Cyl,Disp,HP,DRat,WT,QSec
Unnamed: 0_level_1,String31,Float64,Int64,Float64,Int64,Float64,Float64,Float64
1,Mazda RX4,21.0,6,160.0,110,3.9,2.62,16.46
2,Mazda RX4 Wag,21.0,6,160.0,110,3.9,2.875,17.02
3,Datsun 710,22.8,4,108.0,93,3.85,2.32,18.61
4,Hornet 4 Drive,21.4,6,258.0,110,3.08,3.215,19.44
5,Hornet Sportabout,18.7,8,360.0,175,3.15,3.44,17.02
6,Valiant,18.1,6,225.0,105,2.76,3.46,20.22
7,Duster 360,14.3,8,360.0,245,3.21,3.57,15.84
8,Merc 240D,24.4,4,146.7,62,3.69,3.19,20.0
9,Merc 230,22.8,4,140.8,95,3.92,3.15,22.9
10,Merc 280,19.2,6,167.6,123,3.92,3.44,18.3


## [2. julia-function-for-gradient-descent](https://jihongzhang.org/posts/2021-08-30-gradient-descent-via-julia)

- learn_rate: the magnitude of the steps the algorithm takes along the slope of the MSE function
- conv_threshold: threshold for convergence of gradient descent n: number of iternations
- max_iter: maximum of iteration before the algorithm stopss

In [4]:
function gradientDesc(x, y, learn_rate, conv_threshold, n, max_iter)
    β = rand(Float64, 1)[1]
    α = rand(Float64, 1)[1]
    ŷ = α .+ β .* x
    MSE = sum((y .- ŷ).^2)/n
    converged = false
    iterations = 0

    while converged == false
        # Implement the gradient descent algorithm
        β_new = β - learn_rate*((1/n)*(sum((ŷ .- y) .* x)))
        α_new = α - learn_rate*((1/n)*(sum(ŷ .- y)))
        α = α_new
        β = β_new
        ŷ = β.*x .+ α
        MSE_new = sum((y.-ŷ).^2)/n
        # decide on whether it is converged or not
        if (MSE - MSE_new) <= conv_threshold
            converged = true
            println("Optimal intercept: $α; Optimal slope: $β")
        end
        iterations += 1
        if iterations > max_iter
            converged = true
            println("Optimal intercept: $α; Optimal slope: $β")
        end
    end
end

gradientDesc (generic function with 1 method)

In [5]:
gradientDesc(mtcars[:,:Disp], mtcars[:,:MPG], 0.0000293, 0.001, 32, 2500000)

Optimal intercept: 29.599851490343465; Optimal slope: -0.04121510890036521


## [3 compared-to-linear-regression](https://jihongzhang.org/posts/2021-08-30-gradient-descent-via-julia)

In [6]:
using GLM
linearRegressor = lm(@formula(MPG ~ Disp), mtcars)

StatsModels.TableRegressionModel{LinearModel{GLM.LmResp{Vector{Float64}}, GLM.DensePredChol{Float64, LinearAlgebra.CholeskyPivoted{Float64, Matrix{Float64}, Vector{Int64}}}}, Matrix{Float64}}

MPG ~ 1 + Disp

Coefficients:
───────────────────────────────────────────────────────────────────────────
                  Coef.  Std. Error      t  Pr(>|t|)  Lower 95%   Upper 95%
───────────────────────────────────────────────────────────────────────────
(Intercept)  29.5999     1.22972     24.07    <1e-20  27.0884    32.1113
Disp         -0.0412151  0.00471183  -8.75    <1e-09  -0.050838  -0.0315923
───────────────────────────────────────────────────────────────────────────

# References
- [ ] [Gradient Descent Algorithm via julia](https://jihongzhang.org/posts/2021-08-30-gradient-descent-via-julia/)

Let's ask chatGPT to write some recursive code based on the procedural code above

In [12]:
function recursiveGradientDesc(x, y, learn_rate, conv_threshold, n, max_iter)
    β = rand(Float64, 1)[1]
    α = rand(Float64, 1)[1]
    ŷ = α .+ β .* x
    MSE = sum((y .- ŷ).^2)/n
    iterations = 0

    function gradientStep(α, β, ŷ, MSE)
        β_new = β - learn_rate * ((1/n) * (sum((ŷ .- y) .* x)))
        α_new = α - learn_rate * ((1/n) * (sum(ŷ .- y)))
        ŷ = β_new .* x .+ α_new
        MSE_new = sum((y .- ŷ).^2)/n

        if (MSE - MSE_new) <= conv_threshold
            println("Optimal intercept: $α_new; Optimal slope: $β_new")
            return α_new, β_new, ŷ, MSE_new, true
        end

        if iterations >= max_iter
            println("Optimal intercept: $α_new; Optimal slope: $β_new")
            return α_new, β_new, ŷ, MSE_new, true
        end

        return gradientStep(α_new, β_new, ŷ, MSE_new)
    end

    # Call the recursive function
    α, β, ŷ, MSE, converged = gradientStep(α, β, ŷ, MSE)
end


recursiveGradientDesc (generic function with 1 method)

In [10]:
recursiveGradientDesc(mtcars[:,:Disp], mtcars[:,:MPG], 0.0000293, 0.001, 32, 2500000)

LoadError: StackOverflowError: