In [1]:
import Pkg
Pkg.activate(".")

[32m[1m  Activating[22m[39m project at `~/2024fall/BME574/Homework`


In [2]:
Pkg.add(["Plots", "Optimization", "OptimizationOptimJL", "ForwardDiff", "Optim"])

[32m[1m   Resolving[22m[39m package versions...
[32m[1m  No Changes[22m[39m to `~/2024fall/BME574/Homework/Project.toml`
[32m[1m  No Changes[22m[39m to `~/2024fall/BME574/Homework/Manifest.toml`


In [3]:
using Plots
using Optimization, OptimizationOptimJL, OptimizationManopt
using ForwardDiff
using Optim

In [4]:
?OptimizationFunction

search: [0m[1mO[22m[0m[1mp[22m[0m[1mt[22m[0m[1mi[22m[0m[1mm[22m[0m[1mi[22m[0m[1mz[22m[0m[1ma[22m[0m[1mt[22m[0m[1mi[22m[0m[1mo[22m[0m[1mn[22m[0m[1mF[22m[0m[1mu[22m[0m[1mn[22m[0m[1mc[22m[0m[1mt[22m[0m[1mi[22m[0m[1mo[22m[0m[1mn[22m Multi[0m[1mO[22mbjectiveO[0m[1mp[22m[0m[1mt[22m[0m[1mi[22m[0m[1mm[22m[0m[1mi[22m[0m[1mz[22m[0m[1ma[22m[0m[1mt[22m[0m[1mi[22m[0m[1mo[22m[0m[1mn[22m[0m[1mF[22m[0m[1mu[22m[0m[1mn[22m[0m[1mc[22m[0m[1mt[22m[0m[1mi[22m[0m[1mo[22m[0m[1mn[22m



```julia
struct OptimizationFunction{iip, AD, F, G, FG, H, FGH, HV, C, CJ, CJV, CVJ, CH, HP, CJP, CHP, O, EX, CEX, SYS, LH, LHP, HCV, CJCV, CHCV, LHCV} <: SciMLBase.AbstractOptimizationFunction{iip}
```

A representation of an objective function `f`, defined by:

$$
\min_{u} f(u,p)
$$

and all of its related functions, such as the gradient of `f`, its Hessian, and more. For all cases, `u` is the state which in this case are the optimization variables and `p` are the fixed parameters or data.

## Constructor

```julia
OptimizationFunction{iip}(f, adtype::AbstractADType = NoAD();
                          grad = nothing, hess = nothing, hv = nothing,
                          cons = nothing, cons_j = nothing, cons_jvp = nothing,
                          cons_vjp = nothing, cons_h = nothing,
                          hess_prototype = nothing,
                          cons_jac_prototype = nothing,
                          cons_hess_prototype = nothing,
                          observed = __has_observed(f) ? f.observed : DEFAULT_OBSERVED_NO_TIME,
                          lag_h = nothing,
                          hess_colorvec = __has_colorvec(f) ? f.colorvec : nothing,
                          cons_jac_colorvec = __has_colorvec(f) ? f.colorvec : nothing,
                          cons_hess_colorvec = __has_colorvec(f) ? f.colorvec : nothing,
                          lag_hess_colorvec = nothing,
                          sys = __has_sys(f) ? f.sys : nothing)
```

## Positional Arguments

  * `f(u,p)`: the function to optimize. `u` are the optimization variables and `p` are fixed parameters or data used in the objective,

even if no such parameters are used in the objective it should be an argument in the function. For minibatching `p` can be used to pass in a minibatch, take a look at the tutorial [here](https://docs.sciml.ai/Optimization/stable/tutorials/minibatch/) to see how to do it.  This should return a scalar, the loss value, as the return output.

  * `adtype`: see the Defining Optimization Functions via AD section below.

## Keyword Arguments

  * `grad(G,u,p)` or `G=grad(u,p)`: the gradient of `f` with respect to `u`.
  * `hess(H,u,p)` or `H=hess(u,p)`: the Hessian of `f` with respect to `u`.
  * `hv(Hv,u,v,p)` or `Hv=hv(u,v,p)`: the Hessian-vector product $(d^2 f / du^2) v$.
  * `cons(res,u,p)` or `res=cons(u,p)` : the constraints function, should mutate the passed `res` array   with value of the `i`th constraint, evaluated at the current values of variables   inside the optimization routine. This takes just the function evaluations   and the equality or inequality assertion is applied by the solver based on the constraint   bounds passed as `lcons` and `ucons` to [`OptimizationProblem`](@ref), in case of equality   constraints `lcons` and `ucons` should be passed equal values.
  * `cons_j(J,u,p)` or `J=cons_j(u,p)`: the Jacobian of the constraints.
  * `cons_jvp(Jv,u,v,p)` or `Jv=cons_jvp(u,v,p)`: the Jacobian-vector product of the constraints.
  * `cons_vjp(Jv,u,v,p)` or `Jv=cons_vjp(u,v,p)`: the Jacobian-vector product of the constraints.
  * `cons_h(H,u,p)` or `H=cons_h(u,p)`: the Hessian of the constraints, provided as  an array of Hessians with `res[i]` being the Hessian with respect to the `i`th output on `cons`.
  * `hess_prototype`: a prototype matrix matching the type that matches the Hessian. For example, if the Hessian is tridiagonal, then an appropriately sized `Hessian` matrix can be used as the prototype and optimization solvers will specialize on this structure where possible. Non-structured sparsity patterns should use a `SparseMatrixCSC` with a correct sparsity pattern for the Hessian. The default is `nothing`, which means a dense Hessian.
  * `cons_jac_prototype`: a prototype matrix matching the type that matches the constraint Jacobian. The default is `nothing`, which means a dense constraint Jacobian.
  * `cons_hess_prototype`: a prototype matrix matching the type that matches the constraint Hessian. This is defined as an array of matrices, where `hess[i]` is the Hessian w.r.t. the `i`th output. For example, if the Hessian is sparse, then `hess` is a `Vector{SparseMatrixCSC}`. The default is `nothing`, which means a dense constraint Hessian.
  * `lag_h(res,u,sigma,mu,p)` or `res=lag_h(u,sigma,mu,p)`: the Hessian of the Lagrangian, where `sigma` is a multiplier of the cost function and `mu` are the Lagrange multipliers multiplying the constraints. This can be provided instead of `hess` and `cons_h` to solvers that directly use the Hessian of the Lagrangian.
  * `hess_colorvec`: a color vector according to the SparseDiffTools.jl definition for the sparsity pattern of the `hess_prototype`. This specializes the Hessian construction when using finite differences and automatic differentiation to be computed in an accelerated manner based on the sparsity pattern. Defaults to `nothing`, which means a color vector will be internally computed on demand when required. The cost of this operation is highly dependent on the sparsity pattern.
  * `cons_jac_colorvec`: a color vector according to the SparseDiffTools.jl definition for the sparsity pattern of the `cons_jac_prototype`.
  * `cons_hess_colorvec`: an array of color vector according to the SparseDiffTools.jl definition for the sparsity pattern of the `cons_hess_prototype`.

When [Symbolic Problem Building with ModelingToolkit](https://docs.sciml.ai/Optimization/stable/tutorials/symbolic/) interface is used the following arguments are also relevant:

  * `observed`: an algebraic combination of optimization variables that is of interest to the user   which will be available in the solution. This can be single or multiple expressions.
  * `sys`: field that stores the `OptimizationSystem`.

## Defining Optimization Functions via AD

While using the keyword arguments gives the user control over defining all of the possible functions, the simplest way to handle the generation of an `OptimizationFunction` is by specifying an option from ADTypes.jl which lets the user choose the Automatic Differentiation backend to use for automatically filling in all of the extra functions. For example,

```julia
OptimizationFunction(f,AutoForwardDiff())
```

will use [ForwardDiff.jl](https://github.com/JuliaDiff/ForwardDiff.jl) to define all of the necessary functions. Note that if any functions are defined directly, the auto-AD definition does not overwrite the user's choice.

Each of the AD-based constructors are documented separately via their own dispatches below in the [Automatic Differentiation Construction Choice Recommendations](@ref ad) section.

## iip: In-Place vs Out-Of-Place

For more details on this argument, see the ODEFunction documentation.

## specialize: Controlling Compilation and Specialization

For more details on this argument, see the ODEFunction documentation.

## Fields

The fields of the OptimizationFunction type directly match the names of the inputs.


In [5]:
?AutoForwardDiff

search: [0m[1mA[22m[0m[1mu[22m[0m[1mt[22m[0m[1mo[22m[0m[1mF[22m[0m[1mo[22m[0m[1mr[22m[0m[1mw[22m[0m[1ma[22m[0m[1mr[22m[0m[1md[22m[0m[1mD[22m[0m[1mi[22m[0m[1mf[22m[0m[1mf[22m [0m[1mA[22m[0m[1mu[22m[0m[1mt[22m[0m[1mo[22mSparse[0m[1mF[22m[0m[1mo[22m[0m[1mr[22m[0m[1mw[22m[0m[1ma[22m[0m[1mr[22m[0m[1md[22m[0m[1mD[22m[0m[1mi[22m[0m[1mf[22m[0m[1mf[22m [0m[1mA[22m[0m[1mu[22m[0m[1mt[22m[0m[1mo[22mPolyester[0m[1mF[22m[0m[1mo[22m[0m[1mr[22m[0m[1mw[22m[0m[1ma[22m[0m[1mr[22m[0m[1md[22m[0m[1mD[22m[0m[1mi[22m[0m[1mf[22m[0m[1mf[22m



```
AutoForwardDiff{chunksize,T}
```

Struct used to select the [ForwardDiff.jl](https://github.com/JuliaDiff/ForwardDiff.jl) backend for automatic differentiation.

Defined by [ADTypes.jl](https://github.com/SciML/ADTypes.jl).

# Constructors

```
AutoForwardDiff(; chunksize=nothing, tag=nothing)
```

# Type parameters

  * `chunksize`: the preferred [chunk size](https://juliadiff.org/ForwardDiff.jl/stable/user/advanced/#Configuring-Chunk-Size) to evaluate several derivatives at once

# Fields

  * `tag::T`: a [custom tag](https://juliadiff.org/ForwardDiff.jl/release-0.10/user/advanced.html#Custom-tags-and-tag-checking-1) to handle nested differentiation calls (usually not necessary)

---

```
AutoForwardDiff{chunksize} <: AbstractADType
```

An AbstractADType choice for use in OptimizationFunction for automatically generating the unspecified derivative functions. Usage:

```julia
OptimizationFunction(f, AutoForwardDiff(); kwargs...)
```

This uses the [ForwardDiff.jl](https://github.com/JuliaDiff/ForwardDiff.jl) package. It is the fastest choice for small systems, especially with heavy scalar interactions. It is easy to use and compatible with most Julia functions which have loose type restrictions. However, because it's forward-mode, it scales poorly in comparison to other AD choices. Hessian construction is suboptimal as it uses the forward-over-forward approach.

  * Compatible with GPUs
  * Compatible with Hessian-based optimization
  * Compatible with Hv-based optimization
  * Compatible with constraints

Note that only the unspecified derivative functions are defined. For example, if a `hess` function is supplied to the `OptimizationFunction`, then the Hessian is not defined via ForwardDiff.


In [6]:
# Define the objective function
f(x, y) = x^2 + 3y^2

f (generic function with 1 method)

In [7]:
# Define the optimization problem
function obj(u, p) # p is unused but necessary for the next step where obj is called for OptimizationFunction
    x, y = u
    return f(x, y)
end

obj (generic function with 1 method)

In [8]:
x0 = [3.0, 2.0]  # Starting point
optf = OptimizationFunction(obj, AutoForwardDiff()) # specify how descent is calculated with AutoForwardDiff


(::OptimizationFunction{true, AutoForwardDiff{nothing, Nothing}, typeof(obj), Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, typeof(SciMLBase.DEFAULT_OBSERVED_NO_TIME), Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing}) (generic function with 1 method)

In [9]:
# Set up the optimization problem
prob = OptimizationProblem(optf, x0)

[38;2;86;182;194mOptimizationProblem[0m. In-place: [38;2;86;182;194mtrue[0m
u0: 2-element Vector{Float64}:
 3.0
 2.0

In [10]:
?GradientDescent

search: [0m[1mG[22m[0m[1mr[22m[0m[1ma[22m[0m[1md[22m[0m[1mi[22m[0m[1me[22m[0m[1mn[22m[0m[1mt[22m[0m[1mD[22m[0m[1me[22m[0m[1ms[22m[0m[1mc[22m[0m[1me[22m[0m[1mn[22m[0m[1mt[22m [0m[1mG[22m[0m[1mr[22m[0m[1ma[22m[0m[1md[22m[0m[1mi[22m[0m[1me[22m[0m[1mn[22m[0m[1mt[22m[0m[1mD[22m[0m[1me[22m[0m[1ms[22m[0m[1mc[22m[0m[1me[22m[0m[1mn[22m[0m[1mt[22mState [0m[1mG[22m[0m[1mr[22m[0m[1ma[22m[0m[1md[22m[0m[1mi[22m[0m[1me[22m[0m[1mn[22m[0m[1mt[22m[0m[1mD[22m[0m[1me[22m[0m[1ms[22m[0m[1mc[22m[0m[1me[22m[0m[1mn[22m[0m[1mt[22mOptimizer



# Gradient Descent

## Constructor

```julia
GradientDescent(; alphaguess = LineSearches.InitialHagerZhang(),
linesearch = LineSearches.HagerZhang(),
P = nothing,
precondprep = (P, x) -> nothing)
```

Keywords are used to control choice of line search, and preconditioning.

## Description

The `GradientDescent` method is a simple gradient descent algorithm, that is the search direction is simply the negative gradient at the current iterate, and then a line search step is used to compute the final step. See Nocedal and Wright (ch. 2.2, 1999) for an explanation of the approach.

## References

  * Nocedal, J. and Wright, S. J. (1999), Numerical optimization. Springer Science 35.67-68: 7.


In [11]:
# Solve the problem using gradient descent
sol = solve(prob, GradientDescent(), maxiters=100)

retcode: Success
u: 2-element Vector{Float64}:
  1.6114282515494203e-9
 -2.685713752582376e-10

In [12]:
?solve

search: [0m[1ms[22m[0m[1mo[22m[0m[1ml[22m[0m[1mv[22m[0m[1me[22m [0m[1ms[22m[0m[1mo[22m[0m[1ml[22m[0m[1mv[22m[0m[1me[22m! i[0m[1ms[22ms[0m[1mo[22m[0m[1ml[22m[0m[1mv[22m[0m[1me[22mrstepclock get_[0m[1ms[22m[0m[1mo[22m[0m[1ml[22m[0m[1mv[22m[0m[1me[22mr_result Debug[0m[1mS[22m[0m[1mo[22m[0m[1ml[22m[0m[1mv[22m[0m[1me[22mrState



```julia
CommonSolve.solve(args...; kwargs...)
```

Solves an equation or other mathematical problem using the algorithm specified in the arguments. Generally, the interface is:

```julia
CommonSolve.solve(prob::ProblemType,alg::SolverType; kwargs...)::SolutionType
```

where the keyword arguments are uniform across all choices of algorithms.

By default, `solve` defaults to using `solve!` on the iterator form, i.e.:

```julia
solve(args...; kwargs...) = solve!(init(args...; kwargs...))
```

---

```julia
solve(prob::OptimizationProblem, alg::AbstractOptimizationAlgorithm, args...; kwargs...)
```

## Keyword Arguments

The arguments to `solve` are common across all of the optimizers. These common arguments are:

  * `maxiters`: the maximum number of iterations
  * `maxtime`: the maximum amount of time (typically in seconds) the optimization runs for
  * `abstol`: absolute tolerance in changes of the objective value
  * `reltol`: relative tolerance  in changes of the objective value
  * `callback`: a callback function

Some optimizer algorithms have special keyword arguments documented in the solver portion of the documentation and their respective documentation. These arguments can be passed as `kwargs...` to `solve`. Similarly, the special keyword arguments for the `local_method` of a global optimizer are passed as a `NamedTuple` to `local_options`.

Over time, we hope to cover more of these keyword arguments under the common interface.

If a common argument is not implemented for a optimizer, a warning will be shown.

## Callback Functions

The callback function `callback` is a function which is called after every optimizer step. Its signature is:

```julia
callback = (state, loss_val) -> false
```

where `state` is a `OptimizationState` and stores information for the current iteration of the solver and `loss_val` is loss/objective value. For more information about the fields of the `state` look at the `OptimizationState` documentation. The callback should return a Boolean value, and the default should be `false`, such that the optimization gets stopped if it returns `true`.

### Callback Example

Here we show an example a callback function that plots the prediction at the current value of the optimization variables. The loss function here returns the loss and the prediction i.e. the solution of the `ODEProblem` `prob`, so we can use the prediction in the callback.

```julia
function predict(u)
    Array(solve(prob, Tsit5(), p = u))
end

function loss(u, p)
    pred = predict(u)
    sum(abs2, batch .- pred), pred
end

callback = function (state, l; doplot = false) #callback function to observe training
    display(l)
    # plot current prediction against data
    if doplot
        pred = predict(state.u)
        pl = scatter(t, ode_data[1, :], label = "data")
        scatter!(pl, t, pred[1, :], label = "prediction")
        display(plot(pl))
    end
    return false
end
```

If the chosen method is a global optimizer that employs a local optimization method, a similar set of common local optimizer arguments exists. Look at `MLSL` or `AUGLAG` from NLopt for an example. The common local optimizer arguments are:

  * `local_method`: optimizer used for local optimization in global method
  * `local_maxiters`: the maximum number of iterations
  * `local_maxtime`: the maximum amount of time (in seconds) the optimization runs for
  * `local_abstol`: absolute tolerance in changes of the objective value
  * `local_reltol`: relative tolerance  in changes of the objective value
  * `local_options`: `NamedTuple` of keyword arguments for local optimizer


In [13]:
f(x0[1], x0[2])

21.0

In [14]:
prev_objective = f(x0...)

21.0

The ... operator in Julia is called the "splat" operator. It's used to "unpack" or "spread" the elements of an array or tuple into individual arguments for a function call. 

In [15]:
function cb(state, flag)
    global prev_objective  # Use the global variable
    
    current_u = state.u
    current_objective = state.objective

    # Calculate the difference in objective value
    obj_difference = current_objective - prev_objective
    
    println("Iteration: $(state.iter)")
    println("Current point: x = $(current_u[1]), y = $(current_u[2])")
    println("Current f(x,y) = $current_objective")
    println("Difference from last iteration: $obj_difference")
    println("---")

    prev_objective = current_objective

    return false # continue optimization 
    # should be able to do something like return obj_difference < 1e-6, but seems AutoForwardDiff (or other part?) also takes care of where to stop
end

cb (generic function with 1 method)

In [16]:
# function cb(state, flag)
#     current_u = state.u
#     current_f_value = state.objective
#     println("Current point: x = $(current_u[1]), y = $(current_u[2])")
#     println("Current f(x,y) = $current_f_value")
#     return false # continue optimization
# end

# Solve the problem
sol = solve(prob, GradientDescent(), maxiters=100, callback=cb)


Iteration: 0
Current point: x = 3.0, y = 2.0
Current f(x,y) = 21.0
Difference from last iteration: 0.0
---
Iteration: 1
Current point: x = 1.846153846153846, y = -0.30769230769230793
Current f(x,y) = 3.6923076923076925
Difference from last iteration: -17.307692307692307
---
Iteration: 2
Current point: x = 0.527472527472528, y = 0.3516483516483516
Current f(x,y) = 0.6491969568892652
Difference from last iteration: -3.0431107354184275
---
Iteration: 3
Current point: x = 0.3245984784446326, y = -0.05409974640743881
Current f(x,y) = 0.11414451989261816
Difference from last iteration: -0.535052436996647
---
Iteration: 4
Current point: x = 0.09274242241275227, y = 0.06182828160850144
Current f(x,y) = 0.02006936613496585
Difference from last iteration: -0.09407515375765231
---
Iteration: 5
Current point: x = 0.05707225994630907, y = -0.00951204332438485
Current f(x,y) = 0.003528679759993998
Difference from last iteration: -0.01654068637497185
---
Iteration: 6
Current point: x = 0.016306359984

retcode: Success
u: 2-element Vector{Float64}:
  1.6114282515494203e-9
 -2.685713752582376e-10

In [17]:
function cb_try(state, flag)
    global prev_objective  # Use the global variable
    
    current_u = state.u
    current_objective = state.objective

    # Calculate the difference in objective value
    obj_difference = current_objective - prev_objective
    
    println("Iteration: $(state.iter)")
    println("Current point: x = $(current_u[1]), y = $(current_u[2])")
    println("Current f(x,y) = $current_objective")
    println("Difference from last iteration: $obj_difference")
    println("---")

    prev_objective = current_objective

    return abs(obj_difference)<1e-6 # continue optimization 
    # should be able to do something like return obj_difference < 1e-6, but seems AutoForwardDiff (or other part?) also takes care of where to stop
end

sol_try = solve(prob, GradientDescent(), maxiters=100, callback=cb_try) # retcode: Failure -- conflication with optimization setting I guess

Iteration: 0
Current point: x = 3.0, y = 2.0
Current f(x,y) = 21.0
Difference from last iteration: 21.0
---
Iteration: 1
Current point: x = 1.846153846153846, y = -0.30769230769230793
Current f(x,y) = 3.6923076923076925
Difference from last iteration: -17.307692307692307
---
Iteration: 2
Current point: x = 0.527472527472528, y = 0.3516483516483516
Current f(x,y) = 0.6491969568892652
Difference from last iteration: -3.0431107354184275
---
Iteration: 3
Current point: x = 0.3245984784446326, y = -0.05409974640743881
Current f(x,y) = 0.11414451989261816
Difference from last iteration: -0.535052436996647
---
Iteration: 4
Current point: x = 0.09274242241275227, y = 0.06182828160850144
Current f(x,y) = 0.02006936613496585
Difference from last iteration: -0.09407515375765231
---
Iteration: 5
Current point: x = 0.05707225994630907, y = -0.00951204332438485
Current f(x,y) = 0.003528679759993998
Difference from last iteration: -0.01654068637497185
---
Iteration: 6
Current point: x = 0.01630635998

retcode: Failure
u: 2-element Vector{Float64}:
  0.00031021360527419805
 -5.170226754569983e-5