# Gradient-Based Optimization


* The steepest descent proposes a new point (gradient descent) 


$$
x'= x - \epsilon \nabla_{x} f(x)
$$

where $\epsilon$ is the learning rate, a positive scalar determining the size of the step. We can choose $\epsilon$ in several different ways. 

* Another approach to set $\epsilon$ is to evaluate $f(x - \epsilon \nabla_{x} f(x))$ for several values of $\epsilon$ and choose the one that results in the smallest objective function value - This is known as line search method.  

* Convergence: steepest descent method converges when every element of the gradient is zero (or, in practive very close to zero). In some cases, we may be able to avoid running this iterative algorithm, and jump directly to the critical point by solving the equation $\nabla_{x} f(x) = 0$ for $x$.


* We consider the function $f(x) = (x+10)ˆ{2}$

* Step 1: We initialize the value $x  = 0$ and find the gradient value as $x = 0$ which is $f'(x) = 20$

* Step 2: We specify the learning rate $\epsilon = 0.0001$ and perform the iteration: 

$$
x_{n+1} = x_{n} - \epsilon \times f'(x_{n}) = x_{n} - 0.0001 \times f'(x_{n})
$$


In [11]:
## Specify the learning rate 
learn_rate = 0.01

derf = lambda x: 2*(x + 10) 

## Specify some values:
precision = 0.000000001
iter_count = 0
max_iteration = 100000000
cur_x = 0
prev_step_size = 1

while prev_step_size > precision and iter_count < max_iteration: 
    prev_x = cur_x 
    cur_x = cur_x - learn_rate * derf(prev_x)
    prev_step_size = abs(cur_x - prev_x)
    iter_count +=1
    #print("Iterations ", iter_count, "\nX value ", cur_x)

print("The local minimum value is: ", cur_x)

The local minimum value is:  -9.999999951880648


* Using the method of steepest descent the local minimum value of $f(x) = (x+10)ˆ{2}$ is found to be $-9.999999951880648$. Theoretically, if we solve the function and apply the method theoritically, it is $-10$. 


* Now we consider another function: $f(x) = (xˆ{2} - 10)ˆ{3}$

In [17]:
## specify the learn rate: 

learn_rate = 0.0001
 
derf = lambda x: 6*(x*x - 10)

## specify the precision: 
precision = 0.0001 ## round to 3 decimals
iter_count = 0
max_iter = 100000

cur_x = -2
prevstep_size = 1

### The convergence: |x_{n+1} - x_{n} | < tolerance (precision)

while prevstep_size > precision and iter_count < max_iter: 
    prev_x = cur_x 
    cur_x = cur_x - learn_rate*derf(prev_x) 
    prevstep_size = abs(cur_x - prev_x)
    iter_count = iter_count + 1
    
print("The local minimum value is\n", cur_x)
    

The local minimum value is
 3.1359958525475453
