# Exercise 3: Two-dimensional gradient descent

In [12]:
# 1-D gradient descent (adapted from Slides 8)
def one_dim_gd(f, delta, learning_rate, guess, iterations):
    fprime = lambda x: (f(x + delta) - f(x)) / delta
    for _ in range(iterations):
        print(guess)
        guess = guess - learning_rate * fprime(guess)

In [14]:
# Test 1-D gradient descent
g = lambda x: (x ** 4 / 4 - 2 * x ** 3 / 3 - x ** 2 / 2 + 2 * x + 2) # A function g(x)
f = lambda x: g(x - 2) # A composite function f(x)
delta = 1e-4 # Difference quotient (h)
learning_rate = 0.1 # Learning rate (gamma)
guess = 7 # Initial guess
iterations = 10 # Number of iterations

one_dim_gd(f, delta, learning_rate, guess, iterations)

7
-0.20027000431855413
1.4130210876611642
1.2434441591772032
1.1255487044080574
1.0578788670328958
1.024779984532536
1.0101887274510606
1.004097799830344
1.001617715333463


In [34]:
# 2-D gradient descent (modified for black-box optimization)
import requests

user_agent = "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/108.0.0.0 Safari/537.36"
url = "http://ramcdougal.com/cgi-bin/error_function.py"

def two_dim_gd(delta, learning_rate, guess_a, guess_b, iterations):
    for _ in range(iterations):
        # Retrieve intial error for current guess_a and guess_b
        error = float(requests.get(url, params = {'a': guess_a, 'b': guess_b}, headers = {"User-Agent": user_agent}).text)

        # Estimate the gradients using finite differences method
        error_a_delta = float(requests.get(url, params = {'a': guess_a + delta, 'b': guess_b}, headers = {"User-Agent": user_agent}).text)
        error_b_delta = float(requests.get(url, params = {'a': guess_a, 'b': guess_b + delta}, headers = {"User-Agent": user_agent}).text)

        grad_a = (error_a_delta - error) / delta
        grad_b = (error_b_delta - error) / delta

        # Update guess_a and guess_b
        guess_a = guess_a - learning_rate * grad_a
        guess_b = guess_b - learning_rate * grad_b

        # Monitor convergence
        print(f"Iteration {_ + 1}: Error = {error}")

    # Return final optimized values    
    return guess_a, guess_b

**Explanation:** The function we are interested in, the error function, is a black-box. As such, we cannot take its derivative and cannot directly calculate the gradient. We circumvent this by approximating the gradient using the method of finite differences, in which we choose a small value (known as $h$ or delta) to move the function by. We subtract the difference between the new function position and the original function position and divide the difference by $h$. This gives us an estimate of the derivative that improves as $h$ gets smaller (the estimate will approach the actual derivative as $h$ tends to zero). To extend this to two dimensions, we perform this operation once for each parameter. The stopping criterion was determined by visual confirmation of convergence of the function position, which in this case is the error. After each iteration of gradient descent, I print out the new error and determine manually when convergence is reached. Numerical choices made include careful selection of $h$, the learning rate, and the number of iterations. $h$ should be as small as possible to achieve a close approximation of the gradient while also being large enough to be computationally efficient. The same careful balance applies to the learning rate, as a smaller learning rate will be inefficient and require more iterations, while a larger learning rate might overshoot the minimum entirely or cause the gradient descent algorithm to oscillate around the minimum. As for the number of iterations, it should be sufficiently large enough to capture convergence.

In [35]:
# Minimize the black-box error function
delta = 1e-4 # Difference quotient (h)
learning_rate = 0.1 # Learning rate (gamma)
guess_a, guess_b = 0.5, 0.5 # Initial guesses
iterations = 50 # Number of iterations

a, b = two_dim_gd(delta, learning_rate, guess_a, guess_b, iterations)
print(f"Optimized a: {a}, b: {b}")

Iteration 1: Error = 1.216377
Iteration 2: Error = 1.1744797602
Iteration 3: Error = 1.14766583105
Iteration 4: Error = 1.13050515985
Iteration 5: Error = 1.11952252504
Iteration 6: Error = 1.11249379433
Iteration 7: Error = 1.10799553138
Iteration 8: Error = 1.10511674237
Iteration 9: Error = 1.10327439756
Iteration 10: Error = 1.10209536131
Iteration 11: Error = 1.10134082865
Iteration 12: Error = 1.10085796859
Iteration 13: Error = 1.10054897118
Iteration 14: Error = 1.10035123887
Iteration 15: Error = 1.10022471098
Iteration 16: Error = 1.10014374993
Iteration 17: Error = 1.1000919481
Iteration 18: Error = 1.10005880577
Iteration 19: Error = 1.10003760326
Iteration 20: Error = 1.10002404048
Iteration 21: Error = 1.10001536578
Iteration 22: Error = 1.10000981837
Iteration 23: Error = 1.10000627153
Iteration 24: Error = 1.10000400434
Iteration 25: Error = 1.1000025556
Iteration 26: Error = 1.10000163019
Iteration 27: Error = 1.10000103937
Iteration 28: Error = 1.1000006624
Iteration 

# 