  Now that we understand the steps for gradient descent, let us do an example:
  A new amusement park is about to open in Jeddah. The engineers would like to build a fun wiggly ride that shows you all the cool places in the park. The ride they came up with had the following route:

  $$g(x,t)=t \sin{x}+3t \cos (2x).$$
  They also know that the cool places are located at the points $$(x,y)=\{(1,-3),(3,6),(5,-7),(7,2),(9,5),(11,-8)\}.$$
  Unfortunately, they do not know what is the best $t$ that would pass along most of the cool places. Can you help them?
  play with this graph to better understand the problem:
  https://www.desmos.com/calculator/jyp7hitvh2

In [None]:
import math

We wanted to find the best $t$ for the engineers. Can you think of a loss function that depends only on $t$ that we would like to minimize?

In these cases, we usually use the *squared loss function*, which comes in the following form:
$$loss(t)=(g(x_1,t)-y_1)^2+(g(x_2,t)-y_2)^2+(g(x_3,t)-y_3)^2+(g(x_4,t)-y_4)^2+(g(x_5,t)-y_5)^2+(g(x_6,t)-y_6)^2$$
where $(x_1,y_1)$ stands for the first location $(1,-3)$ and $(x_2,y_2)=(3,6)$ and so on and so forth.

To minimize this function we need to calculate its derivative. Can you do that?

In [None]:
points=[(1,-3),(3,6),(5,-7),(7,2),(9,5),(11,-8)]
def Loss(t,points):

  loss=0

  for point in points:
    loss+=(t*math.sin(point[0])+3*t*math.cos(2*point[0])-point[1])**2

  return loss

def Loss_derivative(t,points):

  loss_derivative=0
  for point in points:
    loss_derivative+=2*(t*math.sin(point[0])+3*t*math.cos(2*point[0])-point[1])*(math.sin(point[0])+3*math.cos(2*point[0]))
  return loss_derivative

The code above defines the loss function and the derivative of the loss function. What are the next steps of gradient descent?


In [None]:
learning_rate=0.01
t=7
for i in range(50):
  t=t-learning_rate*Loss_derivative(t,points)
  print('t=',t,'Loss=',Loss(t,points))

t= 2.6014751833810434 Loss= 19.296234176131783
t= 2.0951248947961356 Loss= 5.000942050646581
t= 2.0368347576616816 Loss= 4.811497959022216
t= 2.0301245016121605 Loss= 4.808987407748666
t= 2.0293520289698974 Loss= 4.808954137417335
t= 2.029263103304671 Loss= 4.8089536965122015
t= 2.0292528663415688 Loss= 4.808953690669237
t= 2.0292516878807354 Loss= 4.8089536905918076
t= 2.029251552218438 Loss= 4.808953690590778
t= 2.0292515366012376 Loss= 4.808953690590765
t= 2.0292515348034135 Loss= 4.808953690590765
t= 2.029251534596451 Loss= 4.808953690590766
t= 2.029251534572626 Loss= 4.808953690590767
t= 2.0292515345698834 Loss= 4.808953690590766
t= 2.0292515345695676 Loss= 4.808953690590766
t= 2.0292515345695312 Loss= 4.808953690590767
t= 2.029251534569527 Loss= 4.808953690590767
t= 2.0292515345695263 Loss= 4.808953690590766
t= 2.0292515345695263 Loss= 4.808953690590766
t= 2.0292515345695263 Loss= 4.808953690590766
t= 2.0292515345695263 Loss= 4.808953690590766
t= 2.0292515345695263 Loss= 4.808953

Now, what if the engineers had a different route function that looked like this:
$$g(x,t_1,t_2)=t_1 \sin x +3t_2 \cos (2x)$$

1. Can you write the loss function? \\
2. Can you think of a way to write its derivative?

In [None]:
def Multi_Loss(t1,t2,points):

  loss=0

  for point in points:
    loss+=(t1*math.sin(point[0])+3*t2*math.cos(2*point[0])-point[1])**2

  return loss

def Multi_Loss_derivative(t1,t2,points):

  t1_loss_derivative=0
  t2_loss_derivative=0
  for point in points:
    t1_loss_derivative+=2*(t1*math.sin(point[0])+3*t2*math.cos(2*point[0])-point[1])*(math.sin(point[0]))
    t2_loss_derivative+=2*(t1*math.sin(point[0])+3*t2*math.cos(2*point[0])-point[1])*(3*math.cos(2*point[0]))
  return t1_loss_derivative,t2_loss_derivative


In [None]:
learning_rate=0.01
t1=7
t2=10
for i in range(200):
  temp1=t1-learning_rate*Multi_Loss_derivative(t1,t2,points)[0]
  temp2=t2-learning_rate*Multi_Loss_derivative(t1,t2,points)[1]
  t1=temp1
  t2=temp2
  print('t1=',t1,'t2=',t2, 'Multi_Loss=',Multi_Loss(t1,t2,points))

t1= 5.702221987860843 t2= 4.790878399534884 Multi_Loss= 394.5327076916003
t1= 5.098816596913342 t2= 2.784623726046976 Multi_Loss= 88.84728870046133
t1= 4.769573995349589 t2= 2.024059397620877 Multi_Loss= 40.38576553137664
t1= 4.550795608832342 t2= 1.7475009238420722 Multi_Loss= 31.054773866430455
t1= 4.378621284604312 t2= 1.6585387586894524 Multi_Loss= 27.806400223063104
t1= 4.228053186986757 t2= 1.6418434854215889 Multi_Loss= 25.622709962054316
t1= 4.089224130784962 t2= 1.6525594427441792 Multi_Loss= 23.732435641885846
t1= 3.9581611714795435 t2= 1.6732576282712481 Multi_Loss= 22.008429882624075
t1= 3.833190646407523 t2= 1.697182148451542 Multi_Loss= 20.421916977371172
t1= 3.7135388338032413 t2= 1.7217299116259874 Multi_Loss= 18.95976504997006
t1= 3.598787137736083 t2= 1.7459130183959901 Multi_Loss= 17.611897143899963
t1= 3.48865985598626 t2= 1.7693711852643057 Multi_Loss= 16.369330835255955
t1= 3.3829413943728266 t2= 1.791987427580736 Multi_Loss= 15.223831937036856
t1= 3.2814438680754

The Engineers think it is a good idea to come up with a new route that looks like this:
$$g(x,t_1,t_2,t_3)=t_1 \sin x +3t_2 \cos (2t_3x).$$
Let us compute the loss function and come up with its derivative.
We can see that the loss function with respect to one point is:
$$f_i(t_1,t_2,t_3)=(t_1 \sin(x_i)+3t_2\cos (2t_3x_i)-y_i)^2$$

So if we take the point $(1,-3)$ the loss for that point is:
$$f_i(t_1,t_2,t_3)=(t_1 \sin(1)+3t_2\cos (2t_3×1)-(-3))^2$$
To get the total loss, we sum up the losses overall of the points.



In [None]:
def Multi_Loss_oscilation(t1,t2,t3,points):

  loss=0
  #this is the computation for the loss function.
  for point in points:
    loss+=(t1*math.sin(point[0])+3*t2*math.cos(2*t3*point[0])-point[1])**2

  return loss

For the derivative of the loss function, we need to calculate the gradient. That is, the partial derivatives $\big(\frac{∂f}{∂t_1},\frac{∂f}{∂t_2},\frac{∂f}{∂t_3}\big)$. for the first partial derivative we have:
$$\frac{∂f}{∂t_1}=Σ\bigg[2(t_1\sin(x_i)+3t_2\cos(2*t_3x_i)-y_i)×\sin(x_i)\bigg].$$
Can you calculate the rest?

In [None]:
def Multi_Loss_derivative_Oscilation(t1,t2,t3,points):

  t1_loss_derivative=0
  t2_loss_derivative=0
  t3_loss_derivative=0
  for point in points:
    t1_loss_derivative+=2*(t1*math.sin(point[0])+3*t2*math.cos(2*point[0]*t3)-point[1])*(math.sin(point[0]))
    t2_loss_derivative+=2*(t1*math.sin(point[0])+3*t2*math.cos(2*point[0]*t3)-point[1])*(3*math.cos(2*point[0]*t3))
    t3_loss_derivative+=2*(t1*math.sin(point[0])+3*t2*math.cos(2*point[0]*t3)-point[1])*(-3*t2*math.sin(2*point[0]*t3)*2*point[0])

  return t1_loss_derivative,t2_loss_derivative,t3_loss_derivative

Now that we have the loss derivatives, we can apply the gradient descent!

In [None]:
learning_rate=0.001
t1=7
t2=10
t3=1
for i in range(200):
  temp1=t1-learning_rate*Multi_Loss_derivative_Oscilation(t1,t2,t3,points)[0]
  temp2=t2-learning_rate*Multi_Loss_derivative_Oscilation(t1,t2,t3,points)[1]
  temp3 = t3-learning_rate*Multi_Loss_derivative_Oscilation(t1,t2,t3,points)[2]
  t1=temp1
  t2=temp2
  t3=temp3
  print('t1=',t1,'t2=',t2, 't3=',t3, 'Multi_Loss=',Multi_Loss(t1,t2,points))

t1= 6.8702221987860845 t2= 9.479087839953488 t3= -2.0993136707304805 Multi_Loss= 2117.7879050108304
t1= 6.80381242500952 t2= 9.036553191432583 t3= 10.633179031549734 Multi_Loss= 1899.3760366033928
t1= 6.9090058998922705 t2= 8.63468312851039 t3= 10.064316316143035 Multi_Loss= 1730.237793516303
t1= 6.962647854164794 t2= 8.193736509860111 t3= 5.9914196740738985 Multi_Loss= 1547.7713626264062
t1= 6.928247045815875 t2= 7.673150130555803 t3= 7.359831120999823 Multi_Loss= 1336.313262625689
t1= 6.934623751700064 t2= 7.377844492996604 t3= 7.4741292199055485 Multi_Loss= 1226.159741327423
t1= 7.032336069146034 t2= 7.049757418914286 t3= 10.430877639656718 Multi_Loss= 1118.3388064782275
t1= 6.940070588943987 t2= 6.698758365307682 t3= 11.578780912890034 Multi_Loss= 991.2245148464278
t1= 6.846314611120345 t2= 6.384014741726507 t3= 13.875919247183742 Multi_Loss= 883.1153993615997
t1= 6.8393190681741105 t2= 6.0764024498731875 t3= 18.57666936101492 Multi_Loss= 790.7683480136013
t1= 6.820496379960784 t2=

What do you think is happening? Try different initial values and learning rates, and see if you get better results. Go back to the slides (slide  26) to get some insight and see if you can explain what is happening here.

Use the rest of the notebook to do your own gradient descent problem!