Sascha Spors,
Professorship Signal Theory and Digital Signal Processing,
Institute of Communications Engineering (INT),
Faculty of Computer Science and Electrical Engineering (IEF),
University of Rostock,
Germany

# Data Driven Audio Signal Processing - A Tutorial with Computational Examples

Master Course #24512

- lecture: https://github.com/spatialaudio/data-driven-audio-signal-processing-lecture
- tutorial: https://github.com/spatialaudio/data-driven-audio-signal-processing-exercise

Feel free to contact lecturer frank.schultz@uni-rostock.de

# Gradient Descent

- a nice 2D loss surface is discussed with Fig. 4.4(b) in the highly recommended textbook https://doi.org/10.1007/978-3-030-40344-7 (page 150)
- this loss function has one global minimum, three local minima, one local maximum and four saddle points
- while this is still a toy example spanning a comparable simple surface, different gradient descents can be studied when varying
    - starting point
    - learning rate
    - stop criterion

In [None]:
import numpy as np
import matplotlib.pyplot as plt
from matplotlib import cm

In [None]:
matplotlib_widget_flag = True

In [None]:
if matplotlib_widget_flag:
    %matplotlib widget

In [None]:
w1 = np.linspace(-2, 3, 1000, endpoint=False)
w2 = np.linspace(-2, 3, 1000, endpoint=False)
W1, W2 = np.meshgrid(w1, w2, indexing='xy')
# cf. Fig. 4.4(b) from https://doi.org/10.1007/978-3-030-40344-7 
J = (W1**4 + W2**4) / 4 - (W1**3 + W2**3) / 3 - W1**2 - W2**2 + 4

In [None]:
# local maximum at (0,0) -> J(0,0) = 4
J[W1==0][w2==0]

In [None]:
# local minimum at (2,-1) -> J(2,-1) = 11/12 = 0.91666667
J[W1==2][w2==-1]

In [None]:
# local minimum at (-1,-1) -> J(-1,-1) = 19/6 = 3.16666667
J[W1==-1][w2==-1]

In [None]:
# local minimum at (-1,2) -> J(-1,2) = 11/12 = 0.91666667
J[W1==-1][w2==2]

In [None]:
# global minimum at (2,2) -> J(2,2) = -4/3 = -1.33333333
np.min(J), J[W1==2][w2==2], W1[np.min(J) == J], W2[np.min(J) == J]

In [None]:
# saddle points at
# (2,0); (0,-1); (-1,0); (0,2)
# J = 
J[W1==2][w2==0], J[W1==0][w2==-1], J[W1==-1][w2==0], J[W1==0][w2==2]

## Loss Surface

In [None]:
fig, ax = plt.subplots(subplot_kw={"projection": "3d"})
surf = ax.plot_surface(W1, W2, J,
                       cmap=cm.magma_r,
                       rstride=10, cstride=10,
                       linewidth=0, antialiased=False)
ax.plot([2], [2], [-4/3], 'o')
ax.set_zlim(-2, 10)
ax.set_xlabel(r'$w_1$')
ax.set_ylabel(r'$w_2$')
ax.set_zlabel(r'$J(w_1,w_2)$')
ax.view_init(elev=65, azim=-135, roll=0)
fig.colorbar(surf, shrink=0.67, aspect=20)

## Gradient Descent

With the chosen parameters
- `w_act = np.array([[3], [0+1e-3]])`
- `step_size = 1e-2`
- `N = 2**10`
the gradient descent has a delicate outcome: it approaches one saddle point in the beginning, comparably fast; and because we are slightly offset with $w_2 = 1e-3$ the GD will not die on the saddle point, but rather (comparably slowly) pursues to the global minimum, making a radical turn close to the saddle point.

1. Set init vallues such that GD will end in a saddle point
2. What possible choices to init $w_2$ for letting GD path arrive at the local minimum (2,-1)
3. Do we have a chance with the given starting parameters and plain gradient descent algorithm, that the GD path finds its way to the local minima (-1,-1) or (-1,2)?

In [None]:
w_act = np.array([[3], [0+1e-3]])
step_size = 1e-2
N = 2**10

# gradient descent
w1w2J = np.zeros([3, N])
for i in range(N):
    # calc gradient
    grad_J_to_w = np.array([[w_act[0, 0]**3 - w_act[0, 0]**2 - 2*w_act[0, 0]],
                            [w_act[1, 0]**3 - w_act[1, 0]**2 - 2*w_act[1, 0]]])
    # GD update
    w_act = w_act - step_size * grad_J_to_w
    # calc cost with current weights
    J_tmp = (w_act[0, 0]**4+w_act[1, 0]**4)/4 -\
        (w_act[0, 0]**3 + w_act[1, 0]**3)/3 -\
        w_act[0, 0]**2 - w_act[1, 0]**2 + 4
    # store the path for plotting
    w1w2J[0:2, i] = np.squeeze(w_act)
    w1w2J[2, i] = J_tmp

## Plot Loss Surface and Gradient Descent Path

In [None]:
fig, ax = plt.subplots(subplot_kw={"projection": "3d"})
surf = ax.plot_surface(W1, W2, J,
                       cmap=cm.magma_r,
                       rstride=10, cstride=10,
                       linewidth=0, antialiased=False)
ax.plot(w1w2J[0,:], w1w2J[1,:], w1w2J[2,:],
        'C0x-', ms=1, zorder=3)
ax.set_zlim(-2, 10)
ax.set_xlabel(r'$w_1$')
ax.set_ylabel(r'$w_2$')
ax.set_zlabel(r'$J(w_1,w_2)$')
ax.view_init(elev=65, azim=-135, roll=0)
fig.colorbar(surf, shrink=0.67, aspect=20)

w1w2J[:,-1]

## Copyright

- the notebooks are provided as [Open Educational Resources](https://en.wikipedia.org/wiki/Open_educational_resources)
- feel free to use the notebooks for your own purposes
- the text is licensed under [Creative Commons Attribution 4.0](https://creativecommons.org/licenses/by/4.0/)
- the code of the IPython examples is licensed under the [MIT license](https://opensource.org/licenses/MIT)
- please attribute the work as follows: *Frank Schultz, Data Driven Audio Signal Processing - A Tutorial Featuring Computational Examples, University of Rostock* ideally with relevant file(s), github URL https://github.com/spatialaudio/data-driven-audio-signal-processing-exercise, commit number and/or version tag, year.