A Gaussian kernel is given by: $$ \langle x^{(i)}, x^{(j)} \rangle = exp(\frac{|| x_i - x_j ||^2}{-\gamma})  $$

Since x is one dimensional $$ \langle x^{(i)}, x^{(j)} \rangle = exp(\frac{( x_i - x_j)^2}{-\gamma})  $$

In [None]:
def kernel_function(x, z):
    gamma = 100
    return np.exp(- ((x - z) ** 2 / gamma))

To make computation shorter, a matrix containing $ \langle x^{(i)}, x^{(j)} \rangle $ for each $ x_i , x_j $ in the training set can be created

In [None]:
def generate_kernel_matrix(x):
    # generate m x m kernel matrix
    n = x.shape[0]
    k = np.zeros((n, n))
    for i in range(n):
        for j in range(n):
            k[i, j] = kernel_function(x[i, :], x[j, :])
    return k

Main function train() that fits the regression line and plots it

In [None]:
def train(x, y, learning_rate, steps):
    beta = np.zeros((x.shape[0], 1))
    kernel_matrix = generate_kernel_matrix(x)
    labels = y.reshape(len(y), 1)

For 'normal' linear regression update rule would be : 
$$ \Theta \ += \alpha \sum_{i=1}^{n} (y^{(i)} - \Theta^T x^{(i)})  \ x $$
or 
$$ \Theta  \ += \alpha \sum_{i=1}^{n} (y^{(i)} - \Theta^T \phi(x^{(i)}))  \ \phi(x) $$
where $\phi(x)$ is a feature mapping

A kernel $ \langle x, z \rangle $ is defined as $ \phi(x)^T \phi(z) $

And the assumption is that :
$$ \Theta = \sum_{i=1}^{n} \beta _i \phi(x^{(i)}) $$

Then hypothesis can be rewritten as

$$ \Theta^T \phi(x) = \sum_{i=1}^{n} \beta _i \phi(x^{(i)})^T \phi(x) = \sum_{i=1}^{n} \beta _i \langle x^{(i)}, x \rangle$$

So if we take $ \beta $ as parameters instead of $ \Theta $ update rule can be rewritten as:
$$ \beta_j \ += \alpha (y^{(j)} - \sum_{i=1}^{n} \beta _i \langle x^{(j)}, x^{(i)} \rangle ) $$

Or with usage of kernel matrix K (defined above)

$$\beta \ +=  \ \alpha (y - K\beta) $$

In [None]:
    for i in range(steps):
        beta += learning_rate * (labels - kernel_matrix.dot(beta))

plotting part

In [1]:
    plt.scatter(x, y)

    axes = plt.gca()
    (x_min, x_max) = axes.get_xlim()

    y_a = np.empty(0)
    x_a = range(int(x_min), int(x_max))
    # Theta.transpose * feature_mapping(x) = sum beta_i * K(x_i, x)
    for i in x_a:
        s = 0
        for j in range(x.shape[0]):
            s += beta[j] * kernel_function(x[j], i)
        y_a = np.append(y_a, s)

    # plot approximated resulting curve as straight lines between segments
    for i in range(len(y_a) - 1):
        plt.plot([x_a[i], x_a[i]+1], [y_a[i], y_a[i+1]], color='r')

    plt.show()

NameError: name 'plt' is not defined

![plot](plot.png)