# Radial Basis Functions (RBF) Kernel 

This process has us pull points off of the line and plot them on a "mountain range," split the points on the range, and then translate them back to their original position with multiple cut points.

![image.png](attachment:image.png)

![image-2.png](attachment:image-2.png)

![image-3.png](attachment:image-3.png)

![image-4.png](attachment:image-4.png)

# The technique

Build a "mountain" on top of every point. The technical term for these mountains is radial basis functions.
![image.png](attachment:image.png)
We can multiply the mountain over the red point by -1, which flips it, and then add the functions together. 
![image-2.png](attachment:image-2.png)
At every point, just **add the three heights.** 
![image-3.png](attachment:image-3.png)
Next, move the points to the corresponding sum. 
![image-4.png](attachment:image-4.png)
After we do this we can easily draw a line that splits the points in two. 
![image-5.png](attachment:image-5.png)

How do we find this weight?

1. Place one mountain/function on top of each point. Under each point, record the value of each function, or how tall the mountain is at that point.
2. Continue doing this for each point and each function.
3. Each point will have one value of 1 in their vector of heights, since the height of the mountain corresponding to that point is one by construction. In general, the other values will be small.

![image-6.png](attachment:image-6.png)
![image-7.png](attachment:image-7.png)

Take the three height vectors and plot them in a three-dimensional space.
![image-8.png](attachment:image-8.png)
Since we have as many dimensions as points, we'll be able to separate our points well. 

Using the application of our known SVM algorithm, we should be able to separate these red and blue points with a plane with the equation, 2x - 4y + 1z = -12x−4y+1z=−1. 
![image-9.png](attachment:image-9.png)
If we take the constants of the equation off the plane, they become the constants of our model.

The first mountain has a weight of 2, the second -4, and the third has a weight of 1. 
![image-10.png](attachment:image-10.png)
The line that separates these points is the line at height -1.

![image-11.png](attachment:image-11.png)

# RBF Kernel in higher dimensions
Here, we still want to draw a mountain at every point.
![image-4.png](attachment:image-4.png)
In the 3D case, this mountain is a Gaussian paraboloid and it lifts the points.

![image.png](attachment:image.png)
 If you want to separate the point from the rest, we can cut it with a plane. 
![image-2.png](attachment:image-2.png)
The plane will intersect the paraboloid at a circle and this circle is what will become our boundary.
![image-3.png](attachment:image-3.png)

If we have more points we use a similar method to find the right weights for the combination of mountains that will bring the majority of the red points up while keeping the majority of the blue points down. Then we cut this with the plane. When we project down, the intersections of the curve on the plane will give us the boundaries that will split our data.
![image.png](attachment:image.png)
![image-2.png](attachment:image-2.png)

# Gamma Parameter in RBF Kernel

We can use some very wide ones or some very narrow ones. 

This is a hyperparameter that we tune during training and it is called the gamma parameter.

- A large gamma will move us to a narrow curve
- A small gamma would give us a wide curve

![image.png](attachment:image.png)

In higher dimensions, this is very similar.

- A large gamma will give us some pointy mountains
- A small gamma would give us wider mountains

![image-2.png](attachment:image-2.png)

The gamma matters a lot in the algorithm. 

Large values of gamma tend to overfit, and small ones tend to underfit.

![image-3.png](attachment:image-3.png)

## What is $\gamma$?

here's where we define these radial basis functions are. We'll use the Gaussian or normal distribution for this. The Gaussian is a very well-used function in statistics that has this formula,

$$ y=\frac{1}{\sqrt{2\pi}}e^{-\frac{x^2}{2}}$$

![image.png](attachment:image.png)

In the general case, when $\mu$ is the very center of the curve, and $\sigma$ is related to its width, we have these rules of thumb:

$$ y=\frac{1}{\sigma\sqrt{2\pi}}e^{-\frac{{(x - \mu)}^2}{2\sigma^2}}$$

- If $\sigma$ is large, then the curve is very wide
- if $\sigma$ is small, then the curve is very narrow

![image-2.png](attachment:image-2.png)

So in order to define gamma, we just use

$$ \gamma = \frac{1}{ 2\sigma^2}$$

And keep in mind

- if $\gamma$ is large, then $\sigma$ is small, so the curve is narrow
- If $\gamma$ is small, then $\sigma$ is large and the curve is wide

![image.png](attachment:image.png)

In the higher dimensional case, this formula becomes a little more complicated. 

But as long as we think of gamma as some parameter that is associated with the width of the curve in an inverse way, then we are grasping the concept of the gamma parameter and the RBF kernel.

![image.png](attachment:image.png)