## How KNN Works as a Regressor

In classification, KNN determines the class of a point by majority voting among its k-nearest neighbors. In regression, the approach is similar, but instead of voting, it averages the target values of the k-nearest neighbors.

1. Calculate distances: Compute the distance (e.g., Euclidean distance) between the query point and all points in the training data.
2. Identify k-nearest neighbors: Select the k points that are closest to the query point.
3. Aggregate outputs: Take the average of the target values of these neighbors to predict the output.
Example

Suppose you want to predict the salary of an employee with 3.5 years of experience. Using KNN with k=3, the algorithm finds the three closest data points:

*  Neighbor 1: 3 years of experience → $45,000 
*  Neighbor 2: 4 years of experience → $50,000
*  Neighbor 3: 3.8 years of experience → $48,000

Prediction=(45,000+50,000+48,000)/3=47,666.67

In [8]:
from sklearn.neighbors import KNeighborsRegressor

X = [[1], [2], [3], [4], [5]] # experience in years
y= [30, 40, 45, 50, 55]  # Salary in $1000

#Initialize and fit the KNN regressor
knn_regressor = KNeighborsRegressor(n_neighbors=3)
knn_regressor.fit(X, y)


# Predict salary for 3.5 years of experience
print(knn_regressor.predict([[3.5]]))



[45.]


# Support Vector Machines (SVM) Regression
Working:

SVM for regression, known as Support Vector Regression (SVR), predicts a continuous output by finding a hyperplane that fits the data with a margin of tolerance.

1. Define margin \text{<span class="katex"><span class="katex-mathml">\epsilon}: Identify a margin of tolerance where predictions within this margin are not penalized.
2. Find optimal hyperplane: Use support vectors (key data points) to construct the hyperplane that minimizes prediction error while staying within the margin.
3. Kernel trick: Extend to non-linear data using kernel functions like RBF or polynomial.

For example, if a point lies within the \text{<span class="katex"><span class="katex-mathml">\epsilon}-tube, no penalty is applied. Points outside the margin contribute to the error.

In [12]:
from sklearn.svm import SVR

X = [[1], [2], [3], [4], [5]]
y = [2.1, 2.9, 3.7, 4.5, 5.3]

# Initialize and fit SVR
svr = SVR(kernel='rbf', C=1, epsilon=0.1)
svr.fit(X, y)

# Predict
print(svr.predict([[2.5]]))

[3.35181503]


## Decision Tree Regression
Working:

Decision Tree Regression splits the data into smaller subsets based on feature values to minimize the variance in each subset.

1. Choose the best split: Evaluate all possible splits and choose the one that reduces the variance in target values the most.
2. Create branches: Split the data into branches based on the selected feature value.
3. Repeat recursively: Continue splitting until a stopping condition is met (e.g., maximum depth or minimum samples per leaf).

For instance, if splitting on experience at 2.5 years minimizes variance, that split becomes a branch, and predictions within that branch are averaged.

In [18]:
from sklearn.tree import DecisionTreeRegressor

X = [[1], [2], [3], [4], [5]]
y = [2.1, 2.9, 3.7, 4.5, 5.3]

#Initialize and fit the regressor 
dt_regressor = DecisionTreeRegressor(max_depth=3)
dt_regressor.fit(X, y)

# Predict
print(dt_regressor.predict([[2.5]]))

[2.9]


# Random Forest Regression
Working:

Random Forest Regression combines predictions from multiple decision trees to improve accuracy and reduce overfitting.

1. Build multiple trees: Construct decision trees on different subsets of data and features.
2. Aggregate predictions: Take the average of all tree predictions to get the final output.
For example, if three trees predict values [3.0,3.2,3.5], the output is:

Prediction=(3.0+3.2+3.5)/3=3.2333...≈3.23

In [23]:
from sklearn.ensemble import RandomForestRegressor

X = [[1], [2], [3], [4], [5]]
y = [2.1, 2.9, 3.7, 4.5, 5.3]

rf_regressor = RandomForestRegressor(n_estimators=10, random_state=42)
rf_regressor.fit(X, y)

print(rf_regressor.predict([[2.5]]))

[3.06]
