# Regression

Regression is a statistical technique used to model the relationship between a dependent variable and one or more independent variables. The aim of regression analysis is to find a mathematical equation that can predict the value of the dependent variable based on the values of the independent variables.

## Linear Regression

Linear regression is a type of regression analysis where the relationship between the dependent variable and the independent variable(s) is assumed to be linear. The linear regression model assumes that the dependent variable y is a linear function of one or more independent variables x, plus some random error ε:

$$y = w_0 + w_1x_1 + w_2x_2 + ... + w_nx_n + ε$$

where $w_0$ is the intercept or constant term, $w_1$, $w_2$, ..., $w_n$ are the coefficients or slopes corresponding to the independent variables $x_1$, $x_2$, ..., $x_n$, and ε represents the random error term.

The goal of linear regression is to estimate the values of the coefficients $w_0$, $w_1$, $w_2$, ..., $w_n$ that minimize the sum of squared errors between the predicted values of y and the actual values of y from the training data. In other words, we want to find the line that best fits the data.

To estimate the coefficients, we use a technique called ordinary least squares (OLS) regression. OLS regression finds the values of the coefficients that minimize the sum of squared errors or `Cost Function`:

   $$E=Σ(y - w_0 - w_1x_1 - w_2x_2 - ... - w_nx_n)^2$$

We can find the values of the coefficients that minimize this expression using calculus. The resulting equations for the coefficients of simple singular variable function are:

$$w_1 = Σ(x_i - x̄)(y_i - ȳ) / Σ(x_i - x̄)^2$$
$$w_0 = ȳ - w_1x̄$$

where $x_i$ and $y_i$ are the values of the independent and dependent variables, respectively, x̄ and ȳ are the mean values of the independent and dependent variables, respectively.

Once we have estimated the coefficients, we can use them to predict the value of the dependent variable y for new values of the independent variable(s) x.

In [3]:
import numpy as np
import matplotlib.pyplot as plt
from ipywidgets import interactive, FloatSlider
from scipy import stats

# Generate some random data
#np.random.seed(0)
x_l = np.linspace(0, 10, 100)
y_l = 2 * x_l + 1 + np.random.normal(0, 2, size=100)

slope, intercept, r, p, std_err = stats.linregress(x_l, y_l)

# Define the linear regression model
def linear_regression(x, w, b):
    return w * x + b

# Define the R^2 function
def r_squared(y_pred, y_true):
    ss_res = np.sum((y_true - y_pred) ** 2)
    ss_tot = np.sum((y_true - np.mean(y_true)) ** 2)
    return 1 - (ss_res / ss_tot)

# Define the gradient of the loss function with respect to w and b
def gradient(x, y_true, y_pred):
    dw = np.mean((y_pred - y_true) * x)
    db = np.mean(y_pred - y_true)
    return dw, db

# Define the training function
def train(x, y, learning_rate, num_iterations):
    w = 0
    b = 0
    r2_values = []
    for i in range(num_iterations):
        y_pred = linear_regression(x, w, b)
        dw, db = gradient(x, y, y_pred)
        w -= learning_rate * dw
        b -= learning_rate * db
        r2 = r_squared(y_pred, y)
        r2_values.append(r2)
    return w, b, r2_values

# Train the model
w, b, r2_values = train(x_l, y_l, learning_rate=0.1, num_iterations=100)

# Print the learned parameters
print(slope,intercept)

# Define a function to visualize the linear regression function for different values of w and b
def plot_regression(w, b):
    y_pred = linear_regression(x_l, w, b)
    r2 = r_squared(y_pred, y_l)
    plt.scatter(x_l, y_l, alpha=0.5)
    plt.plot(x_l, linear_regression(x_l, w, b), color='red')
    plt.title('Linear Regression')
    plt.xlabel('x')
    plt.ylabel('y')
    plt.text(1.5, 20, "R^2 = {:.2f}".format(r2), fontsize=12)
    plt.show()

# Create an interactive panel to explore the linear regression function
interactive_plot = interactive(plot_regression, w=FloatSlider(min=-2, max=6, step=0.01, value=1), b=FloatSlider(min=-2, max=6, step=0.01, value=0.5))
output = interactive_plot.children[-1]
output.layout.height = '500px'
interactive_plot

2.047074696174265 0.49472725880171176


interactive(children=(FloatSlider(value=1.0, description='w', max=6.0, min=-2.0, step=0.01), FloatSlider(value…

## Nonlinear Regression
We learnt how a linear function would be use in regression to model our data, but not for all of datasets, using a linear regression is proper choice. there is other kind of regressions that called Non-linear Regression. the difference is instead of using $y = w_0 + w_1x_1 + w_2x_2 + ... + w_nx_n + ε$ as a model function, we use non linear function such as:$$y = w_0 + w_1x^1 + w_2x^2 + ... + w_nx^n$$
to find best relation between x and y. The `Cost Function` respectivly would be:
   $$E=Σ(y - w_0 - w_1x^1 - w_2x^2 - ... - w_nx^n)^2$$
like linear regression, our goal is to minimize cost to fit our model properly.
The interactive panel below allows students to explore nonlinear regression for the function: $$y = w_1x^2 + w_2x + b$$

### Interactive Panel
Students can adjust the value of n (Degree) using slider and see the resulting regression function and R-squared value in real-time. The R-squared value is a measure of how well the fitted model explains the variability in the data, with values closer to 1 indicating a better fit.

To use the interactive panel, students can adjust the slider `degree` and observe how the regression function changes to fit the data. They can also observe how the R-squared value changes as they adjust the

In [2]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from ipywidgets import interactive, FloatSlider

# Generate sample data
x_nl = np.linspace(-5, 5, 500)
y_nl = 2*np.sin(x_nl) + np.random.normal(0, 0.5, 500)

# Define nonlinear regression function
def nonlinear_regression(x, y, degree):
    n = len(x)
    X = np.zeros((n, degree+1))
    for i in range(degree+1):
        X[:,i] = x**i
    beta = np.linalg.inv(X.T.dot(X)).dot(X.T).dot(y)
    return beta

# Plot data and regression line
def plot_regression(degree):
    degree = int(degree)
    beta = nonlinear_regression(x_nl, y_nl, degree)
    y_pred = np.zeros(len(x_nl))
    for i in range(degree+1):
        y_pred += beta[i]*x_nl**i
    plt.figure(figsize=(8,6))
    plt.scatter(x_nl, y_nl, s=10, label='Data')
    plt.plot(x_nl, y_pred, color='r', label='Regression')
    plt.text(0.2, 0.8, f'R-squared: {r_squared(y_pred, y_nl):.2f}', transform=plt.gca().transAxes)
    plt.legend()
    plt.show()

# Interactive panel
interactive_plot = interactive(plot_regression, degree=FloatSlider(min=2, max=20, step=1, value=3))
output = interactive_plot.children[-1]
output.layout.height = '500px'
interactive_plot

interactive(children=(FloatSlider(value=3.0, description='degree', max=20.0, min=2.0, step=1.0), Output(layout…