A linear regression machine learning algorithm is a type of supervised learning algorithm that is used to predict a continuous outcome variable based on one or more input predictor variables. The goal of a linear regression algorithm is to find the line of best fit that describes the relationship between the predictor variables and the outcome variable.

In [1]:
import numpy as np

class LinearRegression:
    def __init__(self):
        # Initialize weights to 0
        self.weights = None
    
    def fit(self, X, y):
        # Add a column of ones to the input data to account for the bias term
        X = np.hstack([np.ones((X.shape[0], 1)), X])
        # Calculate the weights using the normal equation
        self.weights = np.linalg.inv(X.T @ X) @ X.T @ y
    
    def predict(self, X):
        # Add a column of ones to the input data to account for the bias term
        X = np.hstack([np.ones((X.shape[0], 1)), X])
        # Calculate and return the predictions using the weights
        return X @ self.weights


To use this class, you would first create an instance of it:

In [2]:
regressor = LinearRegression()

The following code creates a sample of data to test the model:

In [3]:
# Generate some random input data
X = np.random.rand(100, 5)

# Generate some random output labels
y = X @ np.random.rand(5) + np.random.rand(1)

This sample data consists of 100 examples, each with 5 features. The output labels are generated using a linear combination of the input features, plus some random noise.

To evaluate the accuracy of the model, we could split the data into a training set and a test set, and use the training set to fit the model, and then evaluate the performance of the model on the test set. We will use the mean squared error (MSE) to evaluate the performance of the model. The MSE is given by the following formula:

In [None]:
MSE = 1 / n * sum((y_true - y_pred)^2)

where *y_true* is the vector of true output labels, *y_pred* is the vector of predicted output labels, and *n* is the number of examples. A lower MSE indicates a better fit of the model to the data.

Here is an example of how you could evaluate the performance of the linear regression model on this sample data:

In [4]:
from sklearn.model_selection import train_test_split

# Split the data into a training set and a test set
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

# Fit the model on the training set
regressor.fit(X_train, y_train)

# Evaluate the performance of the model on the test set
y_pred = regressor.predict(X_test)
mse = np.mean((y_test - y_pred)**2)
print("Mean squared error:", mse)

Mean squared error: 2.953298013921163e-30


This code splits the sample data into a training set and a test set, trains the linear regression model on the training set, and then evaluates the performance of the model on the test set using the MSE.

Linear regression is a statistical technique that is used to model the relationship between a dependent variable and one or more independent variables. It is a widely used method for analyzing data and making predictions. Some general use cases for linear regression include:

- *Forecasting time series data*: Linear regression can be used to model and forecast time series data, such as stock prices, weather patterns, and demand for goods and services.

- *Evaluating the impact of changes*: By training a linear regression model on data from before and after a change, such as the introduction of a new policy or the launch of a new product, it is possible to evaluate the impact of that change on some outcome.

- *Testing hypotheses*: Linear regression can be used to test hypotheses about the relationship between different variables. For example, a researcher might use linear regression to test whether there is a relationship between income and happiness.

- *Inferring causality*: In some cases, linear regression can be used to infer causality between two variables. For example, if a linear regression model shows that increasing the amount of exercise is associated with lower blood pressure, it is possible to infer that exercise causes a decrease in blood pressure. However, it is important to note that correlation does not necessarily imply causality.

Overall, linear regression is a versatile and widely used method for analyzing and making predictions based on data. It is often used as a starting point for more complex modeling and analysis, and it can provide valuable insights into the relationships between different variables in a dataset.