<a href="https://colab.research.google.com/github/ivanozono/MathFuncsDSciencieAI/blob/main/(9)Calculating_Errors_in_ML_and_Linear_Regression.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

---
**Calculating Errors in Machine Learning and Linear Regression**

---



---
**Introduction to Error Calculation**

---

In Machine Learning, error calculation is a crucial step as it provides a quantifiable measure of how well the model is performing. The error of a model is the difference between the actual data and the data predicted by the model. The smaller the error, the better the model is at predicting the data.

There are several ways to calculate the error, and one of the most common methods is using the Mean Squared Error (MSE).

---
**Understanding Mean Squared Error (MSE)**

---

Mean Squared Error (MSE) is a common method used to calculate the error of a model. It is the average of the squared differences between the actual and predicted values. Here's the formula for MSE:

MSE = 1/n * Σ(actual - prediction)²

where:
- n is the total number of data points
- actual is the actual data value
- prediction is the data value predicted by the model

The squaring is necessary to remove any negative signs. It also gives more weight to larger differences. It's called the 'mean squared' error because you're finding the average of a set of errors.

---
**Introduction to Linear Regression**

---

Linear Regression is a statistical method that allows us to study relationships between two continuous (quantitative) variables:

- One variable, denoted x, is regarded as the predictor, explanatory, or independent variable.
- The other variable, denoted y, is regarded as the response, outcome, or dependent variable.

The goal of linear regression is to find the best fitting line through the data points.

---
**Understanding Ordinary Least Squares**

---

Ordinary Least Squares (OLS) is a type of linear least squares method for estimating the unknown parameters in a linear regression model. OLS chooses the parameters of a linear function of a set of explanatory variables by the principle of least squares: minimizing the sum of the squares of the differences between the observed dependent variable (values of the variable being predicted) in the given dataset and those predicted by the linear function.

In other words, it tries to minimizes the sum of squared errors (SSE) or mean squared error (MSE) between the target variable (y) and our predicted output over all samples in the dataset.

OLS is used widely in data analysis and econometrics.

---

**Simple Linear Regression with MSE Evaluation**

---

In [1]:
# Importing necessary libraries
import numpy as np
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error

# Creating a simple dataset
np.random.seed(0)
x = np.random.rand(100, 1)
y = 2 + 3 * x + np.random.rand(100, 1)

# Fitting a Linear Regression model
model = LinearRegression()
model.fit(x, y)

# Making predictions
y_pred = model.predict(x)

# Calculating Mean Squared Error
mse = mean_squared_error(y, y_pred)
mse

0.07623324582875007



The code demonstrates how to fit a linear regression model to a hypothetical dataset and evaluate its performance using the Mean Squared Error (MSE).

---

**Code Explanation:**

1. **Setting up the Environment and Libraries:**
    - Necessary components such as `numpy` and relevant modules from `sklearn` are imported.

2. **Generating the Hypothetical Dataset:**

    A seed is set for reproducibility. `x` values are generated randomly. `y` values are computed based on a linear equation with some random noise added to it.

3. **Model Training:**

    A linear regression model is initialized and trained using the hypothetical dataset.

4. **Making Predictions:**
   
    The trained model is then utilized to predict the `y` values for the input data `x`.

5. **Performance Evaluation (Mean Squared Error):**
    
    Mean Squared Error (MSE) is a commonly used metric to quantify the difference between the actual and predicted values. The closer the MSE is to zero, the better the model's performance. In this code, the MSE is calculated using the true `y` values and the predicted `y_pred` values.

6. **Output:**
    The calculated MSE value is the output of the last code line.

---



As we can see, the Mean Squared Error of our model is approximately 0.076. This means that our model's predictions are, on average, 0.076 units away from the actual values. This is a relatively small error, indicating that our model is doing a good job at predicting the data.


Understanding how to calculate errors and evaluate models is a crucial skill for any Data Scientist. It allows us to determine how well our model is performing and identify areas for improvement. Remember, the goal is to minimize the error to make our model as accurate as possible.