# Understanding Simple Linear Regression

In this notebook, we will explore the concept of Simple Linear Regression, a fundamental statistical and machine learning method used for predicting a quantitative dependent variable based on a single independent variable. We will go through the theory, assumptions, and implementation of Simple Linear Regression.

## Table of Contents

1. [Introduction to Simple Linear Regression](#section1)
2. [Assumptions of Simple Linear Regression](#section2)
3. [Implementing Simple Linear Regression](#section3)
4. [Interpreting the Results](#section4)
5. [Conclusion](#section5)

<a id='section1'></a>
## 1. Introduction to Simple Linear Regression

Simple Linear Regression is a statistical method that allows us to summarize and study relationships between two continuous (quantitative) variables:

1. One variable, denoted x, is regarded as the predictor, explanatory, or independent variable.
2. The other variable, denoted y, is regarded as the response, outcome, or dependent variable.

The simple linear regression model is expressed as:

y = β0 + β1x + ε

where:

- y is the dependent variable.
- x is the independent variable.
- β0 is the y-intercept.
- β1 is the slope.
- ε is the error term.

The goal of simple linear regression is to create a linear model that minimizes the sum of squares of the residuals/error (ε).

<a id='section2'></a>
## 2. Assumptions of Simple Linear Regression

Simple Linear Regression makes several assumptions:

- **Linearity**: The relationship between X and the mean of Y is linear.
- **Homoscedasticity**: The variance of residual is the same for any value of X.
- **Independence**: Observations are independent of each other.
- **Normality**: For any fixed value of X, Y is normally distributed.

When these assumptions are violated, the reliability of the forecasted values and the insights from the model are questionable.

<a id='section3'></a>
## 3. Implementing Simple Linear Regression

In this section, we will implement Simple Linear Regression using Python's Scikit-Learn library. We will use a hypothetical dataset for this purpose. Let's start by importing the necessary libraries.

In [None]:
# Importing necessary libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn import metrics

# This allows plots to appear directly in the notebook
%matplotlib inline

Now, let's create a hypothetical dataset with one independent variable and one dependent variable.

In [None]:
# Creating a hypothetical dataset
np.random.seed(0)
x = np.random.rand(100, 1)
y = 2 + 3 * x + np.random.rand(100, 1)

# Plotting the dataset
plt.scatter(x,y,s=10)
plt.xlabel('x')
plt.ylabel('y')
plt.show()

Next, we will split our dataset into training and testing sets. The training set will be used to train the linear regression model, while the testing set will be used to evaluate the model's performance.

In [None]:
# Splitting the dataset into training and testing sets
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2, random_state=0)

Now, we will train our Simple Linear Regression model on the training set using Scikit-Learn's `LinearRegression` class.

In [None]:
# Training the Simple Linear Regression model on the training set
regressor = LinearRegression()
regressor.fit(x_train, y_train)

After training the model, we can obtain the slope and intercept of the regression line.

In [None]:
# Getting parameters
intercept = regressor.intercept_[0]
slope = regressor.coef_[0][0]

print(f'The intercept of the regression line is: {intercept}')
print(f'The slope of the regression line is: {slope}')

Now, let's use our trained model to make predictions on the testing set and visualize the regression line.

In [None]:
# Making predictions
y_pred = regressor.predict(x_test)

# Visualizing the training set results
plt.scatter(x_train, y_train, color = 'red')
plt.plot(x_train, regressor.predict(x_train), color = 'blue')
plt.title('Training set')
plt.xlabel('x')
plt.ylabel('y')
plt.show()

# Visualizing the test set results
plt.scatter(x_test, y_test, color = 'red')
plt.plot(x_train, regressor.predict(x_train), color = 'blue')
plt.title('Test set')
plt.xlabel('x')
plt.ylabel('y')
plt.show()

<a id='section4'></a>
## 4. Interpreting the Results

The slope and intercept of the regression line are known as the model coefficients or parameters. In our case:

- The y-intercept (β0) is around 2.58. This means that if the x variable (independent variable) is zero, then the expected output (y or dependent variable) would be 2.58.
- The slope (β1) is around 2.91. This means that for each one unit change in x, the change in y is about 2.91.

The blue lines in the plots above represent the regression line, which is the best fit line through the data points. As we can see, the model seems to fit the data quite well.

<a id='section5'></a>
## 5. Conclusion

In this notebook, we have gone through the basics of Simple Linear Regression, its assumptions, and its implementation using Python's Scikit-Learn library. We have also interpreted the results of our model.

Simple Linear Regression is a powerful tool for understanding the linear relationships between two variables, and it's a fundamental technique in statistical learning and machine learning. However, it's important to remember that it makes several assumptions about the data, and if these assumptions are violated, the results may not be reliable.

In the real world, data is often more complex and may require more sophisticated models. But understanding Simple Linear Regression is a good starting point for diving into more complex regression models and machine learning techniques.