### Mechanics of Linear Regression

#### Equation of a Simple Linear Regression

For a simple linear regression with one independent variable (X) and one dependent variable (Y), the equation takes the form:

Y = β₀ + β₁X + ε


- Y is the dependent variable.
- X is the independent variable.
- β₀ is the intercept (where the line intercepts the y-axis when X is zero).
- β₁ is the slope (how much Y changes when X changes by one unit).
- ε represents the error term (the difference between the observed Y and the predicted Y).

### Fitting the Line

The goal of linear regression is to find the best-fitting line that minimizes the difference between the actual Y values and the predicted Y values (the line of best fit). This is often done by minimizing the sum of the squared differences between the observed Y values and the values predicted by the linear equation.

### Interpreting Coefficients

The coefficients (β₀ and β₁) quantify the relationship between the independent and dependent variables.

- β₀ represents the expected value of Y when X is zero.
- β₁ represents the change in Y for a one-unit change in X.

### Assumptions

Linear regression assumes:

- A linear relationship between the variables.
- Normally distributed residuals with constant variance (homoscedasticity).
- Independence of observations (no autocorrelation between residuals).

## Interpreting Results

### Coefficient Sign and Magnitude

- A positive β₁ indicates a positive relationship between X and Y, while a negative β₁ indicates an inverse relationship.
- The magnitude of β₁ determines the steepness of the relationship.

### R-squared (R²) Value

- R-squared measures the proportion of variation in the dependent variable explained by the independent variables.
- Higher values indicate a better fit of the model to the data (ranges from 0 to 1).

### P-values and Confidence Intervals

- P-values associated with coefficients help determine their significance (< 0.05 indicates more significance).
- Confidence intervals provide a range within which the true population parameter is likely to fall.

### Residual Analysis

- Examining residuals helps verify assumptions. Patterns in residuals against predicted values or independent variables may indicate violations of assumptions.

Understanding the mechanics and interpreting the results of linear regression are crucial for drawing meaningful insights from data and making predictions based on the relationships between variables.


#### Import packages

In [2]:
import math
import pandas as pd
from sklearn import linear_model
from sklearn.metrics import mean_absolute_error
import matplotlib.pyplot as plt
from scipy.stats import pearsonr

In [4]:
url = 'http://wiki.stat.ucla.edu/socr/index.php/SOCR_Data_Dinov_020108_HeightsWeights'
height_weight_df = pd.read_html(url)[1][['Height(Inches)','Weight(Pounds)']]

In [5]:
num_records = len(height_weight_df)
print(num_records)

200


In [7]:
x = height_weight_df['Height(Inches)'].values.reshape(num_records, 1)
y = height_weight_df['Weight(Pounds)'].values.reshape(num_records, 1)

In [8]:
model = linear_model.LinearRegression().fit(x,y)

In [9]:
print("ŷ = " + str(model.intercept_[0]) + " + " + str(model.coef_.T[0][0]) + " x₁")

ŷ = -106.02770644878126 + 3.432676129271628 x₁
