## Linear Regression
Linear regression is a statistical method for modeling the relationship between a dependent variable and one or more dependent variables.

### Linear Regression Assumptions:
1. Linearity: The relationship between the dependent and independent variables should be linear. This means that the change in the dependent variable is constant for a given change in the independent variable.

2. Independence: The observations in the dataset should be independent of each other. This means that the information about one observation should not affect the information about another observation.

3. Normality: The residuals (the differences between the observed values and the predicted values) should be normally distributed. This assumption is important because many statistical tests, including those used in linear regression, assume normality.

4. Normalization: The residuals (the differences between the observed values and the predicted values) should be normally distributed. This assumption is important because many statistical tests and comparisons in linear regression assume normality. 

## Implementation From Scratch
Suppose we want to predict the price of a house based on its square footage. We can use linear regression to find a linear equation that best fits the data. The linear equation can then be used to predict the price of a new house based on its square footage.

In [7]:
import numpy as np
import pandas as pd

#Generate simulated data on the sqaure footage and price of houses
square_footage = np.random.randint(1000, 3000, size=100)
price = 100000 + 500 * square_footage + np.random.randn(100) * 10000

#Create a dataframe to store the data
data = pd.DataFrame({
    'square_footage': square_footage,
    'price': price
})

data

Unnamed: 0,square_footage,price
0,2253,1.219895e+06
1,1523,8.676320e+05
2,2472,1.338827e+06
3,2440,1.323748e+06
4,2211,1.223805e+06
...,...,...
95,2381,1.285492e+06
96,1907,1.051109e+06
97,1507,8.740639e+05
98,2479,1.335870e+06


#### 1. calculate the mean and variance of the sqaure footage and price data

In [9]:
def calculate_mean_and_variance(data):
    """ Calculates the mean and variance of a Numpy array.
    Args: data: A Numpy array
    Returns: A tuple containing the meand and variance of the data.
    """

    mean = np.mean(data)
    variance = np.var(data)
    return mean, variance

square_footage_mean, square_footage_variance = calculate_mean_and_variance(data['square_footage'])
price_mean, price_variance = calculate_mean_and_variance(data['price'])

#### 2. calculate the covariance between the square footage and the price data.

In [None]:
def calculate_covariance(data_1, data_2):
    """Calculates the covariance between two Numpy arrays.
    Args: data_1: A Numpy array, data_2: A Numpy array
    Returns: The covariance of data_1 and data_2.
    """
    covariance = np.cov(data_1, data_2)[0][1]
    return covariance

