Linear regression makes several assumptions about the data:

1. **Linearity:** The relationship between the independent and dependent variables is linear.
2. **Independence:** The observations are independent of each other.
3. **Homoscedasticity:** The variance of the errors is constant across all levels of the independent variables.
4. **Normality:** The errors are normally distributed with a mean of zero.
5. **No multicollinearity:** The independent variables are not highly correlated with each other.

In [10]:
# Import necessary libraries
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
import pandas as pd
from pathlib import Path

# Read the dataset from a CSV file
df = pd.read_csv(str(Path().resolve().parent) + "\\4. DataFrame\\sample-data\\kc_housingdata.csv")

In [11]:
# Drop the 'date' column from the DataFrame
df1 = df.drop(['date', 'id', 'zipcode'], axis=1)

In [12]:
# Convert categorical variable into dummy/indicator variables
df1 = pd.get_dummies(df1)

# Define the independent variables (x) and the dependent variable (y)
x = df1.drop(['price'], axis=1)
y = df1['price']

# Split the dataset into a training set and a test set
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.20, random_state=400)

In [13]:
# Create a Linear Regression model
reg = LinearRegression()
# Train the model using the training set
reg.fit(x_train, y_train)
# Evaluate the model using the test set
reg.score(x_test, y_test)

0.699131159635666

In [14]:
# Print the coefficients of the model
reg.coef_

# Print the intercept of the model
reg.intercept_

# Print the singular values of the model
reg.singular_

array([6.15022001e+06, 2.20190910e+06, 1.70495966e+05, 6.96478823e+04,
       5.40880768e+04, 5.07143858e+04, 3.35447897e+03, 1.06762120e+02,
       9.12800925e+01, 8.61443165e+01, 7.78519903e+01, 6.12090162e+01,
       4.67294949e+01, 1.71650134e+01, 1.50288076e+01, 1.03604809e+01,
       4.03326499e-11])

In [17]:
# Predict the dependent variable using the test data
x_predicted_data = pd.DataFrame(reg.predict(x_test))
x_predicted_data

Unnamed: 0,0
0,426955.818751
1,357916.962132
2,576195.325171
3,513480.278815
4,387855.156609
...,...
4318,837905.317280
4319,901186.172316
4320,275405.903983
4321,740276.647227
