# Step 1: Import Necessary Libraries
In this step, we will import the required Python libraries: `numpy`, `pandas`, `matplotlib`, and `sklearn`. These libraries will help us work with data and implement linear regression.

The libraries include:
- `numpy`: for numerical operations
- `pandas`: for data manipulation
- `matplotlib.pyplot`: for visualizing the data
- `sklearn.linear_model`: for linear regression
- `sklearn.model_selection`: for splitting the data

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split

# Step 2: Create Sample Data
Next, we will create a small dataset that includes the independent variable `Square_Feet` (representing the size of houses) and the dependent variable `Price` (representing the price of houses).

We will use the `pandas` library to store the data in a DataFrame.

This dataset will serve as our example for performing linear regression to predict house prices based on square footage.

In [2]:
# Sample Data
data = {'Square_Feet': [1500, 1800, 2400, 3000, 3500, 4000, 4500, 5000],
        'Price': [400000, 450000, 500000, 600000, 650000, 700000, 750000, 800000]}
df = pd.DataFrame(data)

# Display the DataFrame
df.head()

# Step 3: Split the Data into Training and Testing Sets
In this step, we will split the data into training and testing sets.

We will use the `train_test_split` function from `sklearn.model_selection` to divide our data. The training data will be used to train the model, while the testing data will be used to evaluate its performance.

We'll use `Square_Feet` as the independent variable (feature) and `Price` as the dependent variable (target).

In [3]:
# Splitting the data into training and test sets
X = df[['Square_Feet']]  # Independent variable (feature)
y = df['Price']  # Dependent variable (target)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Display the shape of training and testing sets
X_train.shape, X_test.shape, y_train.shape, y_test.shape

# Step 4: Create and Train the Linear Regression Model
Now that the data is split into training and testing sets, we will create a linear regression model and train it using the training data.

The model will attempt to find the best-fitting line that represents the relationship between square footage and house price.

In [4]:
# Creating the model
model = LinearRegression()
model.fit(X_train, y_train)

# Check the model coefficients
model.intercept_, model.coef_

# Step 5: Make Predictions and Plot Results
After training the model, we can use it to make predictions on the testing data.

We will then compare the actual house prices with the predicted values by plotting them on a graph. This will help us visually assess how well the model has performed.

In [5]:
# Making predictions
y_pred = model.predict(X_test)

# Plotting the results
plt.scatter(X_test, y_test, color='blue', label='Actual Prices')
plt.plot(X_test, y_pred, color='red', label='Predicted Prices')
plt.title('Linear Regression: Actual vs Predicted Prices')
plt.xlabel('Square Feet')
plt.ylabel('Price')
plt.legend()
plt.show()

# Step 6: Evaluate Model Performance
Finally, we will evaluate the performance of the model by calculating performance metrics such as accuracy, precision, and recall. These metrics will help us assess how well the model is making predictions based on the test data.

In [6]:
# Model Evaluation
from sklearn.metrics import mean_squared_error, r2_score

# Calculate Mean Squared Error and R^2 Score
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)

print(f'Mean Squared Error: {mse}')
print(f'R-squared: {r2}')

# Step 7: Conclusion
In this project, we used a linear regression model to predict house prices based on square footage. We:
- Created a dataset with house sizes and prices
- Split the data into training and testing sets
- Trained a linear regression model
- Evaluated the model's performance

The results showed that the model performed reasonably well, and it is capable of predicting house prices given the size of the house.