In [3]:
# Test date: 21 Jan 2025
# Code from Microsoft Copilot generated using the following prompt
"""
Attached is a csv file called houses.99.txt and it is delimited
by ",".  The first row is a header.  The remainder rows contain 
the numberical data.  The first four columns contain the input
features, X_train, which are for predicting the house prices.
The fifth column contains the house prices in units of 
1000's of dollars, y_train. 
We wish to fit a linear model y = w.X + b, where w are the weights,
b is the bias value and X is the input feature set and 
y is the output house price in dollars.
Please give a python code to determine the linear model 
for X_train and y_train using sklearn and the SGDRegressor.  
Use scaling for X_train.  
Please also include the code for reading X_train and y_train 
from the houses99.txt file.  
Using this code, determine the weights and bias and show the model.  
Calculate the weights and the bias using this code, and give the model.  
Print the mean and standard deviation, for each column in X_train.  
Finally, predict the house price for a new feature set [1200, 3, 1, 40].
Give the scaled values for this feature set.  
Also, provide the python code listing and let the 
print statements for numbers be to 8 decimal places.
"""

import pandas as pd
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import SGDRegressor

# Step 1: Read X_train and y_train from the houses99.txt file
data = pd.read_csv('./data/houses99.txt')

# Extract input features and target values
X_train = data.iloc[:, :4].values
y_train = data.iloc[:, 4].values  # Convert y_train to dollars

# Step 2: Scale X_train
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)

# Step 3: Fit a linear model using SGDRegressor
model = SGDRegressor(max_iter=1000, tol=1e-3)
model.fit(X_train_scaled, y_train)

# Step 4: Calculate and print weights, bias, mean, and standard deviation
weights = model.coef_
bias = model.intercept_

print(f'Weights: {weights}')
print(f'Bias: {bias[0]:.8f}')

mean = scaler.mean_
std_dev = scaler.scale_

print('Mean and Standard Deviation for each column in X_train:')
for i in range(len(mean)):
    print(f'Column {i+1}: Mean = {mean[i]:.8f}, Std Dev = {std_dev[i]:.8f}')

# Step 5: Predict the house price for a new feature set [1200, 3, 1, 40]
new_features = [[1200, 3, 1, 40]]
new_features_scaled = scaler.transform(new_features)
predicted_price = model.predict(new_features_scaled)

print(f'Scaled values for the new feature set: {new_features_scaled[0]}')
print(f'Predicted house price: ${predicted_price[0]:.8f}')


Weights: [110.31295658 -21.14286925 -32.57414498 -38.00091904]
Bias: 363.16367372
Mean and Standard Deviation for each column in X_train:
Column 1: Mean = 1418.37373737, Std Dev = 411.61562893
Column 2: Mean = 2.71717172, Std Dev = 0.65196523
Column 3: Mean = 1.38383838, Std Dev = 0.48631932
Column 4: Mean = 38.38383838, Std Dev = 25.77788069
Scaled values for the new feature set: [-0.53052829  0.43380884 -0.78927234  0.06269567]
Predicted house price: $318.79494369
