# Task
Implement a linear regression model to predict house prices using 'GrLivArea', 'BedroomAbvGr', and 'BathAbvGr' as features from the "train.csv" and "test.csv" files. Handle missing values, train the model, predict on the test data, and generate a submission file in the format of "sample_submission.csv".

## Load data

Load the `train.csv` and `test.csv` files into pandas DataFrames.


In [1]:
import pandas as pd

train_df = pd.read_csv('/content/train.csv')
test_df = pd.read_csv('/content/test.csv')

## Select features

Select the features 'GrLivArea', 'BedroomAbvGr', and 'BathAbvGr' from the training and testing data. Also select the target variable 'SalePrice' from the training data.


In [3]:
features = ['GrLivArea', 'BedroomAbvGr', 'FullBath', 'HalfBath']
X_train = train_df[features]
y_train = train_df['SalePrice']
X_test = test_df[features]

display(X_train.head())
display(y_train.head())
display(X_test.head())

Unnamed: 0,GrLivArea,BedroomAbvGr,FullBath,HalfBath
0,1710,3,2,1
1,1262,3,2,0
2,1786,3,2,1
3,1717,3,1,0
4,2198,4,2,1


Unnamed: 0,SalePrice
0,208500
1,181500
2,223500
3,140000
4,250000


Unnamed: 0,GrLivArea,BedroomAbvGr,FullBath,HalfBath
0,896,2,1,0
1,1329,3,1,1
2,1629,3,2,1
3,1604,3,2,1
4,1280,2,2,0


## Handle missing values

Check for and handle any missing values in the selected features and the target variable.


In [4]:
print("Missing values in X_train:")
print(X_train.isnull().sum())

print("\nMissing values in y_train:")
print(y_train.isnull().sum())

print("\nMissing values in X_test:")
print(X_test.isnull().sum())

# Handle missing values in X_train by filling with the mean
for col in X_train.columns:
    if X_train[col].isnull().sum() > 0:
        X_train[col].fillna(X_train[col].mean(), inplace=True)

# Handle missing values in X_test by filling with the mean
for col in X_test.columns:
    if X_test[col].isnull().sum() > 0:
        X_test[col].fillna(X_test[col].mean(), inplace=True)

# Handle missing values in y_train by removing corresponding rows
if y_train.isnull().sum() > 0:
    X_train = X_train[y_train.notnull()]
    y_train = y_train[y_train.notnull()]

print("\nMissing values after handling:")
print("Missing values in X_train:")
print(X_train.isnull().sum())

print("\nMissing values in y_train:")
print(y_train.isnull().sum())

print("\nMissing values in X_test:")
print(X_test.isnull().sum())

Missing values in X_train:
GrLivArea       0
BedroomAbvGr    0
FullBath        0
HalfBath        0
dtype: int64

Missing values in y_train:
0

Missing values in X_test:
GrLivArea       0
BedroomAbvGr    0
FullBath        0
HalfBath        0
dtype: int64

Missing values after handling:
Missing values in X_train:
GrLivArea       0
BedroomAbvGr    0
FullBath        0
HalfBath        0
dtype: int64

Missing values in y_train:
0

Missing values in X_test:
GrLivArea       0
BedroomAbvGr    0
FullBath        0
HalfBath        0
dtype: int64


## Train model

Train a linear regression model using the selected features from the training data and the target variable.


In [5]:
from sklearn.linear_model import LinearRegression

model = LinearRegression()
model.fit(X_train, y_train)

## Predict on test data

Use the trained model to predict house prices on the selected features from the testing data.


In [6]:
predictions = model.predict(X_test)

## Prepare submission file

Create a submission file in the format of `sample_submission.csv` with the predicted prices for the test data.


In [7]:
sample_submission_df = pd.read_csv('/content/sample_submission.csv')
sample_submission_df['SalePrice'] = predictions
sample_submission_df.to_csv('submission.csv', index=False)

# Predict a house based on the user inputs

In [8]:
# Get user input for the features
grlivarea = float(input("Enter the value for GrLivArea: "))
bedroomabvgr = int(input("Enter the value for BedroomAbvGr: "))
fullbath = int(input("Enter the value for FullBath: "))
halfbath = int(input("Enter the value for HalfBath: "))

# Create a DataFrame from the user input, ensuring the correct order of columns
user_input_df = pd.DataFrame([[grlivarea, bedroomabvgr, fullbath, halfbath]],
                             columns=['GrLivArea', 'BedroomAbvGr', 'FullBath', 'HalfBath'])

# Predict the price using the trained model
predicted_price = model.predict(user_input_df)

print(f"\nPredicted house price: ${predicted_price[0]:,.2f}")

Enter the value for GrLivArea: 1500
Enter the value for BedroomAbvGr: 2
Enter the value for FullBath: 2
Enter the value for HalfBath: 1

Predicted house price: $218,858.31
