In [1]:
!pip install pandas scikit-learn



**pandas:** A data manipulation library to handle and analyze data in tabular form (like CSV files).
**scikit-learn:** A machine learning library used to train models, preprocess data, and evaluate performance.

In [2]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import StandardScaler

**import pandas as pd**: Imports the pandas library and gives it an alias (pd), which allows easy manipulation of tabular data.

**from sklearn.model_selection import train_test_split**: Imports the train_test_split function, which is used to split the dataset into training and testing sets.

**from sklearn.linear_model import LinearRegression**: Imports the LinearRegression model from scikit-learn, which is used to train a model that predicts continuous values (like house prices).

**from sklearn.preprocessing import StandardScaler**: Imports StandardScaler, which standardizes the features (scaling them to have a mean of 0 and a standard deviation of 1). This helps some models, like linear regression, perform better.


---




In [3]:
df = pd.read_csv('house_prices.csv')

FileNotFoundError: [Errno 2] No such file or directory: 'house_prices.csv'



**pd.read_csv('house_prices.csv')**: Reads the CSV file (house_prices.csv) containing house price data and loads it into a pandas DataFrame (df). This DataFrame stores the data in tabular format.

---

In [None]:
print("First few rows of the dataset:")
print(df.head())

First few rows of the dataset:
   square_feet  num_bedrooms  num_bathrooms   price
0         1500             3              2  400000
1         1800             4              3  500000
2         2400             4              3  600000
3         3000             5              4  700000
4         3500             5              4  800000



**df.head():** Displays the first 5 rows of the dataset to give a quick overview of the data. It's useful for understanding the structure of the dataset (features and target).

---

In [None]:
print("\nMissing values in the dataset:")
print(df.isnull().sum())


Missing values in the dataset:
square_feet      0
num_bedrooms     0
num_bathrooms    0
price            0
dtype: int64



**df.isnull().sum():** Checks if there are any missing values (NaN) in the dataset. The isnull() function returns a boolean DataFrame, and sum() counts the number of True values (i.e., missing values) in each column.

---

In [None]:
df = df.fillna(df.median())

**df.fillna(df.median()):** Fills missing values in the dataset with the median value of each column. The median is often used to handle missing numerical values because it is more robust to outliers than the mean.

---

In [None]:
X = df[['square_feet', 'num_bedrooms', 'num_bathrooms']]  # Example feature columns
y = df['price']  # Target variable (house price)

**X = df[['square_feet', 'num_bedrooms', 'num_bathrooms']]:** This selects the feature columns (square_feet, num_bedrooms, num_bathrooms) from the DataFrame and stores them in X. These are the independent variables that will be used to predict the target.
**y = df['price']:** This selects the price column, which is the target variable (dependent variable) that we want to predict.

---


In [None]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

**train_test_split(X, y, test_size=0.2, random_state=42):** Splits the dataset into training and testing sets:
**X_train, X_test:** Feature sets for training and testing.
**y_train, y_test:** Target sets for training and testing.
**test_size=0.2:** 20% of the data will be used for testing, and 80% will be used for training.
**random_state=42:** Sets the seed for the random number generator to ensure reproducibility (so the split is always the same every time you run the code).

---


In [None]:
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

**scaler = StandardScaler()**: Initializes the StandardScaler object, which will be used to scale the features.
**X_train_scaled = scaler.fit_transform(X_train)**: This scales the training data (X_train). The fit_transform() method computes the mean and standard deviation of the training set and scales it accordingly.
**X_test_scaled = scaler.transform(X_test)**: This scales the testing data (X_test) using the mean and standard deviation computed from the training set (to avoid data leakage).

---

In [None]:
model = LinearRegression()
model.fit(X_train_scaled, y_train)

**model = LinearRegression()**: Initializes a Linear Regression model object.
**model.fit(X_train_scaled, y_train)**: Trains the model using the scaled training data (X_train_scaled) and the target variable (y_train). The model learns the relationship between the features and the target variable during this step.

---

In [None]:
print("\nModel coefficients:", model.coef_)
print("Intercept:", model.intercept_)


Model coefficients: [91191.73057328 62884.88241002 62884.88241002]
Intercept: 612500.0


**model.coef_**: Displays the coefficients of the linear regression model for each feature. These coefficients represent how much each feature contributes to the prediction of the target variable (house price).
**model.intercept_:** Displays the intercept (constant) of the regression model. This is the predicted value when all feature values are zero.

---

In [None]:
square_feet = float(input("Enter square feet of the house: "))
num_bedrooms = float(input("Enter number of bedrooms: "))
num_bathrooms = float(input("Enter number of bathrooms: "))

Enter square feet of the house: 1320
Enter number of bedrooms: 5
Enter number of bathrooms: 2


**input()**: Takes input from the user. The input() function returns a string, so we convert it to a float since house features are numerical.

---

In [None]:
user_input = pd.DataFrame([[square_feet, num_bedrooms, num_bathrooms]], columns=['square_feet', 'num_bedrooms', 'num_bathrooms'])

**pd.DataFrame(...)**: Creates a new pandas DataFrame from the user's input with columns square_feet, num_bedrooms, and num_bathrooms. This ensures the input matches the format of the features used to train the model.

---

In [None]:
user_input_scaled = scaler.transform(user_input)

**scaler.transform(user_input)**: Standardizes the user input using the same scaling parameters (mean and standard deviation) as the training data. This ensures that the model receives input in the same scale as it was trained on.

---

In [None]:
predicted_price = model.predict(user_input_scaled)

**model.predict(user_input_scaled)**: Uses the trained model to predict the house price based on the standardized user input features.

---

In [None]:
print(f"The predicted price of the house is: ${predicted_price[0]:,.2f}")

The predicted price of the house is: $495,964.13


**print(f"....")**: Prints the predicted house price with a dollar sign and formats it to two decimal places. The [0] accesses the first (and only) value in the predicted price array.