# Simple Linear Regression Project: Weight vs Height


---

## Dataset

We have a small dataset of **24 records** with two features:  

- **Weight** (independent feature, X)  
- **Height** (dependent feature, Y)  

Our goal: Predict height based on weight using **simple linear regression**.

---

## Step 1: Import Libraries

```python
%matplotlib inline
Always keep the %matplotlib inline magic for Jupyter notebooks.


In [1]:
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np

Step 2: Load the Dataset

In [None]:
df = pd.read_csv("height-weight.csv")
df.head()


Step 3: Visualize Data

plt.scatter(df['Weight'], df['Height'])
plt.xlabel("Weight")
plt.ylabel("Height")
plt.show()
#You can see a positive linear correlation.

#Check correlation:

df.corr()


Weight vs Height correlation: 0.93 (strong positive correlation)

Optional: Use Seaborn for pairplots:

import seaborn as sns
sns.pairplot(df)


Step 4: Define Features and Labels

# Independent feature (X)
X = df[['Weight']]  # 2D DataFrame

# Dependent feature (Y)
Y = df['Height']    # 1D Series


Step 5: Split Data into Train and Test Sets

from sklearn.model_selection import train_test_split

X_train, X_test, Y_train, Y_test = train_test_split(
    X, Y, test_size=0.25, random_state=42
)


Step 6: Standardize Features

from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)


Note: Standardize independent features only to speed up gradient descen

Step 7: Apply Linear Regression (sklearn)

from sklearn.linear_model import LinearRegression

regression = LinearRegression()
regression.fit(X_train, Y_train)


Get slope and intercept:

print("Slope:", regression.coef_[0])
print("Intercept:", regression.intercept_)


Step 8: Visualize Best Fit Line

plt.scatter(X_train, Y_train)
plt.plot(X_train, regression.predict(X_train), color='red')
plt.xlabel("Weight (standardized)")
plt.ylabel("Height")
plt.show()


Step 9: Predictions on Test Data

Y_pred = regression.predict(X_test)
new_weight = np.array([[72]])
new_weight_scaled = scaler.transform(new_weight)
predicted_height = regression.predict(new_weight_scaled)
print("Predicted height for weight 72:", predicted_height[0])



Step 10: Model Performance Metrics

from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score

mse = mean_squared_error(Y_test, Y_pred)
mae = mean_absolute_error(Y_test, Y_pred)
rmse = np.sqrt(mse)
r2 = r2_score(Y_test, Y_pred)

print("MSE:", mse)
print("MAE:", mae)
print("RMSE:", rmse)
print("R2 Score:", r2)
Optional: Adjusted R²
n = len(Y_test)
k = X_test.shape[1]
adjusted_r2 = 1 - (1 - r2) * (n - 1) / (n - k - 1)
print("Adjusted R²:", adjusted_r2)



Step 11: OLS Implementation

import statsmodels.api as sm

ols_model = sm.OLS(Y_train, X_train).fit()
Y_pred_ols = ols_model.predict(X_test)

print(ols_model.summary())


Key Notes

Always standardize independent features before using Gradient Descent.

Ensure X is 2D and Y is 1D.

Train/Test split prevents overfitting and data leakage.

OLS and sklearn Linear Regression produce similar results.

For new predictions, apply the same scaling as used for training.