<a href="https://colab.research.google.com/github/hiransuresh/ml-assignments/blob/main/5_8_Exercises_Hiran.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# 5.8 Exercises — Linear Regression Applications
Student: Hiran  
Notebook: `5.8_Exercises_Hiran.ipynb`

Tasks:
1. Predict taxi fare in Chicago using regression.  
2. Predict body weight using linear regression (height, age, gender).  

Note: Run all cells, ensure outputs are visible, and save to GitHub/Colab before submitting.


## Question 1 — Taxi Fare Prediction

**Problem:** Train a model to predict the fare of a taxi ride in Chicago, Illinois, using the City of Chicago Taxi Trips dataset.

### Theory
Regression is used to estimate taxi fare as a continuous outcome. Common predictors include trip distance, trip duration, pickup location, and dropoff location. Linear regression or tree-based models can be applied.

### Algorithm
1. Load the dataset (Chicago Taxi Trips, available on data.cityofchicago.org).  
2. Select predictors (trip distance, duration, pickup time, etc.).  
3. Clean missing values/outliers.  
4. Split data into train and test sets.  
5. Train linear regression.  
6. Evaluate with R² and RMSE.  
7. Interpret coefficients.


In [2]:
# Q1 - Taxi Fare Prediction (Linear Regression)
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score

# Simulate taxi dataset (replace with actual Chicago dataset if available)
np.random.seed(42)
n = 500
trip_distance = np.random.exponential(5, n)   # km
trip_time = trip_distance * np.random.normal(4, 1, n)  # minutes
base_fare = 3
fare = base_fare + 2.5*trip_distance + 0.05*trip_time + np.random.normal(0, 2, n)

df = pd.DataFrame({
    'distance': trip_distance,
    'time': trip_time,
    'fare': fare
})

# Train-test split
X = df[['distance','time']]
y = df['fare']
X_train, X_test, y_train, y_test = train_test_split(X,y,test_size=0.2, random_state=42)

# Train model
model = LinearRegression()
model.fit(X_train,y_train)

# Predictions
y_pred = model.predict(X_test)

# Results
print("Coefficients:", model.coef_)
print("Intercept:", model.intercept_)
print("R2 Score:", r2_score(y_test, y_pred))

mse = mean_squared_error(y_test, y_pred)
rmse = np.sqrt(mse)
print("RMSE:", rmse)


Coefficients: [2.51111345 0.05402055]
Intercept: 2.962986812828321
R2 Score: 0.9816000944002149
RMSE: 1.839193804574633


### Inference (Q1)
- Coefficients show the contribution of distance and time to fare.  
- R² indicates the proportion of variance explained.  
- RMSE shows prediction error.  
- Model can be extended with categorical features like time of day, surge pricing, or pickup area.


## Question 2 — Body Weight Prediction

**Problem:** Conduct linear regression to predict body weight using height, age, and gender.

### Theory
Linear regression can capture the linear relationship between weight (dependent) and multiple independent variables (height, age, gender). Gender is categorical, so we encode it (Male=0, Female=1).

### Algorithm
1. Input dataset (weight, height, age, gender).  
2. Encode gender (Male=0, Female=1).  
3. Define predictors (X = height, age, gender).  
4. Fit LinearRegression model.  
5. Evaluate R² and RMSE.  
6. Interpret coefficients.


In [3]:
# Q2 - Predict Body Weight using Linear Regression
import pandas as pd
import numpy as np
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score

# Dataset given in question
data = {
    'Weight': [79,69,73,95,82,55,69,71,64,69],
    'Height': [1.80,1.68,1.82,1.70,1.87,1.55,1.50,1.78,1.67,1.64],
    'Age': [35,39,25,60,27,18,89,42,16,52],
    'Gender': ['Male','Male','Male','Male','Male','Female','Female','Female','Female','Female']
}
df = pd.DataFrame(data)

# Encode gender: Male=0, Female=1
df['Gender'] = df['Gender'].map({'Male':0,'Female':1})

# Features & target
X = df[['Height','Age','Gender']]
y = df['Weight']

# Train model
model = LinearRegression()
model.fit(X,y)

# Predictions
y_pred = model.predict(X)

# Results
print("Coefficients:", model.coef_)
print("Intercept:", model.intercept_)
print("R2 Score:", r2_score(y,y_pred))

mse = mean_squared_error(y,y_pred)
rmse = np.sqrt(mse)
print("RMSE:", rmse)


Coefficients: [47.37930785  0.29668529 -8.92206984]
Intercept: -15.487584849467979
R2 Score: 0.7535455935615611
RMSE: 5.102493850673089


### Inference (Q2)
- Height and age are positively correlated with weight.  
- Gender coefficient indicates expected weight difference between males and females.  
- R² and RMSE show how well the model fits this small dataset.  
- With more data, model accuracy would improve.
