Practice Exercise: Predicting Insurance Claim Amount
Problem Statement

An insurance company wants to predict the claim amount (charges) for a policyholder based on:

age (years)

bmi (body mass index)

children (number of dependents)

smoker (yes/no)

You need to build a Linear Regression model to predict charges.
Sample Dataset (tiny for practice)

| age | bmi  | children | smoker | charges |
| --- | ---- | -------- | ------ | ------- |
| 25  | 28.5 | 0        | no     | 3200    |
| 32  | 31.2 | 1        | no     | 4300    |
| 45  | 24.0 | 2        | yes    | 16800   |
| 52  | 36.5 | 3        | yes    | 24000   |
| 28  | 26.2 | 0        | no     | 3700    |
| 36  | 29.8 | 2        | no     | 4800    |
| 50  | 30.1 | 1        | yes    | 19500   |


In [None]:
import pandas as pd
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score

In [None]:
# Step 1: Create dataset
data = {
    "age": [25, 32, 45, 52, 28, 36, 50],
    "bmi": [28.5, 31.2, 24.0, 36.5, 26.2, 29.8, 30.1],
    "children": [0, 1, 2, 3, 0, 2, 1],
    "smoker": ["no", "no", "yes", "yes", "no", "no", "yes"],
    "charges": [3200, 4300, 16800, 24000, 3700, 4800, 19500]
}

df = pd.DataFrame(data)

# Step 2: Convert categorical variable 'smoker' to numeric
df["smoker"] = df["smoker"].map({"no": 0, "yes": 1})

# Step 3: Features (X) and target (y)
X = df[["age", "bmi", "children", "smoker"]]
y = df["charges"]

# Step 4: Train-test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Step 5: Train model
model = LinearRegression()
model.fit(X_train, y_train)

# Step 6: Predictions
y_pred = model.predict(X_test)

# Step 7: Evaluation
print("Coefficients:", model.coef_)
print("Intercept:", model.intercept_)
print("MAE:", mean_absolute_error(y_test, y_pred))
print("RMSE:", mean_squared_error(y_test, y_pred, squared=False))
print("R2 Score:", r2_score(y_test, y_pred))