### 📘 Step 1: Load the Dataset
Let’s start by loading the Cars dataset into a Pandas DataFrame.

Load the dataset from the provided url

> 🔗 The dataset contains:
`name, year, selling_price, km_driven, fuel, seller_type, transmission, owner`

In [None]:
# Load the dataset (paste your dataset URL or local path)
import pandas as pd

data_url = "https://storage.googleapis.com/edulabs-public-datasets/CAR%20DETAILS%20FROM%20CAR%20DEKHO.csv"
# Example: df = pd.read_csv('cars.csv')

### 📘 Step 2: Explore the Data
Preview the dataset: look at the first few rows and check the column types.
How many samples and features are there? Any missing values?

In [None]:
# Display first rows, summary info, and check for missing values

### 📘 Step 3: Preprocess the Data
We'll use `selling_price` as the target variable.

Now:
- Create manufacturer column (extract mnufacturer from `name`)
- Drop irrelevant columns (e.g., `name`)
- Convert categorical columns using `pd.get_dummies()`
- Split features and target (`X`, `y`)

In [None]:
# Preprocess dataset
# Drop 'name', one-hot encode categorical features, define X and y

### 📘 Step 4: Split the Data into Train and Test Sets
Use `train_test_split` from `sklearn.model_selection`.
Recommended split: **80% train / 20% test**.

In [None]:
# Split data into training and testing sets

### 📘 Step 5: Train a Basic RandomForestRegressor
Train a `RandomForestRegressor` using default parameters.
Evaluate it on the **test set** using **R² score** and **MAE**.

In [None]:
# Train a basic Random Forest regressor and evaluate on the test set

### 📘 Step 6: Feature Importance Plot
Visualize the top 10 most important features using **Plotly**.

In [None]:
# Plot feature importances using Plotly

### 📘 Step 7: Cross-Validation with Hyperparameter Tuning
Use `cross_val_score` to evaluate how well different parameter combinations work.

#### 🔧 Suggested Parameters to Try:

| Parameter           | Values to Try          |
|---------------------|------------------------|
| `n_estimators`      | 50, 100, 200           |
| `max_depth`         | None, 5, 10, 20        |
| `min_samples_split` | 2, 5, 10               |
| `max_features`      | 'log', 'sqrt', 0.5    |

Use **5-fold cross-validation** and print the average R², MAE, MAPE for each configuration.

In [None]:
# Perform cross-validation with different hyperparameter settings

### 📘 Step 8: Final Model with Best Parameters
After testing, retrain the model using the **best parameters** on the full training set.
Evaluate it again on the **test set** (R², MAE).

In [None]:
# Train final model with best parameters and evaluate it

### 📘 Step 9: Predict and Plot Actual vs. Predicted Prices
Create a **scatter plot** comparing actual vs. predicted car prices on the test set.
Use Plotly for visualization.

In [None]:
# Plot actual vs. predicted values using Plotly

### 📘 Step 10: Summary – What Did You Learn?

- How well did the Random Forest perform?
- Which features were most important?
- Which parameter changes improved performance most?

In [None]:
# Write your summary here