# ðŸ”€ Understanding train_test_split in Machine Learning
When building machine learning models, we need to evaluate how well they perform on unseen data. Thatâ€™s where train_test_split comes in. Itâ€™s a function from scikit-learn (sklearn.model_selection) that helps divide your dataset into training and testing subsets.

## ðŸ“Œ What It Does

* **Training set:** Used to fit (train) the model.
* **Testing set:** Used to evaluate the modelâ€™s performance on data it hasnâ€™t seen before.
* **train_test_split** randomly splits your dataset into these two parts.

## âš¡ Syntax

``` python
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

```


* **X** â†’ Features (input variables)
* **y** â†’ Target (output variable)
* **test_size** â†’ Fraction of data to use for testing (e.g., 0.2 = 20%)
* **random_state** â†’ Ensures reproducibility (same split every time if set)


## ðŸ§© Example

In [1]:
import pandas as pd
from sklearn.model_selection import train_test_split

# Sample dataset
data = {
    "SquareFootage": [1400, 1600, 1700, 1800, 2000],
    "Rooms": [3, 4, 3, 4, 5],
    "Price": [250000, 300000, 280000, 320000, 350000]
}
df = pd.DataFrame(data)

X = df[["SquareFootage", "Rooms"]]  # features
y = df["Price"]                     # target

# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.4, random_state=1)

print("Training Features:\n", X_train)
print("\nTesting Features:\n", X_test)


Training Features:
    SquareFootage  Rooms
4           2000      5
0           1400      3
3           1800      4

Testing Features:
    SquareFootage  Rooms
2           1700      3
1           1600      4


In [None]:
ðŸŽ¯ Why Itâ€™s Important
Prevents overfitting (model memorizing training data instead of generalizing).

Provides a fair evaluation of model performance.

Mimics real-world scenarios where the model encounters new data.