## Train - Test Split
    Train–Test Split is a technique used in machine learning and statistics to divide a dataset into two separate parts
    - Training set → Used to train (fit) the machine learning model. 
                     The model learns patterns, relationships, and structure from this data.
    - Testing set →  Used to evaluate the trained model. 
                     Helps check how well the model performs on new, unseen data.
                     
    The core idea is to simulate real-world performance by testing the model on data it has never seen before.

    -> Why Do We Need Train–Test Split?
    If we evaluate a model on the same data it was trained on:
    - The model may memorize the data
    - Accuracy will be misleadingly high
    - The model may fail in real-world scenarios
    
    Train–test split helps to ::
    - Detect overfitting
    - Measure generalization ability
    - Provide honest performance evaluation
     
    -> Typical Split Ratios  
        | Training | Testing |
        | -------- | ------- |
        | 80%      | 20%     |
        | 75%      | 25%     |
        | 70%      | 30%     |


### Importing Essentital Libraries 

In [14]:
import numpy as np
import pandas as pd
import seaborn as sns 
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import StandardScaler

### Load iris dataset from seaborn 
    ->Key Characteristics:
    Contains 150 samples, Each sample represents an iris flower, Used mainly for multiclass classification
    -> Features (Input Variables):
    Sepal Length (cm), Sepal Width (cm), Petal Length (cm), Petal Width (cm)
    -> Target Variable:
    Species of iris flower
    - Iris-setosa
    - Iris-versicolor
    - Iris-virginica

In [16]:
data = sns.load_dataset('iris')
data['species'] = data['species'].map({"setosa":1,"versicolor":2,"virginica":3})
data.sample(2)

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width,species
26,5.0,3.4,1.6,0.4,1
92,5.8,2.6,4.0,1.2,2


### Check for null values 

In [18]:
print("Number of null values in dataset ::",data['species'].isnull().sum())

Number of null values in dataset :: 0


### Dividing dataset into input and output feature
     X -> Input feature
     y -> Output feature

In [20]:
X = data[['sepal_length','sepal_width','petal_length','petal_width']]
X.sample(2)

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width
141,6.9,3.1,5.1,2.3
49,5.0,3.3,1.4,0.2


In [21]:
y = data[['species']]
y.sample(2)

Unnamed: 0,species
114,3
20,1


### Spliting data into Train - Test data 
     80 % -> Training set
     20 % -> Testing set

In [23]:
X_train,X_test, y_train,y_test = train_test_split(X,y,test_size=0.2,random_state=42)
print('Shape of X_train : ',X_train.shape)
print('Shape of X_test : ',X_test.shape)
print('Shape of y_train : ',y_train.shape)
print('Shape of y_test : ',y_test.shape)

Shape of X_train :  (120, 4)
Shape of X_test :  (30, 4)
Shape of y_train :  (120, 1)
Shape of y_test :  (30, 1)


- Now we have 4 different dataset
- X_train & y_train - used for training the model
- X_test & y_test - used for Tesing/Evaluating the model performance