**Supervised_Learning**
Supervised learning is a type of machine learning where a model is trained on labeled data, meaning each input has a corresponding correct output.

**Tasks**
1. Classification
2. Regression

#**Classification**

Classification is a supervised learning task where a model learns to assign input data to predefined categories or classes. It is commonly used in applications like spam detection, medical diagnosis, and image recognition.

#**Types of Classification**
1. **Binary Classification** – Classifies data into two categories (e.g., spam or not spam, pass or fail).

2. **Multiclass Classification** – Involves more than two categories, but each instance belongs to only one class (e.g., classifying handwritten digits from 0 to 9).

3. **Multilabel Classification** – Assigns multiple labels to a single instance (e.g., a news article classified as both "sports" and "politics").

#Binary Classification

| Patient_ID | Age | Blood Pressure (mmHg) | Cholesterol (mg/dL) | Smoking (Yes/No) | Heart Disease (Label) |
|------------|----|----------------------|---------------------|----------------|------------------|
| 1          | 45 | 130                  | 220                 | Yes            | 1 (Has Disease)  |
| 2          | 50 | 120                  | 180                 | No             | 0 (No Disease)   |
| 3          | 60 | 140                  | 250                 | Yes            | 1 (Has Disease)  |
| 4          | 35 | 110                  | 160                 | No             | 0 (No Disease)   |


#**Multi Classification**

| Patient_ID | Age | Blood Pressure (mmHg) | Cholesterol (mg/dL) | Smoking (Yes/No) | Heart Disease (Severity) |
|------------|----|----------------------|---------------------|----------------|--------------------------|
| 1          | 45 | 130                  | 220                 | Yes            | 1 (Mild)                 |
| 2          | 50 | 120                  | 180                 | No             | 0 (No Disease)           |
| 3          | 60 | 140                  | 250                 | Yes            | 2 (Severe)               |
| 4          | 35 | 110                  | 160                 | No             | 0 (No Disease)           |


#**Multi Label Classification**

| Patient_ID | Age | Blood Pressure (mmHg) | Cholesterol (mg/dL) | Smoking (Yes/No) | HD | HBP | HC | SRR |
|------------|----|----------------------|---------------------|----------------|----|----|----|----|
| 1          | 45 | 130                  | 220                 | Yes            | 1  | 1  | 1  | 1  |
| 2          | 50 | 120                  | 180                 | No             | 0  | 0  | 1  | 0  |
| 3          | 60 | 140                  | 250                 | Yes            | 1  | 1  | 1  | 1  |
| 4          | 35 | 110                  | 160                 | No             | 0  | 0  | 0  | 0  |




---



---



---



#Performing **Binary Classification** (On Heart Disease Dataset)

In [None]:
#Importig Libraries

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix


In [None]:
#Step 2: Load the Kaggle Heart Disease Dataset
df = pd.read_csv("heart.csv")
df.head(3)  # Display the first few rows


Unnamed: 0,age,sex,cp,trestbps,chol,fbs,restecg,thalach,exang,oldpeak,slope,ca,thal,target
0,52,1,0,125,212,0,1,168,0,1.0,2,2,3,0
1,53,1,0,140,203,1,0,155,1,3.1,0,0,3,0
2,70,1,0,145,174,0,1,125,1,2.6,0,0,3,0


**Perform EDA**

'''
Load the dataset
Check the first few rows (head())
Check the last few rows (tail())
Check dataset info (info())
Check for missing values (isnull().sum())
Check duplicate rows (duplicated().sum())
Summary statistics (describe())
Check column names and data types (dtypes)
Value counts of categorical variables (value_counts())
Distribution of numerical features
Check correlation between features (corr())
Visualize missing data
Outlier detection
Pairplot for feature relationships etc

'''


**Perform all necessary Pre_processing**

'''
Handle missing values
Remove duplicate rows
Convert categorical variables into numerical format (One-Hot Encoding, Label Encoding)
Normalize/standardize numerical features
Handle outliers
Feature selection (drop irrelevant or highly correlated features)
Split dataset into training and testing sets
Balance the dataset (if necessary) using techniques like SMOTE or undersampling
Handle skewness in data (log transformation, Box-Cox transformation)
Convert date/time columns into useful features (if applicable)

'''



**Split Data into X and Y**

In [None]:
#Split Data into X and Y
y=df.pop('target')
x=df

In [None]:
#Split Data into Train Test Dataser
xtrain,xtest,ytrain,ytest=train_test_split(x,y,test_size=0.2)


In [None]:
#Step 5: Train a Machine Learning Model using any Binary Classifier (Logistic Regression , SVM, KNN, etc)

from sklearn.neighbors import KNeighborsClassifier
KNN=KNeighborsClassifier()
ModelKNN = KNN.fit(xtrain,ytrain)
PredictionKNN = KNN.predict(xtest)

# =====================ACCUARACY===========================
print("=====================KNN Training Accuarcy=============")
tracKNN=KNN.score(xtrain,ytrain)
trainingAccKNN=tracKNN*100
print(trainingAccKNN)
print("====================KNN Testing Accuracy============")
teacKNN=accuracy_score(ytest,PredictionKNN)
testingAccKNN=teacKNN*100
print(testingAccKNN)

print(classification_report(ytest, PredictionKNN))
confusion_matrix(ytest, PredictionKNN)


93.29268292682927
77.5609756097561
              precision    recall  f1-score   support

           0       0.76      0.78      0.77        98
           1       0.79      0.78      0.78       107

    accuracy                           0.78       205
   macro avg       0.78      0.78      0.78       205
weighted avg       0.78      0.78      0.78       205



array([[76, 22],
       [24, 83]])

#Performing **Multi Classification** (On Heart Disease Dataset)

In [None]:
#Importig Libraries


In [None]:
#Step 2: Load the Kaggle Heart Disease Dataset
df = pd.read_csv("Iris.csv")
df.head(3)  # Display the first few rows


Unnamed: 0,SepalLengthCm,SepalWidthCm,PetalLengthCm,PetalWidthCm,Species
0,5.1,3.5,1.4,0.2,Iris-setosa
1,4.9,3.0,1.4,0.2,Iris-setosa
2,4.7,3.2,1.3,0.2,Iris-setosa


**Perform EDA**

'''
Load the dataset
Check the first few rows (head())
Check the last few rows (tail())
Check dataset info (info())
Check for missing values (isnull().sum())
Check duplicate rows (duplicated().sum())
Summary statistics (describe())
Check column names and data types (dtypes)
Value counts of categorical variables (value_counts())
Distribution of numerical features
Check correlation between features (corr())
Visualize missing data
Outlier detection
Pairplot for feature relationships etc

'''


**Perform all necessary Pre_processing**

'''
Handle missing values
Remove duplicate rows
Convert categorical variables into numerical format (One-Hot Encoding, Label Encoding)
Normalize/standardize numerical features
Handle outliers
Feature selection (drop irrelevant or highly correlated features)
Split dataset into training and testing sets
Balance the dataset (if necessary) using techniques like SMOTE or undersampling
Handle skewness in data (log transformation, Box-Cox transformation)
Convert date/time columns into useful features (if applicable)

'''



In [None]:
#converting target variable into numeric form
from sklearn.preprocessing import LabelEncoder
le=LabelEncoder()
df['Species']=le.fit_transform(df['Species'])
df.head()

Unnamed: 0,SepalLengthCm,SepalWidthCm,PetalLengthCm,PetalWidthCm,Species
0,5.1,3.5,1.4,0.2,0
1,4.9,3.0,1.4,0.2,0
2,4.7,3.2,1.3,0.2,0
3,4.6,3.1,1.5,0.2,0
4,5.0,3.6,1.4,0.2,0


**Split Data into X and Y**

In [None]:
#Split Data into X and Y
y=df.pop('Species')
x=df

In [None]:
#Split Data into Train Test Dataser
xtrain,xtest,ytrain,ytest=train_test_split(x,y,test_size=0.2)


In [None]:
#Step 5: Train a Machine Learning Model using any Binary Classifier (Logistic Regression , SVM, KNN, etc)

from sklearn.neighbors import KNeighborsClassifier
KNN=KNeighborsClassifier()
ModelKNN = KNN.fit(xtrain,ytrain)
PredictionKNN = KNN.predict(xtest)

# =====================ACCUARACY===========================
print("=====================KNN Training Accuarcy=============")
tracKNN=KNN.score(xtrain,ytrain)
trainingAccKNN=tracKNN*100
print(trainingAccKNN)
print("====================KNN Testing Accuracy============")
teacKNN=accuracy_score(ytest,PredictionKNN)
testingAccKNN=teacKNN*100
print(testingAccKNN)

print(classification_report(ytest, PredictionKNN))
confusion_matrix(ytest, PredictionKNN)


93.29268292682927
77.5609756097561
              precision    recall  f1-score   support

           0       0.76      0.78      0.77        98
           1       0.79      0.78      0.78       107

    accuracy                           0.78       205
   macro avg       0.78      0.78      0.78       205
weighted avg       0.78      0.78      0.78       205



array([[76, 22],
       [24, 83]])

#Performing **Multi Label Classification** (On Flags Dataset)

In [None]:
from sklearn.preprocessing import StandardScaler, MultiLabelBinarizer
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score, classification_report
from sklearn.multioutput import MultiOutputClassifier
from sklearn.datasets import fetch_openml


In [None]:
# Fetch dataset from OpenML
flags = fetch_openml(name="flags", version=1, as_frame=True)
df = flags.frame

# Display first few rows
df.head()


Unnamed: 0,1landmass,2zone,3area,population,language,religion,bars,stripes,colours,red,...,saltires,quarters,sunstars,crescent,triangle,icon,animate,text,topleft,botright
0,5,1,648,16,10,2,0,3,5,1,...,0,0,1,0,0,1,0,0,black,green
1,3,1,29,3,6,6,0,0,3,1,...,0,0,1,0,0,0,1,0,red,red
2,4,1,2388,20,8,2,2,0,3,1,...,0,0,1,1,0,0,0,0,green,white
3,6,3,0,0,1,1,0,0,5,1,...,0,0,0,0,1,1,1,0,blue,red
4,3,1,0,0,6,0,3,0,3,1,...,0,0,0,0,0,0,0,0,blue,red


In [None]:
df.dtypes

Unnamed: 0,0
1landmass,category
2zone,category
3area,int64
population,int64
language,category
religion,category
bars,category
stripes,category
colours,category
red,category


In [None]:
# Initialize LabelEncoder
label_encoders = {}

# Loop through categorical columns and apply Label Encoding
for col in df.select_dtypes(include=['category']).columns:
    le = LabelEncoder()
    df[col] = le.fit_transform(df[col])
    label_encoders[col] = le  # Store encoder for inverse transformation

print("\nEncoded DataFrame:")
print(df)



Encoded DataFrame:
     1landmass  2zone  3area  population  language  religion  bars  stripes  \
0            4      0    648          16         1         2     0        6   
1            2      0     29           3         6         6     0        0   
2            3      0   2388          20         8         2     2        0   
3            5      2      0           0         0         1     0        0   
4            2      0      0           0         6         0     3        0   
..         ...    ...    ...         ...       ...       ...   ...      ...   
189          5      2      3           0         0         1     0        0   
190          2      0    256          22         6         6     0        6   
191          3      1    905          28         1         5     0        0   
192          3      1    753           6         1         5     3        0   
193          3      1    391           8         1         5     0       10   

     colours  red  ...  saltire

In [None]:
df.dtypes

Unnamed: 0,0
1landmass,int64
2zone,int64
3area,int64
population,int64
language,int64
religion,int64
bars,int64
stripes,int64
colours,int64
red,int64


In [None]:
# Define features (X) and target labels (y)
X = df.iloc[:, :-7]  # All columns except last 7
y = df.iloc[:, -7:]  # Last 7 columns as labels

# Convert categorical features into numerical values using one-hot encoding
X = pd.get_dummies(X, drop_first=True)

# Standardize numerical features
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

# Train-test split
X_train, X_test, y_train, y_test = train_test_split(X_scaled, y, test_size=0.2, random_state=42)


In [None]:
# Initialize KNN model
knn = KNeighborsClassifier(n_neighbors=5)

# Multi-label classification wrapper
multi_knn = MultiOutputClassifier(knn)

# Train the model
multi_knn.fit(X_train, y_train)


In [None]:
# Predict on test set
y_pred = multi_knn.predict(X_test)


In [None]:
# Accuracy score (for each label)
for i, col in enumerate(y.columns):
    acc = accuracy_score(y_test[col], y_pred[:, i])
    print(f"Accuracy for {col}: {acc:.2f}")



Accuracy for crescent: 0.97
Accuracy for triangle: 0.87
Accuracy for icon: 0.79
Accuracy for animate: 0.87
Accuracy for text: 0.95
Accuracy for topleft: 0.54
Accuracy for botright: 0.67
