# Introduction to scikit-learn | Baseline Classifiers
🦊 `Notebook by` [Md.Samiul Alim](https://github.com/sami0055)

😋  `Machine Learning Source Codes` [GitHub](https://github.com/sami0055/Machine-Learning)

📌 `Related` [PPTX slide](https://docs.google.com/presentation/d/1gir8tFBx8T4ZXhNtN4HHZ2-o6kxG1qg_/edit?usp=sharing&ouid=114227978762288850531&rtpof=true&sd=true)

# ZeroR Classifier
Read scikit-learn [documentation](https://scikit-learn.org/stable/modules/generated/sklearn.dummy.DummyClassifier.html)

ZeroR is one of the simplest and most basic machine learning models used as a benchmark or a baseline for comparison in various experiments. Despite its simplicity, understanding ZeroR is fundamental as it provides a crucial reference point for evaluating the performance of more complex models.

The ZeroR algorithm works by predicting the most frequent class or value in the training dataset for every instance in the test dataset. In classification tasks, it simply predicts the class with the highest frequency, while in regression tasks, it predicts the mean or median value of the target variable across the training data.

Here's how ZeroR works in more detail:

1. **Training Phase**: 
   - ZeroR doesn't involve any learning or training process. Instead, during the training phase, it only analyzes the target variable's distribution in the training dataset.
   - For classification tasks, it identifies the most frequent class label.
   - For regression tasks, it calculates the mean or median of the target variable.

2. **Prediction Phase**:
   - Once the most frequent class or value is determined during the training phase, ZeroR uses this information to make predictions on unseen data.
   - In classification tasks, it assigns the most frequent class label to all instances in the test dataset.
   - In regression tasks, it assigns the calculated mean or median value to all instances in the test dataset.

While ZeroR serves as a straightforward baseline, it has several limitations:
- It doesn't take into account any features or attributes of the data, making it oblivious to any patterns or relationships.
- It disregards the complexity and nuances of the dataset, leading to overly simplistic predictions.
- ZeroR's predictive power is limited, and its performance is usually poor compared to more sophisticated models.

Despite these limitations, ZeroR is valuable in machine learning for the following reasons:
- It provides a baseline performance metric for comparing more complex models. Any model that performs worse than ZeroR is considered ineffective.
- It serves as a quick and simple approach to establish a minimal level of performance expectation.
- It highlights the importance of feature selection and model evaluation by demonstrating the consequences of ignoring data characteristics.

In summary, ZeroR is a fundamental concept in machine learning, emphasizing the importance of model evaluation and the necessity of surpassing basic benchmarks to create effective predictive models.

## Load Libraries

In [1]:
import numpy as np
import pandas as pd

In [2]:
data = {
    'Outlook': ["Rainy", "Rainy", "Overcast", "Sunny", "Sunny", "Sunny",
               "Overcast", "Rainy", "Rainy", "Sunny", "Rainy", "Overcast",
               "Overcast", "Sunny"],

    'Temp': ["Hot", "Hot", "Hot", "Mild", "Cool", "Cool", "Cool", "Mild",
            "Cool", "Mild", "Mild", "Mild", "Hot", "Mild"],

    'Humidity': ["High", "High", "High", "High", "Normal", "Normal", "Normal",
                "High", "Normal", "Normal", "Normal", "High", "Normal", "High"],

    'Windy': ["False", "True", "False", "False", "False", "True", "True",
             "False", "False", "False", "True", "True", "False", "True"],

    'Play Golf': ["No", "No", "Yes", "Yes", "Yes", "No", "Yes", "No", "Yes",
                 "Yes", "Yes", "Yes", "Yes", "No"]
}

In [3]:
df=pd.DataFrame(data)
df

Unnamed: 0,Outlook,Temp,Humidity,Windy,Play Golf
0,Rainy,Hot,High,False,No
1,Rainy,Hot,High,True,No
2,Overcast,Hot,High,False,Yes
3,Sunny,Mild,High,False,Yes
4,Sunny,Cool,Normal,False,Yes
5,Sunny,Cool,Normal,True,No
6,Overcast,Cool,Normal,True,Yes
7,Rainy,Mild,High,False,No
8,Rainy,Cool,Normal,False,Yes
9,Sunny,Mild,Normal,False,Yes


In [4]:
df['Play Golf'].value_counts()

Yes    9
No     5
Name: Play Golf, dtype: int64

In [5]:
test_data = {
    'Outlook': ["Rainy", "Overcast", "Rainy"],
    'Temp': ["Hot", "Hot", "Hot"],
    'Humidity': ["Normal", "Normal", "Normal"],
    'Windy': ["True", "True", "False"],

    'Play Golf': ["No", "No", "Yes"]
}
test = pd.DataFrame(test_data)
test

Unnamed: 0,Outlook,Temp,Humidity,Windy,Play Golf
0,Rainy,Hot,Normal,True,No
1,Overcast,Hot,Normal,True,No
2,Rainy,Hot,Normal,False,Yes


In [6]:
test['Play Golf'].value_counts()

No     2
Yes    1
Name: Play Golf, dtype: int64

In [7]:
x_train=df.drop(columns='Play Golf')
y_train=df['Play Golf']
x_test=test.drop(columns='Play Golf')
y_test=test['Play Golf']

# ZeroR Classifier using Sklearn

In [10]:
# Step-1: Import the model from sklearn
from sklearn.dummy import DummyClassifier
#Step-2: Create an instance or object of the model
model = DummyClassifier(strategy='most_frequent')
#Step-3 Fit the model with train set
model.fit(x_train,y_train)

In [12]:
# Step-4: Test the model
y_pred=model.predict(x_test)

In [13]:
# Step-5: Evaluate The Model
from sklearn.metrics import accuracy_score

In [15]:
print('Test accuracy: ',accuracy_score(y_test,y_pred))
print('Train accuracy: ',accuracy_score(y_train,model.predict(x_train)))

Test accuracy:  0.3333333333333333
Train accuracy:  0.6428571428571429


## Thank YOu