### What is dummy Classifier ? 
A dummy classifier is a simple machine learning model that makes predictions using basic rules, without actually learning from the input data. It serves as a baseline for comparing the performance of more complex models.

In [1]:
# Import libraries
from sklearn.model_selection import train_test_split
import pandas as pd

In [7]:
# Make a dataset
dataset_dict = {
    'Outlook': ['sunny', 'sunny', 'overcast', 'rain', 'rain', 'rain', 'overcast', 'sunny', 'sunny', 'rain', 'sunny', 'overcast', 'overcast', 'rain', 'sunny', 'overcast', 'rain', 'sunny', 'sunny', 'rain', 'overcast', 'rain', 'sunny', 'overcast', 'sunny', 'overcast', 'rain', 'overcast'],
    'Temperature': [85.0, 80.0, 83.0, 70.0, 68.0, 65.0, 64.0, 72.0, 69.0, 75.0, 75.0, 72.0, 81.0, 71.0, 81.0, 74.0, 76.0, 78.0, 82.0, 67.0, 85.0, 73.0, 88.0, 77.0, 79.0, 80.0, 66.0, 84.0],
    'Humidity': [85.0, 90.0, 78.0, 96.0, 80.0, 70.0, 65.0, 95.0, 70.0, 80.0, 70.0, 90.0, 75.0, 80.0, 88.0, 92.0, 85.0, 75.0, 92.0, 90.0, 85.0, 88.0, 65.0, 70.0, 60.0, 95.0, 70.0, 78.0],
    'Wind': [False, True, False, False, False, True, True, False, False, False, True, True, False, True, True, False, False, True, False, True, True, False, True, False, False, True, False, False],
    'Play': ['No', 'No', 'Yes', 'Yes', 'Yes', 'No', 'Yes', 'No', 'Yes', 'Yes', 'Yes', 'Yes', 'Yes', 'No', 'No', 'Yes', 'Yes', 'No', 'No', 'No', 'Yes', 'Yes', 'Yes', 'Yes', 'Yes', 'Yes', 'No', 'Yes']
}
df = pd.DataFrame(dataset_dict)

In [8]:
df.head()

Unnamed: 0,Outlook,Temperature,Humidity,Wind,Play
0,sunny,85.0,85.0,False,No
1,sunny,80.0,90.0,True,No
2,overcast,83.0,78.0,False,Yes
3,rain,70.0,96.0,False,Yes
4,rain,68.0,80.0,False,Yes


In [9]:
# One-hot Encode 'Outlook' Column
df = pd.get_dummies(df, columns=['Outlook'],  prefix='', prefix_sep='', dtype=int)

In [10]:
df.head()

Unnamed: 0,Temperature,Humidity,Wind,Play,overcast,rain,sunny
0,85.0,85.0,False,No,0,0,1
1,80.0,90.0,True,No,0,0,1
2,83.0,78.0,False,Yes,1,0,0
3,70.0,96.0,False,Yes,0,1,0
4,68.0,80.0,False,Yes,0,1,0


In [11]:
# Convert 'Windy' (bool) and 'Play' (binary) Columns to 0 and 1
df['Wind'] = df['Wind'].astype(int)
df['Play'] = (df['Play'] == 'Yes').astype(int)

In [12]:
# Set feature matrix X and target vector y
X, y = df.drop(columns='Play'), df['Play']

In [13]:
# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, train_size=0.5, shuffle=False)

### Training Steps
The “training” process for a dummy classifier is quite simple and doesn’t involve the usual learning algorithms. Here’s a general outline:

1. select strategy
-   Stratified: Makes random guesses based on the original class distribution.
-   Most Frequent: Always picks the most common class.
-   Uniform: Randomly picks any class.



In [14]:
from sklearn.dummy import DummyClassifier
# Choose a strategy for your DummyClassifier (e.g., 'most_frequent', 'stratified', etc.)
strategy = 'most_frequent'

### 2. Collect Training Labels
Collect the class labels from the training dataset to determine the strategy parameters.

In [15]:
# Initialize the DummyClassifier
dummy_clf = DummyClassifier(strategy=strategy)
# "Train" the DummyClassifier (although no real training happens)
dummy_clf.fit(X_train, y_train)

### 3. Apply Strategy to Test Data
Use the chosen strategy to generate a list of predicted labels for your test data.


In [16]:
# Use the DummyClassifier to make predictions
y_pred = dummy_clf.predict(X_test)
print("Label     :",list(y_test))
print("Prediction:",list(y_pred))

Label     : [0, 1, 1, 0, 0, 0, 1, 1, 1, 1, 1, 1, 0, 1]
Prediction: [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]


### Evaluate the Model

In [17]:
# Evaluate the DummyClassifier's accuracy
from sklearn.metrics import accuracy_score

accuracy = accuracy_score(y_test, y_pred)
print(f"Dummy Classifier Accuracy: {round(accuracy,4)*100}%")

Dummy Classifier Accuracy: 64.29%
