# Introduction to scikit-learn | Baseline Classifiers

🤖 `Notebook by` [Ihsanul Haque](https://www.linkedin.com/in/ihsanul09/)

✅ `Machine Learning Source Codes` [GitHub](https://https://github.com/ihsanulcode/ML-Batch-2)

📌 `Machine Learning from Scratch` [Course Outline](https://https://docs.google.com/document/d/15mGNTUSlWQsy4TzcLZUdYedpCMO5KiVq1USaDprHaIc/edit?usp=sharing)

📌 `Related` [PPTX slide](https://docs.google.com/presentation/d/1gir8tFBx8T4ZXhNtN4HHZ2-o6kxG1qg_/edit?usp=sharing&ouid=114227978762288850531&rtpof=true&sd=true)

# ZeroR Classifier
Read scikit-learn [documentation](https://scikit-learn.org/stable/modules/generated/sklearn.dummy.DummyClassifier.html)

# Load Libraries

In [None]:
import numpy as np
import pandas as pd

In [None]:
# Create train dataset

data = {
    'Outlook': ["Rainy", "Rainy", "Overcast", "Sunny", "Sunny", "Sunny",
               "Overcast", "Rainy", "Rainy", "Sunny", "Rainy", "Overcast",
               "Overcast", "Sunny"],

    'Temp': ["Hot", "Hot", "Hot", "Mild", "Cool", "Cool", "Cool", "Mild",
            "Cool", "Mild", "Mild", "Mild", "Hot", "Mild"],

    'Humidity': ["High", "High", "High", "High", "Normal", "Normal", "Normal",
                "High", "Normal", "Normal", "Normal", "High", "Normal", "High"],

    'Windy': ["False", "True", "False", "False", "False", "True", "True",
             "False", "False", "False", "True", "True", "False", "True"],

    'Play Golf': ["No", "No", "Yes", "Yes", "Yes", "No", "Yes", "No", "Yes",
                 "Yes", "Yes", "Yes", "Yes", "No"]
}


df = pd.DataFrame(data)
df

Unnamed: 0,Outlook,Temp,Humidity,Windy,Play Golf
0,Rainy,Hot,High,False,No
1,Rainy,Hot,High,True,No
2,Overcast,Hot,High,False,Yes
3,Sunny,Mild,High,False,Yes
4,Sunny,Cool,Normal,False,Yes
5,Sunny,Cool,Normal,True,No
6,Overcast,Cool,Normal,True,Yes
7,Rainy,Mild,High,False,No
8,Rainy,Cool,Normal,False,Yes
9,Sunny,Mild,Normal,False,Yes


In [None]:
df['Play Golf'].value_counts()

Yes    9
No     5
Name: Play Golf, dtype: int64

In [None]:
# Create test dataset

test_data = {
    'Outlook': ["Rainy", "Overcast", "Rainy"],
    'Temp': ["Hot", "Hot", "Hot"],
    'Humidity': ["Normal", "Normal", "Normal"],
    'Windy': ["True", "True", "False"],

    'Play Golf': ["No", "No", "Yes"]
}
test = pd.DataFrame(test_data)
test

Unnamed: 0,Outlook,Temp,Humidity,Windy,Play Golf
0,Rainy,Hot,Normal,True,No
1,Overcast,Hot,Normal,True,No
2,Rainy,Hot,Normal,False,Yes


In [None]:
test['Play Golf'].value_counts()

No     2
Yes    1
Name: Play Golf, dtype: int64

# Split the Dataset

In [None]:
x_train = df.drop(columns='Play Golf')
y_train = df['Play Golf']

x_test = test.drop(columns='Play Golf')
y_test = test['Play Golf']

# ZeroR Classifier using Sklearn

In [None]:
# Step-1 : Import the model from sklearn
from sklearn.dummy import DummyClassifier

# Step-2 : Create an instance or object of the model
model = DummyClassifier(strategy='most_frequent')

In [None]:
# Step-3: Fit the model with train set
model.fit(x_train,y_train)

In [None]:
# Step-4 : Test the model
y_pred = model.predict(x_test)

In [None]:
# Step-5 : Evaluate the model
from sklearn.metrics import accuracy_score

print("Test Accuracy: ",accuracy_score(y_test,y_pred))
print("Train Accuracy: ",accuracy_score(y_train, model.predict(x_train)))

Test Accuracy:  0.3333333333333333
Train Accuracy:  0.6428571428571429


# Thank you
© [Dataque Academy](https://www.facebook.com/dataque.academy)