# ZeroR and OneR ##

We are using diabetes dataset to try several approaches for classification. The aim of this project is to understand these basic approaches/algorithms rather than achieving high accuracies. The algorithms which we are going to learn about are:

1. [ ZeroR Classifier](#ZeroR-Classifier)
2. [ OneR Classifier](#OneR-Classifier)

Let's get started

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns, numpy as np

In [None]:
data = pd.read_csv("../input/diabetes-dataset/diabetes2.csv")
data.head()

## ZeroR Classifier ##

ZeroR or zero rule classifier is a naive approach to classify a dataset. It is purely based on the target and ignores the other independent attributes. 

Application: The ZeroR classifier is used as a baseline for other classifiers. Other classifiers should do better than this to proove their capabilities.

Let's see an example of this.

In [None]:
data['Outcome'].value_counts()

In [None]:
sns.distplot(data['Outcome'],bins=3,kde=False)
plt.title("Analysing ZeroR")
plt.xticks([0,1])
plt.show()

Now, if i say that I have a rule to classify the dataset. The rule is :

`Outcome = 0 `

How much accurate is it? Note that 500 times the outcome is 0 and 268 times it is 1.

    = 500 / (500+268)
    
    = 500 / 768

    = 0.651

Thus, the accuracy is 65.1% . Other classifers will have to do better than this.

**Note** : I have not divided the data set into training and test-set to make everything simple and easy to understand.

## OneR Classifier ##



Unlike zeroR, oneR considers each of the attributes. It make rules for each attribute, and selectes the rule which generates highest accuracy. The algorithm is:

            
    For each predictor,

         For each value of that predictor, make a rule as follows;

               Count how often each value of target (class) appears

               Find the most frequent class

               Make the rule assign that class to this value of the predictor

         Calculate the total error of the rules of each predictor
     
    Choose the predictor with the smallest total error


[Algorithm Source: Saed Sayad OneR](https://www.saedsayad.com/oner.htm)

`It basically generates one level dicision tree.`

Let's see it in action:

**Note**:
1. As we have `numerical data` over a wide range, there can be many rules. Like for attribute 'Age', rule can be age<17, age<12, age<24, age<34 ... etc. Thus, we will make 3 categories :

    0 = young, 1 = mid, 2 = old
    
2. The code will become complex and uncessarily lengthy, thus, using only `Age` attribute.    

In [None]:
# Our criteria:
# 0 = young, 1 = mid, 2 = old

column_age = []

for age in data['Age']:
    if(age < 25):
        column_age.append("0")
    elif(age>25 and age<45):
        column_age.append("1")
    else:
        column_age.append("2")

# adding a new column
data["Age_Categorical"] = column_age        

In [None]:
data.head()

In [None]:
for i in range(0,3):
    print("If Age Category: ", i, " , number of outcomes(0): ", len( data[ (data['Age_Categorical'] == str(i)) & (data['Outcome'] == 0) ]) )
    print("If Age Category: ", i, " , number of outcomes(1): ", len( data[ (data['Age_Categorical'] == str(i)) & (data['Outcome'] == 1) ] ) ,"\n")
    

So, If we have a set of rules only for attribute `Age_Categorical` which says:

    if, Age_Categorical = 0 then Outcome=0
    if, Age_Categorical = 1 then Outcome=0
    else Age_Categorical = 2 then Outcome=0,
    
the accuracy of the model will be: 

    = (188 + 211 + 101) / (188 + 31 + 211 + 157 + 101 + 80)
    = 500/768
    = 0.651
    
The accuracy will be 65.1% . Same as the zeroR . Hmmmm     

# References

1. [Saed Sayad , Zero R ](https://www.saedsayad.com/zeror.htm)
2. [How To Estimate A Baseline Performance For Your Machine Learning Models in Weka](https://machinelearningmastery.com/estimate-baseline-performance-machine-learning-models-weka/)
3. [Saed Sayad OneR](https://www.saedsayad.com/oner.htm)
4. [Youtube: MLCollab OneR](https://www.youtube.com/watch?v=Bhc838MCSDY)
