# OneR Algorithm for Classification

## One Sentence Summary

The OneR algorithm for classification is one of the simplest rule-based classifiers, selecting the feature with the lowest classification error and assigning the most frequent class within each feature value to make predictions.

## General Summary

The OneR classifier is one of the simplest machine learning algorithms for classification with categorical data. It works by evaluating each feature using contingency tables to find the feature that separates classes with the least error. Once the best feature is identified, the majority class in each group of the best feature is used as the rule for classifying. This makes it ideal for situations where interpretability is important, but it may sacrifice some accuracy compared to more complex models. Its appeal lies in its ability to quickly highlight the most important feature in a dataset.

## Full Discussion

The OneR classifier (short for one rule) is a simple yet effective rule-based algorithm designed for classification tasks, particularly when working with **categorical data**. At its core, OneR operates on the principle that a single feature can often provide sufficient information to make accurate predictions, provided that feature is chosen carefully. This simplicity is what makes OneR a popular choice for scenarios where interpretability, transparency, and ease of explainability are prioritized over complex modeling. By relying on a single, well-chosen rule, OneR ensures that its decisions are easy to understand in any application where model explainability is essential.

The algorithm works in two main phases: feature selection and rule creation. During the fitting phase, OneR evaluates each feature in the dataset by constructing **contingency tables**. This is a process that quantifies how well each feature separates the classes. For each feature, the algorithm groups the data by the feature's unique values and calculates the **majority class** within each group. It then determines how many misclassifications would occur if that feature were used to make predictions. The feature with the lowest qunatity of errors is selected as the best feature, and the corresponding rules are established. This process ensures that the model is not only simple but also empirically grounded, as it explicitly minimizes the risk of incorrect predictions based on the data itself.

Once the best feature is identified, the prediction phase becomes straightforward: for any new input, the model simply looks up the value of the best feature and applies the corresponding rule to assign a class label. This approach eliminates the need for complex calculations or probabilistic reasoning, making OneR one of the most computationally efficient classifiers available. However, this simplicity comes with a trade-off: while OneR excels in interpretability and ease of use, it may sacrifice accuracy compared to more sophisticated models that leverage multiple features or nonlinear relationships. Despite this limitation, OneR remains a valuable tool for exploratory analysis, feature selection, and scenarios where a quick, interpretable baseline is needed before deploying more complex models.

The following implementation of OneR has been created to operate like a sklearn classifier, with methods such as `fit`, `predict`, and `score` that align with the standard machine learning workflow.

Following this implementation of `OneR` we will apply it to a subset of the popular machine learning *Titanic* toy dataset.

In [63]:
import numpy as np
import pandas as pd


class OneR:
    def __init__(self):
        self.best_feature = None
        self.rule = {}
        self.classes = None
        self.ftables = {}

    def fit(self, X, y):
        """
        Fit the OneR model according to the training data.

        Parameters:
        -----------
        X : pd.DataFrame
            Feature matrix (training data).
        y : pd.Series or np.ndarray
            Target vector (class labels).

        Returns:
        --------
        self : object
            Returns self.
        """
        if not isinstance(X, pd.DataFrame):
            X = pd.DataFrame(X)
        if not isinstance(y, pd.Series):
            y = pd.Series(y)

        self.classes = np.unique(y)

        best_error = float('inf')
        best_rule = {}
        best_feature = None

        for feature in X.columns:
            # Create a temporary DataFrame with the feature and target
            temp = pd.concat([X[feature], y], axis=1)
            grouped = temp.groupby(feature)

            rule = {}
            total_error = 0

            for name, group in grouped:
                # Get the majority class in this group
                majority_class = group.iloc[:, 1].mode()[0]
                rule[name] = majority_class
                maj_total = (group.iloc[:, 1] == majority_class).sum()
                # Count misclassifications
                total_error += len(group) - maj_total

            # Add frequency table to ftables
            freq_table = grouped[y.name].value_counts().unstack(fill_value=0)
            self.ftables[feature] = freq_table

            # Check if this feature is better
            if total_error < best_error:
                best_error = total_error
                best_rule = rule
                best_feature = feature

        self.best_feature = best_feature
        self.rule = best_rule

        return self

    def predict(self, X):
        """
        Predict class labels for samples in X.

        Parameters:
        -----------
        X : pd.DataFrame or np.ndarray
            Feature matrix (test data).

        Returns:
        --------
        y_pred : np.ndarray
            Predicted class labels.
        """
        if self.best_feature is None:
            raise ValueError("Model not fitted yet. Call 'fit' first.")

        if not isinstance(X, pd.DataFrame):
            X = pd.DataFrame(X)

        predictions = []

        for _, row in X.iterrows():
            feature_value = row[self.best_feature]
            # Use the best rule to predict the class
            pred = self.rule.get(feature_value, 0)
            predictions.append(pred)

        return np.array(predictions)

    def score(self, X, y):
        """
        Return the mean accuracy on the given test data and labels.

        Parameters:
        -----------
        X : pd.DataFrame or np.ndarray
            Feature matrix (test data).
        y : pd.Series or np.ndarray
            True labels for X.

        Returns:
        --------
        accuracy : float
            Mean accuracy of the classifier.
        """
        y_pred = self.predict(X)
        return np.mean(y_pred == y)

## Example

The following code demonstrates how the OneR classifier can be applied to a subset of categorical features from the popular machine learning Titanic dataset. The dataset contains information about passengers aboard the Titanic, including the following categorical features we'll use with OneR:

 - Sex
    - Female
    - Male
 - Pclass (passenger class)
    - 1 (1st class, most expensive)
    - 2 (2nd class)
    - 3 (3rd class, cheapest)
 - Embarked (port of embarkation)
    - C (Cherbourg)
    - Q (Queenstown)
    - S (Southampton)

These features are set to `X` in the following code. The target column "Survived" is set to `y` and is either 1 (Survived the titanic) or 0 if then did not.

As per standard practice we are splitting the data with `train_test_split` from `sklearn`. Then, we create a `OneR` object, fit with the testing data, and finally predict and score from the testing data.

In [72]:
from sklearn.model_selection import train_test_split


df = pd.read_csv("datasets/titanic/train.csv")

X = df[["Sex", "Pclass", "Embarked"]]
y = df["Survived"]

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=42)

model = OneR()
model.fit(X_train, y_train)

y_pred = model.predict(X_test)
accuracy = model.score(X_test, y_test)

print(f"Accuracy: {accuracy:.3f}\n")
print(f"Best Feature: Rule\n{model.best_feature}: {model.rule}")

Accuracy: 0.785

Best Feature: Rule
Sex: {'female': 1, 'male': 0}


From the testing and training with `OneR` we can see that the best feature for classification is *Sex* and the rule is that females will survive and males will not. This rule is accurate 78.5% of the time from the testing.

To better understand this let's examine the contingency tables.

In [73]:
for feature in model.ftables.keys():
    print(f"{model.ftables[feature]}\n")

Survived    0    1
Sex               
female     59  170
male      356   83

Survived    0   1
Pclass           
1          59  93
2          73  68
3         283  92

Survived    0    1
Embarked          
C          52   65
Q          37   21
S         326  165



The first table for the *Sex* feature shows 170 females survived, while 356 males did not survive, and as stated previously this was identified as the best feature of the dataset. The rule was therefore established as females survive (classify as 1), males do not survive (classify as 0). With this rule established we can see that 356 males and 170 females would be correctly classified, leaving $59+83=142$ misclassifications.

If we follow the same logic and use the mode (majority) of each row of the other repective tables to establish rules for both features we would get that for *Pclass* only class 1 passengers would be classified as survivors, while for the *Embarked* feature only passengers that embarked from Cherbourg (C) would be classified as survivors. That would leave $59+68+92=219$ misclassifications for *Pclass*, and $52+21+165=238$ misclassifications for *Embarked*. Since the *Sex* feature had the fewest misclassifications it was selected as the best feature of the dataset. Testing `OneR` on the testing data gave the score of $0.785$, so we would expect for this to be the best possible accuracy of all of these features.

## Conclusion

OneRâ€™s strengths in interpretability and speed make it a practical choice for quick prototyping or initial data exploration. While it may not match the predictive accuracy of more complex classifiers, its simplicity and computational efficiency make it an excellent choice for exploratory analysis or just a baseline benchmarking. By balancing simplicity with utility, OneR highlights the value of pragmatic approaches when classifying. Sometimes the best solution can the simplest one.