## Single or Multiclass Label Dumb Classifier

Predicts a given set of `pred_vals` on any dataset based on their appearance frequency `p` in the fitted training set.

Concretely, the training and inference steps are:
- Training:
    - Initialize `DumbClassifier` class by optionally providing `pred_vals`. Note that if not provided, the model will infer the possible values to predict from training labels.
    - Fit the model, where infers `p`. Given a training label set `y`, the function `fit()` computes the frequency of labels, and normalizes them as a form of probability.
    
    
- Inference
    - To predict, provide any test set X, and the function `predict()` will select any of the values in `pred_vals` with their respective probability of appearance in the training set `p`.
    
    
### Next Steps / TODO

* 

In [None]:
class DumbClassifier:
    """Classifier that predicts labels based on their appearance in the training set"""
    
    def __init__(self, pred_vals = None):
        self.p = p
        self.pred_vals = pred_vals
        self.fitted = False
        
    def fit(self, X, y, override_vals=False):
        
        vals, counts = np.unique(y, return_counts=True)    
        self.p = counts / y.shape[0]
        if override_vals:
            self.pred_vals = vals
        
        self.fitted = True
        
    def predict(self, X):
        if self.fitted:
            preds = np.random.choice(self.pred_vals, size=(X.shape[0],), p=self.p)
        else:
            raise ValueError("Classifier has not been fitted yet. Consider calling .fit() function")
        return preds
    
    def __repr__(self):
        return "DumbClassifier"

## Multilabel Dumb Classifier

The Multilabel Dumb Classifier performs a OneVsRest approach of N `DumbClassifier`, where N is the total number of labels. Each DumbClassifier performs same steps as stated in [Single or Multiclass Label Dumb Classifier](#Single-or-Multiclass-Label-Dumb-Classifier). OneVsRest basically generates N binary classifiers aiming to predict `label / other` for the given label n/N.

### Next Steps / TODO

- 

In [None]:
class MultilabelDumbClassifier:
    
    def __init__(self, class_names):
        self.class_names = class_names
        self.individual_clfs = {}
        self.fitted = False
        
    def fit(self, X, y):
        """
        - X : training set of shape (n_samples, n_features)
        - y : Binarized multilabelled y set of shape (n_samples, n_classes)
        """
        for i, class_name in enumerate(self.class_names):
            labels = y[:, i]
            clf = DumbClassifier()
            clf.fit(X, labels)
            self.individual_clfs[class_name] = clf
        self.fitted = True
        
    def predict(self, X):
        if self.fitted:
            preds = np.asarray([self.individual_clfs[class_name].predict(X) 
                                for class_name in self.class_names])
            preds = np.transpose(preds)
        else:
            raise ValueError("Classifier has not been fitted yet. Consider calling .fit() function")
        
        return preds
    
    def debug_classifier(self):
        if self.fitted:
            print("{:<40} {:<20} {:<10}".format("Class Name", "Pred Values", "Probas"))
            for class_name in self.class_names:
                clf = self.individual_clfs[class_name]
                print("{:<40} {:<20} {:<10}".format(class_name, 
                                                    str(clf.pred_vals), 
                                                    str(clf.p)))
        else:
            raise ValueError("Classifier has not been fitted yet. Consider calling .fit() function")
          
    def __repr__(self):
        return "MultilabelDumbClassifier"
