# Ordinal Classification

In the paper ["Ordinal Regression by Extended Binary Classification" by L. Li and H.T. Lin](https://papers.nips.cc/paper/2006/file/019f8b946a256d9357eadc5ace2c8678-Paper.pdf),
a general framework for decomposing an ordinal classification problem into a set of binary problems is introduced. Read this paper carefully (you may skip Section 4 about generalization bounds), and then answer the following questions:

## Problem (A) Loss Function

The approach makes use of a cost matrix $\mathcal{C}$, which is supposed to be V-shaped. The V-shaped cost-matrix covers the $0/1$ and the $L_1$ losses as special cases. 

Create two loss functions namely `accuracy_loss` and `l1_loss` that return the cost matrix as `np.ndarray` if $K$ classes are given as input.

In [1]:
import warnings
warnings.filterwarnings("ignore")

import numpy as np

def accuracy_loss(n_classes: int):
    if n_classes < 1:
        return None
    
    cost_matrix = np.ones((n_classes, n_classes), dtype=int) - np.eye(n_classes, dtype=int)
    return cost_matrix

def l1_loss(n_classes):
    if n_classes < 1:
        return None
    
    cost_matrix = np.zeros((n_classes, n_classes), dtype=int)
    for i in range(n_classes):
        for j in range(n_classes):
            if i!=j:
                cost_matrix[i][j] = np.abs(i-j)
    return cost_matrix
    

print("Accuracy Loss:\n {}\n\n L1 Loss:\n {}".format(accuracy_loss(4), l1_loss(4)))

Accuracy Loss:
 [[0 1 1 1]
 [1 0 1 1]
 [1 1 0 1]
 [1 1 1 0]]

 L1 Loss:
 [[0 1 2 3]
 [1 0 1 2]
 [2 1 0 1]
 [3 2 1 0]]


## Problem (B) Ordinal Binary Framework
For illustration, suppose the following training data to be given:
$$
\begin{array}{ccc}
x_1 & x_2 & y \\
\hline
0 & 1 & c_1 \\
1 & 2 & c_1 \\
3 & 5 & c_2 \\
2 & 2 & c_2 \\
4 & 2 & c_3 \\
6 & 0 & c_3 \\
\end{array}
$$
Moreover, let the cost matrix be given as
$$
\mathcal{C} = \left(
\begin{array}{ccc}
0 & 2 & 2 \\
1 & 0 & 2 \\
4 & 1 & 0 
\end{array}
\right) \enspace .
$$
How do the training data sets for all (weighted) binary classification problems generated by the framework look like?

In [9]:
import numpy as np
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import *
from sklearn.base import TransformerMixin

class OrdinalBinaryTransformer:
    """
    This is not a proper transformer[1] since the transformation returns not only the transformed X but also y and w.
    
    [1] https://scikit-learn.org/stable/glossary.html#term-transform
    """
    def __init__(self, cost_matrix):
        self.n_classes = len(cost_matrix)
        self.cost_matrix = cost_matrix
    
    def fit_transform(self, X, y):
        """
        Transform the features and labels according to the new ordinal framework in the given paper.

        Parameter
        ---------
        X: Features
        y: Class labels

        Returns
        -------
        Xt: Transformed featured according to new framework
        yt: Transformed class labels according to new framework
        w: Transformed weights
        """
        y_dupl = pd.DataFrame(np.repeat(y.values, self.n_classes - 1, axis=0))
        x_dupl = pd.DataFrame(np.repeat(X.values, self.n_classes - 1, axis=0), columns=X.columns)
        x_dupl['x3'] = [i for i in range(1, self.n_classes)] * X.shape[0]
        
        print(y_dupl)
        print(x_dupl)
        
        yt = 2 * (x_dupl['x3'] < y_dupl[0]) - 1
        x_assign = x_dupl.assign(w=np.abs(cost_matrix[y_dupl[0] - 1, x_dupl['x3'] - 1] - cost_matrix[y_dupl[0] - 1, x_dupl['x3']]))
        
        print(x_assign)
        
        Xt = x_assign.iloc[:,:-1]
        w = x_assign.iloc[:,-1]
        return Xt, yt, w

The following test cells will help you check whether the implemented solution matches the actual framework results.

In [10]:
# TEST CELL 1
from io import StringIO
import pandas as pd

# Create a dataset
data = pd.read_csv(StringIO(
    """
    x1,x2,target
    0,1,1
    1,2,1
    3,5,2
    2,2,2
    4,2,3
    6,0,3
    """
  )
)

# Separate the features and labels
X = data.iloc[:,:-1]
y = data.iloc[:,-1]

# Create the Cost matrix
cost_matrix = np.array([0, 2, 2, 1, 0, 2, 4, 1, 0]).reshape((3,3)) 

transformer = OrdinalBinaryTransformer(cost_matrix)
Xt, yt, w = transformer.fit_transform(X, y)

print(Xt)
print(yt)
print(w)

dataT = set([tuple(np.array(e,dtype=int)) for e in pd.concat([Xt, yt, w], 1).itertuples(index=False)])

# The length of Xt, yt and w should be same and since the transformed example 
# contains more than two class labels, the length can be equal and greater than 
# the original dataset
assert len(Xt) == len(yt) == len(w) >= 6

# Desired results
print(dataT == 
      {
          (0, 1, 1, -1, 2),
          (0, 1, 2, -1, 0),
          (1, 2, 1, -1, 2),
          (1, 2, 2, -1, 0),
          (2, 2, 1, 1, 1),
          (2, 2, 2, -1, 2),
          (3, 5, 1, 1, 1),
          (3, 5, 2, -1, 2),
          (4, 2, 1, 1, 3),
          (4, 2, 2, 1, 1),
          (6, 0, 1, 1, 3),
          (6, 0, 2, 1, 1)
          }
      )


    0
0   1
1   1
2   1
3   1
4   2
5   2
6   2
7   2
8   3
9   3
10  3
11  3
        x1  x2  x3
0        0   1   1
1        0   1   2
2        1   2   1
3        1   2   2
4        3   5   1
5        3   5   2
6        2   2   1
7        2   2   2
8        4   2   1
9        4   2   2
10       6   0   1
11       6   0   2
        x1  x2  x3  w
0        0   1   1  2
1        0   1   2  0
2        1   2   1  2
3        1   2   2  0
4        3   5   1  1
5        3   5   2  2
6        2   2   1  1
7        2   2   2  2
8        4   2   1  3
9        4   2   2  1
10       6   0   1  3
11       6   0   2  1
        x1  x2  x3
0        0   1   1
1        0   1   2
2        1   2   1
3        1   2   2
4        3   5   1
5        3   5   2
6        2   2   1
7        2   2   2
8        4   2   1
9        4   2   2
10       6   0   1
11       6   0   2
0    -1
1    -1
2    -1
3    -1
4     1
5    -1
6     1
7    -1
8     1
9     1
10    1
11    1
dtype: int32
0     2
1     0
2     2
3     0
4

In [11]:
# TEST CELL 2
def test_accuracy_loss(n_classes: int):
    return np.ones((n_classes, n_classes)) - np.eye(n_classes)

# Create a dataset
data = pd.read_csv(StringIO(
    """
    x1,x2,target
    0,1,1
    1,2,1
    3,5,2
    2,2,2
    4,2,3
    6,0,3
    6,1,4
    8,-1,4
    """
  )
)

# Separate the features and labels
X = data.iloc[:,:-1]
y = data.iloc[:,-1]

# Create the Cost matrix from the accuracy loss function
cost_matrix = test_accuracy_loss(4)

transformer = OrdinalBinaryTransformer(cost_matrix)
Xt, yt, w = transformer.fit_transform(X, y)
dataT = set([tuple(np.array(e, dtype=int)) for e in pd.concat([Xt, yt, w], 1).itertuples(index=False)])

# Desired results
print(dataT == 
      {
          (0, 1, 1, -1, 1),
          (0, 1, 2, -1, 0),
          (0, 1, 3, -1, 0),
          (1, 2, 1, -1, 1),
          (1, 2, 2, -1, 0),
          (1, 2, 3, -1, 0),
          (2, 2, 1, 1, 1),
          (2, 2, 2, -1, 1),
          (2, 2, 3, -1, 0),
          (3, 5, 1, 1, 1),
          (3, 5, 2, -1, 1),
          (3, 5, 3, -1, 0),
          (4, 2, 1, 1, 0),
          (4, 2, 2, 1, 1),
          (4, 2, 3, -1, 1),
          (6, 0, 1, 1, 0),
          (6, 0, 2, 1, 1),
          (6, 0, 3, -1, 1),
          (6, 1, 1, 1, 0),
          (6, 1, 2, 1, 0),
          (6, 1, 3, 1, 1),
          (8, -1, 1, 1, 0),
          (8, -1, 2, 1, 0),
          (8, -1, 3, 1, 1)
          }
      )

    0
0   1
1   1
2   1
3   1
4   1
5   1
6   2
7   2
8   2
9   2
10  2
11  2
12  3
13  3
14  3
15  3
16  3
17  3
18  4
19  4
20  4
21  4
22  4
23  4
        x1  x2  x3
0        0   1   1
1        0   1   2
2        0   1   3
3        1   2   1
4        1   2   2
5        1   2   3
6        3   5   1
7        3   5   2
8        3   5   3
9        2   2   1
10       2   2   2
11       2   2   3
12       4   2   1
13       4   2   2
14       4   2   3
15       6   0   1
16       6   0   2
17       6   0   3
18       6   1   1
19       6   1   2
20       6   1   3
21       8  -1   1
22       8  -1   2
23       8  -1   3
        x1  x2  x3    w
0        0   1   1  1.0
1        0   1   2  0.0
2        0   1   3  0.0
3        1   2   1  1.0
4        1   2   2  0.0
5        1   2   3  0.0
6        3   5   1  1.0
7        3   5   2  1.0
8        3   5   3  0.0
9        2   2   1  1.0
10       2   2   2  1.0
11       2   2   3  0.0
12       4   2   1  0.0
13       4   2   2  1.0
14       4   2 

## Problem (C) Ordinal Classification

Let's put everything together. In this part of the exercise, you have to perform the following taks:

- Read training and testing data from `train_pasture.csv` and `test_pasture.csv` respectively using the `read` function.
- Use `accuracy_loss` or `l1_loss` from part (A) to create the cost matrix
- Transform training and testing dataset using `OrdinalBinaryTransformer`.
- Fit binary classifier (eg. [`DecisionTreeClassifier`](https://scikit-learn.org/stable/modules/generated/sklearn.tree.DecisionTreeClassifier.html#sklearn.tree.DecisionTreeClassifier) etc) as the base learner and predict on the test set.
- Calculate weighted test accuracy with respect to cost matrix.
- Transform prediction back into an ordinal classification.

In [None]:
import pandas as pd
from sklearn.tree import DecisionTreeClassifier
from sklearn import metrics

def read(filename):
    df = pd.read_csv(filename, sep=' ', header=None)
    X = df.iloc[:,:-1]
    y = df.iloc[:,-1]
    return X, y

X_train,y_train = read(filename='train_pasture.csv')
X_test,y_test = read(filename='test_pasture.csv')

cost_matrix = l1_loss(pd.get_dummies(y_train).shape[1])
transformer = OrdinalBinaryTransformer(cost_matrix)

X_train_transform, y_train_transform, w_train_transform = transformer.fit_transform(X_train, y_train)
X_test_transform, y_test_transform, w_test_transform = transformer.fit_transform(X_test, y_test)

dt = DecisionTreeClassifier(max_depth=3)
dt.fit(X_train_transform, y_train_transform)
y_pred = dt.predict(X_test_transform)

print("Accuracy:",metrics.accuracy_score(y_test_transform, y_pred))

Accuracy: 0.9444444444444444
