<a href="https://colab.research.google.com/github/lawsonk16/Metrics/blob/main/Hierarchical_Metrics.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Hierarchical Performance Metrics
In this notebook, I'll be investigating hierarchical precision, recall, and f1 score as described in [this article](https://towardsdatascience.com/hierarchical-performance-metrics-and-where-to-find-them-7090aaa07183).

In [None]:
import json
import os
import numpy as np
import pandas as pd

## Category Implementation

The article's author demonstartes the calculation of the metrics as shown using the following class structure:
![image.png](https://miro.medium.com/max/1400/1*Z68UG1eIK9GASUEb1nSbzg.webp)
I prefer to use COCO datasets for my machine learning efforts, so the following is that class tree represented in a COCO-style set of categories


In [None]:
cats = [
    {'id': 1, 'name': 'cat', 'supercategory': 'pet'},
    {'id': 2, 'name': 'dog', 'supercategory': 'pet'},
    {'id': 3, 'name': 'unicorn', 'supercategory': 'pet'},

    {'id': 4, 'name': 'siamese', 'supercategory': 'cat'},
    {'id': 5, 'name': 'persian', 'supercategory': 'cat'},
    {'id': 6, 'name': 'sphynx', 'supercategory': 'cat'},

    {'id': 7, 'name': 'poodle', 'supercategory': 'dog'},
    {'id': 8, 'name': 'french bulldog', 'supercategory': 'dog'},
    {'id': 9, 'name': 'dalmation', 'supercategory': 'dog'},
    {'id': 10, 'name': 'labrador', 'supercategory': 'dog'},

    {'id': 11, 'name': 'pegasus', 'supercategory': 'unicorn'},
    {'id': 12, 'name': 'rainbow unicorn', 'supercategory': 'unicorn'},
    {'id': 13, 'name': 'narwhal', 'supercategory': 'unicorn'},

    # {'id': 14, 'name': 'pet', 'supercategory': None}
]

Because array-wise operations are generally more effective and DataFrames enable that style of analysis with ease, let's convert to a DataFrame 

In [None]:
cats_df = pd.DataFrame(cats)
cats_df

Unnamed: 0,id,name,supercategory
0,1,cat,pet
1,2,dog,pet
2,3,unicorn,pet
3,4,siamese,cat
4,5,persian,cat
5,6,sphynx,cat
6,7,poodle,dog
7,8,french bulldog,dog
8,9,dalmation,dog
9,10,labrador,dog


The metrics as described in the article are calculated as follows.
![image.png](https://miro.medium.com/max/4800/1*2hhRmQx9KnUB3muTBqE5iA.webp)

She gives the following example table in the article for calculations of metrics relative to the Dalmation class:
![image.png](https://miro.medium.com/max/4800/1*LhX_q5UtpljO4vo8F8Veew.webp)

Ideally, we could do these calculations on an array-wise basis.

In [None]:
metric_id = [[9],[2]]

predictions = np.array([[9,9,9,8,12,4],
                       [2,2,2,2,3,1]])
true_labels = np.array([[10,13,9,9,9,5],
                       [2,3,2,2,2,1]])

t_i = true_labels==metric_id
p_i = predictions==metric_id

In [None]:
sum(p_i),sum(t_i)

(array([2, 2, 2, 1, 0, 0]), array([1, 0, 2, 2, 2, 0]))

In [None]:
t_and_p = np.logical_and(t_i, p_i)
sum(t_and_p)

array([1, 0, 2, 1, 0, 0])

In [None]:
sum(t_and_p)/sum(t_i)

  sum(t_and_p)/sum(t_i)


array([1. , nan, 1. , 0.5, 0. , nan])

In [None]:
sum(t_and_p)/sum(p_i)

  sum(t_and_p)/sum(p_i)


array([0.5, 0. , 1. , 1. , nan, nan])

In [None]:
sum(t_and_p)/sum(p_i)

  sum(t_and_p)/sum(p_i)


array([0.5, 0. , 1. , 1. , nan, nan])

## Automating the categories
Above I just manually entered the two-tiered category arrays and made calculations but we need to be able to automate creation of the arrays we used to make those calculations

In [None]:
def is_supercategory(cats_df):
  """Takes in cat_df (a DataFrame with coco_style categories)
  Returns an array which can be used as a mask indicating
  Whether each category is a supercategory of something else in the category list
  """
  cat_names = cats_df['name']
  is_supercategory = []

  for c in cat_names:
    is_supercategory.append(not cats_df[cats_df['supercategory']==c].empty)

  return is_supercategory

In [None]:
def add_supercategories(cats_df):
  '''
  in: cats_df, pandas dataframe with coco-style categories
  out: cats_df with new column appended for the class's full category hierarchy
  '''
  # add supercategory ids to the dataframe
  super_cats = cats_df[is_supercategory(cats_df)]
  map_dict = super_cats.set_index('name').to_dict()['id']
  cats_df['supercategory_id'] = cats_df['supercategory'].map(map_dict)

  # recursively create new data column consisting of a list of the sequence of 
  # categories a class is a part of 
  cat_supers = cats_df.set_index('id').to_dict()['supercategory_id']
  ids = list(cat_supers.keys())
  id_lists = []
  for i in ids:
    id_list = [i]
    cont= True
    while cont:
      try:
        check_id = cat_supers[id_list[-1]]
        if not pd.isna(check_id):
            id_list.append(int(check_id))
        else:
            id_lists.append(id_list)
            cont = False
      except:
        id_lists.append(id_list)
        cont = False

  cats_df['category_hierarchy'] = id_lists

  return cats_df

In [None]:
cats_df = add_supercategories(cats_df)
cats_df

Unnamed: 0,id,name,supercategory,supercategory_id,category_hierarchy
0,1,cat,pet,,[1]
1,2,dog,pet,,[2]
2,3,unicorn,pet,,[3]
3,4,siamese,cat,1.0,"[4, 1]"
4,5,persian,cat,1.0,"[5, 1]"
5,6,sphynx,cat,1.0,"[6, 1]"
6,7,poodle,dog,2.0,"[7, 2]"
7,8,french bulldog,dog,2.0,"[8, 2]"
8,9,dalmation,dog,2.0,"[9, 2]"
9,10,labrador,dog,2.0,"[10, 2]"


In [None]:
def get_nd_cats(values, col_name, cats_df):
  df = pd.DataFrame(values, columns = [col_name])
  df['cat_hierarchy'] = df[col_name].map(cats_df.set_index('id').to_dict()['category_hierarchy'])

  return np.array(df['cat_hierarchy'].to_list()).T

In [None]:
p = np.array([9,9,9,8,12,4])
t = np.array([10,13,9,9,9,5])
cat_id = 9

In [None]:
predictions = get_nd_cats(p, 'predictions', cats_df)
predictions

array([[ 9,  9,  9,  8, 12,  4],
       [ 2,  2,  2,  2,  3,  1]])

In [None]:
true_labels = get_nd_cats(t, 'predictions', cats_df)
true_labels

array([[10, 13,  9,  9,  9,  5],
       [ 2,  3,  2,  2,  2,  1]])

In [None]:
metric_id = np.array([cats_df.set_index('id').to_dict()['category_hierarchy'][cat_id]]).T
metric_id

array([[9],
       [2]])

In [None]:
t_i = true_labels==metric_id
p_i = predictions==metric_id

### Backup