# Zero Shot Analysis

Glip (a zero shot model) performed poorly on RF100, since it uses language to identify object instances we expect to fail quite heavily. RF100 was crown sourced, it means there are class names that are either very domain specific (e.g. bacteria names) or not descriptive (e.g. different types of damaged vehicles named from `'0'` to `'10'`). 

In this notebook, we will find out which datasets and classes made Glip to perform bad, did something and did well based on the reported mAP from our experiments.

Let's start by loading some packages

In [17]:
import pandas as pd
from pathlib import Path
from typing import Dict, List



`datasets_stats` contains the results for each dataset

In [3]:
df = pd.read_csv("../metadata/datasets_stats.csv", index_col=0)

df.head()

Unnamed: 0_level_0,category,train,test,valid,size,num_classes,yolov5,yolov7,glip,num_datasets
dataset,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
aerial-pool,aerial,673,96,177,946,7.0,0.513,0.791,0.013,1
secondary-chains,aerial,103,16,43,162,1.0,0.341,0.312,0.0,1
aerial-spheres,aerial,318,51,104,473,6.0,0.993,0.539,0.0,1
soccer-players-5fuqs,aerial,114,16,33,163,3.0,0.66,0.399,0.065,1
weed-crop-aerial,aerial,823,118,235,1176,2.0,0.82,0.615,0.027,1


Then, we need a to load the labels name, we will create a dict of `<dataset_name>: [<class_0>, <class_1>]`

In [19]:
import json

def get_dataset2classes(annotations_file_name: Path):
    with annotations_file_name.open("r") as f:
        data = json.load(f)
        dataset2classes = { item["name"] : list(item["classes"].keys()) for item in data}
        return dataset2classes

def filter_dataset2classes_by_datasets(datasets: List[str], dataset2classes: Dict[str, List[str]]) -> List[str]:
    filtered = {}
    for dataset in datasets:
        filtered[dataset] = dataset2classes[dataset]
    return filtered

dataset2classes = get_dataset2classes(Path('../metadata/labels_names.json'))

dataset2classes

{'hand-gestures-jps7z': ['0',
  '1',
  '10',
  '11',
  '12',
  '13',
  '2',
  '3',
  '4',
  '5',
  '6',
  '7',
  '8',
  '9'],
 'smoke-uvylj': ['smoke'],
 'wall-damage': ['Minorrotation', 'Moderaterotation', 'Severerotation'],
 'corrosion-bi3q3': ['Slippage', 'corrosion', 'crack'],
 'excavators-czvg9': ['EXCAVATORS', 'dump truck', 'wheel loader'],
 'chess-pieces-mjzgj': ['bishop',
  'black-bishop',
  'black-king',
  'black-knight',
  'black-pawn',
  'black-queen',
  'black-rook',
  'white-bishop',
  'white-king',
  'white-knight',
  'white-pawn',
  'white-queen',
  'white-rook'],
 'road-signs-6ih4y': ['bus_stop',
  'do_not_enter',
  'do_not_stop',
  'do_not_turn_l',
  'do_not_turn_r',
  'do_not_u_turn',
  'enter_left_lane',
  'green_light',
  'left_right_lane',
  'no_parking',
  'parking',
  'ped_crossing',
  'ped_zebra_cross',
  'railway_crossing',
  'red_light',
  'stop',
  't_intersection_l',
  'traffic_light',
  'u_turn',
  'yellow_light'],
 'street-work': ['Cone',
  'Coverall',
  '

### Where GLIP did bad

Here we look at datasets, class names where GLIP mAP was less than `0.01`. At any point in time feel free to head over [`rf100`](https://universe.roboflow.com/roboflow-100) hosted on our platform and search for the dataset

In [22]:
filter_dataset2classes_by_datasets(df[df["glip"] < 0.01].index, dataset2classes)

{'secondary-chains': ['chain'],
 'aerial-spheres': ['green_sphero',
  'orange-sphero',
  'orange_sphero',
  'purple_sphero',
  'red_sphero',
  'yellow_sphero'],
 'cloud-types': ['Fish', 'Flower', 'Gravel', 'Sugar'],
 'team-fight-tactics': ['Akali',
  'Blitzcrank',
  'Braum',
  'Caitlyn',
  'Camille',
  'Cho-Gath',
  'Darius',
  'Dr- Mundo',
  'Ekko',
  'Ezreal',
  'Fiora',
  'Galio',
  'Gankplank',
  'Garen',
  'Graves',
  'Heimerdinger',
  'Illaoi',
  'Janna',
  'Jayce',
  'Jhin',
  'Jinx',
  'Kai-Sa',
  'Kassadin',
  'Katarina',
  'Kog-Maw',
  'Leona',
  'Lissandra',
  'Lulu',
  'Lux',
  'Malzahar',
  'Miss Fortune',
  'Orianna',
  'Poppy',
  'Quinn',
  'Samira',
  'Seraphine',
  'Shaco',
  'Singed',
  'Sion',
  'Swain',
  'Tahm Kench',
  'Talon',
  'Taric',
  'Tristana',
  'Trundle',
  'Twisted Fate',
  'Twitch',
  'Urgot',
  'Veigar',
  'Vex',
  'Vi',
  'Viktor',
  'Warwick',
  'Yone',
  'Yuumi',
  'Zac',
  'Ziggs',
  'Zilean',
  'Zyra'],
 'robomasters-285km': ['armor',
  'base',
 

Here we can notice we have classes for domain specific datasets, for example
- [`cloud-types`](https://universe.roboflow.com/roboflow-100/cloud-types) is a dataset for cloud types detections, the clouds types are named `'Fish', 'Flower', 'Gravel', 'Sugar'`. Obliously, the model completely failed since it has no idea a cloud type may be a `Fish`
- [`sedimentary-features-9eosf`](https://universe.roboflow.com/roboflow-100/sedimentary-features-9eosf), similar to before, the class names are `'Cross bedding', 'Low angle', 'Massive', 'Parallel lamination', 'mud drape'` very domain specific
- [`team-fight-tactics`](https://universe.roboflow.com/roboflow-100/team-fight-tactics) contains different annotated instances referring to gaming strategy on a popular videogame

Another reason is that the class names are arbitrarely, e.g. `'0', '1'` or letters. For this category, is impossible for `GLIP` to identify the objects based on it's languages capabilities

- [`paragraphs-co84b`](https://universe.roboflow.com/roboflow-100/paragraphs-co84b) uses `'-', 'g', 'g1', 'g3', 'h', 'm', 'n'` has class names
- [`gynecology-mri`](https://universe.roboflow.com/roboflow-100/gynecology-mri) uses `'6W', '7W', 'EH'`

### Where GLIP did something

Here we look at datasets, class names where GLIP mAP was greater than `0.1` but less than `0.3`. These datasets have class names that are somehow descriptive but not descriptive enough.

In [37]:
filter_dataset2classes_by_datasets(df[(df["glip"] > 0.1) & (df["glip"] < 0.3)].index, dataset2classes)

{'farcry6-videogame': ['assassin',
  'atv',
  'car',
  'gun',
  'gun menu',
  'healthbar',
  'horse',
  'hud',
  'map',
  'person',
  'surroundings'],
 'csgo-videogame': ['CT', 'T'],
 'halo-infinite-angel-videogame': ['enemy',
  'enemy-head',
  'friendly',
  'friendly-head'],
 'bccd-ouzjz': ['Platelets', 'RBC', 'WBC'],
 'aquarium-qlnqy': ['fish',
  'jellyfish',
  'penguin',
  'puffin',
  'shark',
  'starfish',
  'stingray'],
 'excavators-czvg9': ['EXCAVATORS', 'dump truck', 'wheel loader'],
 'street-work': ['Cone',
  'Coverall',
  'Face_Shield',
  'Gloves',
  'Goggles',
  'Head',
  'Helmet',
  'Mask',
  'No glasses',
  'No gloves',
  'Person'],
 'construction-safety-gsnvb': ['helmet',
  'no-helmet',
  'no-vest',
  'person',
  'vest'],
 'washroom-rf1fa': ['bathtub',
  'c',
  'geyser',
  'mirror',
  'showerhead',
  'sink',
  'toilet',
  'towel',
  'washbasin',
  'wc'],
 'pills-sxdht': ['Cipro 500',
  'Ibuphil 600 mg',
  'Ibuphil Cold 400-60',
  'Xyzall 5mg',
  'blue',
  'pink',
  'red',


A few examples:
- [`aquarium-qlnqy`](https://universe.roboflow.com/roboflow-100/aquarium-qlnqy) has very descriptive names `'fish', 'jellyfish', 'penguin', 'puffin', 'shark', 'starfish', 'stingray'`
- [`cotton-20xz5`](https://universe.roboflow.com/roboflow-100/cotton-plant-disease) has very specific cotton types as class names `'G-arboreum', 'G-barbadense', 'G-herbaceum', 'G-hirsitum'`. Here probably GLIP knows that they are cotton types but he's not able to distingush among them

### Where GLIP did good 

Here we look at datasets, class names where GLIP mAP was greater than `0.3`. These datasets have class names that are descriptive and the model knows about them

In [39]:
filter_dataset2classes_by_datasets(df[df["glip"] > 0.3].index, dataset2classes)

{'avatar-recognition-nuexe': ['Character'],
 'underwater-pipes-4ng4t': ['pipe'],
 'thermal-dogs-and-people-x6ejw': ['dog', 'person'],
 'smoke-uvylj': ['smoke'],
 'peanuts-sd4kf': ['with mold', 'without mold'],
 'furniture-ngpea': ['Chair', 'Sofa', 'Table'],
 'trail-camera': ['Deer', 'Hog']}

There is not much to add, all the above classes (e.g. `'pipe', 'dog', 'chair`) are well known english words, thus the model was correctly able to find the instances associated with them