# 🌟2 Class Filter🌟
Previously I have trained `YOLOv5` using `14` class data. As it creates `FP` we can tackle that just simply using a `2 class filter`. Here I'll be using 2 class model (`AUC`:`0.98`) prediction to filter out the `FP` predictions. I used `EfficientNetB6` to generate these predictions.
It should increase the score as `FP` would be reduced significantly
* [14 class train](https://www.kaggle.com/awsaf49/vinbigdata-cxr-ad-yolov5-14-class-train)
* [14 class infer](https://www.kaggle.com/awsaf49/vinbigdata-cxr-ad-yolov5-14-class-infer)

# Version

* `v48`: yolov5x_fold4_finetune768
* `v47`: yolov5x_fold3_finetune768
* `v46`: yolov5x_fold2_finetune768
* `v45`: yolov5x_fold1_finetune768
* `v44`: yolov5x_fold0_finetune768
* `v43`: vfnet_r101_fold4_v3_epoch18
* `v42`: vfnet_r101_fold3_v1_epoch25
* `v41`: vfnet_r101_fold2_v4_epoch18
* `v40`: vfnet_r101_fold1_v4_epoch18
* `v39`: vfnet_r101_fold0_v3_epoch4
* `v38`: vfnet_r101_8020_v1_epoch18
* Threshold 0.07
* `v36`: cascade_rcnn_x101_fold0_v1_epoch16
* `v34`: vfnet_r101_fold1_v4_epoch18
* `v33`: vfnet_r101_fold2_v4_epoch18
* `v32`: vfnet_r101_fold4_v3_epoch18
* `v31`: vfnet_r101_fold0_v3_epoch4
* `v30`: vfnet_r101_fold3_v1_epoch25
* `v29`: vfnet_r101_8020_v1_epoch18
* `v28`: vfnet_r101_fold1_v1_epoch24 conf_0.0
* `v27`: vfnet_r101_fold0_v2_epoch4 conf_0.0
* `v26`: vfnet_r101_v2 conf_0.0
* `v25`: yolov5x_fold4_finetune_768_tta conf_0.01
* `v24`: yolov5x_fold3_finetune_768_tta conf_0.01
* `v23`: yolov5x_fold2_finetune_768_tta conf_0.01
* `v22`: yolov5x_fold1_finetune_768_tta conf_0.01
* `v21`: yolov5x_fold0_finetune_768_tta conf_0.01
* `v20`: yolov5x_v4.0_fold4_finetune_512_tta conf_0.1
* `v19`: yolov5x_v4.0_fold2_finetune_512_tta conf_0.01
* `v18`: yolov5x_v4.0_fold1_finetune_512_tta conf_0.01
* `v17`: yolov5x_v4.0_fold4_finetune_512_tta conf_0.01
* `v16`: yolov5x_v4.0_fold0_finetune_512_tta conf_0.01
* `v15`: ori yolov5x_fold2 conf_0.01
* `v14`: ori yolov5x_fold1 conf_0.01
* `v13`: ori yolov5x_fold4 conf_0.01

# Loading Package

In [1]:
import pandas as pd
import numpy as np
from glob import glob
import shutil

# Threshold For `2 Class Filter`
**NB**: The threshold was chosen arbitarily

In [2]:
thr = 0.07

# Loading csv

In [3]:
# yolov5x_fold4_finetune768
# pred_14cls = pd.read_csv('../input/fork2-vinbigdata-cxr-ad-yolov5-v4-0-infer/submission.csv')
pred_14cls = pd.read_csv('../input/vinbigdata-final-models-infer/yolov5x_fold4_finetune768_submission.csv')
pred_2cls = pd.read_csv('../input/vinbigdata-2class-prediction/2-cls test pred.csv')

In [4]:
pred_14cls.head()

Unnamed: 0,image_id,PredictionString
0,83caa8a85e03606cf57e49147d7ac569,11 0.013 1208 453 1474 534 13 0.018 513 1080 7...
1,7550347fa2bb96c2354a3716dfa3a69c,13 0.011 1956 1124 2084 1194 5 0.012 332 1898 ...
2,74b23792db329cff5843e36efb8aa65a,11 0.012 2314 2398 2420 2565 10 0.012 2314 239...
3,94568a546be103177cb582d3e91cd2d8,11 0.011 732 721 921 793 11 0.011 678 679 886 ...
4,6da36354fc904b63bc03eb3884e0c35c,11 0.01 1323 325 1484 399 13 0.011 1492 838 17...


In [5]:
pred_2cls.head()

Unnamed: 0,image_id,target
0,002a34c58c5b758217ed1f584ccbcfe9,0.013326
1,004f33259ee4aef671c2b95d54e4be68,0.037235
2,008bdde2af2462e86fd373a445d0f4cd,0.9397
3,009bc039326338823ca3aa84381f17f1,0.123799
4,00a2145de1886cb9eb88869c85d74080,0.654006


In [6]:
pred = pd.merge(pred_14cls, pred_2cls, on = 'image_id', how = 'left')
pred.head()

Unnamed: 0,image_id,PredictionString,target
0,83caa8a85e03606cf57e49147d7ac569,11 0.013 1208 453 1474 534 13 0.018 513 1080 7...,0.970583
1,7550347fa2bb96c2354a3716dfa3a69c,13 0.011 1956 1124 2084 1194 5 0.012 332 1898 ...,0.039873
2,74b23792db329cff5843e36efb8aa65a,11 0.012 2314 2398 2420 2565 10 0.012 2314 239...,0.01024
3,94568a546be103177cb582d3e91cd2d8,11 0.011 732 721 921 793 11 0.011 678 679 886 ...,0.065679
4,6da36354fc904b63bc03eb3884e0c35c,11 0.01 1323 325 1484 399 13 0.011 1492 838 17...,0.838772


# Before 2 Class Filter Number of `No Finding`

In [7]:
pred['PredictionString'].value_counts().iloc[[0]]

14 1 0 0 1 1    11
Name: PredictionString, dtype: int64

# 2 Class Filter

In [8]:
def filter_2cls(row, thr=thr):
    if row['target']<thr:
        row['PredictionString'] = '14 1 0 0 1 1'
    return row

In [9]:
sub = pred.apply(filter_2cls, axis=1)
sub.head()

Unnamed: 0,image_id,PredictionString,target
0,83caa8a85e03606cf57e49147d7ac569,11 0.013 1208 453 1474 534 13 0.018 513 1080 7...,0.970583
1,7550347fa2bb96c2354a3716dfa3a69c,14 1 0 0 1 1,0.039873
2,74b23792db329cff5843e36efb8aa65a,14 1 0 0 1 1,0.01024
3,94568a546be103177cb582d3e91cd2d8,14 1 0 0 1 1,0.065679
4,6da36354fc904b63bc03eb3884e0c35c,11 0.01 1323 325 1484 399 13 0.011 1492 838 17...,0.838772


# After 2 Class Filter Number of `No Finding`

In [10]:
sub['PredictionString'].value_counts().iloc[[0]]

14 1 0 0 1 1    1863
Name: PredictionString, dtype: int64

As we can see from above that applying `2 class filter` Number of `'No Finding'`increases significanly. **[614->2010]**

In [11]:
sub[['image_id', 'PredictionString']].to_csv('yolov5x_fold4_finetune768_2cls_filter_0.07_submission.csv',index = False)

# Result
As we can see applying the `2 class filter` improves the result significantly, from `0.154` to `0.201`. But bear in mind that choosing the `thershold` could be a bit `tricky`.

# Please Upvote If You Have Found This Notebook Useful 😃