# Ensemble Results
In this notebook, we will ensemble the results from two models when we already have the predictions from those two models. We also demonstrate how to pass dimensions of images as argument to the model ensemble function

In [2]:
# ensemble_boxes library is required for ensembling the results of the two models
!pip install ensemble_boxes

Collecting ensemble_boxes
  Downloading ensemble_boxes-1.0.9-py3-none-any.whl (23 kB)
Installing collected packages: ensemble-boxes
Successfully installed ensemble-boxes-1.0.9


In [3]:
import cv2
import numpy as np
import pandas as pd
from tqdm import tqdm
import sys

In [4]:
sys.path.append('../src')    # Add the source directory to the PYTHONPATH. This allows to import local functions and modules.

In [5]:
from gdsc_util import PROJECT_DIR, load_sections_df
from merge_ensemble_results import generate_test_results

In [6]:
data_folder = str(PROJECT_DIR / 'data')

In [7]:
# Paths for model prediction files
with open(f'{PROJECT_DIR}/experiment_frcnn_5k_r101_epoch_24.txt', 'r') as f:
    experiment_name_frcnn = f.read()
    
with open(f'{PROJECT_DIR}/experiment_crcnn_5k_r101_epoch_24.txt', 'r') as f:
    experiment_name_crcnn = f.read()
    
frcnn_result_path = f'{data_folder}/{experiment_name_frcnn}/results_train_epoch_24.csv'
crcnn_result_path = f'{data_folder}/{experiment_name_crcnn}/results_train_epoch_24.csv'

In [8]:
# Load sections dataframe
train_path = f'{data_folder}/gdsc_train.csv'
sections_df = load_sections_df(train_path)
file_names = sections_df['file_name'].unique()
section_df_dims = None

Since we already have the dimensions for gdsc_train images, we can use those dimensions without needing to get the dimensions again from the images. We have given an option to pass these dimensions as parameter to our merging function.

If we don't have dimensions for the images, then we can skip the next cell and pass the dimensions as None. The function will load each image and get the dimensions automatically.

In [9]:
# Get dimension of each file in the format {'filename' : {'height': height_of_image, 'width': width_of_image}}
section_df_dims = (
    sections_df[["file_name", "height", "width"]]
    .drop_duplicates(subset=["file_name"])
    .set_index("file_name")
    .to_dict(orient="index")
)

In [10]:
# Ensemble the results of the two models
ensemble_df = generate_test_results(frcnn_result_path, crcnn_result_path, file_names, section_df_dims)

Merging Results from the two models


100%|██████████| 994/994 [00:22<00:00, 43.40it/s]


In [11]:
ensemble_df.to_csv(f'{data_folder}/frcnn_crcnn_ensemble.csv', sep=';')