# Binocular

We explore a different approach of training per patient by loading both left and right image and train efficient net b0 

We also treat ODIR as multi-label problem instead of multi-class as originally it is officially a multi-label problem
from https://odir2019.grand-challenge.org/dataset/
> Note: one patient may contains one or multiple labels. 

We also want to explore binocular or siamese approach to train our model on both left and right fundus image pair. This has been researched in https://arxiv.org/html/2504.18046v3 DMS-Net:Dual-Modal Multi-Scale Siamese Network for Binocular: Fundus Image Classification Guohao Huo, Zibo Lin, Zitong Wang, Ruiting Dai, Hao Tang paper to work well for fundus disease classification 

There are 3 advantages of use both eyes images instead of one eye image :
- Symmetry: Diseases like Diabetes aren't "accidents" in one eye; they are systemic. If the AI sees it in both, it's a "confirmed" diagnosis.

- Comparison: The left eye acts as a "control" for the right eye. AI can spot a tiny change by noticing how much it differs from the other eye.

- Noise Reduction: Just like your two eyes help you see depth, two images help the AI ignore "camera blur" or "dust" on one lens that might look like a disease.

Install Dependencies

In [1]:
%%capture
!pip install -q kagglehub torch torchvision scikit-learn pandas opencv-python tqdm wandb

Import python libraries

In [2]:

import os
import cv2
import pandas as pd
import numpy as np
import torch
import torch.nn as nn
from torch.utils.data import Dataset, DataLoader
from torchvision import models, transforms
from sklearn.model_selection import train_test_split
from sklearn.metrics import f1_score, classification_report, multilabel_confusion_matrix, accuracy_score
import kagglehub
from tqdm import tqdm # tqdm for progress bars
import wandb


Download DataSet

In [3]:
# 1. Download Dataset (Official ODIR-5K)
path = kagglehub.dataset_download("andrewmvd/ocular-disease-recognition-odir5k")
print("Dataset path:", path)
IMG_DIR = os.path.join(path, "ODIR-5K/ODIR-5K/Training Images")
CSV_PATH = os.path.join(path, "full_df.csv")
IMG_SIZE = 512
df = pd.read_csv(CSV_PATH)

Dataset path: /home/ray/.cache/kagglehub/datasets/andrewmvd/ocular-disease-recognition-odir5k/versions/2


In [4]:
df.head()

Unnamed: 0,ID,Patient Age,Patient Sex,Left-Fundus,Right-Fundus,Left-Diagnostic Keywords,Right-Diagnostic Keywords,N,D,G,C,A,H,M,O,filepath,labels,target,filename
0,0,69,Female,0_left.jpg,0_right.jpg,cataract,normal fundus,0,0,0,1,0,0,0,0,../input/ocular-disease-recognition-odir5k/ODI...,['N'],"[1, 0, 0, 0, 0, 0, 0, 0]",0_right.jpg
1,1,57,Male,1_left.jpg,1_right.jpg,normal fundus,normal fundus,1,0,0,0,0,0,0,0,../input/ocular-disease-recognition-odir5k/ODI...,['N'],"[1, 0, 0, 0, 0, 0, 0, 0]",1_right.jpg
2,2,42,Male,2_left.jpg,2_right.jpg,laser spotÔºåmoderate non proliferative retinopathy,moderate non proliferative retinopathy,0,1,0,0,0,0,0,1,../input/ocular-disease-recognition-odir5k/ODI...,['D'],"[0, 1, 0, 0, 0, 0, 0, 0]",2_right.jpg
3,4,53,Male,4_left.jpg,4_right.jpg,macular epiretinal membrane,mild nonproliferative retinopathy,0,1,0,0,0,0,0,1,../input/ocular-disease-recognition-odir5k/ODI...,['D'],"[0, 1, 0, 0, 0, 0, 0, 0]",4_right.jpg
4,5,50,Female,5_left.jpg,5_right.jpg,moderate non proliferative retinopathy,moderate non proliferative retinopathy,0,1,0,0,0,0,0,0,../input/ocular-disease-recognition-odir5k/ODI...,['D'],"[0, 1, 0, 0, 0, 0, 0, 0]",5_right.jpg


## Ben Graham's Preprocessing

This function implements the Ben Graham Preprocessing 
ref : 
- https://scholar.google.com/citations?view_op=view_citation&hl=en&user=jQkkhlkAAAAJ&citation_for_view=jQkkhlkAAAAJ:sNmaIFBj_lkC
- https://scholar.google.com/citations?user=jQkkhlkAAAAJ&hl=en
- https://arxiv.org/abs/2303.00915 usuyama variation


From https://medium.com/@astronomer.abdurrehman/enhancing-image-quality-for-machine-learning-ben-grahams-preprocessing-e795ad982abe
the method described as followed
<blockquote>

The cv2.GauissanBlur takes an image, (0, 0) tuple automatically chooses a gaussian filter size based on sigmaX value which specifies the intensity of blur. Goal of using gaussian blur here is to reduce the noise and smooth out the fine details.

The addWeighted function blends two images together using specified weights, the -4 here is the beta value which subtracts the blurred image from the original image and 128 is the gamma value that adjusts the brightness so that the image does not become too dark after subtraction.


</blockquote>


In [5]:
import cv2
import numpy as np

def resize_odir_image(img, target_size=512):
    """
    Combines Circular Cropping, Aspect-Ratio Resizing, and Padding.
    """
    # 1. Load and initial crop to remove obvious black dead space
    if img is None: return None
    
    gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
    # Thresholding to find the retina boundary
    _, mask = cv2.threshold(gray, 10, 255, cv2.THRESH_BINARY)
    coords = cv2.findNonZero(mask)
    
    if coords is not None:
        x, y, w, h = cv2.boundingRect(coords)
        img = img[y:y+h, x:x+w]

    # 2. Letterbox Resize (Preserve Aspect Ratio)
    h, w = img.shape[:2]
    scale = target_size / max(h, w)
    new_w, new_h = int(w * scale), int(h * scale)
    
    # Use INTER_AREA for high-quality downsampling
    resized = cv2.resize(img, (new_w, new_h), interpolation=cv2.INTER_AREA)
    
    # 3. Create Square Canvas and Center
    final_img = np.zeros((target_size, target_size, 3), dtype=np.uint8)
    offset_y = (target_size - new_h) // 2
    offset_x = (target_size - new_w) // 2
    final_img[offset_y:offset_y+new_h, offset_x:offset_x+new_w] = resized
    
    return final_img

In [6]:
def usuyama_prep(img):
    """Enhances vessels and normalizes lighting."""
    img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
    # Circular Crop: Find non-black pixels and crop
    img = resize_odir_image(img, target_size=IMG_SIZE)  # Crop to circular region
    blurred = cv2.GaussianBlur(img, (0, 0), 10)
    enhanced = cv2.addWeighted(img, 4, blurred, -4, 128)
    return enhanced

In [7]:
def usuyama_green_prep(img):
    """Extracts the green channel and normalizes."""
    img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
    # Circular Crop: Find non-black pixels and crop
    img = resize_odir_image(img, target_size=IMG_SIZE)  # Crop to circular region
    green_ch = img[:, :, 1]
    blurred = cv2.GaussianBlur(green_ch, (0, 0), 10)
    green_ben = cv2.addWeighted(green_ch, 4, blurred, -4, 128)
    
    # Convert to 3-channel for Model Input
    return cv2.merge([green_ben, green_ben, green_ben])

On the fly image prep caused the training slowdown given the image need to be preprocessed repeatedly each time it is loaded. We speed up the process by performing preprocessing offline once and cache it

In [8]:
def run_offline_prep(df, raw_dir, img_prep_func, save_dir):
    print("üöÄ Starting Offline Pre-processing (Ben Graham)...")
    os.makedirs(save_dir, exist_ok=True)
    all_images = pd.concat([df['Left-Fundus'], df['Right-Fundus']]).unique()
    for img_name in tqdm(all_images):
        save_path = os.path.join(save_dir, img_name)
        load_path = os.path.join(raw_dir, img_name)
        if os.path.exists(save_path):
            print(f"‚úÖ {save_path} already exists. Skipping.")
            continue
        if not os.path.exists(load_path):
            print(f"‚ö†Ô∏è  Warning: {load_path} does not exist. Skipping.")
            continue
        img = cv2.imread(str(load_path))
        # Ben Graham Logic
        enhanced = img_prep_func(img)
        cv2.imwrite(save_path, cv2.cvtColor(enhanced, cv2.COLOR_RGB2BGR))

In [9]:
run_offline_prep(df, IMG_DIR, usuyama_prep, "tmp/usuyama_prep")
run_offline_prep(df, IMG_DIR, usuyama_green_prep, "tmp/usuyama_green_prep")

üöÄ Starting Offline Pre-processing (Ben Graham)...


100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 6716/6716 [04:16<00:00, 26.14it/s]


üöÄ Starting Offline Pre-processing (Ben Graham)...


100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 6716/6716 [04:03<00:00, 27.61it/s]
