# Nail Disease Classification using MedSigLIP

This notebook fine-tunes Google's MedSigLIP model for nail disease classification. I'm working on a mobile app project and needed a classifier that works well with medical images, so I decided to try MedSigLIP since it's specifically trained on medical data.

**Dataset**: 7 categories of nail conditions (~6,650 images total)
**Goal**: Get at least 90% accuracy on the test set
**Strategy**: Fine-tune last 8 layers with aggressive augmentation

---

## What I'm doing here

- Loading nail disease images directly from Kaggle dataset
- Fine-tuning MedSigLIP (last 8 transformer blocks)
- Using warm-up learning rate + cosine annealing
- Training for 10 epochs with gradient accumulation
- Tracking overfitting carefully with detailed metrics

---

## Dataset Structure

The dataset should be organized like this:
```
/kaggle/input/nail-disease-dataset/
├── train/ (~5,300 images)
│   ├── Acral_Lentiginous_Melanoma/
│   ├── blue_finger/
│   ├── clubbing/
│   ├── Healthy_Nail/
│   ├── Onychogryphosis/
│   ├── pitting/
│   └── psoriasis/
└── test/ (~1,350 images)
    └── (same structure)
```

## Classes

1. **Acral Lentiginous Melanoma** - Dark pigmentation under nail
2. **Blue Finger** - Bluish discoloration
3. **Clubbing** - Rounded, bulging nails
4. **Healthy Nail** - Normal baseline
5. **Onychogryphosis** - Thick, curved nails
6. **Pitting** - Small dents in nail surface
7. **Psoriasis** - Nail changes from psoriasis


## Step 1: Login to Hugging Face

You'll need a Hugging Face token to download MedSigLIP.

1. Get your token here: https://huggingface.co/settings/tokens
2. Request model access: https://huggingface.co/google/medsiglip-448
3. Run the cell below and paste your token

In [None]:
from huggingface_hub import notebook_login

print("Logging into Hugging Face...")
print("Get your token from: https://huggingface.co/settings/tokens\n")

notebook_login()

print("\nLogin successful!")

## Step 2: Install Required Libraries

In [None]:
# Installing everything we need
!pip install -q torch torchvision transformers datasets pillow scikit-learn matplotlib tqdm numpy pandas
!pip install -q open-clip-torch
!pip install -q onnx onnxruntime
!pip install -q huggingface_hub

print("All packages installed successfully!")

## Step 3: Check GPU Availability

In [None]:
import torch
import sys

print(f"Python: {sys.version.split()[0]}")
print(f"PyTorch: {torch.__version__}")
print(f"\nGPU Available: {torch.cuda.is_available()}")

if torch.cuda.is_available():
    print(f"GPU: {torch.cuda.get_device_name(0)}")
    print(f"Memory: {torch.cuda.get_device_properties(0).total_memory / 1e9:.1f} GB")
    print(f"CUDA: {torch.version.cuda}")
else:
    print("WARNING: No GPU found. Training will be slow.")

## Step 4: Setup Dataset Paths

In [None]:
import os
from pathlib import Path

# Paths for Kaggle environment
KAGGLE_DATASET_PATH = '/kaggle/input/nail-disease-dataset-medsiglip'
OUTPUT_PATH = '/kaggle/working/output'

os.makedirs(OUTPUT_PATH, exist_ok=True)

print("Checking dataset location...")

# Verify dataset exists
if not os.path.exists(KAGGLE_DATASET_PATH):
    print(f"ERROR: Dataset not found at {KAGGLE_DATASET_PATH}")
    print("\nMake sure you've added the dataset to this notebook:")
    print("1. Go to notebook settings -> Add data")
    print("2. Search for 'nail-disease-dataset'")
    print("3. Add it and re-run this cell")
    raise FileNotFoundError(f"Dataset not found")

print(f"Dataset found: {KAGGLE_DATASET_PATH}")

# List what's in the input folder
print(f"\nAvailable inputs:")
for item in os.listdir('/kaggle/input'):
    print(f"  - {item}")

# Check dataset structure
print(f"\nDataset structure:")
for item in os.listdir(KAGGLE_DATASET_PATH):
    item_path = os.path.join(KAGGLE_DATASET_PATH, item)
    if os.path.isdir(item_path):
        subdirs = len([d for d in os.listdir(item_path) if os.path.isdir(os.path.join(item_path, d))])
        files = len([f for f in os.listdir(item_path) if os.path.isfile(os.path.join(item_path, f))])
        print(f"  {item}/ - {subdirs} folders, {files} files")

# Set train/test paths
TRAIN_DATA_PATH = os.path.join(KAGGLE_DATASET_PATH, 'train')
TEST_DATA_PATH = os.path.join(KAGGLE_DATASET_PATH, 'test')

if not os.path.exists(TRAIN_DATA_PATH) or not os.path.exists(TEST_DATA_PATH):
    print(f"ERROR: Missing train/ or test/ folders")
    raise FileNotFoundError("Dataset structure incorrect")

print(f"\nAll paths configured:")
print(f"  Train: {TRAIN_DATA_PATH}")
print(f"  Test: {TEST_DATA_PATH}")
print(f"  Output: {OUTPUT_PATH}")