# AdaFace Fine-tuning Guide

This guide will walk you through the process of fine-tuning a pretrained AdaFace model, organizing your data, and creating an inference pipeline.

## 1. Data Organization for Fine-tuning

For fine-tuning, you'll need to organize your face images in a directory structure where each subfolder represents a different person/identity:



In [None]:
dataset_root/
├── person_1/
│   ├── image1.jpg
│   ├── image2.jpg
│   └── ...
├── person_2/
│   ├── image1.jpg
│   ├── image2.jpg
│   └── ...
└── ...



### Data Preparation Steps

1. **Collect aligned face images**: Ensure all face images are aligned and cropped to 112x112 pixels
2. **Organize by identity**: Create a folder for each person with their face images
3. **Validation data**: Create a separate folder structure for validation if needed



In [None]:
# Example directory structure
my_face_dataset/
├── imgs/                  # Training set
│   ├── person_1/
│   ├── person_2/
│   └── ...
└── val_images/            # Optional validation images
    ├── agedb_30.bin       # Optional validation binaries
    ├── lfw.bin
    └── ...



## 2. Fine-tuning Script

Create a script `finetune.py` with the following content:



In [None]:
import os
import torch
import argparse
from datetime import datetime
import sys
sys.path.insert(0, os.path.dirname(os.getcwd()))

def get_args():
    parser = argparse.ArgumentParser(description='AdaFace fine-tuning')
    
    # Data parameters
    parser.add_argument('--data_root', type=str, default='./data', help='Path to the data root directory')
    parser.add_argument('--train_data_path', type=str, default='my_face_dataset/imgs', help='Path to training data relative to data_root')
    parser.add_argument('--val_data_path', type=str, default='my_face_dataset', help='Path to validation data relative to data_root')
    
    # Model parameters
    parser.add_argument('--arch', type=str, default='ir_50', choices=['ir_18', 'ir_34', 'ir_50', 'ir_101', 'ir_200'])
    parser.add_argument('--pretrained_model', type=str, required=True, help='Path to the pretrained model')
    
    # Training parameters
    parser.add_argument('--epochs', type=int, default=10, help='Number of fine-tuning epochs')
    parser.add_argument('--batch_size', type=int, default=64)
    parser.add_argument('--output_dir', type=str, default='./finetuned_models')
    parser.add_argument('--lr', type=float, default=0.001, help='Lower learning rate for fine-tuning')
    parser.add_argument('--gpus', type=int, default=1, help='Number of GPUs to use')
    parser.add_argument('--use_16bit', action='store_true', help='Use 16-bit precision')
    
    # AdaFace specific parameters
    parser.add_argument('--head', type=str, default='adaface')
    parser.add_argument('--m', type=float, default=0.4, help='AdaFace margin parameter')
    parser.add_argument('--h', type=float, default=0.333, help='AdaFace h parameter')
    parser.add_argument('--s', type=float, default=64.0, help='AdaFace scale parameter')
    parser.add_argument('--t_alpha', type=float, default=0.01)
    
    # Augmentation
    parser.add_argument('--low_res_augmentation_prob', type=float, default=0.2)
    parser.add_argument('--crop_augmentation_prob', type=float, default=0.2)
    parser.add_argument('--photometric_augmentation_prob', type=float, default=0.2)
    
    # Other parameters
    parser.add_argument('--distributed_backend', type=str, default='dp')
    parser.add_argument('--num_workers', type=int, default=4)
    parser.add_argument('--seed', type=int, default=42)
    parser.add_argument('--save_all_models', action='store_true', help='Save all checkpoints')
    parser.add_argument('--momentum', type=float, default=0.9)
    parser.add_argument('--lr_milestones', type=str, default='5,8', help='Learning rate milestones')
    parser.add_argument('--lr_gamma', type=float, default=0.1, help='Learning rate decay rate')
    parser.add_argument('--accumulate_grad_batches', type=int, default=1)
    parser.add_argument('--fast_dev_run', action='store_true')
    parser.add_argument('--test_run', action='store_true')
    parser.add_argument('--evaluate', action='store_false')
    parser.add_argument('--use_wandb', action='store_true')
    parser.add_argument('--use_mxrecord', action='store_false')
    
    return parser.parse_args()

if __name__ == '__main__':
    args = get_args()
    
    # Create a timestamped output directory
    timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
    args.output_dir = os.path.join(args.output_dir, f"{args.arch}_{timestamp}")
    os.makedirs(args.output_dir, exist_ok=True)
    
    # Set pretrained model path for loading
    args.start_from_model_statedict = args.pretrained_model
    
    # Convert lr_milestones from string to list of ints
    args.lr_milestones = [int(x) for x in args.lr_milestones.split(',')]
    
    # Import main after argument parsing to avoid circular imports
    from main import main
    main(args)



## 3. Run Fine-tuning

To fine-tune a pretrained model:



In [None]:
python finetune.py \
    --data_root /path/to/your/data \
    --train_data_path my_face_dataset/imgs \
    --val_data_path my_face_dataset \
    --pretrained_model ./pretrained/adaface_ir50_ms1mv2.ckpt \
    --arch ir_50 \
    --batch_size 64 \
    --epochs 10 \
    --lr 0.001 \
    --gpus 1



## 4. Inference Pipeline

Create a file `embedding_pipeline.py` with the following code to load your fine-tuned model and extract face embeddings:



In [None]:
import os
import torch
import numpy as np
import cv2
from PIL import Image
import matplotlib.pyplot as plt
import net
from face_alignment import align
import argparse

class AdaFaceInference:
    def __init__(self, model_path, architecture='ir_50', device='cuda:0'):
        """
        Initialize the AdaFace inference pipeline
        
        Args:
            model_path: Path to the fine-tuned model checkpoint
            architecture: Model architecture, one of ['ir_18', 'ir_34', 'ir_50', 'ir_101']
            device: Device to run inference on ('cuda:0', 'cpu')
        """
        self.device = torch.device(device)
        
        # Build model
        self.model = net.build_model(architecture)
        
        # Load state dict
        checkpoint = torch.load(model_path, map_location=self.device)
        if 'state_dict' in checkpoint:
            state_dict = checkpoint['state_dict']
            # Remove 'model.' prefix if it exists
            model_state_dict = {key.replace('model.', ''):val for key, val in state_dict.items() if key.startswith('model.')}
            self.model.load_state_dict(model_state_dict)
        else:
            self.model.load_state_dict(checkpoint)
            
        self.model.to(self.device)
        self.model.eval()
        
    def preprocess_image(self, image_path):
        """
        Align face and preprocess the image
        
        Args:
            image_path: Path to the input image
            
        Returns:
            Preprocessed tensor ready for the model
        """
        # Get aligned face
        aligned_rgb_img = align.get_aligned_face(image_path)
        if aligned_rgb_img is None:
            raise ValueError(f"Could not detect face in image: {image_path}")
            
        # Convert to BGR and normalize
        bgr_img = aligned_rgb_img[:, :, ::-1]  # RGB to BGR
        bgr_tensor = torch.tensor([(bgr_img / 255. - 0.5) / 0.5]).float()
        
        return bgr_tensor, aligned_rgb_img
    
    def get_embedding(self, image_path, return_aligned_face=False):
        """
        Extract embedding from a face image
        
        Args:
            image_path: Path to the input image
            return_aligned_face: Whether to return the aligned face image
            
        Returns:
            Face embedding vector, and optionally the aligned face
        """
        # Preprocess image
        tensor, aligned_face = self.preprocess_image(image_path)
        
        # Get embedding
        with torch.no_grad():
            tensor = tensor.to(self.device)
            embedding = self.model(tensor)
            
            # If model returns tuple (embedding, norm)
            if isinstance(embedding, tuple):
                embedding = embedding[0]
                
            # Normalize embedding
            embedding = torch.nn.functional.normalize(embedding, p=2, dim=1)
            embedding = embedding.cpu().numpy().flatten()
        
        if return_aligned_face:
            return embedding, aligned_face
        return embedding
    
    def compare_faces(self, image_path1, image_path2):
        """
        Compare two faces and return similarity score
        
        Args:
            image_path1: Path to the first image
            image_path2: Path to the second image
            
        Returns:
            Cosine similarity score between the two face embeddings
        """
        emb1 = self.get_embedding(image_path1)
        emb2 = self.get_embedding(image_path2)
        
        similarity = np.dot(emb1, emb2)
        return similarity
        
    def visualize_comparison(self, image_path1, image_path2, title=None):
        """
        Visualize two face images and their similarity
        
        Args:
            image_path1: Path to the first image
            image_path2: Path to the second image
            title: Optional title for the plot
            
        Returns:
            Matplotlib figure
        """
        emb1, aligned1 = self.get_embedding(image_path1, return_aligned_face=True)
        emb2, aligned2 = self.get_embedding(image_path2, return_aligned_face=True)
        
        similarity = np.dot(emb1, emb2)
        
        fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(10, 5))
        ax1.imshow(aligned1)
        ax1.set_title("Image 1")
        ax1.axis('off')
        
        ax2.imshow(aligned2)
        ax2.set_title("Image 2")
        ax2.axis('off')
        
        plt_title = f"Similarity: {similarity:.4f}"
        if title:
            plt_title = f"{title}\n{plt_title}"
        fig.suptitle(plt_title)
        
        return fig, similarity

if __name__ == "__main__":
    parser = argparse.ArgumentParser(description='AdaFace inference')
    parser.add_argument('--model_path', type=str, required=True, help='Path to fine-tuned model checkpoint')
    parser.add_argument('--arch', type=str, default='ir_50', choices=['ir_18', 'ir_34', 'ir_50', 'ir_101'])
    parser.add_argument('--image', type=str, required=True, help='Path to input image')
    parser.add_argument('--compare_with', type=str, help='Path to image to compare with (optional)')
    parser.add_argument('--device', type=str, default='cuda:0', help='Device to run inference on')
    args = parser.parse_args()
    
    # Initialize inference pipeline
    inference = AdaFaceInference(model_path=args.model_path, architecture=args.arch, device=args.device)
    
    # Get embedding
    if args.compare_with:
        # Compare two images
        fig, similarity = inference.visualize_comparison(args.image, args.compare_with)
        plt.show()
        print(f"Similarity score: {similarity:.4f}")
    else:
        # Get embedding for a single image
        embedding, aligned_face = inference.get_embedding(args.image, return_aligned_face=True)
        
        # Display the aligned face and embedding
        plt.imshow(aligned_face)
        plt.title("Aligned Face")
        plt.axis('off')
        plt.show()
        
        print("Face embedding (first 10 values):", embedding[:10])
        print("Embedding shape:", embedding.shape)
        print("Embedding L2 norm:", np.linalg.norm(embedding))



## 5. Using the Inference Pipeline



In [None]:
# Get embedding for a single image
python embedding_pipeline.py --model_path ./finetuned_models/ir_50_20250331_121530/epoch=9-val_acc=0.98.ckpt --image path/to/face_image.jpg

# Compare two face images
python embedding_pipeline.py --model_path ./finetuned_models/ir_50_20250331_121530/epoch=9-val_acc=0.98.ckpt --image path/to/face1.jpg --compare_with path/to/face2.jpg



## Additional Tips for Fine-tuning

1. **Learning Rate**: Use a smaller learning rate than for training from scratch (0.001 or lower)
2. **Epochs**: Fewer epochs are typically needed for fine-tuning (5-10)
3. **Data Augmentation**: Keep augmentation probabilities lower for fine-tuning
4. **Batch Size**: Adjust based on your GPU memory

### Example Fine-tuning Script with Lower Learning Rate



In [None]:
python finetune.py \
    --data_root /path/to/data \
    --train_data_path my_face_dataset/imgs \
    --val_data_path my_face_dataset \
    --pretrained_model ./pretrained/adaface_ir50_ms1mv2.ckpt \
    --arch ir_50 \
    --batch_size 64 \
    --lr 0.0005 \
    --lr_milestones 3,6,8 \
    --epochs 10 \
    --gpus 1 \
    --low_res_augmentation_prob 0.1 \
    --crop_augmentation_prob 0.1 \
    --photometric_augmentation_prob 0.1



Make sure all your face images are aligned and cropped to 112x112 pixels. If not, you can use the MTCNN alignment process included in the AdaFace repo to preprocess your dataset before fine-tuning.