# Deepfake Detection Competition - Submission Notebook

This notebook implements the complete inference pipeline for the **딥페이크 범죄 대응을 위한 AI 탐지 모델 경진대회**.

## Competition Details
- **Platform**: AI Factory
- **Metric**: Macro F1-score
- **Input**: Mixed image/video files in `./data/`
- **Output**: `submission.csv` with columns [filename, label]
- **Labels**: 0 = Real, 1 = Fake

## Pipeline Overview
1. Install dependencies
2. Load trained model checkpoint
3. Initialize inference engine
4. Process test data
5. Generate submission.csv
6. Validate format
7. Submit to AI Factory

## 1. Install Dependencies

Install PyTorch 1.13.1+cu118 and required packages for CUDA 11.8 environment.

In [None]:
%%bash
# Install PyTorch with CUDA 11.8 support
pip install -q torch==1.13.1+cu118 torchvision==0.14.1+cu118 --extra-index-url https://download.pytorch.org/whl/cu118

# Install core dependencies
pip install -q timm==0.9.2 opencv-python-headless==4.8.1.78 albumentations==1.3.1
pip install -q pandas==2.0.3 scikit-learn==1.3.0 pyyaml==6.0.1 tqdm==4.66.1

# Install face detection libraries
pip install -q facenet-pytorch==2.5.3 mediapipe==0.10.3

echo "Dependencies installed successfully!"

## 2. Import Modules

Import all required modules and add src to Python path.

In [None]:
import sys
from pathlib import Path

# Add src to path
src_path = Path("./src").resolve()
if str(src_path) not in sys.path:
    sys.path.insert(0, str(src_path))

# Standard library imports
import time
import warnings
from typing import Dict, Any

# Third-party imports
import torch
import pandas as pd
import numpy as np
from tqdm.auto import tqdm

# Project imports
from inference import create_inference_engine

# Suppress warnings
warnings.filterwarnings('ignore')

# Set random seeds for reproducibility
torch.manual_seed(42)
np.random.seed(42)

print(f"PyTorch version: {torch.__version__}")
print(f"CUDA available: {torch.cuda.is_available()}")
if torch.cuda.is_available():
    print(f"CUDA version: {torch.version.cuda}")
    print(f"Device: {torch.cuda.get_device_name(0)}")
    print(f"Device memory: {torch.cuda.get_device_properties(0).total_memory / 1024**3:.1f} GB")

## 3. Configuration

Set up inference configuration parameters.

In [None]:
# Inference configuration
CONFIG = {
    # Model checkpoint
    "checkpoint_path": "checkpoints/best.pth",
    
    # Data paths
    "data_dir": "./data",
    "output_path": "submission.csv",
    
    # Inference settings
    "device": "cuda" if torch.cuda.is_available() else "cpu",
    "use_fp16": True,  # Use mixed precision for faster inference
    "batch_size": 64,  # Batch size for image processing
    "video_frames": 16,  # Number of frames to extract per video
    
    # Face detection
    "face_detector": "mtcnn",  # Options: mtcnn, retinaface, mediapipe
    
    # Verbose output
    "verbose": True,
}

# Display configuration
print("=" * 80)
print("INFERENCE CONFIGURATION")
print("=" * 80)
for key, value in CONFIG.items():
    print(f"  {key}: {value}")
print("=" * 80)

## 4. Validate Paths

Check that required files and directories exist before proceeding.

In [None]:
# Validate checkpoint exists
checkpoint_path = Path(CONFIG["checkpoint_path"])
if not checkpoint_path.exists():
    raise FileNotFoundError(
        f"Model checkpoint not found: {checkpoint_path}\n"
        f"Please ensure the checkpoint file exists before running inference."
    )
print(f"✅ Checkpoint found: {checkpoint_path}")
print(f"   Size: {checkpoint_path.stat().st_size / 1024**2:.1f} MB")

# Validate data directory exists
data_dir = Path(CONFIG["data_dir"])
if not data_dir.exists():
    raise FileNotFoundError(
        f"Data directory not found: {data_dir}\n"
        f"Please ensure the test data is available in ./data/"
    )
print(f"✅ Data directory found: {data_dir}")

# Count files in data directory
image_extensions = {".jpg", ".jpeg", ".png"}
video_extensions = {".mp4", ".avi", ".mov"}

all_files = list(data_dir.glob("*"))
num_images = sum(1 for f in all_files if f.suffix.lower() in image_extensions)
num_videos = sum(1 for f in all_files if f.suffix.lower() in video_extensions)

print(f"\nTest data statistics:")
print(f"  Total files: {len(all_files)}")
print(f"  Images: {num_images}")
print(f"  Videos: {num_videos}")

if len(all_files) == 0:
    raise ValueError("No test files found in data directory!")

## 5. Initialize Inference Engine

Create the inference engine with the trained model and preprocessing pipeline.

In [None]:
print("\nInitializing inference engine...")
print("=" * 80)

try:
    # Create inference engine
    engine = create_inference_engine(
        checkpoint_path=CONFIG["checkpoint_path"],
        device=CONFIG["device"],
        use_fp16=CONFIG["use_fp16"],
        batch_size=CONFIG["batch_size"],
        video_frames=CONFIG["video_frames"],
        face_detector_name=CONFIG["face_detector"],
        verbose=CONFIG["verbose"],
    )
    
    print("\n✅ Inference engine initialized successfully!")
    print("=" * 80)
    
except Exception as e:
    print(f"\n❌ Failed to initialize inference engine: {e}")
    import traceback
    traceback.print_exc()
    raise

## 6. Run Inference

Process all test files and generate predictions.

In [None]:
print("\nRunning inference on test data...")
print("=" * 80)

# Start timer
start_time = time.time()

try:
    # Run inference
    results_df = engine.run_inference(
        data_dir=CONFIG["data_dir"],
        output_path=CONFIG["output_path"],
    )
    
    # Calculate elapsed time
    elapsed_time = time.time() - start_time
    
    # Display results
    print("\n" + "=" * 80)
    print("INFERENCE COMPLETED SUCCESSFULLY!")
    print("=" * 80)
    print(f"Total time: {elapsed_time:.2f} seconds ({elapsed_time/60:.2f} minutes)")
    print(f"Average time per file: {elapsed_time / len(results_df):.3f} seconds")
    print(f"\nPrediction statistics:")
    print(f"  Total predictions: {len(results_df)}")
    print(f"  Real (0): {sum(results_df['label'] == 0)} ({sum(results_df['label'] == 0) / len(results_df) * 100:.1f}%)")
    print(f"  Fake (1): {sum(results_df['label'] == 1)} ({sum(results_df['label'] == 1) / len(results_df) * 100:.1f}%)")
    print(f"\nOutput saved to: {CONFIG['output_path']}")
    print("=" * 80)
    
    # Display sample predictions
    print("\nSample predictions (first 10 rows):")
    print(results_df.head(10).to_string(index=False))
    
except Exception as e:
    print(f"\n❌ Inference failed with error: {e}")
    import traceback
    traceback.print_exc()
    raise

## 7. Validate Submission Format

Verify that submission.csv meets competition requirements.

In [None]:
def validate_submission(csv_path: str) -> bool:
    """Validate submission.csv format.
    
    Args:
        csv_path: Path to submission.csv
        
    Returns:
        True if valid, False otherwise
    """
    print("\nValidating submission format...")
    print("=" * 80)
    
    errors = []
    warnings = []
    
    try:
        # Load submission
        df = pd.read_csv(csv_path)
        
        # Check columns
        expected_columns = ["filename", "label"]
        if list(df.columns) != expected_columns:
            errors.append(
                f"Invalid columns: {df.columns.tolist()}. "
                f"Expected: {expected_columns}"
            )
        
        # Check for null values
        null_counts = df.isnull().sum()
        if null_counts.any():
            for col in null_counts[null_counts > 0].index:
                errors.append(f"Column '{col}' has {null_counts[col]} null values")
        
        # Check label values
        if "label" in df.columns:
            unique_labels = df["label"].unique()
            invalid_labels = [l for l in unique_labels if l not in [0, 1]]
            
            if invalid_labels:
                errors.append(
                    f"Invalid label values: {invalid_labels}. "
                    f"Labels must be 0 (Real) or 1 (Fake)"
                )
            
            # Check label data type
            if not pd.api.types.is_integer_dtype(df["label"]):
                warnings.append(
                    f"Label column has non-integer dtype: {df['label'].dtype}"
                )
        
        # Check filenames
        if "filename" in df.columns:
            # Check for missing extensions
            no_extension = df[~df["filename"].str.contains(".", regex=False)]
            if len(no_extension) > 0:
                errors.append(
                    f"{len(no_extension)} filenames missing extensions"
                )
            
            # Check for duplicates
            duplicates = df[df["filename"].duplicated()]
            if len(duplicates) > 0:
                errors.append(
                    f"{len(duplicates)} duplicate filenames found"
                )
        
        # Check number of rows
        if len(df) == 0:
            errors.append("Submission is empty (0 rows)")
        
        # Print validation results
        if errors:
            print("\n❌ VALIDATION ERRORS:")
            for i, error in enumerate(errors, 1):
                print(f"  {i}. {error}")
        
        if warnings:
            print("\n⚠️  WARNINGS:")
            for i, warning in enumerate(warnings, 1):
                print(f"  {i}. {warning}")
        
        if not errors and not warnings:
            print("\n✅ All validation checks passed!")
        
        print("\nSubmission summary:")
        print(f"  Total rows: {len(df)}")
        if "label" in df.columns:
            print(f"  Real (0): {sum(df['label'] == 0)}")
            print(f"  Fake (1): {sum(df['label'] == 1)}")
        print("=" * 80)
        
        return len(errors) == 0
        
    except Exception as e:
        print(f"\n❌ Validation error: {e}")
        import traceback
        traceback.print_exc()
        return False

# Run validation
is_valid = validate_submission(CONFIG["output_path"])

if not is_valid:
    raise ValueError(
        "Submission validation failed! Please fix the errors before submitting."
    )

## 8. Submit to AI Factory

Submit the results to the competition platform for automated scoring.

**Note**: This cell assumes the `aifactory.score.submit()` function is available in the AI Factory environment. If running locally, this cell will be skipped.

In [None]:
# Submit to AI Factory platform
try:
    import aifactory.score as aif

    print("\nSubmitting to AI Factory...")
    print("=" * 80)

    # Submit using Competition Key for CUDA 11.8
    result = aif.submit(
        model_name="deepfake_detector_efficientnet_b4",
        key="560ffaf9-b456-444f-b4a5-6cadc116dd5e"
    )

    print("\n✅ Submission successful!")
    print(f"Result: {result}")
    print("=" * 80)

except ImportError:
    print("\n⚠️  Not running on AI Factory platform")
    print("   Skipping submission step")
    print("   To test locally, ensure submission.csv is valid")
    print("=" * 80)
except Exception as e:
    print(f"\n❌ Submission failed: {e}")
    import traceback
    traceback.print_exc()
    raise

## Summary

The inference pipeline has completed successfully:

1. ✅ Dependencies installed
2. ✅ Model checkpoint loaded
3. ✅ Inference engine initialized
4. ✅ Test data processed
5. ✅ Submission.csv generated
6. ✅ Format validated
7. ✅ Submitted to AI Factory (if available)

The submission.csv file is ready for evaluation!

---

**딥페이크 범죄 대응을 위한 AI 탐지 모델 경진대회**

For questions or issues, please contact the competition organizers.