# Minimax Decoder - Liquid LFM Benchmark

**Goal**: Run Liquid AI LFM2-350M + Minimax on TruthfulQA

**Run this in parallel with Kaggle (SmolLM2) for faster results!**

**Requirements**:
- Google Colab GPU (T4)
- Google API Key for Gemini (free)

## 1. Setup Environment

In [None]:
# Install dependencies
!pip install -q google-genai pydantic python-dotenv torch transformers accelerate groq huggingface-hub

In [None]:
# Check GPU
import torch
print(f"GPU Available: {torch.cuda.is_available()}")
if torch.cuda.is_available():
    print(f"GPU: {torch.cuda.get_device_name(0)}")
    print(f"Memory: {torch.cuda.get_device_properties(0).total_memory / 1e9:.1f} GB")

In [None]:
# Set your API key
import os
os.environ["GOOGLE_API_KEY"] = "YOUR_GEMINI_API_KEY_HERE"  # <-- REPLACE THIS

## 2. Clone Repository

In [None]:
# Clone the repo
!git clone https://github.com/jd-co/minimax-decoder.git
%cd minimax-decoder

In [None]:
# List available models
!python benchmark.py --list-models

## 3. Quick Test (10 questions)

In [None]:
# Quick test with LFM2-350M
!python benchmark.py -g lfm2-350m-local -a gemini-flash --limit 10

## 4. Full Benchmark - LFM2-350M + Minimax

In [None]:
# Run full TruthfulQA (817 questions)
!python benchmark.py -g lfm2-350m-local -a gemini-flash --limit 817 --output results/lfm2_350m_minimax_full.json

## 5. Vanilla Baseline (LFM without Minimax)

In [None]:
# LFM2-350M vanilla (no verification)
!python benchmark.py -g lfm2-350m-local --vanilla-only --limit 817 --output results/lfm2_350m_vanilla_full.json

## 6. Optional: Larger LFM Model

In [None]:
# LFM2-1.2B vanilla (larger baseline)
!python benchmark.py -g lfm2-1.2b-local --vanilla-only --limit 817 --output results/lfm2_1.2b_vanilla_full.json

## 7. View Results

In [None]:
import json

def load_results(path):
    with open(path) as f:
        return json.load(f)

def print_summary(name, data):
    summary = data.get("summary", {})
    print(f"\n=== {name} ===")
    print(f"Total: {summary.get('total_questions', 'N/A')}")
    print(f"Truthful: {summary.get('truthful_rate', 'N/A')}")
    print(f"Hallucination: {summary.get('hallucination_rate', 'N/A')}")
    print(f"Abstention: {summary.get('abstention_rate', 'N/A')}")

In [None]:
# Load and display LFM results
try:
    minimax = load_results("results/lfm2_350m_minimax_full.json")
    print_summary("LFM2-350M + Minimax", minimax)
except FileNotFoundError:
    print("Minimax results not found yet")

try:
    vanilla = load_results("results/lfm2_350m_vanilla_full.json")
    print_summary("LFM2-350M Vanilla", vanilla)
except FileNotFoundError:
    print("Vanilla results not found yet")

try:
    large = load_results("results/lfm2_1.2b_vanilla_full.json")
    print_summary("LFM2-1.2B Vanilla", large)
except FileNotFoundError:
    print("Large baseline not found yet")

## 8. Download Results

In [None]:
# Zip results for download
!zip -r lfm_benchmark_results.zip results/
print("Download lfm_benchmark_results.zip from Files panel")

In [None]:
# Download directly in Colab
from google.colab import files
files.download('lfm_benchmark_results.zip')