# Model Inference Tutorial: Using Your Trained AI Model 🔮

Welcome to the final chapter of our tutorial series! You've successfully trained a powerful translation efficiency prediction model, and now it's time to **put it to work** making predictions on new mRNA sequences.

> 🎯 **Learning Objectives**: Master model inference techniques, understand result interpretation, and learn to apply trained models to real biological questions

---

## From Training to Prediction: The Complete Journey 🏁

You've completed an incredible AI journey! Let's review what you've accomplished:

```
1. 📊 Data Preparation    → ✅ Loaded and processed genomic datasets
2. 🤖 Model Initialization → ✅ Set up pre-trained foundation models  
3. 🏋️ Model Training      → ✅ Fine-tuned for translation efficiency
4. 🔮 Model Inference     → 🎯 Making predictions (We are here!)
```

**In this tutorial, you will learn:**
- 🔮 **Model Loading**: How to load your trained model
- 📊 **Batch Inference**: Processing multiple sequences efficiently
- 🧬 **Result Interpretation**: Understanding what predictions mean biologically
- 🎯 **Real-world Application**: Using your model for research questions

> 💡 **Why Inference Matters**: This is where your trained model becomes a powerful research tool that can analyze thousands of sequences and provide biological insights!

## Environment Setup and Model Loading 🛠️

Let's start by setting up our environment and loading the model we trained in the previous tutorial.

In [None]:
import torch
import warnings
from omnigenbench import (
    ModelHub,
    OmniTokenizer,
    OmniModelForSequenceClassification,
)

warnings.filterwarnings('ignore')
print("✅ Libraries imported successfully!")
print(f"🔥 PyTorch version: {torch.__version__}")
print(f"🎯 CUDA available: {torch.cuda.is_available()}")

### 📦 Loading Your Trained Model

We'll load the model that we trained in the previous tutorial. The model was saved as "ogb_te_finetuned" to match the complete tutorial exactly.

In [None]:
# Load the trained model - exactly as in complete tutorial
print("📦 Loading trained model...")
inference_model = ModelHub.load("yangheng/ogb_te_finetuned")
print("✅ Model loaded successfully!")
print(f"🎯 Ready for translation efficiency prediction!")

## Single Sequence Inference 🧬

Let's start with predicting translation efficiency for individual mRNA sequences. This is the most common use case for biological research.

In [None]:
# Test sequences - exactly as in complete tutorial
sample_sequences = {
    "Optimized sequence": "AAACCAACAAAATGCAGTAGAAGTACTCTCGAGCTATAGTCGCGACGTGCTGCCCCGCAGGAGTACAGTAGTAGTACAACGTAAGCGGGAGCAACAGACTCCCCCCCTGCAACCCACTGTGCCTGTGCCCTCGACGCGTCTCCGTCGCTTTGGCAAATGTCACGTACATATTACCGTCTCAGGCTCTCAGCCATGCTCCCTACCACCCCTGCAGCGAAGCAAAAGCCACGCACGCGGCGCCTGACATGTAACAGGACTAGACCATCTTGTTCATTTCCCGCACCCCCTCCTCTCCTCTTCCTCCATCTGCCTCTTTAAAACAGTAAAAATAACCGTGCATCCCCTGGGCAAAATCTCTCCCATACATACACTACAGCGGCGAACCTTTCCTTATTCTCGCAACGCCTCGGTAACGGGCAGCGCCTGCTCCGCGCCGCGGTTGCGAGTTCGGGAAGGCGGCCGGAGTCGCGGGGAGGAGAGGGAGGATTCGATCGGCCAGA",
    "Suboptimal sequence": "TGGAGATGGGCAGATGGCACACAAAACATGAATAGAAAACCCAAAAGGAAGGATGAAAAAAACACACACACACACACACACAAAACACAGAGAGAGAGAGAGAGAGAGAGCGAGAAAAGAAAAGAAAAAACCAATTCTTTTGGTCTCTTCCCTCTCCGTTTGTCGTGTCGAAGCCTTTGCCCCCACCACCTCCTCCTCTCCTCTCCCTTCCTCCCCTCCTCCCCATCTCGCTCTCCTCCCTCCTCTCTCCTCTCCTCGTCTCCTCTTCCTCTCCATTCCATTGGCCATTCCATTCCATTCCACCCCCCATGAAACCCCAAACCCTCGTCGGCCTCGCCGCGCTCGCGTAGCGCACCCGCCCTTCTCCTCTCGCCGGTGGTCCGCCGCCAGCCTCCCCCCACCCGATCCCGCCGCCCCCCCCGCCTTCACCCCGCCCACGCGGACGCATCCGATCCCGCCGCATCGCCGCGCGGGGGGGGGGGGGGGGGGGGGGGGGAGGGCACG",
    "Random sequence": "AUGC" * (128 // 4),
}

print("🧬 Testing model on sample sequences:")
print("=" * 50)

In [None]:
# Perform inference - exactly as in complete tutorial
with torch.no_grad():
    for seq_name, sequence in sample_sequences.items():
        outputs = inference_model.inference(sequence)
        print("✅ Prediction completed!")

        # Result Interpretation - corrected version from complete tutorial
        prediction = outputs.get('predictions', [0])[0]
        probability = outputs.get('probabilities', [0.5])[0] if 'probabilities' in outputs else 0.5

        te_class = "High TE" if prediction == 1 else "Low TE"
        confidence = probability if prediction == 1 else (1 - probability)

        print(f"📊 Analysis for {seq_name}:")
        print(f"  🎯 Prediction: {te_class}")
        print(f"  📈 Confidence: {confidence:.3f}")
        print(f"  📏 Sequence length: {len(sequence)} nucleotides")

        if confidence > 0.8:
            emoji = "🟢 High confidence"
        elif confidence > 0.6:
            emoji = "🟡 Moderate confidence"
        else:
            emoji = "🔴 Low confidence"
        print(f"  {emoji} Model confidence level for this prediction\n")

## Understanding Prediction Results 📊

Let's dive deeper into what these predictions mean biologically and how to interpret them for research.

In [None]:
# Detailed analysis of prediction results
print("📈 Understanding Your Predictions")
print("=" * 40)

print("\n🎯 What the Model Learned:")
print("   ✅ Sequence patterns associated with high translation efficiency")
print("   ✅ Ribosome binding site characteristics")
print("   ✅ Codon usage optimization patterns")
print("   ✅ mRNA secondary structure effects")
print("   ✅ 5' UTR regulatory elements")

print("\n📊 Prediction Components:")
print("   🎯 Prediction: Binary classification (0=Low TE, 1=High TE)")
print("   📈 Confidence: Model's certainty about the prediction (0-1)")
print("   🔬 Biological Meaning: Likely translation efficiency in living cells")

print("\n⚠️ Confidence Levels:")
print("   🟢 High (>0.8): Very reliable prediction")
print("   🟡 Moderate (0.6-0.8): Good prediction, consider experimental validation")
print("   🔴 Low (<0.6): Uncertain prediction, requires careful interpretation")

## Batch Inference for Research Applications 🔬

In real research, you often need to analyze many sequences at once. Let's demonstrate how to do efficient batch processing.

In [None]:
# Example: Batch analysis for research
import pandas as pd

# Simulate a research dataset
research_sequences = [
    "AUGAAACCCGGGUUUAAACCCGGGUUUAAACCCGGG",
    "AUGGCUGCUACGCUACGCUACGCUACGCUACGCUA",
    "AUGCCCAAAGGGCCCAAAGGGCCCAAAGGGCCCAAA",
    "AUGUUUAAACCCUUUAAACCCUUUAAACCCUUUAAA",
    "AUGGGGUUUAAACCCCGGGUUUAAACCCCGGGUUUAA",
]

gene_ids = [f"Gene_{i+1:03d}" for i in range(len(research_sequences))]

print("🔬 Batch Analysis for Research Applications")
print(f"📊 Analyzing {len(research_sequences)} sequences...")

# Batch inference
results = []
with torch.no_grad():
    for i, (gene_id, sequence) in enumerate(zip(gene_ids, research_sequences)):
        outputs = inference_model.inference(sequence)
        
        prediction = outputs.get('predictions', [0])[0]
        probability = outputs.get('probabilities', [0.5])[0] if 'probabilities' in outputs else 0.5
        
        te_class = "High TE" if prediction == 1 else "Low TE"
        confidence = probability if prediction == 1 else (1 - probability)
        
        results.append({
            'Gene_ID': gene_id,
            'Sequence': sequence,
            'Prediction': te_class,
            'Confidence': confidence,
            'Raw_Probability': probability
        })
        
        print(f"  ✅ {gene_id}: {te_class} (confidence: {confidence:.3f})")

# Create results DataFrame
results_df = pd.DataFrame(results)
print(f"\n📊 Batch analysis completed for {len(results)} sequences!")

In [None]:
# Display results summary
print("📈 Research Results Summary:")
print(results_df)

print(f"\n📊 Summary Statistics:")
high_te_count = len(results_df[results_df['Prediction'] == 'High TE'])
low_te_count = len(results_df[results_df['Prediction'] == 'Low TE'])
avg_confidence = results_df['Confidence'].mean()

print(f"   🎯 High TE predictions: {high_te_count}/{len(results_df)}")
print(f"   🎯 Low TE predictions: {low_te_count}/{len(results_df)}")
print(f"   📈 Average confidence: {avg_confidence:.3f}")
print(f"   🔬 Ready for downstream analysis!")

## Biological Interpretation and Research Applications 🧬

Your trained model can now help answer important biological questions. Let's explore how to apply it to real research scenarios.

In [None]:
# Research application examples
print("🔬 Research Applications for Your Trained Model")
print("=" * 50)

research_applications = {
    "🧬 Synthetic Biology": [
        "Design optimized gene circuits",
        "Predict expression levels for synthetic constructs",
        "Optimize codon usage for heterologous expression"
    ],
    "🌾 Plant Biology": [
        "Identify genes with high/low translation rates",
        "Analyze tissue-specific translation patterns", 
        "Study stress response at the translation level"
    ],
    "💊 Drug Discovery": [
        "Design therapeutic mRNAs with optimal translation",
        "Predict protein production levels",
        "Optimize mRNA vaccines and therapeutics"
    ],
    "📊 Comparative Genomics": [
        "Compare translation efficiency across species",
        "Identify evolutionary patterns in translation",
        "Study gene family expression differences"
    ]
}

for category, applications in research_applications.items():
    print(f"\n{category}:")
    for app in applications:
        print(f"   • {app}")

print(f"\n🎯 Your Model = Powerful Research Tool!")

## Best Practices for Model Usage 📋

To get the most reliable results from your model, follow these research best practices.

In [None]:
# Best practices guide
print("📋 Best Practices for Model Usage")
print("=" * 40)

best_practices = {
    "🎯 Input Requirements": [
        "Use RNA sequences (A, U, C, G) not DNA (A, T, C, G)",
        "Include 5' UTR and start of coding sequence when possible",
        "Sequences should be 50-500 nucleotides for best results",
        "Ensure sequences are biologically meaningful (proper start codons, etc.)"
    ],
    "📊 Result Interpretation": [
        "High confidence (>0.8): Very reliable predictions",
        "Moderate confidence (0.6-0.8): Good predictions, consider validation", 
        "Low confidence (<0.6): Use with caution, may need experimental validation",
        "Always consider biological context and prior knowledge"
    ],
    "🔬 Research Applications": [
        "Use for comparative analysis across sequences",
        "Combine with other genomic features for comprehensive analysis",
        "Validate key predictions experimentally when possible",
        "Consider organism-specific effects (model trained on rice data)"
    ],
    "⚠️ Limitations to Consider": [
        "Model trained on rice data - may not generalize to all organisms",
        "Predictions are probabilistic - not absolute biological truth",
        "Cannot account for all cellular conditions and contexts",
        "Experimental validation recommended for critical applications"
    ]
}

for category, practices in best_practices.items():
    print(f"\n{category}:")
    for practice in practices:
        print(f"   • {practice}")

print(f"\n✅ Follow these guidelines for reliable research results!")

## 🎉 Tutorial Series Completion!

Congratulations! You have successfully completed the entire OmniGenBench tutorial series. You are now equipped with powerful skills in genomic AI!

### 🎓 **Complete Skill Set Mastered**

✅ **Data Preparation**: Loading, processing, and validating genomic datasets  
✅ **Model Initialization**: Setting up pre-trained genomic foundation models  
✅ **Model Training**: Fine-tuning models for specific biological tasks  
✅ **Model Inference**: Making predictions and interpreting results  
✅ **Research Applications**: Applying AI to real biological questions  
✅ **Best Practices**: Professional workflows for reliable results  

### 🚀 **Your AI Journey: From Beginner to Expert**

You started as a beginner and now possess:
- 🧬 **Deep understanding** of genomic foundation models
- 🔧 **Technical skills** to train and deploy AI models
- 📊 **Research capabilities** to analyze biological data at scale
- 🎯 **Professional workflows** for reproducible science

### 🔬 **Ready for Real Research**

Your trained translation efficiency model can now:
- 📈 **Predict translation rates** for thousands of mRNA sequences
- 🧬 **Guide synthetic biology** designs and optimizations
- 🌾 **Analyze plant genomics** data for agricultural applications
- 💊 **Support biotechnology** and therapeutic development

### 🌟 **What's Next?**

With your new skills, you can:
1. **Apply to your research**: Use these techniques on your own biological data
2. **Explore other tasks**: Try different genomic prediction problems
3. **Customize models**: Adapt the framework for novel research questions
4. **Share knowledge**: Teach others and contribute to the community

> 🎊 **Achievement Unlocked**: Genomic AI Expert!

**Thank you for completing this journey with OmniGenBench!** 🧬✨

Your newfound expertise in genomic AI will empower you to make significant contributions to biological research and help unlock the secrets hidden in genomic sequences.