# **Project 2 - Methodology 1: Pareto Model Merging for Reasoning Optimization**

**Lead:** Nadya Devani (nad8884, naddevani@gmail.com)

**Research Objective:** Reduce reasoning/output tokens by 30% by merging non-reasoning LLM with reasoning LRM counterparts using Pareto merging where the weightage for the LRM is directly proportional to the predicted difficulty score of the question.

**Target Performance:**

*   30% average output token reduction on 6 math datasets
*   <1% accuracy drop compared to reasoning LRMs
*   Better performance than fixed merging methods

**NOTEBOOK STRUCTURE:**

*   Section 1: Environment Setup & Dependencies
*   Section 2: Literature Review & Model Architecture
*   Section 3: Question Difficulty Estimation
*   Section 4: Pareto Merging Implementation
*   Section 5: Training Pipeline
*   Section 6: Evaluation Framework
*   Section 7: Math Dataset Benchmarking
*   Section 8: Performance Analysis
*   Section 9: Package Integration
*   Section 10: Results & Documentation

## API Usage Section:
### Code Example:

Complete working example adapted for reasoning tasks
Clear parameter explanations (context, prompt, model, rate)
Security note about getting personal API keys

### Usage Tips:

Start with no compression (rate: 0) for baseline testing
Personal API key requirement for security
Dashboard monitoring for experiment tracking
Baseline comparison guidance for methodology evaluation

### Generate API key
To generate the api key:
1. please log into the [dashboard](https://hallucinating-prompts.scaledown.ai/dashboard) and
2. switch to API keys tab
3. Generate an API key
4. You can track the usage over time

In [None]:
import requests
import json
url = "https://api.scaledown.xyz/compress/"
payload = json.dumps({
  "context": "<context about messi>",
  "prompt": "How many awards does messi have",
  "model": "gemini-2.5-flash",
  "scaledown": {
    "rate": 0
  }
})
headers = {
  'x-api-key': 'add your api key here',
  'Content-Type': 'application/json'
}
response = requests.request("POST", url, headers=headers, data=payload)
print(response.text)

In [None]:
"# Cell 1.2: Model Configuration & Loading\n",

"TODO: Load and configure base models for merging\n",
"- Load Qwen2.5-Math-7B as base non-reasoning model\n",
"- Load DeepSeek-R1-Distill-Qwen-7B as reasoning model\n",
 "- Configure model parameters and tokenizers\n",
"- Set up memory optimization for efficient processing\n",

# ===============================================================================
# SECTION 2: LITERATURE REVIEW & MODEL ARCHITECTURE
# Primary: Pouya Sadeghi | Supporting: Debarshi Das
# ===============================================================================

In [None]:
        "# Cell 2.1: Literature Foundation Implementation\n",
        "\"\"\"\n",
        "TODO: Implement key findings from literature review\n",
        "- Task Arithmetic Merging implementation\n",
        "- TIES Merging with redundancy reduction\n",
        "- Adaptive Merging parameter optimization\n",
        "- Pareto Merging framework setup\n",

        "#===============================================================================",
        "# SECTION 4: PARETO MERGING IMPLEMENTATION\n",
        "# Primary: Nadya Devani, Purva Kandalgaonkar | Supporting: Suparnojit Sarkar\n",
        "#==============================================================================="

In [None]:
        "# Cell 3.1: Difficulty Scoring Implementation\n",
        "\"\"\"\n",
        "TODO: Implement question difficulty estimation model\n",
        "- Design difficulty scoring algorithm\n",
        "- Train difficulty predictor on math datasets\n",
        "- Create preference vector generation system\n",
        "- Validate difficulty predictions across datasets\n",

        "#===============================================================================\n",
        "# SECTION 4: PARETO MERGING IMPLEMENTATION\n",
        "# Primary: Nadya Devani, Purva Kandalgaonkar | Supporting: Suparnojit Sarkar\n",
        "#==============================================================================="

In [None]:
        "TODO: Implement Pareto merging with preference-dependent low rank tensors\n",
        "- Create preference-independent base model merger\n",
        "- Implement preference-dependent low rank tensor for attention heads\n",
        "- Build training algorithm for Pareto merging\n",
        "- Create inference pipeline with dynamic preference vectors\n",

        "#===============================================================================\n",
        "# SECTION 5: TRAINING PIPELINE\n",
        "# Primary: Purva Kandalgaonkar | Supporting: Nadya Devani\n",
        "#==============================================================================="

In [None]:
        "TODO: Implement end-to-end training pipeline\n",
        "- Set up training data preparation for math datasets\n",
        "- Implement training loop for preference tensor optimization\n",
        "- Create validation framework\n",
        "- Add logging and monitoring systems\n",

        # SECTION 6: EVALUATION FRAMEWORK
        # Primary: Purva Kandalgaonkar | Supporting: Abhishek Shriwas

In [None]:
        "TODO: Implement evaluation framework for math datasets\n",
        "- Set up GSM8K, MATH500, Minerva Math, Olympiad Bench, College Math, AIME24\n",
        "- Implement accuracy and token counting metrics\n",
        "- Create comparison framework with baseline methods\n",
        "- Add statistical significance testing\n",

        "# SECTION 7: MATH DATASET BENCHMARKING\n",
        "# Primary: Purva Kandalgaonkar | Supporting: Abhishek Shriwas\n",

In [None]:
        "TODO: Run comprehensive evaluation on all 6 math datasets\n",
        "- Execute baseline evaluations for comparison\n",
        "- Run Pareto merging methodology on all datasets\n",
        "- Generate performance comparisons and statistical analysis\n",
        "- Create visualizations for results\n",

        "#===============================================================================\n",
        "# SECTION 8: PERFORMANCE ANALYSIS\n",
        "# Primary: Abhishek Shriwas | Supporting: Samrudhi Bhoyar\n",
        "#==============================================================================="

In [None]:
        "TODO: Implement comprehensive performance analysis\n",
        "- Statistical significance testing\n",
        "- Error analysis and failure case studies\n",
        "- Performance breakdown by difficulty levels\n",
        "- Comparison with literature baselines\n",

        "# SECTION 9: PACKAGE INTEGRATION\n",
        "# Primary: Samrudhi Bhoyar | Supporting: Shruti Shinde\n",

In [None]:
        "TODO: Create production-ready Python package\n",
        "- Package all components into coherent library\n",
        "- Create user-friendly API interface\n",
        "- Add comprehensive documentation and examples\n",
        "- Implement testing suite and validation\n",