## Find Best Qwen Variants

### Prerequisites to follow this notebook:

1. Install the remyxai-cli: `pip install git+https://github.com/remyxai/remyxai-cli.git`
2. Ensure the REMYXAI_API_KEY is set as an environment variable.

In [1]:
import logging
import traceback
from remyxai.client.myxboard import MyxBoard
from remyxai.client.remyx_client import RemyxAPI
from remyxai.api.evaluations import EvaluationTask, list_evaluations, download_evaluation

### Step 1: Select Variants of the Qwen Model Family for Evaluation

We will evaluate all available Qwen variants to determine the best performer for specific tasks.

In [2]:
models = [
    'Qwen/Qwen-7B',
    'Qwen/Qwen-7B-Instruct',
    'Qwen/Qwen2.5-0.5B',
    'Qwen/Qwen2.5-0.5B-Instruct',
    'Qwen/Qwen2-1.5B',
    'Qwen/Qwen2-1.5B-Instruct'
]

### Create a MyxBoard for Evaluation

In [3]:
myx_board_name = "qwen_family_comparison"
myx_board = MyxBoard(model_repo_ids=models, name=myx_board_name)
myx_board

<remyxai.client.myxboard.MyxBoard at 0x7fe3238edcf0>

### Initialize the RemyxAPI

In [4]:
remyx_api = RemyxAPI()
remyx_api

<remyxai.client.remyx_client.RemyxAPI at 0x7fe3238ede70>

### Define Evaluation Tasks

In [5]:
tasks = [EvaluationTask.MYXMATCH]

### Define Prompts for Evaluation

In [6]:
prompts = [
    "Summarize the findings of this research on quantum computing.",
    "Explain the side effects of Drug Y in patients with kidney disease.",
    "What are the economic implications of AI in manufacturing?",
    "Generate a step-by-step solution to this calculus problem: ∫x^2 dx."
]

### Run Evaluation

In [7]:
for prompt in prompts:
    remyx_api.evaluate(myx_board, tasks, prompt=prompt)

Starting evaluation for Qwen variants...
Starting evaluation for Qwen variants...
Starting evaluation for Qwen variants...


### Check Evaluation Status

In [8]:
evaluations = list_evaluations()
evaluations

[{'eval_type': 'myxmatch',
  'name': 'qwen_family_comparison',
  'status': 'FINISHED'}]

### Download Results for Analysis

In [9]:
results = download_evaluation("myxmatch", "qwen_family_comparison")
results['message']

{'models': [{'model': 'Qwen2-1.5B-Instruct', 'rank': 1},
  {'model': 'Qwen-14B-Instruct', 'rank': 2},
  {'model': 'Qwen-14B', 'rank': 3},
  {'model': 'Qwen-7B-Instruct', 'rank': 4},
  {'model': 'Qwen-7B', 'rank': 5},
  {'model': 'Qwen2-1.5B', 'rank': 6}],
 'prompt': 'Summarize the findings of this research on quantum computing.'}

### Results Summary

- **Top Variant in Qwen Family**: `Qwen2-1.5B-Instruct` ranks the highest for performance across prompts.
- **Other Strong Performers**: `Qwen-14B-Instruct` and `Qwen-14B` also deliver competitive results.

Based on this evaluation, integrate `Qwen2-1.5B-Instruct` into production for tasks requiring the best overall performance in the Qwen family.