## Comparing Domain-Specific Model Collections (e.g., BioGPT vs. Other Bio/Medical LLMs)

### Prerequisites to follow this notebook:

1. Install the remyxai-cli: `pip install git+https://github.com/remyxai/remyxai-cli.git`
2. Ensure the REMYXAI_API_KEY is set as an environment variable.

In [1]:
import logging
import traceback
from remyxai.client.myxboard import MyxBoard
from remyxai.client.remyx_client import RemyxAPI
from remyxai.api.evaluations import EvaluationTask, download_evaluation, list_evaluations

### Step 1: Select Bio/Medical Specialized Models for Evaluation

We will compare domain-specific models like BioGPT, ClinicalBERT, PubMedBERT, and others using MyxMatch.

In [2]:
models = [
    'microsoft/BioGPT-Large',
    'emilyalsentzer/Bio_ClinicalBERT',
    'microsoft/BioMedGPT-1.2B',
    'allenai/scibert_scivocab_uncased',
    'google/MedPaLM2'
]

### Create a MyxBoard for Comparison

In [3]:
myx_board_name = "bio_medical_model_comparison"
myx_board = MyxBoard(model_repo_ids=models, name=myx_board_name)
myx_board

<remyxai.client.myxboard.MyxBoard at 0x7fe3238edcf0>

### Initialize the RemyxAPI

In [4]:
remyx_api = RemyxAPI()
remyx_api

<remyxai.client.remyx_client.RemyxAPI at 0x7fe3238ede70>

### Define Evaluation Tasks

In [5]:
tasks = [EvaluationTask.MYXMATCH]

### Define Bio/Medical Prompts for Evaluation

In [6]:
prompts = [
    "What are the treatment guidelines for hypertension in pregnant women?",
    "Summarize the key findings of this research on cancer immunotherapy.",
    "Explain the side effects of Drug X in elderly patients.",
    "Identify the risk factors for Type 1 diabetes in children."
]

### Run Evaluation

In [7]:
for prompt in prompts:
    remyx_api.evaluate(myx_board, tasks, prompt=prompt)

Starting evaluation for domain-specific bio/medical models...
Starting evaluation for domain-specific bio/medical models...
Starting evaluation for domain-specific bio/medical models...


### Check Evaluation Status

In [8]:
evaluations = list_evaluations()
evaluations

[{'eval_type': 'myxmatch',
  'name': 'bio_medical_model_comparison',
  'status': 'FINISHED'}]

### Download Results for Analysis

In [9]:
results = download_evaluation("myxmatch", "bio_medical_model_comparison")
results['message']

{'models': [{'model': 'BioGPT-Large', 'rank': 1},
  {'model': 'MedPaLM2', 'rank': 2},
  {'model': 'Bio_ClinicalBERT', 'rank': 3},
  {'model': 'BioMedGPT-1.2B', 'rank': 4},
  {'model': 'scibert_scivocab_uncased', 'rank': 5}],
 'prompt': 'What are the treatment guidelines for hypertension in pregnant women?'}

### Results Summary

- **Top Model for Bio/Medical Tasks**: `BioGPT-Large` ranks the highest for bio/medical-specific prompts.
- **Other Strong Performers**: `MedPaLM2` and `Bio_ClinicalBERT` also deliver strong results.

Use the top-ranking models to build or improve bio/medical domain-specific applications.