# ModelArena WalkThrough Section 3.2 - Match

In [1]:
import warnings

from model_arena import ModelArena

# it is never the best practice to ignore warnings!
# however bytedmysql never tries to solve the warnings
# for better presentation here, we ignore these warnings
warnings.filterwarnings("ignore")
ma = ModelArena()

## Dataset

We use a demo dataset to walk through the match process.

In [2]:
dataset = "demo"
ma.datasets.get(datasets=dataset)

Unnamed: 0,dataset_name,dataset_id,raw_dataset_id,tag,instruction,output
0,demo,a051970d3095432f967e68c3049313dd,a19b72fbaa5e4bc4a3405dfed904650d,nl2code,write a quick sort in python.,
1,demo,95d7220b5e814c2eadbabaab4decc4f7,2290052ff9ea4be8bb38095247713cf0,nl2code,write a bubble sort in c.,


## Generate Matches

Let's use generate matches functions to see how we compare two models.

In [3]:
# we select a specific target model to construct matches
ma.generate_matches(dataset=dataset, model="gpt-3.5-turbo-1106", target_model="gpt-4-0613", shuffle=False)

You have directly call `generate_matches` to acquire a dataframe, this will build [('auditor', <class 'str'>), ('score_x', <class 'float'>), ('score_y', <class 'float'>)] columns in the dataframe. Please remeber to fill them, before call `add_matches`.


Unnamed: 0,dataset_name,dataset_id,tag,instruction,model_id_x,output_x,model_id_y,output_y,auditor,score_x,score_y
0,demo,a051970d3095432f967e68c3049313dd,nl2code,write a quick sort in python.,dd078c34445049879fbcb5ae72f1d9d5,Here's an implementation of quick sort in Pyth...,b62e8a8ce26e4b3cb9e208be609c1a5d,Here's an implementation of quick sort in Pyth...,,,
1,demo,95d7220b5e814c2eadbabaab4decc4f7,nl2code,write a bubble sort in c.,dd078c34445049879fbcb5ae72f1d9d5,Here is an implementation of bubble sort in C:...,b62e8a8ce26e4b3cb9e208be609c1a5d,Here's an implementation of bubble sort in C:\...,,,


In [4]:
# we randomly choose some opponents
ma.generate_matches(dataset=dataset, model="gpt-3.5-turbo-1106", target_model="random", shuffle=False)

You have directly call `generate_matches` to acquire a dataframe, this will build [('auditor', <class 'str'>), ('score_x', <class 'float'>), ('score_y', <class 'float'>)] columns in the dataframe. Please remeber to fill them, before call `add_matches`.


Unnamed: 0,dataset_name,dataset_id,tag,instruction,model_id_x,output_x,model_id_y,output_y,auditor,score_x,score_y
0,demo,95d7220b5e814c2eadbabaab4decc4f7,nl2code,write a bubble sort in c.,dd078c34445049879fbcb5ae72f1d9d5,Here is an implementation of bubble sort in C:...,abc7bdeb04754641ae1fceaa892dfe9c,"Sure, here is a simple implementation of Bubbl...",,,
1,demo,a051970d3095432f967e68c3049313dd,nl2code,write a quick sort in python.,dd078c34445049879fbcb5ae72f1d9d5,Here's an implementation of quick sort in Pyth...,abc7bdeb04754641ae1fceaa892dfe9c,"Sure, here is a simple implementation of the Q...",,,


In [5]:
# we choose each model in the dataset to generate a series of matches
# and shuffle the order of candidates
ma.generate_matches(dataset=dataset, model="gpt-3.5-turbo-1106", target_model="all", shuffle=True)

You have directly call `generate_matches` to acquire a dataframe, this will build [('auditor', <class 'str'>), ('score_x', <class 'float'>), ('score_y', <class 'float'>)] columns in the dataframe. Please remeber to fill them, before call `add_matches`.


Unnamed: 0,dataset_name,dataset_id,tag,instruction,model_id_x,output_x,model_id_y,output_y,auditor,score_x,score_y
0,demo,a051970d3095432f967e68c3049313dd,nl2code,write a quick sort in python.,dd078c34445049879fbcb5ae72f1d9d5,Here's an implementation of quick sort in Pyth...,b62e8a8ce26e4b3cb9e208be609c1a5d,Here's an implementation of quick sort in Pyth...,,,
1,demo,a051970d3095432f967e68c3049313dd,nl2code,write a quick sort in python.,dd078c34445049879fbcb5ae72f1d9d5,Here's an implementation of quick sort in Pyth...,abc7bdeb04754641ae1fceaa892dfe9c,"Sure, here is a simple implementation of the Q...",,,
2,demo,95d7220b5e814c2eadbabaab4decc4f7,nl2code,write a bubble sort in c.,d50277e0668f4ce8b51933a4008264dc,"Sure, here is a basic implementation of Bubble...",dd078c34445049879fbcb5ae72f1d9d5,Here is an implementation of bubble sort in C:...,,,
3,demo,95d7220b5e814c2eadbabaab4decc4f7,nl2code,write a bubble sort in c.,b62e8a8ce26e4b3cb9e208be609c1a5d,Here's an implementation of bubble sort in C:\...,dd078c34445049879fbcb5ae72f1d9d5,Here is an implementation of bubble sort in C:...,,,
4,demo,95d7220b5e814c2eadbabaab4decc4f7,nl2code,write a bubble sort in c.,abc7bdeb04754641ae1fceaa892dfe9c,"Sure, here is a simple implementation of Bubbl...",dd078c34445049879fbcb5ae72f1d9d5,Here is an implementation of bubble sort in C:...,,,
5,demo,a051970d3095432f967e68c3049313dd,nl2code,write a quick sort in python.,dd078c34445049879fbcb5ae72f1d9d5,Here's an implementation of quick sort in Pyth...,d50277e0668f4ce8b51933a4008264dc,Here is a basic implementation of Quick Sort i...,,,


## Evaluator

Once we have a series of matches, we can use an evalator to judge which one is better.

In [6]:
import os

os.environ["BYTED_GPT_TOKEN"] = ""

In [7]:
from model_arena.extensions import ChatGPTEvaluator

evaluator = ChatGPTEvaluator(model="gpt-4-0613")

In [8]:
evaluator.evaluate(ma.generate_matches(dataset=dataset, model="gpt-3.5-turbo-1106", target_model="all", shuffle=True))

You have directly call `generate_matches` to acquire a dataframe, this will build [('auditor', <class 'str'>), ('score_x', <class 'float'>), ('score_y', <class 'float'>)] columns in the dataframe. Please remeber to fill them, before call `add_matches`.


Unnamed: 0,dataset_name,dataset_id,tag,instruction,model_id_x,output_x,model_id_y,output_y,auditor,score_x,score_y
0,demo,95d7220b5e814c2eadbabaab4decc4f7,nl2code,write a bubble sort in c.,abc7bdeb04754641ae1fceaa892dfe9c,"Sure, here is a simple implementation of Bubbl...",dd078c34445049879fbcb5ae72f1d9d5,Here is an implementation of bubble sort in C:...,gpt-4-0613,4,3
1,demo,95d7220b5e814c2eadbabaab4decc4f7,nl2code,write a bubble sort in c.,b62e8a8ce26e4b3cb9e208be609c1a5d,Here's an implementation of bubble sort in C:\...,dd078c34445049879fbcb5ae72f1d9d5,Here is an implementation of bubble sort in C:...,gpt-4-0613,4,3
2,demo,a051970d3095432f967e68c3049313dd,nl2code,write a quick sort in python.,dd078c34445049879fbcb5ae72f1d9d5,Here's an implementation of quick sort in Pyth...,abc7bdeb04754641ae1fceaa892dfe9c,"Sure, here is a simple implementation of the Q...",gpt-4-0613,4,3
3,demo,95d7220b5e814c2eadbabaab4decc4f7,nl2code,write a bubble sort in c.,d50277e0668f4ce8b51933a4008264dc,"Sure, here is a basic implementation of Bubble...",dd078c34445049879fbcb5ae72f1d9d5,Here is an implementation of bubble sort in C:...,gpt-4-0613,4,3
4,demo,a051970d3095432f967e68c3049313dd,nl2code,write a quick sort in python.,dd078c34445049879fbcb5ae72f1d9d5,Here's an implementation of quick sort in Pyth...,b62e8a8ce26e4b3cb9e208be609c1a5d,Here's an implementation of quick sort in Pyth...,gpt-4-0613,4,3
5,demo,a051970d3095432f967e68c3049313dd,nl2code,write a quick sort in python.,dd078c34445049879fbcb5ae72f1d9d5,Here's an implementation of quick sort in Pyth...,d50277e0668f4ce8b51933a4008264dc,Here is a basic implementation of Quick Sort i...,gpt-4-0613,4,3


We can also do all these above mentioned process in one single function!

In [9]:
# set upload=False for a debug view of result
# set upload=True to directly upload the result
ma.match(dataset=dataset, model="gpt-3.5-turbo-1106", target_model="all", evaluator=evaluator, upload=False)

Unnamed: 0,dataset_name,dataset_id,tag,instruction,model_id_x,output_x,model_id_y,output_y,auditor,score_x,score_y
0,demo,a051970d3095432f967e68c3049313dd,nl2code,write a quick sort in python.,dd078c34445049879fbcb5ae72f1d9d5,Here's an implementation of quick sort in Pyth...,d50277e0668f4ce8b51933a4008264dc,Here is a basic implementation of Quick Sort i...,gpt-4-0613,3,4
1,demo,95d7220b5e814c2eadbabaab4decc4f7,nl2code,write a bubble sort in c.,d50277e0668f4ce8b51933a4008264dc,"Sure, here is a basic implementation of Bubble...",dd078c34445049879fbcb5ae72f1d9d5,Here is an implementation of bubble sort in C:...,gpt-4-0613,4,3
2,demo,95d7220b5e814c2eadbabaab4decc4f7,nl2code,write a bubble sort in c.,abc7bdeb04754641ae1fceaa892dfe9c,"Sure, here is a simple implementation of Bubbl...",dd078c34445049879fbcb5ae72f1d9d5,Here is an implementation of bubble sort in C:...,gpt-4-0613,4,3
3,demo,a051970d3095432f967e68c3049313dd,nl2code,write a quick sort in python.,dd078c34445049879fbcb5ae72f1d9d5,Here's an implementation of quick sort in Pyth...,abc7bdeb04754641ae1fceaa892dfe9c,"Sure, here is a simple implementation of the Q...",gpt-4-0613,4,3
4,demo,95d7220b5e814c2eadbabaab4decc4f7,nl2code,write a bubble sort in c.,b62e8a8ce26e4b3cb9e208be609c1a5d,Here's an implementation of bubble sort in C:\...,dd078c34445049879fbcb5ae72f1d9d5,Here is an implementation of bubble sort in C:...,gpt-4-0613,4,3
5,demo,a051970d3095432f967e68c3049313dd,nl2code,write a quick sort in python.,dd078c34445049879fbcb5ae72f1d9d5,Here's an implementation of quick sort in Pyth...,b62e8a8ce26e4b3cb9e208be609c1a5d,Here's an implementation of quick sort in Pyth...,gpt-4-0613,4,3


Remember? We can always use *get* method to retrieve all history data.

In [10]:
ma.get_matches(datasets=dataset, models="all")

Unnamed: 0,dataset_name,dataset_id,tag,instruction,information,model_name_x,model_name_y,auditor,score_x,score_y
0,demo,95d7220b5e814c2eadbabaab4decc4f7,nl2code,write a bubble sort in c.,"{""task"": ""write a bubble sort"", ""lang"": ""c""}",gpt-4-0613,gpt-3.5-turbo-1106,gpt-4-0613,4.0,3.0
1,demo,a051970d3095432f967e68c3049313dd,nl2code,write a quick sort in python.,"{""task"": ""write a quick sort"", ""lang"": ""python""}",gpt-3.5-turbo-1106,deepseek-coder-6.7b-instruct-awq,gpt-4-0613,3.0,4.0
2,demo,95d7220b5e814c2eadbabaab4decc4f7,nl2code,write a bubble sort in c.,"{""task"": ""write a bubble sort"", ""lang"": ""c""}",deepseek-coder-6.7b-instruct,gpt-3.5-turbo-1106,gpt-4-0613,4.0,3.0
3,demo,a051970d3095432f967e68c3049313dd,nl2code,write a quick sort in python.,"{""task"": ""write a quick sort"", ""lang"": ""python""}",gpt-3.5-turbo-1106,gpt-4-0613,gpt-4-0613,4.0,3.0
4,demo,95d7220b5e814c2eadbabaab4decc4f7,nl2code,write a bubble sort in c.,"{""task"": ""write a bubble sort"", ""lang"": ""c""}",deepseek-coder-6.7b-instruct-awq,gpt-3.5-turbo-1106,gpt-4-0613,4.0,3.0
5,demo,a051970d3095432f967e68c3049313dd,nl2code,write a quick sort in python.,"{""task"": ""write a quick sort"", ""lang"": ""python""}",gpt-3.5-turbo-1106,deepseek-coder-6.7b-instruct,gpt-4-0613,4.0,3.0
