# Send test prompts to multiple models for evaluation

### Scenario
You are choosing a large language model for your project.  You have narrowed down the list of potential models based on reading about the available models.  Now, you want to send test prompts to multiple 

Notebook sections:
- [Step 1: Set up IBM watsonx.ai foundation model Python library prerequisites](#step1)
- [Step 2: Create a function for prompting a model](#step2)
- [Step 3: Create a simple function for prompting multiple models](#step3)
- [Step 4: Create a function for sending multiple prompts to multiple models](#step4)
- [Step 5: Create a function with model-specific prompt parameter overrides](#step5)
- [Step 6: Create a function with model-specific prompt text overrides](#step6)

By the end of this notebook, you'll have test results to help you decide which model to use for your project at this time:

<img src="https://raw.githubusercontent.com/spackows/watsonx.ai-samples/main/sample-02_prompt-multiple-models/images/sample-02_prompt-multiple-models.png" width="70%" title="Image of DataFrame" />

<a id="step1"></a>
## Step 1: Set up IBM watsonx.ai foundation model Python library prerequisites
Before you can prompt a foundation model in watsonx.ai, you must perform the following setup tasks:
- 1.1 Create an instance of the Watson Machine Learning service
- 1.2 Associate the Watson Machine Learning instance with the current project
- 1.3 Create an IBM Cloud API key
- 1.4 Create a credentials dictionary for Watson Machine learning
- 1.5 Look up the current project ID

### 1.1 Create an instance of the Watson Machine Learning service
If you don't already have an instance of the IBM Watson Machine Learning service, you can create an instance of the service from the IBM Cloud catalog: <a href="https://console.ng.bluemix.net/catalog/services/ibm-watson-machine-learning/" target="_blank">Watson Machine Learning service</a>

### 1.2 Associate an instance of the Watson Machine Learning service with the current project
The _current project_ is the project in which you are running this notebook.

If an instance of Watson Machine Learning is not already associated with the current project, follow the instructions in this topic to do so: <a href="https://dataplatform.cloud.ibm.com/docs/content/wsj/getting-started/assoc-services.html?context=wx&audience=wdp" target="_blank">Adding associated services to a project</a>

### 1.3 Create an IBM Cloud API key
Create an IBM Cloud API key by following these instruction: <a href="https://cloud.ibm.com/docs/account?topic=account-userapikey&interface=ui#create_user_key" target="_blank">Creating an IBM Cloud API key</a>

Then paste your new IBM Cloud API key in the code cell below.

In [1]:
g_cloud_apikey = ""

### 1.4 Create a credentials dictionary for Watson Machine learning
See: [Authentication](https://ibm.github.io/watson-machine-learning-sdk/setup_cloud.html#authentication)

In [2]:
g_wml_credentials = { 
    "url"    : "https://us-south.ml.cloud.ibm.com", 
    "apikey" : g_cloud_apikey
}

### 1.5 Look up the current project ID
The _current project_ is the project in which you are running this notebook.  You can get the ID of the current project programmatically by running the following cell.

In [3]:
import os

g_project_id = os.environ["PROJECT_ID"]

<a id="step2"></a>
## Step 2: Create a function for prompting a model

See: [Foundation models Python library](https://ibm.github.io/watson-machine-learning-sdk/foundation_models.html)

In [4]:
from ibm_watson_machine_learning.foundation_models import Model
import json

def generate( model_id, prompt_parameters, prompt_text, b_debug=False ):
    model = Model( model_id, g_wml_credentials, prompt_parameters, g_project_id )
    raw_response = model.generate( prompt_text )
    if b_debug:
        print( "prompt_text:\n'" + prompt_text + "'\n" )
        print( "raw_response:\n" + json.dumps( raw_response, indent=3 ) )
    if ( "results" in raw_response ) \
       and ( len( raw_response["results"] ) > 0 ) \
       and ( "generated_text" in raw_response["results"][0] ):
        return raw_response, raw_response["results"][0]["generated_text"]
    else:
        return raw_response, ""

In [None]:
raw_response, generated_output = generate( "google/flan-t5-xxl", {}, "I took my dog for ", b_debug=True )

print( "\ngenerated_output:\n" + generated_output )

<a id="step3"></a>
## Step 3: Create a simple function for prompting multiple models

### Use the Python library to list supported models

In [36]:
from ibm_watsonx_ai import APIClient
from ibm_watsonx_ai.foundation_models import ModelInference
import textwrap

def listAllModelIDs( client ):
    model_enums = client.foundation_models.TextModels
    all_model_ids = [ e.value for e in model_enums ]
    return all_model_ids
    
def modelDetails( client, model_id ):
    model = ModelInference( model_id=model_id, project_id=g_project_id, api_client=client)
    model_details = model.get_details()
    return model_details
    
def isDeprecatedOrWithdrawn( model_details ):
    for lifecycle_event in model_details["lifecycle"]:
        if re.match( r"deprecated|withdrawn", lifecycle_event["id"] ):
            return True
    return False

def detailsSubset( model_details ):
    return { "Provider / Source" : model_details["provider"] + "\n\n" + model_details["source"],
             "Model ID" : model_details["model_id"],
             "Description" : "\n".join( textwrap.wrap( model_details["short_description"], 30 ) ),
             "Use cases" : "- " + "\n- ".join( sorted( model_details["task_ids"] ) ) }

def modelsDetails( client, model_ids ):
    all_model_details = []
    for model_id in model_ids:
        model_details = modelDetails( client, model_id )
        if isDeprecatedOrWithdrawn( model_details ):
            continue
        model_details_subset = detailsSubset( model_details )
        all_model_details.append( model_details_subset )
    return all_model_details

def modelDetailsDF():
    client = APIClient( g_wml_credentials )
    model_ids = listAllModelIDs( client )
    models_details = modelsDetails( client, model_ids )
    models_details_df = pd.DataFrame( models_details )
    return models_details_df

In [None]:
models_details_df = modelDetailsDF()

In [None]:
from tabulate import tabulate

print( tabulate( models_details_df, headers="keys", tablefmt="grid", showindex=False ) )

### [Optional] Filter models by use case

In [81]:
def listModelsByUseCase( models_details_df, use_cases ):
    if( len( use_cases ) < 1 ):
        return list( models_details_df["Model ID"] )
    regex_str = "|".join( [ uc.strip().lower() for uc in use_cases ] )
    models_subset_df = models_details_df[ models_details_df["Use cases"].str.contains( regex_str, regex= True, na=False) ]
    return list( models_subset_df["Model ID"] )

In [None]:
model_ids = listModelsByUseCase( models_details_df, [ "generation" ] )
model_ids

### Prompt models

In [70]:
import re

def promptModels1( prompt_text, model_ids_arr, prompt_parameters={}, b_debug=False ):
    all_results_arr = []
    for model_id in model_ids_arr:
        raw_response, generated_output = generate( model_id, prompt_parameters, prompt_text, b_debug )
        generated_output = generated_output.strip()
        if( "system" in raw_response ):
            del( raw_response["system"] )
        all_results_arr.append( { "model_id"         : model_id, 
                                  "model_short_id"   : re.sub( r"^.*\/", "", model_id ),
                                  "prompt_text"      : prompt_text,
                                  "raw_response"     : raw_response, 
                                  "generated_output" : generated_output } )
    return all_results_arr

In [71]:
prompt_parameters = {
    "decoding_method" : "greedy",
    "min_new_tokens"  : 0,
    "max_new_tokens"  : 20
}

In [72]:
prompt_text = "I took my dog for a "

In [73]:
results_arr = promptModels1( prompt_text, model_ids, prompt_parameters )

In [None]:
print( json.dumps( results_arr, indent=3 ) )

### Display results in a DataFrame

In [65]:
import pandas as pd

def styleDataFrame( styler ):
    styler.set_properties( **{ "text-align" : "left", "vertical-align" : "top", "padding" : "10px 20px 10px 20px", "font-size" : "120%" } )
    styler.set_table_styles( [ dict( selector="th", props="text-align: center" ) ] )
    return styler

def styleModelIDCol( styler ):
    f_model_id = lambda v: "width: 190px; min-width: 190px; max-width: 190px;"
    styler.applymap( f_model_id, subset=[ "model_short_id" ] )
    return styler

def resultsDF1( results_arr ):
    df_org = pd.DataFrame( results_arr )
    result_df = df_org[ [ "model_short_id", "generated_output" ] ]
    result_df = result_df.sort_values( [ "model_short_id" ] ).reset_index( drop=True )
    result_df = result_df.replace( { "\\n" : "<br/>" }, regex=True )
    styler = result_df.style.pipe( styleDataFrame )
    styler = styler.pipe( styleModelIDCol )
    return styler

In [66]:
df = resultsDF1( results_arr )

print( "Prompt text:\n'" + prompt_text + "'\n" )
print( "Results:" )
df

Prompt text:
'I took my dog for a '

Results:


Unnamed: 0,model_short_id,generated_output
0,flan-ul2,walk
1,granite-13b-instruct-v2,walkI took my dog for a walk.
2,granite-20b-code-instruct,"ride on his new remote control car. On the first day, I took my dog 20"
3,granite-3-2-8b-instruct,10-mile hike today. He's a 10-year-old German
4,granite-3-2b-instruct,ride in the car. I noticed that the car's interior was quite dusty. I decided to
5,granite-3-8b-instruct,10-mile hike today. He's a 10-year-old German
6,granite-34b-code-instruct,"ride on the Ferris wheel. On the last ride, when my dog was at the"
7,granite-3b-code-instruct,"ride on the way to work today. I was a little worried about him, but he was"
8,granite-8b-code-instruct,ride on the roller coaster at the park. He was really scared and started to
9,llama-3-2-11b-vision-instruct,"3 mile walk today. The weather was perfect, sunny and cool. We walked through a beautiful park"


<a id="step4"></a>
## Step 4: Create a function for sending multiple prompts to multiple models

In [75]:
def promptModels2( prompts_arr, model_ids_arr, prompt_parameters={}, b_debug=False ):
    all_results_arr = []
    for i in range( len( prompts_arr ) ):
        prompt_text = prompts_arr[i]
        for model_id in model_ids_arr:
            raw_response, generated_output = generate( model_id, prompt_parameters, prompt_text, b_debug )
            generated_output = generated_output.strip()
            if( "system" in raw_response ):
                del( raw_response["system"] )
            all_results_arr.append( { "prompt_num"       : i,
                                      "model_id"         : model_id, 
                                      "prompt_text"      : prompt_text,
                                      "model_short_id"   : re.sub( r"^.*\/", "", model_id ),
                                      "raw_response"     : raw_response, 
                                      "generated_output" : generated_output } )
    return all_results_arr

In [76]:
prompts_arr = [
    "I took my dog for a ",
    "I took my cat for a "
]

In [83]:
model_ids = [
    "google/flan-t5-xxl",
    "ibm/granite-13b-instruct-v2",
    "meta-llama/llama-3-2-1b-instruct"
]

In [84]:
results_arr = promptModels2( prompts_arr, model_ids, prompt_parameters )

### Make a new display function that includes prompt text

In [85]:
g_prompt_col_width = "200px"

def stylePromptCol( styler ):
    f_prompt = lambda v: "width: "     + g_prompt_col_width + "; " + \
                         "min-width: " + g_prompt_col_width + "; " + \
                         "max-width: " + g_prompt_col_width + ";"
    styler.applymap( f_prompt, subset=[ "prompt_text" ] )
    return styler

def resultsDF2( results_arr ):
    df_org = pd.DataFrame( results_arr )
    result_df = df_org[ [ "prompt_num", "model_short_id", "prompt_text", "generated_output" ] ]
    result_df = result_df.sort_values( [ "prompt_num", "model_short_id" ] ).reset_index( drop=True )
    result_df = result_df.drop( "prompt_num", axis=1 )
    result_df = result_df.replace( { "\\n" : "<br/>" }, regex=True )
    styler = result_df.style.pipe( styleDataFrame )
    styler = styler.pipe( styleModelIDCol )
    styler = styler.pipe( stylePromptCol )
    return styler

In [86]:
resultsDF2( results_arr )

Unnamed: 0,model_short_id,prompt_text,generated_output
0,flan-t5-xxl,I took my dog for a,walk.
1,granite-13b-instruct-v2,I took my dog for a,walkI took my dog for a walk.
2,llama-3-2-1b-instruct,I took my dog for a,"30-minute walk around the block. I was looking forward to a relaxing afternoon, but as I was"
3,flan-t5-xxl,I took my cat for a,walk.
4,granite-13b-instruct-v2,I took my cat for a,cat scanThe cat scan was negative.
5,llama-3-2-1b-instruct,I took my cat for a,30-minute walk around the block. I was excited to see how much she would enjoy the fresh air


<a id="step5"></a>
## Step 5: Create a function with model-specific parameter overrides

In [89]:
def promptModels3( prompts_arr, models_json, prompt_parameters={}, b_debug=False ):
    all_results_arr = []
    for i in range( len( prompts_arr ) ):
        prompt_text = prompts_arr[i]
        for model_id in models_json.keys():
            model = models_json[ model_id ]
            if( "parameter_overrides" in model ):
                for parameter_name in model["parameter_overrides"].keys():
                    prompt_parameters[ parameter_name ] = model["parameter_overrides"][ parameter_name ]
            raw_response, generated_output = generate( model_id, prompt_parameters, prompt_text, b_debug )
            generated_output = generated_output.strip()
            if( "system" in raw_response ):
                del( raw_response["system"] )
            all_results_arr.append( { "prompt_num"       : i,
                                      "model_id"         : model_id, 
                                      "model_short_id"   : re.sub( r"^.*\/", "", model_id ),
                                      "prompt_text"      : prompt_text,
                                      "raw_response"     : raw_response, 
                                      "generated_output" : generated_output } )
    return all_results_arr

In [87]:
models_json = {
    "google/flan-t5-xxl" : {},
    "ibm/granite-13b-instruct-v2"      : { "parameter_overrides" : { "max_new_tokens" : 60 } },
    "meta-llama/llama-3-2-1b-instruct" : { "parameter_overrides" : { "max_new_tokens" : 80, "stop_sequences" : [ "\n" ] } }
}

In [90]:
results_arr = promptModels3( prompts_arr, models_json, prompt_parameters )

In [91]:
resultsDF2( results_arr )

Unnamed: 0,model_short_id,prompt_text,generated_output
0,flan-t5-xxl,I took my dog for a,walk.
1,granite-13b-instruct-v2,I took my dog for a,walkI took my dog for a walk.
2,llama-3-2-1b-instruct,I took my dog for a,"30-minute walk around the block. I was looking forward to a relaxing afternoon, but as I was walking, I noticed that my dog was acting strangely. He was panting heavily and seemed to be getting more and more agitated by the minute."
3,flan-t5-xxl,I took my cat for a,walk.
4,granite-13b-instruct-v2,I took my cat for a,cat scanThe cat scan was negative.
5,llama-3-2-1b-instruct,I took my cat for a,"30-minute walk around the block. I was excited to see how much she would enjoy the fresh air and exercise. As we strolled, I noticed that she was sniffing around a few bushes and then suddenly darted off towards a nearby alleyway."


<a id="step6"></a>
## Step 6: Create a function with model-specific prompt overrides

In [92]:
def promptModels4( prompts_arr, models_json, prompt_parameters={}, b_debug=False ):
    all_results_arr = []
    for i in range( len( prompts_arr ) ):
        prompt_text_org = prompts_arr[i]
        for model_id in models_json.keys():
            model = models_json[ model_id ]
            if( "parameter_overrides" in model ):
                for parameter_name in model["parameter_overrides"].keys():
                    prompt_parameters[ parameter_name ] = model["parameter_overrides"][ parameter_name ]
            prompt_text = model["prompt_template"] % ( prompt_text_org ) if ( "prompt_template" in model ) else prompt_text_org
            raw_response, generated_output = generate( model_id, prompt_parameters, prompt_text, b_debug )
            generated_output = generated_output.strip()
            if( "system" in raw_response ):
                del( raw_response["system"] )
            all_results_arr.append( { "prompt_num"       : i,
                                      "model_id"         : model_id, 
                                      "model_short_id"   : re.sub( r"^.*\/", "", model_id ),
                                      "prompt_text"      : prompt_text,
                                      "raw_response"     : raw_response, 
                                      "generated_output" : generated_output } )
    return all_results_arr

In [107]:
flan_template = """I took my bird for a flight around the yard, it said *tweet*.
I took my horse for a ride along the trail, it said *snort*.
I took my fish for a swim in the lake, it said *bubbles*.
I took my mouse for a cycle around the block, it said *squeak*.
I took my cow for a drive to the mountains, it said *moo*.
I took my donkey for a trek up the hill, it said *heehaw*.
%s"""

granite_template = """<|start_of_role|>system<|end_of_role|>
Continue the given text.<|end_of_text|>
<|start_of_role|>user<|end_of_role|>%s <|end_of_text|>
<|start_of_role|>assistant<|end_of_role|>"""

models_json = {
    "google/flan-t5-xxl" : { "prompt_template" : flan_template },
    "ibm/granite-13b-instruct-v2" : { "parameter_overrides" : { "max_new_tokens" : 60 }, "prompt_template" : granite_template },
    "meta-llama/llama-3-2-1b-instruct" : { "parameter_overrides" : { "max_new_tokens" : 80, "stop_sequences" : [ "\n" ] } }
}

In [108]:
results_arr = promptModels4( prompts_arr, models_json, prompt_parameters )

In [None]:
g_prompt_col_width = "440px"

resultsDF2( results_arr )

### Create a new display function to view generated output only

In [110]:
def styleOutputCol( styler, col_names_arr ):
    f_prompt = lambda v: "width: 350px; " + \
                         "min-width: 350px; " + \
                         "max-width: 350px;"
    styler.applymap( f_prompt, subset=col_names_arr )
    return styler


def resultsDF3( results_arr ):
    
    df_org = pd.DataFrame( results_arr )
    result_df = df_org[ [ "prompt_num", "model_short_id", "generated_output" ] ]
    result_df = result_df.sort_values( [ "prompt_num", "model_short_id" ] )
    result_df = result_df.reset_index( drop=True )
    result_df = result_df.replace( { "\\n" : "<br/>" }, regex=True )
    
    new_df = pd.DataFrame( columns=[ "model_short_id" ] )
    prompt_nums_arr = sorted( result_df[ "prompt_num" ].unique() )
    col_names_arr = []
    for prompt_num in prompt_nums_arr:
        col_name = "output " + str( prompt_num )
        col_names_arr.append( col_name)
        df_tmp = result_df[ result_df["prompt_num"] == prompt_num ]
        df_tmp = df_tmp.drop( "prompt_num", axis=1 )
        df_tmp = df_tmp.rename( columns={ "generated_output": col_name } )
        new_df = new_df.merge( df_tmp, how="right", on="model_short_id" )
    
    styler = new_df.style.pipe( styleDataFrame )
    styler = styler.pipe( styleModelIDCol )
    styler = styler.pipe( styleOutputCol, col_names_arr )
        
    return styler

In [111]:
resultsDF3( results_arr )

Unnamed: 0,model_short_id,output 0,output 1
0,flan-t5-xxl,"walk around the block, it said *woof*.","walk around the block, it said *meow*."
1,granite-13b-instruct-v2,to the vet.,to the vet.
2,llama-3-2-1b-instruct,"30-minute walk around the block. I was looking forward to a relaxing afternoon, but as I was walking, I noticed that my dog was acting strangely. He was panting heavily and seemed to be getting more and more agitated by the minute.","30-minute walk around the block. I was excited to see how much she would enjoy the fresh air and exercise. As we strolled, I noticed that she was sniffing around a few bushes and then suddenly darted off towards a nearby alleyway."
