# Send test prompts to multiple models for evaluation

### Scenario
You are choosing a large language model for your project.  You have narrowed down the list of potential models based on reading about the available models.  Now, you want to send test prompts to multiple 

Notebook sections:
- [Step 1: Set up IBM watsonx.ai foundation model Python library prerequisites](#step1)
- [Step 2: Create a function for prompting a model](#step2)
- [Step 3: Create a simple function for prompting multiple models](#step3)
- [Step 4: Create a function for sending multiple prompts to multiple models](#step4)
- [Step 5: Create a function with model-specific prompt parameter overrides](#step5)
- [Step 6: Create a function with model-specific prompt text overrides](#step6)

By the end of this notebook, you'll have test results to help you decide which model to use for your project at this time:

<img src="https://raw.githubusercontent.com/spackows/watsonx.ai-samples/main/sample-02_prompt-multiple-models/images/sample-02_prompt-multiple-models.png" width="70%" title="Image of DataFrame" />

<a id="step1"></a>
## Step 1: Set up IBM watsonx.ai foundation model Python library prerequisites
Before you can prompt a foundation model in watsonx.ai, you must perform the following setup tasks:
- 1.1 Create an instance of the Watson Machine Learning service
- 1.2 Associate the Watson Machine Learning instance with the current project
- 1.3 Create an IBM Cloud API key
- 1.4 Create a credentials dictionary for Watson Machine learning
- 1.5 Look up the current project ID

### 1.1 Create an instance of the Watson Machine Learning service
If you don't already have an instance of the IBM Watson Machine Learning service, you can create an instance of the service from the IBM Cloud catalog: <a href="https://console.ng.bluemix.net/catalog/services/ibm-watson-machine-learning/" target="_blank">Watson Machine Learning service</a>

### 1.2 Associate an instance of the Watson Machine Learning service with the current project
The _current project_ is the project in which you are running this notebook.

If an instance of Watson Machine Learning is not already associated with the current project, follow the instructions in this topic to do so: <a href="https://dataplatform.cloud.ibm.com/docs/content/wsj/getting-started/assoc-services.html?context=wx&audience=wdp" target="_blank">Adding associated services to a project</a>

### 1.3 Create an IBM Cloud API key
Create an IBM Cloud API key by following these instruction: <a href="https://cloud.ibm.com/docs/account?topic=account-userapikey&interface=ui#create_user_key" target="_blank">Creating an IBM Cloud API key</a>

Then paste your new IBM Cloud API key in the code cell below.

In [None]:
g_cloud_apikey = ""

### 1.4 Create a credentials dictionary for Watson Machine learning
See: [Authentication](https://ibm.github.io/watson-machine-learning-sdk/setup_cloud.html#authentication)

In [None]:
g_wml_credentials = { 
    "url"    : "https://us-south.ml.cloud.ibm.com", 
    "apikey" : g_cloud_apikey
}

### 1.5 Look up the current project ID
The _current project_ is the project in which you are running this notebook.  You can get the ID of the current project programmatically by running the following cell.

In [None]:
import os

g_project_id = os.environ["PROJECT_ID"]

<a id="step2"></a>
## Step 2: Create a function for prompting a model

See: [Foundation models Python library](https://ibm.github.io/watson-machine-learning-sdk/foundation_models.html)

In [None]:
from ibm_watson_machine_learning.foundation_models import Model
import json

def generate( model_id, prompt_parameters, prompt_text, b_debug=False ):
    model = Model( model_id, g_wml_credentials, prompt_parameters, g_project_id )
    raw_response = model.generate( prompt_text )
    if b_debug:
        print( "prompt_text:\n'" + prompt_text + "'\n" )
        print( "raw_response:\n" + json.dumps( raw_response, indent=3 ) )
    if ( "results" in raw_response ) \
       and ( len( raw_response["results"] ) > 0 ) \
       and ( "generated_text" in raw_response["results"][0] ):
        return raw_response, raw_response["results"][0]["generated_text"]
    else:
        return raw_response, ""

In [None]:
raw_response, generated_output = generate( "google/flan-t5-xxl", {}, "I took my dog for ", b_debug=True )

print( "\ngenerated_output:\n" + generated_output )

<a id="step3"></a>
## Step 3: Create a simple function for prompting multiple models

In [None]:
import re

def promptModels1( prompt_text, model_ids_arr, prompt_parameters={}, b_debug=False ):
    all_results_arr = []
    for model_id in model_ids_arr:
        raw_response, generated_output = generate( model_id, prompt_parameters, prompt_text, b_debug )
        generated_output = generated_output.strip()
        if( "system" in raw_response ):
            del( raw_response["system"] )
        all_results_arr.append( { "model_id"         : model_id, 
                                  "model_short_id"   : re.sub( r"^.*\/", "", model_id ),
                                  "prompt_text"      : prompt_text,
                                  "raw_response"     : raw_response, 
                                  "generated_output" : generated_output } )
    return all_results_arr

In [None]:
prompt_text = "I took my dog for a "

In [None]:
prompt_parameters = {
    "decoding_method" : "greedy",
    "min_new_tokens"  : 0,
    "max_new_tokens"  : 20
}

### Use the Python library to list supported model IDs

In [None]:
from ibm_watson_machine_learning.foundation_models.utils.enums import ModelTypes

model_ids = list( map( lambda e: e.value, ModelTypes._member_map_.values() ) )
model_ids

In [None]:
results_arr = promptModels1( prompt_text, model_ids, prompt_parameters )

In [None]:
print( json.dumps( results_arr, indent=3 ) )

### Create a function for displaying results in a DataFrame

In [None]:
import pandas as pd

def styleDataFrame( styler ):
    styler.set_properties( **{ "text-align" : "left", "vertical-align" : "top", "padding" : "10px 20px 10px 20px", "font-size" : "120%" } )
    styler.set_table_styles( [ dict( selector="th", props="text-align: center" ) ] )
    return styler

def styleModelIDCol( styler ):
    f_model_id = lambda v: "width: 190px; min-width: 190px; max-width: 190px;"
    styler.applymap( f_model_id, subset=[ "model_short_id" ] )
    return styler

def resultsDF1( results_arr ):
    df_org = pd.DataFrame( results_arr )
    result_df = df_org[ [ "model_short_id", "generated_output" ] ]
    result_df = result_df.sort_values( [ "model_short_id" ] ).reset_index( drop=True )
    result_df = result_df.replace( { "\\n" : "<br/>" }, regex=True )
    styler = result_df.style.pipe( styleDataFrame )
    styler = styler.pipe( styleModelIDCol )
    return styler

In [None]:
df = resultsDF1( results_arr )

print( "Prompt text:\n'" + prompt_text + "'\n" )
print( "Results:" )
df

<a id="step4"></a>
## Step 4: Create a function for sending multiple prompts to multiple models

In [None]:
def promptModels2( prompts_arr, model_ids_arr, prompt_parameters={}, b_debug=False ):
    all_results_arr = []
    for i in range( len( prompts_arr ) ):
        prompt_text = prompts_arr[i]
        for model_id in model_ids_arr:
            raw_response, generated_output = generate( model_id, prompt_parameters, prompt_text, b_debug )
            generated_output = generated_output.strip()
            if( "system" in raw_response ):
                del( raw_response["system"] )
            all_results_arr.append( { "prompt_num"       : i,
                                      "model_id"         : model_id, 
                                      "prompt_text"      : prompt_text,
                                      "model_short_id"   : re.sub( r"^.*\/", "", model_id ),
                                      "raw_response"     : raw_response, 
                                      "generated_output" : generated_output } )
    return all_results_arr

In [None]:
prompts_arr = [
    "I took my dog for a ",
    "I took my cat for a "
]

In [None]:
model_ids = [
    "bigscience/mt0-xxl",
    "google/flan-t5-xxl",
    "eleutherai/gpt-neox-20b",
    "ibm/granite-13b-chat-v1",
    "meta-llama/llama-2-13b-chat"
]

In [None]:
results_arr = promptModels2( prompts_arr, model_ids, prompt_parameters )

### Make a new display function that includes prompt text

In [None]:
g_prompt_col_width = "200px"

def stylePromptCol( styler ):
    f_prompt = lambda v: "width: "     + g_prompt_col_width + "; " + \
                         "min-width: " + g_prompt_col_width + "; " + \
                         "max-width: " + g_prompt_col_width + ";"
    styler.applymap( f_prompt, subset=[ "prompt_text" ] )
    return styler

def resultsDF2( results_arr ):
    df_org = pd.DataFrame( results_arr )
    result_df = df_org[ [ "prompt_num", "model_short_id", "prompt_text", "generated_output" ] ]
    result_df = result_df.sort_values( [ "prompt_num", "model_short_id" ] ).reset_index( drop=True )
    result_df = result_df.drop( "prompt_num", axis=1 )
    result_df = result_df.replace( { "\\n" : "<br/>" }, regex=True )
    styler = result_df.style.pipe( styleDataFrame )
    styler = styler.pipe( styleModelIDCol )
    styler = styler.pipe( stylePromptCol )
    return styler

In [None]:
resultsDF2( results_arr )

<a id="step5"></a>
## Step 5: Create a function with model-specific parameter overrides

In [None]:
def promptModels3( prompts_arr, models_json, prompt_parameters={}, b_debug=False ):
    all_results_arr = []
    for i in range( len( prompts_arr ) ):
        prompt_text = prompts_arr[i]
        for model_id in models_json.keys():
            model = models_json[ model_id ]
            if( "parameter_overrides" in model ):
                for parameter_name in model["parameter_overrides"].keys():
                    prompt_parameters[ parameter_name ] = model["parameter_overrides"][ parameter_name ]
            raw_response, generated_output = generate( model_id, prompt_parameters, prompt_text, b_debug )
            generated_output = generated_output.strip()
            if( "system" in raw_response ):
                del( raw_response["system"] )
            all_results_arr.append( { "prompt_num"       : i,
                                      "model_id"         : model_id, 
                                      "model_short_id"   : re.sub( r"^.*\/", "", model_id ),
                                      "prompt_text"      : prompt_text,
                                      "raw_response"     : raw_response, 
                                      "generated_output" : generated_output } )
    return all_results_arr

In [None]:
models_json = {
    "bigscience/mt0-xxl" : {},
    "google/flan-t5-xxl" : {},
    "eleutherai/gpt-neox-20b"     : { "parameter_overrides" : { "stop_sequences" : [ "\n" ] } },
    "ibm/granite-13b-chat-v1"     : { "parameter_overrides" : { "max_new_tokens" : 60 } },
    "meta-llama/llama-2-13b-chat" : { "parameter_overrides" : { "max_new_tokens" : 80, "stop_sequences" : [ "\n" ] } }
}

In [None]:
results_arr = promptModels3( prompts_arr, models_json, prompt_parameters )

In [None]:
resultsDF2( results_arr )

<a id="step6"></a>
## Step 6: Create a function with model-specific prompt overrides

In [None]:
def promptModels4( prompts_arr, models_json, prompt_parameters={}, b_debug=False ):
    all_results_arr = []
    for i in range( len( prompts_arr ) ):
        prompt_text_org = prompts_arr[i]
        for model_id in models_json.keys():
            model = models_json[ model_id ]
            if( "parameter_overrides" in model ):
                for parameter_name in model["parameter_overrides"].keys():
                    prompt_parameters[ parameter_name ] = model["parameter_overrides"][ parameter_name ]
            prompt_text = model["prompt_template"] % ( prompt_text_org ) if ( "prompt_template" in model ) else prompt_text_org
            raw_response, generated_output = generate( model_id, prompt_parameters, prompt_text, b_debug )
            generated_output = generated_output.strip()
            if( "system" in raw_response ):
                del( raw_response["system"] )
            all_results_arr.append( { "prompt_num"       : i,
                                      "model_id"         : model_id, 
                                      "model_short_id"   : re.sub( r"^.*\/", "", model_id ),
                                      "prompt_text"      : prompt_text,
                                      "raw_response"     : raw_response, 
                                      "generated_output" : generated_output } )
    return all_results_arr

In [None]:
gptneox_template = """I took my bird for a flight around the yard, it said *tweet*.
I took my horse for a ride along the trail, it said *snort*.
I took my fish for a swim in the lake, it said *bubbles*.
I took my mouse for a cycle around the block, it said *squeak*.
I took my cow for a drive to the mountains, it said *moo*.
I took my donkey for a trek up the hill, it said *heehaw*.
%s"""


granite_template = """As a helpful assistant, complete the sentence started by the human.

Human: %s
Assistant: 
"""


models_json = {
    "bigscience/mt0-xxl" : {},
    "google/flan-t5-xxl" : {},
    "eleutherai/gpt-neox-20b"     : { "parameter_overrides" : { "stop_sequences" : [ "\n" ] },
                                      "prompt_template"     : gptneox_template },
    "ibm/granite-13b-chat-v1"     : { "parameter_overrides" : { "max_new_tokens" : 60 },
                                      "prompt_template"     : granite_template },
    "meta-llama/llama-2-13b-chat" : { "parameter_overrides" : { "max_new_tokens" : 80, "stop_sequences" : [ "\n" ] } }
}

In [None]:
results_arr = promptModels4( prompts_arr, models_json, prompt_parameters )

In [None]:
g_prompt_col_width = "440px"

resultsDF2( results_arr )

### Create a new display function to view generated output only

In [None]:
def styleOutputCol( styler, col_names_arr ):
    f_prompt = lambda v: "width: 350px; " + \
                         "min-width: 350px; " + \
                         "max-width: 350px;"
    styler.applymap( f_prompt, subset=col_names_arr )
    return styler


def resultsDF3( results_arr ):
    
    df_org = pd.DataFrame( results_arr )
    result_df = df_org[ [ "prompt_num", "model_short_id", "generated_output" ] ]
    result_df = result_df.sort_values( [ "prompt_num", "model_short_id" ] )
    result_df = result_df.reset_index( drop=True )
    result_df = result_df.replace( { "\\n" : "<br/>" }, regex=True )
    
    new_df = pd.DataFrame( columns=[ "model_short_id" ] )
    prompt_nums_arr = sorted( result_df[ "prompt_num" ].unique() )
    col_names_arr = []
    for prompt_num in prompt_nums_arr:
        col_name = "output " + str( prompt_num )
        col_names_arr.append( col_name)
        df_tmp = result_df[ result_df["prompt_num"] == prompt_num ]
        df_tmp = df_tmp.drop( "prompt_num", axis=1 )
        df_tmp = df_tmp.rename( columns={ "generated_output": col_name } )
        new_df = new_df.merge( df_tmp, how="right", on="model_short_id" )
    
    styler = new_df.style.pipe( styleDataFrame )
    styler = styler.pipe( styleModelIDCol )
    styler = styler.pipe( styleOutputCol, col_names_arr )
        
    return styler

In [None]:
resultsDF3( results_arr )