# How to perform inference using Generative AI Foundation Models deployed as prompt templates in IBM watsonx, AI and Data platform 

#### Author: Shuvanker Ghosh, IBM Technology Expert Labs, sghosh@us.ibm.com

#### This notebook demonstrate how to automate the inference using Generative AI Foundation Model deployed as prompt template in IBM watsonx in order to infuse the usage of the model in business applications and workflows

## Setup packages and the runtime

### Setup packages

In [1]:
# Install ibm-watson-machine-learning python package
!pip install -U ibm-watson-machine-learning

Collecting ibm-watson-machine-learning
  Downloading ibm_watson_machine_learning-1.0.339-py3-none-any.whl.metadata (8.6 kB)
Downloading ibm_watson_machine_learning-1.0.339-py3-none-any.whl (1.8 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.8/1.8 MB[0m [31m68.4 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: ibm-watson-machine-learning
  Attempting uninstall: ibm-watson-machine-learning
    Found existing installation: ibm-watson-machine-learning 1.0.338
    Uninstalling ibm-watson-machine-learning-1.0.338:
      Successfully uninstalled ibm-watson-machine-learning-1.0.338
Successfully installed ibm-watson-machine-learning-1.0.339


In [2]:
# Import necessary packages
import requests, json, logging
from ibm_watson_machine_learning import APIClient

In [3]:
# Setup logging
import sys
logging.basicConfig(format='%(asctime)s %(levelname)s:%(message)s', level=logging.DEBUG, datefmt='%I:%M:%S')
logger = logging.getLogger()

### Getting IBM Cloud API Key and Location of Waston Machine Learning in IBM Cloud

Authenticate the Watson Machine Learning service on IBM Cloud. You need to provide platform `apikey` and instance `location`.

You can use [IBM Cloud CLI](https://cloud.ibm.com/docs/cli/index.html) to retrieve platform API Key and instance location.

API Key can be generated in the following way:
```
ibmcloud login
ibmcloud iam api-key-create API_KEY_NAME
```

In result, get the value of `api_key` from the output.


Location of your WML instance can be retrieved in the following way:
```
ibmcloud login --apikey API_KEY -a https://cloud.ibm.com
ibmcloud resource service-instance WML_INSTANCE_NAME
```

In result, get the value of `location` from the output.

**Tip**: Your `Cloud API key` can be generated by going to the [**Users** section of the Cloud console](https://cloud.ibm.com/iam#/users). From that page, click your name, scroll down to the **API Keys** section, and click **Create an IBM Cloud API key**. Give your key a name and click **Create**, then copy the created key and paste it below. You can also get a service specific url by going to the [**Endpoint URLs** section of the Watson Machine Learning docs](https://cloud.ibm.com/apidocs/machine-learning).  You can check your instance location in your  <a href="https://console.ng.bluemix.net/catalog/services/ibm-watson-machine-learning/" target="_blank" rel="noopener no referrer">Watson Machine Learning (WML) Service</a> instance details.

You can also get service specific apikey by going to the [**Service IDs** section of the Cloud Console](https://cloud.ibm.com/iam/serviceids).  From that page, click **Create**, then copy the created key and paste it below.

**Action**: Enter your `api_key` and `location` in the following cell.


In [4]:
# Setup API KEY and runtime url
apikey = 'PASTE YOUR IBM CLOUD API KEY'
location = 'us-south'

### Provide inference inputs

In [5]:
# Inputs for inference

space_name = 'Gen AI Model Deployments for Production'  # PASTE YOUR SPACE NAME
deployment_name = 'Car rental review satisfaction analysis production deployment'  # PASTE YOUR DEPLOYMENT NAME
positive_prompt_input = 'Agent was very friendly and was ready to work with us in upgrading the car rental'  # PASTE YOUR POSITIVE PROMPT INPUT
negative_prompt_input = 'Agent was very nasty and rude'  # PASTE YOUR NEGATIVE PROMPT INPUT

# Choose once negative and once positive prompt to test the notebook
prompt_input = negative_prompt_input

### Connect to Watson Machine Learning

In [6]:
# Connect to watson machine learning service 
wml_credentials = {
    "apikey": apikey,
    "url": 'https://' + location + '.ml.cloud.ibm.com'
}
wml_client = APIClient(wml_credentials)

04:05:04 INFO:Client successfully initialized


## Helper method

In [7]:
# Helper method to find prompt template details based on the deployment name and space name
def find_prompt_template_details(space_name:str, deployment_name:str):
    spaces = wml_client.spaces.get_details()
    space_id = None
    for space in spaces['resources']:
        if space['entity']['name'] == space_name:
            logger.debug(f"Found space {space}")
            space_id = wml_client.spaces.get_uid(space)
    
    deployment_id = None
    prompt_template_asset_id = None
    inference_url = None
    if space_id is not None:
        wml_client.set.default_space(space_id)
        deployments = wml_client.deployments.get_details()
        for deployment in deployments['resources']:
            if deployment['entity']['name'] == deployment_name:
                logger.debug(f"Found deployment {deployment}")
                deployment_id = wml_client.deployments.get_uid(deployment)
                prompt_template_asset_id = deployment['entity']['prompt_template']['id']
                inference_url = deployment['entity']['status']['inference'][0]['url']

    if prompt_template_asset_id is not None:
        prompt_template_details = wml_client.data_assets.get_details(asset_uid=prompt_template_asset_id)
        prompt_variable_names = prompt_template_details['entity']['wx_prompt']['prompt_variables'].keys()

    return space_id, deployment_id, inference_url, prompt_variable_names

## Find prompt template deployment details

### Find prompt template deployment details from deployment space

In [8]:
# Find prompt template details
space_id, deployment_id, inference_url, prompt_variable_names = find_prompt_template_details(space_name, deployment_name)
logger.info (f"Prompt template space id : {space_id} deployment id : {deployment_id}, inference_url id : {inference_url}, prompt_variable_names : {prompt_variable_names} ")

04:05:04 INFO:Successfully finished spaces for url: 'https://api.dataplatform.cloud.ibm.com/v2/spaces?version=2023-12-05&limit=200'
04:05:04 DEBUG:Response(GET https://api.dataplatform.cloud.ibm.com/v2/spaces?version=2023-12-05&limit=200): {
  "first": {
    "href": "https://api.dataplatform.cloud.ibm.com/v2/spaces?version=2023-12-05&limit=200"
  },
  "limit": 200,
  "resources": [{
    "entity": {
      "compute": [{
        "crn": "crn:v1:bluemix:public:pm-20:us-south:a/b11ef9462fd5cd198951947913b3ccff:e0ec79ab-c7e2-4a9a-8867-631a5acd6e4c::",
        "guid": "e0ec79ab-c7e2-4a9a-8867-631a5acd6e4c",
        "name": "Machine Learning-Synergy",
        "type": "machine_learning"
      }],
      "description": "Development deployment space",
      "name": "AI-Gov-Demo-Development",
      "scope": {
        "bss_account_id": "b11ef9462fd5cd198951947913b3ccff"
      },
      "stage": {
        "production": false
      },
      "status": {
        "state": "active"
      },
      "storage":

04:05:04 DEBUG:Found space {'entity': {'compute': [{'crn': 'crn:v1:bluemix:public:pm-20:us-south:a/3bde3b8a1d624d6fb4e5bd1283cbfd14:09445407-e2ec-431b-a2c7-974f0a040f4c::', 'guid': '09445407-e2ec-431b-a2c7-974f0a040f4c', 'name': 'Watson Machine Learning-tel-se-watsonx', 'type': 'machine_learning'}], 'description': '', 'name': 'Gen AI Model Deployments for Production', 'scope': {'bss_account_id': '3bde3b8a1d624d6fb4e5bd1283cbfd14'}, 'stage': {'name': 'Production', 'production': True}, 'status': {'state': 'active'}, 'storage': {'properties': {'bucket_name': 'a99653a0-4f41-4a17-b172-efea9cf7eaed', 'bucket_region': 'us-south', 'credentials': {'admin': {'access_key_id': 'c88b717eb619401bab43d3497ccf8fcc', 'api_key': 'YIofM0L3a84zHUQF0GaNdnzDj2lQvEEw42SOZoZo6unY', 'secret_access_key': 'c7984f63a18933a85091142049d6c014ad7ab4b45a153ec2', 'service_id': 'ServiceId-35a05dcb-dfff-4864-9b9d-bcacc24b35c0'}, 'editor': {'access_key_id': '5cabce9843c14ae79fc9f429a7ee59b4', 'api_key': 'uH56dG90XQBYz5B09

04:05:05 INFO:Prompt template space id : 566d4549-253a-489c-8173-d3a9c92bb027 deployment id : 22905dc9-3d72-427a-91d2-8e3ddbae8b3b, inference_url id : https://us-south.ml.cloud.ibm.com/ml/v1-beta/deployments/22905dc9-3d72-427a-91d2-8e3ddbae8b3b/generation/text, prompt_variable_names : dict_keys(['Review']) 


## Prepare inference payload

In [9]:
# Add prompt variables to inference payload
prompt_variables = {}
for variable_name in prompt_variable_names:
    prompt_variables[variable_name] = prompt_input

prompt_params = {}
prompt_params['prompt_variables'] = prompt_variables 
logger.info(f"Prompt params for inference {prompt_params}")

04:05:05 INFO:Prompt params for inference {'prompt_variables': {'Review': 'Agent was very nasty and rude'}}


## Call inference endpoint 


### Approach 1: Call inference endpoint using WML API client and  prompt template deployment using Python SDK

In [10]:
# Call inference endpoint using WML Deployments 
generated_text_using_deployment = wml_client.deployments.generate_text(deployment_id, params=prompt_params)
logger.info(f"Generated text using deployment: {generated_text_using_deployment}")

04:05:05 INFO:Successfully finished getting deployments details for url: 'https://us-south.ml.cloud.ibm.com/ml/v4/deployments/22905dc9-3d72-427a-91d2-8e3ddbae8b3b?version=2023-12-05&space_id=566d4549-253a-489c-8173-d3a9c92bb027'
04:05:05 DEBUG:Response(GET https://us-south.ml.cloud.ibm.com/ml/v4/deployments/22905dc9-3d72-427a-91d2-8e3ddbae8b3b?version=2023-12-05&space_id=566d4549-253a-489c-8173-d3a9c92bb027): {
  "entity": {
    "base_model_id": "ibm/granite-13b-instruct-v2",
    "custom": {

    },
    "deployed_asset_type": "prompt_template",
    "name": "Car rental review satisfaction analysis production deployment",
    "online": {

    },
    "prompt_template": {
      "id": "ba4ce1ca-f7af-4544-8879-5c2a8e6a88e4"
    },
    "space_id": "566d4549-253a-489c-8173-d3a9c92bb027",
    "status": {
      "inference": [{
        "url": "https://us-south.ml.cloud.ibm.com/ml/v1-beta/deployments/22905dc9-3d72-427a-91d2-8e3ddbae8b3b/generation/text"
      }, {
        "sse": true,
        "url

### Approach 2: Call inference endpoint using Model Inference pointing to prompt template deployment using Python SDK

In [11]:
# Call inference endpoint using ModelInference
from ibm_watson_machine_learning.foundation_models import ModelInference
model_inference = ModelInference(
    deployment_id=deployment_id,
    credentials={
        'apikey': apikey,
        'url': 'https://' + location + '.ml.cloud.ibm.com'
    },
    space_id=space_id
    )
generated_text_using_model_inference = model_inference.generate_text(params = prompt_params)
logger.info(f"Generated text using model inference: {generated_text_using_model_inference}")

04:05:11 INFO:Client successfully initialized
04:05:11 INFO:Successfully finished getting deployments details for url: 'https://us-south.ml.cloud.ibm.com/ml/v4/deployments/22905dc9-3d72-427a-91d2-8e3ddbae8b3b?version=2023-12-05&space_id=566d4549-253a-489c-8173-d3a9c92bb027'
04:05:11 DEBUG:Response(GET https://us-south.ml.cloud.ibm.com/ml/v4/deployments/22905dc9-3d72-427a-91d2-8e3ddbae8b3b?version=2023-12-05&space_id=566d4549-253a-489c-8173-d3a9c92bb027): {
  "entity": {
    "base_model_id": "ibm/granite-13b-instruct-v2",
    "custom": {

    },
    "deployed_asset_type": "prompt_template",
    "name": "Car rental review satisfaction analysis production deployment",
    "online": {

    },
    "prompt_template": {
      "id": "ba4ce1ca-f7af-4544-8879-5c2a8e6a88e4"
    },
    "space_id": "566d4549-253a-489c-8173-d3a9c92bb027",
    "status": {
      "inference": [{
        "url": "https://us-south.ml.cloud.ibm.com/ml/v1-beta/deployments/22905dc9-3d72-427a-91d2-8e3ddbae8b3b/generation/text

### Approach 3: Call inference endpoint using REST API

In [12]:
# Call inference endpoint using REST API
# Get an IAM token from IBM Cloud
url = "https://iam.cloud.ibm.com/identity/token"

headers={
    "Content-Type"  : "application/x-www-form-urlencoded",
    "cache-control" : "no-cache"
}

auth_data = f"grant_type=urn:ibm:params:oauth:grant-type:apikey&apikey={apikey}"

# Get access token
response = requests.post(url, data = auth_data, headers = headers, verify=True)

if response.status_code == 200:
    access_token = response.json()["access_token"]

    # Send payload to model inference
    request_headers = {"Content-Type"   : "application/json",
               "Authorization"  : "Bearer " + access_token}
    request_params = {"version": "2021-05-01"}
    
    inference_data={}
    inference_data['parameters'] = prompt_params
    logger.info(f"Inference payload : {inference_data}")
    
    response = requests.post(inference_url, data=json.dumps(inference_data), params=request_params, headers=request_headers)
    response_json = response.json()
    logger.debug(response_json)
    generated_text_using_rest_api = response_json['results'][0]['generated_text']
    logger.info(f"Generated text using REST API: {generated_text_using_rest_api}")
else:
    logger.error(response.text)
    logger.error(response.status_code)
    logger.error(response.reason)



04:05:15 INFO:Inference payload : {'parameters': {'prompt_variables': {'Review': 'Agent was very nasty and rude'}}}
04:05:15 DEBUG:{'model_id': 'ibm/granite-13b-instruct-v2', 'created_at': '2023-12-17T04:05:15.662Z', 'results': [{'generated_text': 'no\n\n', 'generated_token_count': 3, 'input_token_count': 117, 'stop_reason': 'eos_token'}]}
04:05:15 INFO:Generated text using REST API: no




### Authors

**Shuvanker Ghosh (sghosh@us.ibm.com)** is a member of the Worldwide Data and AI Solution Engineering, IBM Technology Expert Labs.