[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/meta-llama/llama-cookbook/blob/ibm-wxai/3p-integrations/ibm/Get%20Started%20with%20watsonx.ai%20%26%20Llama.ipynb)

# Basic Inference with Llama Models on watsonx.ai
## Introduction

Welcome to this getting started guide for using Llama models on IBM watsonx.ai! This notebook will walk you through the fundamentals of:

- Setting up your environment
- Making your first API calls to Llama models
- Understanding key parameters
- Building practical examples

By the end of this notebook, you'll be comfortable using Llama models for various text generation tasks on watsonx.ai.

## Prerequisites

- IBM Cloud account with watsonx.ai access
- Python 3.8 or higher
- Basic Python knowledge



## 1. Environment Setup
First, let's install the required packages and set up our environment.

In [1]:
# # # Install required packages
!pip install ibm-watsonx-ai
!pip install python-dotenv pandas

In [1]:
# Import necessary libraries
import os
from dotenv import load_dotenv
from ibm_watsonx_ai import APIClient, Credentials
from ibm_watsonx_ai.foundation_models import Model
from ibm_watsonx_ai.metanames import GenTextParamsMetaNames as GenParams
from ibm_watsonx_ai.foundation_models.utils.enums import ModelTypes
import pandas as pd
import json

## 2. Authentication and Configuration

### To set env values in local development

In [None]:
# Add your credentials - Create a .env file in your project directory with your credentials:

env_content = """\
# IBM_CLOUD_API_KEY=""
# PROJECT_ID=""
# IBM_CLOUD_URL=""
"""

# Write the file
with open('.env', 'w') as f:
    f.write(env_content)

print(".env file created!")

In [None]:
# Load environment variables
load_dotenv()

# Set up credentials
credentials = Credentials(
    api_key=os.getenv("IBM_CLOUD_API_KEY"),
    url=os.getenv("IBM_CLOUD_URL", "https://us-south.ml.cloud.ibm.com")
)

# Set project ID
try:
    project_id = os.getenv("PROJECT_ID")
except KeyError:
    project_id = input("Please enter your project_id (hit enter): ")

print("Credentials configured successfully!")

### To set env values in Google Colab


In [None]:
# from google.colab import userdata

# Set your env values in Secrets

# IBM_CLOUD_API_KEY=""
# PROJECT_ID=""
# IBM_CLOUD_URL=""

# # Import necessary libraries
# from google.colab import userdata

# # Retrieve secrets securely from Colab & set up credentials
# credentials = {
#     "apikey": userdata.get('IBM_CLOUD_API_KEY'),
#     "url": userdata.get('IBM_CLOUD_URL') or "https://us-south.ml.cloud.ibm.com"
# }

# project_id = userdata.get('PROJECT_ID')

# client = APIClient(credentials)

# # Set project ID
# if not project_id:
#     project_id = input("Please enter your project_id: ")

# client.set.default_project(project_id)

# print("Credentials configured successfully!")

### Create an instance of APIClient with authentication details

In [None]:
client = APIClient(credentials=credentials)

## 3. Foundation Models on watsonx.ai

### List available models
All avaliable models are presented under ModelTypes class. For more information refer to the [documentation](https://ibm.github.io/watsonx-ai-python-sdk/fm_models.html)


In [7]:
client.foundation_models.ChatModels.show()

{'GRANITE_20B_CODE_INSTRUCT': 'ibm/granite-20b-code-instruct', 'GRANITE_3_2_8B_INSTRUCT': 'ibm/granite-3-2-8b-instruct', 'GRANITE_3_2B_INSTRUCT': 'ibm/granite-3-2b-instruct', 'GRANITE_3_3_8B_INSTRUCT': 'ibm/granite-3-3-8b-instruct', 'GRANITE_3_8B_INSTRUCT': 'ibm/granite-3-8b-instruct', 'GRANITE_34B_CODE_INSTRUCT': 'ibm/granite-34b-code-instruct', 'GRANITE_GUARDIAN_3_2B': 'ibm/granite-guardian-3-2b', 'GRANITE_GUARDIAN_3_8B': 'ibm/granite-guardian-3-8b', 'GRANITE_VISION_3_2_2B': 'ibm/granite-vision-3-2-2b', 'LLAMA_3_2_11B_VISION_INSTRUCT': 'meta-llama/llama-3-2-11b-vision-instruct', 'LLAMA_3_2_1B_INSTRUCT': 'meta-llama/llama-3-2-1b-instruct', 'LLAMA_3_2_3B_INSTRUCT': 'meta-llama/llama-3-2-3b-instruct', 'LLAMA_3_2_90B_VISION_INSTRUCT': 'meta-llama/llama-3-2-90b-vision-instruct', 'LLAMA_3_3_70B_INSTRUCT': 'meta-llama/llama-3-3-70b-instruct', 'LLAMA_3_405B_INSTRUCT': 'meta-llama/llama-3-405b-instruct', 'LLAMA_4_MAVERICK_17B_128E_INSTRUCT_FP8': 'meta-llama/llama-4-maverick-17b-128e-instruct-

### Pick a model

In [8]:
model_id = 'meta-llama/llama-4-maverick-17b-128e-instruct-fp8'

## 4. Defining the model parameters
You might need to adjust model parameters for different models or tasks, to do so please refer to [documentation](https://ibm.github.io/watsonx-ai-python-sdk/fm_schema.html).

In [9]:
from ibm_watsonx_ai.foundation_models.schema import TextChatParameters

TextChatParameters.show()

+-------------------+----------------------------------------+------------------------------+
| PARAMETER         | TYPE                                   | EXAMPLE VALUE                |
| frequency_penalty | float, NoneType                        | 0.5                          |
+-------------------+----------------------------------------+------------------------------+
| logprobs          | bool, NoneType                         | True                         |
+-------------------+----------------------------------------+------------------------------+
| top_logprobs      | int, NoneType                          | 3                            |
+-------------------+----------------------------------------+------------------------------+
| presence_penalty  | float, NoneType                        | 0.3                          |
+-------------------+----------------------------------------+------------------------------+
| response_format   | dict, TextChatResponseFormat, NoneType

In [10]:
params = TextChatParameters(
    temperature=0.5,
    max_tokens=100
)

## 5. Initialize the model
Initialize the ModelInference class with previous set params.

In [None]:
from ibm_watsonx_ai.foundation_models import ModelInference

model = ModelInference(
    model_id=model_id,
    params=params,
    credentials=credentials,
    project_id=project_id)

### Model's details


In [14]:
model.get_details()

{'model_id': 'meta-llama/llama-4-maverick-17b-128e-instruct-fp8',
 'label': 'llama-4-maverick-17b-128e-instruct-fp8',
 'provider': 'Meta',
 'source': 'Hugging Face',
 'functions': [{'id': 'autoai_rag'},
  {'id': 'image_chat'},
  {'id': 'multilingual'},
  {'id': 'text_chat'},
  {'id': 'text_generation'}],
 'short_description': 'Llama 4 Maverick, a 17 billion active parameter model with 128 experts.',
 'long_description': 'The Llama 4 collection of models are natively multimodal AI models that enable text and multimodal experiences. These models leverage a mixture-of-experts architecture to offer industry-leading performance in text and image understanding.',
 'terms_url': 'https://github.com/meta-llama/llama-models/blob/main/models/llama4/LICENSE',
 'input_tier': 'class_9',
 'output_tier': 'class_16',
 'number_params': '400b',
 'min_shot_size': 1,
 'task_ids': ['question_answering',
  'summarization',
  'retrieval_augmented_generation',
  'classification',
  'generation',
  'code',
  'e

## 6.Your First Llama Model Chat

In [15]:
messages = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "Who won the last Fifa World Cup?"}
]

### Chat without Streaming

In [16]:
generated_response = model.chat(messages=messages)

# # Print full response
# print(generated_response)

# Print only content
print(generated_response["choices"][0]["message"]["content"])

The last FIFA World Cup was held in 2022 in Qatar. The winner of the tournament was Argentina, led by Lionel Messi. They defeated France 4-2 in a penalty shootout after the match ended 3-3 after extra time in the final on December 18, 2022.


### Chat with Streaming

In [17]:
generated_response = model.chat(messages=messages)
response = generated_response["choices"][0]["message"]["content"]
for chunk in response:
    print(chunk, end='', flush=True)

The last FIFA World Cup was held in 2022 in Qatar. The winner of that tournament was Argentina, led by Lionel Messi. They defeated France 4-2 in a penalty shootout after the match ended 3-3 after extra time in the final on December 18, 2022.

## 8. Examples

### Email Assistant

In [None]:
def email_assistant(context, tone="professional"):
    """Generate email responses based on context and tone"""

    messages = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": f"""
    Write a {tone} email response based on this context:
    Context: {context}
    Email Response:"""}
     ]

    params = TextChatParameters(
        temperature=0.5,
        max_tokens=250
    )

    model = ModelInference(
        model_id=model_id,
        params=params,
        credentials=credentials,
        project_id=project_id
    )

    response = model.chat(messages=messages)
    clean_response = response["choices"][0]["message"]["content"]

    return clean_response


In [19]:
# Example usage
context = "Declining a meeting invitation due to a scheduling conflict, but expressing interest in rescheduling"
email_response = email_assistant(context, tone="friendly")

print(email_response)

Here's a friendly email response:

Dear [Name],

Thank you so much for inviting me to meet on [Date and Time]. I appreciate you thinking of me and I'm glad we're in touch.

Unfortunately, I have a prior commitment at that time and won't be able to make it to our meeting as scheduled. I'd love to reschedule for another time that works better for you. Would you be available to meet at an alternative time? I'm flexible and can work around your schedule.

Let's touch base soon to find a new time that suits you. I'm looking forward to catching up with you then.

Best regards,
[Your Name]


### Code Documentation Generator

In [None]:
def generate_docstring(code):
    """Generate documentation for code snippets"""

    messages = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": f"""
    Generate a comprehensive docstring for this Python function:
    {code}

    Include:
    - Description
    - Parameters
    - Returns
    - Example usage

    Docstring:"""}
    ]

    params = TextChatParameters(
        temperature=0.5,
        max_tokens=1000
    )

    model = ModelInference(
        model_id=model_id,
        params=params,
        credentials=credentials,
        project_id=project_id
    )

    response = model.chat(messages=messages)
    clean_response = response["choices"][0]["message"]["content"]

    return clean_response


In [21]:
# Example code
sample_code = """
def calculate_discount(price, discount_percent, max_discount=None):
    discount_amount = price * (discount_percent / 100)
    if max_discount and discount_amount > max_discount:
        discount_amount = max_discount
    return price - discount_amount
"""

docstring = generate_docstring(sample_code)
print("Generated Documentation:")
print("-" * 50)
print(docstring)

Generated Documentation:
--------------------------------------------------
```python
def calculate_discount(price, discount_percent, max_discount=None):
    """
    Calculates the price after applying a discount.

    This function takes into account a percentage discount and an optional maximum discount amount.

    Parameters
    ----------
    price : float
        The original price of the item.
    discount_percent : float
        The percentage discount to be applied.
    max_discount : float, optional
        The maximum discount amount allowed (default is None).

    Returns
    -------
    float
        The price after applying the discount.

    Example
    -------
    >>> calculate_discount(100, 20)
    80.0
    >>> calculate_discount(100, 20, max_discount=15)
    85.0
    """
    discount_amount = price * (discount_percent / 100)
    if max_discount and discount_amount > max_discount:
        discount_amount = max_discount
    return price - discount_amount
```


## Next Steps
### Congratulations! You've learned the basics of using Llama models on watsonx.ai. Here are some next steps:

## Useful Resources

* [IBM watsonx.ai Documentation](https://www.ibm.com/docs/en/watsonx)
* [Llama Model Documentation](https://www.llama.com/docs/overview/)
* [Prompt Engineering Guide](https://www.promptingguide.ai/)