# Lab 05: First Steps with Azure OpenAI

**Course:** Generative AI for Banking Sector  
**Institution:** Banco Nacional de Costa Rica (BNCR)  
**Instructor:** Manuela Larrea  
**Duration:** 3 hours

---

## Learning Objectives

By the end of this lab, you will be able to:

1. Understand the Azure OpenAI Service architecture and authentication
2. Make your first API calls to GPT-3.5 and GPT-4 models
3. Explore different parameters (temperature, max_tokens, top_p)
4. Build a simple banking assistant chatbot
5. Handle errors and implement retry logic
6. Understand token usage and cost optimization

---

## Azure Infrastructure for This Lab

```
╔══════════════════════════════════════════════════════════════════════════╗
║                    LAB 05 - AZURE INFRASTRUCTURE                         ║
╚══════════════════════════════════════════════════════════════════════════╝

┌─────────────────────────────────────────────────────────────────────────┐
│                        YOU (Jupyter Notebook)                            │
└────────────────────────────────┬────────────────────────────────────────┘
                                 │
                                 │ HTTPS Request
                                 │ (API Key in Header)
                                 ▼
                    ┌────────────────────────────┐
                    │   Azure OpenAI Service     │
                    │                            │
                    │  ┌──────────────────────┐  │
                    │  │  GPT-3.5 Turbo       │  │
                    │  │  (60K TPM)           │  │
                    │  └──────────────────────┘  │
                    │                            │
                    │  ┌──────────────────────┐  │
                    │  │  GPT-4               │  │
                    │  │  (10K TPM)           │  │
                    │  └──────────────────────┘  │
                    └────────────────────────────┘

📊 Resources Used:
  • Azure OpenAI Service (East US 2)
  • GPT-3.5 Turbo deployment
  • GPT-4 deployment (optional)

💰 Estimated Cost: ~$0.50 per lab session
```

## Part 1: Environment Setup

First, let's install the required packages and set up our environment variables.

In [None]:
# Install required packages
!pip install openai python-dotenv -q

In [None]:
import os
from openai import AzureOpenAI
from dotenv import load_dotenv
import json

# Cargar environment variables
load_dotenv()

# Verify environment variables are loaded
print("✓ Environment variables loaded")
print(f"✓ Azure OpenAI Endpoint: {os.getenv('AZURE_OPENAI_ENDPOINT')[:30]}...")

## Part 2: Initialize Azure OpenAI Client

The Azure OpenAI client requires three key pieces of information:
1. **API Key**: Your authentication credential
2. **Endpoint**: Your Azure OpenAI resource URL
3. **API Version**: The version of the API to use

In [None]:
# Inicializar Azure OpenAI client
client = AzureOpenAI(
    api_key=os.getenv("AZURE_OPENAI_API_KEY"),
    api_version="2024-02-15-preview",
    azure_endpoint=os.getenv("AZURE_OPENAI_ENDPOINT")
)

# Deployment names
GPT35_DEPLOYMENT = os.getenv("AZURE_OPENAI_DEPLOYMENT_GPT35", "gpt-35-turbo")
GPT4_DEPLOYMENT = os.getenv("AZURE_OPENAI_DEPLOYMENT_GPT4", "gpt-4")

print("✓ Azure OpenAI client initialized successfully")
print(f"✓ GPT-3.5 Deployment: {GPT35_DEPLOYMENT}")
print(f"✓ GPT-4 Deployment: {GPT4_DEPLOYMENT}")

## Part 3: Your First API Call

Let's make our first call to Azure OpenAI! We'll create a simple banking assistant.

In [None]:
# Completación de chat simple
response = client.chat.completions.create(
    model=GPT35_DEPLOYMENT,
    messages=[
        {"role": "system", "content": "Eres un asistente bancario profesional del BNCR."},
        {"role": "user", "content": "¿Qué es una cuenta de ahorros?"}
    ],
    temperature=0.7,
    max_tokens=150
)

print("Respuesta del Asistente:")
print(response.choices[0].message.content)
print("\n" + "="*80)
print(f"Tokens usados: {response.usage.total_tokens}")
print(f"Modelo: {response.model}")

## Part 4: Understanding Message Roles

Azure OpenAI uses three message roles:

1. **System**: Sets the behavior and context for the assistant
2. **User**: The user's input or question
3. **Assistant**: The model's previous responses (for conversation history)

Let's see how different system prompts affect the responses.

In [None]:
# Probar different system prompts
system_prompts = [
    "You are a formal banking advisor.",
    "You are a friendly banking assistant who explains things simply.",
    "You are a technical banking expert who provides detailed explanations."
]

user_question = "What is compound interest?"

for i, system_prompt in enumerate(system_prompts, 1):
    print(f"\n{'='*80}")
    print(f"Test {i}: {system_prompt}")
    print(f"{'='*80}\n")
    
    response = client.chat.completions.create(
        model=GPT35_DEPLOYMENT,
        messages=[
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": user_question}
        ],
        temperature=0.7,
        max_tokens=150
    )
    
    print(response.choices[0].message.content)

## Part 5: Exploring Temperature Parameter

**Temperature** controls the randomness of the model's output:
- **0.0**: Deterministic, always picks the most likely token
- **0.7**: Balanced creativity and consistency (default)
- **1.0+**: More creative and random

For banking applications, we typically use lower temperatures (0.3-0.7) for consistency.

In [None]:
# Probar different temperatures
temperatures = [0.0, 0.5, 1.0, 1.5]
question = "Give me 3 tips for saving money."

for temp in temperatures:
    print(f"\n{'='*80}")
    print(f"Temperatura: {temp}")
    print(f"{'='*80}\n")
    
    response = client.chat.completions.create(
        model=GPT35_DEPLOYMENT,
        messages=[
            {"role": "system", "content": "You are a financial advisor."},
            {"role": "user", "content": question}
        ],
        temperature=temp,
        max_tokens=200
    )
    
    print(response.choices[0].message.content)

## Part 6: Building a Conversational Banking Assistant

Let's build a simple chatbot that maintains conversation history.

In [None]:
class BankingAssistant:
    def __init__(self, client, deployment_name):
        self.client = client
        self.deployment_name = deployment_name
        self.conversation_history = [
            {
                "role": "system",
                "content": """Eres un asistente bancario profesional del Banco Nacional de Costa Rica (BNCR).
                Ayudas a los clientes con:
                - Información de cuentas
                - Recomendaciones de productos
                - Preguntas generales de banca
                - Transaction support
                
                Always be professional, friendly, and provide accurate information.
                If you don't know something, admit it and suggest contacting customer service."""
            }
        ]
    
    def chat(self, user_message):
        # Add user message to history
        self.conversation_history.append({
            "role": "user",
            "content": user_message
        })
        
        # Get response from Azure OpenAI
        response = self.client.chat.completions.create(
            model=self.deployment_name,
            messages=self.conversation_history,
            temperature=0.7,
            max_tokens=300
        )
        
        # Extract assistant's response
        assistant_message = response.choices[0].message.content
        
        # Add assistant's response to history
        self.conversation_history.append({
            "role": "assistant",
            "content": assistant_message
        })
        
        return assistant_message, response.usage
    
    def reset(self):
        # Keep only the system message
        self.conversation_history = [self.conversation_history[0]]
        print("Conversation history reset.")

# Inicializar the assistant
assistant = BankingAssistant(client, GPT35_DEPLOYMENT)
print("✓ Banking Assistant initialized")

In [None]:
# Probar conversation with context
print("User: Hello! I want to open a savings account.\n")
response, usage = assistant.chat("Hello! I want to open a savings account.")
print(f"Assistant: {response}\n")
print(f"Tokens usados: {usage.total_tokens}\n")
print("="*80)

print("\nUser: ¿Qué documentos necesito?\n")
response, usage = assistant.chat("¿Qué documentos necesito?")
print(f"Assistant: {response}\n")
print(f"Tokens usados: {usage.total_tokens}\n")
print("="*80)

print("\nUser: What's the minimum deposit?\n")
response, usage = assistant.chat("What's the minimum deposit?")
print(f"Assistant: {response}\n")
print(f"Tokens usados: {usage.total_tokens}")

## Part 7: Error Handling and Retry Logic

Production applications need robust error handling. Let's implement retry logic for common errors.

In [None]:
import time
from openai import RateLimitError, APIError, APIConnectionError

def chat_with_retry(client, messages, deployment, max_retries=3):
    """
    Hacer una solicitud de completación con reintento exponencial
    """
    for attempt in range(max_retries):
        try:
            response = client.chat.completions.create(
                model=deployment,
                messages=messages,
                temperature=0.7,
                max_tokens=200
            )
            return response
        
        except RateLimitError as e:
            if attempt == max_retries - 1:
                raise
            wait_time = 2 ** attempt
            print(f"Rate limit exceeded. Retrying in {wait_time} seconds...")
            time.sleep(wait_time)
        
        except (APIError, APIConnectionError) as e:
            if attempt == max_retries - 1:
                raise
            wait_time = 2 ** attempt
            print(f"API error occurred. Retrying in {wait_time} seconds...")
            time.sleep(wait_time)
        
        except Exception as e:
            print(f"Unexpected error: {str(e)}")
            raise

# Probar the retry function
messages = [
    {"role": "system", "content": "You are a banking assistant."},
    {"role": "user", "content": "What is a credit score?"}
]

try:
    response = chat_with_retry(client, messages, GPT35_DEPLOYMENT)
    print("Success!")
    print(response.choices[0].message.content)
except Exception as e:
    print(f"Failed after retries: {str(e)}")

## Part 8: Token Usage and Cost Optimization

Understanding token usage is crucial for cost optimization.

**Token Pricing (approximate):**
- GPT-3.5 Turbo: $0.0015 per 1K input tokens, $0.002 per 1K output tokens
- GPT-4: $0.03 per 1K input tokens, $0.06 per 1K output tokens

In [None]:
def calculate_cost(usage, model="gpt-35-turbo"):
    """
    Calculate the cost of an API call
    """
    if "gpt-4" in model.lower():
        input_cost = (usage.prompt_tokens / 1000) * 0.03
        output_cost = (usage.completion_tokens / 1000) * 0.06
    else:  # GPT-3.5
        input_cost = (usage.prompt_tokens / 1000) * 0.0015
        output_cost = (usage.completion_tokens / 1000) * 0.002
    
    total_cost = input_cost + output_cost
    
    return {
        "input_tokens": usage.prompt_tokens,
        "output_tokens": usage.completion_tokens,
        "total_tokens": usage.total_tokens,
        "input_cost": input_cost,
        "output_cost": output_cost,
        "total_cost": total_cost
    }

# Probar cost calculation
response = client.chat.completions.create(
    model=GPT35_DEPLOYMENT,
    messages=[
        {"role": "system", "content": "You are a banking assistant."},
        {"role": "user", "content": "Explain the difference between a checking and savings account."}
    ],
    temperature=0.7,
    max_tokens=300
)

cost_info = calculate_cost(response.usage, response.model)

print("Respuesta:")
print(response.choices[0].message.content)
print("\n" + "="*80)
print("\nCost Analysis:")
print(f"Input tokens: {cost_info['input_tokens']}")
print(f"Output tokens: {cost_info['output_tokens']}")
print(f"Total tokens: {cost_info['total_tokens']}")
print(f"\nInput cost: ${cost_info['input_cost']:.6f}")
print(f"Output cost: ${cost_info['output_cost']:.6f}")
print(f"Total cost: ${cost_info['total_cost']:.6f}")

## Part 9: Comparing GPT-3.5 vs GPT-4

Let's compare the responses and costs between GPT-3.5 and GPT-4.

In [None]:
# Complex banking question
complex_question = """A customer has $50,000 to invest. They want low risk but better returns 
than a savings account. They might need access to the money in 2 years. 
What would you recommend and why?"""

messages = [
    {"role": "system", "content": "You are an expert financial advisor at BNCR."},
    {"role": "user", "content": complex_question}
]

# Probar with GPT-3.5
print("GPT-3.5 Turbo Respuesta:")
print("="*80)
response_35 = client.chat.completions.create(
    model=GPT35_DEPLOYMENT,
    messages=messages,
    temperature=0.7,
    max_tokens=400
)
print(response_35.choices[0].message.content)
cost_35 = calculate_cost(response_35.usage, "gpt-35-turbo")
print(f"\nCost: ${cost_35['total_cost']:.6f}")

print("\n" + "="*80 + "\n")

# Probar with GPT-4 (if available)
try:
    print("GPT-4 Respuesta:")
    print("="*80)
    response_4 = client.chat.completions.create(
        model=GPT4_DEPLOYMENT,
        messages=messages,
        temperature=0.7,
        max_tokens=400
    )
    print(response_4.choices[0].message.content)
    cost_4 = calculate_cost(response_4.usage, "gpt-4")
    print(f"\nCost: ${cost_4['total_cost']:.6f}")
    
    print("\n" + "="*80)
    print(f"\nCost Comparison:")
    print(f"GPT-3.5: ${cost_35['total_cost']:.6f}")
    print(f"GPT-4: ${cost_4['total_cost']:.6f}")
    print(f"GPT-4 is {cost_4['total_cost']/cost_35['total_cost']:.1f}x more expensive")
except Exception as e:
    print(f"GPT-4 not available: {str(e)}")

## 🎯 Practical Exercise 1: Product Recommendation System

Create a banking product recommendation system that:
1. Asks the customer about their needs
2. Recommends appropriate products
3. Explains why each product is suitable

Use the banking products dataset provided.

In [None]:
import pandas as pd

# Cargar banking products
products_df = pd.read_csv("../../datasets/banking/banking_products.csv")

# Mostrar available products
print("Productos Bancarios Disponibles:")
print(products_df[['product_name', 'product_type', 'interest_rate']].to_string(index=False))

# TODO: Crear un sistema prompt that includes product information
# TODO: Build a recommendation function
# TODO: Test with different customer profiles

# Tu código aquí:


## 🎯 Practical Exercise 2: Multi-language Support

Extend the banking assistant to support both Spanish and English.
The assistant should:
1. Detect the language of the user's message
2. Respond in the same language
3. Maintain conversation history in both languages

In [None]:
# TODO: Create a multilingual banking assistant
# TODO: Probar con consultas en español e inglés
# TODO: Implementar detección de idioma

# Tu código aquí:


## 🎯 Practical Exercise 3: Cost Optimization Challenge

You have a budget of $10 per day for API calls.

Tasks:
1. Calculate how many customer interactions you can handle
2. Implement a token counter that warns when approaching limits
3. Optimize the system prompt to reduce token usage
4. Compare costs between GPT-3.5 and GPT-4 for your use case

In [None]:
# TODO: Implementar rastreador de presupuesto diario
# TODO: Crear estrategias de optimización de costos
# TODO: Comparar costos de modelos

# Tu código aquí:


## Summary and Key Takeaways

In this lab, you learned:

1. **Azure OpenAI Basics**: How to initialize the client and make API calls
2. **Message Roles**: System, user, and assistant roles and their purposes
3. **Parameters**: Temperature, max_tokens, and their effects on output
4. **Conversation Management**: Building chatbots with conversation history
5. **Error Handling**: Implementing retry logic for production systems
6. **Cost Optimization**: Understanding token usage and calculating costs
7. **Model Comparison**: When to use GPT-3.5 vs GPT-4

### Best Practices for Banking Applications:

- Use **lower temperatures (0.3-0.7)** for consistent, reliable responses
- Implement **comprehensive error handling** with retry logic
- **Monitor token usage** to control costs
- Use **clear system prompts** that define the assistant's role and limitations
- **Never expose sensitive information** in prompts or logs
- Always **validate and sanitize** user inputs
- Implement **rate limiting** to prevent abuse

### Next Steps:

In the next lab, we'll explore advanced prompt engineering techniques to improve response quality and consistency.

---

**Questions or Issues?**  
Contact: Manuela Larrea | manuela.larrea@idataglobal.com