# Notebook 3 — Token/Context Cost + CoT (Quality and Cost)


1) Zero/One/Few-shot prompting consume of token
2) CoT (step-by-step) cost / output relation


## Setup

Azure openai gpt model

In [None]:
# Requirements
%pip -q install -U langchain-core langchain-openai langchain-google-genai tiktoken python-dotenv matplotlib pandas==2.2.2 pydantic==2.12.3

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m52.8/52.8 kB[0m [31m1.6 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m84.8/84.8 kB[0m [31m6.3 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m66.5/66.5 kB[0m [31m5.1 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m8.7/8.7 MB[0m [31m90.3 MB/s[0m eta [36m0:00:00[0m
[?25h

In [None]:
import os, json, re
from typing import Dict, Any
from langchain_openai import AzureChatOpenAI

# --- Azure OpenAI Configuration ---
AZURE_ENDPOINT = "https://sd-rg.cognitiveservices.azure.com/"
AZURE_API_KEY = ""  # api key
AZURE_DEPLOYMENT = "vodafone_rag_module"
API_VERSION = "2024-12-01-preview"

def get_llm():
    """
    Initializes the Azure OpenAI client using the provided configuration.

    Returns:
        tuple: (llm_instance, provider_name_string)

    Raises:
        RuntimeError: If API credentials are missing or connection fails.
    """
    # Check if essential credentials are present
    if AZURE_API_KEY and AZURE_ENDPOINT:
        try:
            llm = AzureChatOpenAI(
                azure_deployment=AZURE_DEPLOYMENT,
                api_version=API_VERSION,
                azure_endpoint=AZURE_ENDPOINT,
                api_key=AZURE_API_KEY,
                temperature=0.1,
                max_retries=2
            )
            return llm, f'Azure OpenAI ({AZURE_DEPLOYMENT})'
        except Exception as e:
            raise RuntimeError(f"Azure connection error: {e}")

    raise RuntimeError('Azure API credentials are missing!')

# --- Initialization ---
try:
    llm, provider = get_llm()
    print('✅ LLM ready:', provider)

    # Uncomment the line below to test the connection immediately
    # print("Test Response:", llm.invoke("Hello, are you active?").content)

except Exception as e:
    # If initialization fails, set llm to None to prevent subsequent NameErrors
    llm = None
    print(f"❌ Error occurred: {e}")

def llm_text(prompt: str) -> str:
    """
    Sends a prompt to the LLM and retrieves the text response.

    Args:
        prompt (str): The input string to send to the model.

    Returns:
        str: The clean text response from the model.
    """
    if llm is None:
        return "Error: LLM is not initialized."

    resp = llm.invoke(prompt)
    # Safely retrieve content whether it's an object or string
    return getattr(resp, 'content', str(resp)).strip()

def strip_fences(s: str) -> str:
    """
    Removes Markdown code fences (e.g., ```json ... ```) from a string.

    Args:
        s (str): The input string containing code fences.

    Returns:
        str: Cleaned string without the fences.
    """
    s = s.strip()
    # Remove starting ```json or ``` (case insensitive)
    s = re.sub(r'^```(json)?\s*', '', s, flags=re.IGNORECASE)
    # Remove ending ```
    s = re.sub(r'\s*```$', '', s)
    return s.strip()

✅ LLM ready: Azure OpenAI (vodafone_rag_module)


In [None]:
# Dataset for Triage
EMAILS = [
    {'id': 'E1', 'text': 'Kargom hâlâ gelmedi. 7 gündür bekliyorum. Acil çözüm istiyorum!', 'notes': 'Gecikme + yüksek aciliyet'},
    {'id': 'E2', 'text': 'Ürün kırık geldi. Değişim yapabilir miyiz?', 'notes': 'Hasarlı ürün'},
    {'id': 'E3', 'text': 'İade sürecini nasıl başlatabilirim? Kutuyu attım ama ürün duruyor.', 'notes': 'İade + edge-case (kutusuz)'},
    {'id': 'E4', 'text': 'Kartımdan iki kez çekim yapılmış görünüyor. Lütfen hemen kontrol edin.', 'notes': 'Faturalama + yüksek aciliyet'},
    {'id': 'E5', 'text': 'Ürününüzün kullanım kılavuzunu paylaşır mısınız?', 'notes': 'Bilgi talebi (low)'},
]
len(EMAILS)

5

## 1) Token Counting

For OpenAI, `tiktoken` gives a more accurate count. Tokenizers may differ for other providers. Here:
- It uses `tiktoken` if it works
- It makes an approximate estimate if it doesn't work (1 token ≈ 4 characters)

In [None]:
import os

def count_tokens(text: str, model_hint: str = 'gpt-4o-mini') -> int:
    try:
        import tiktoken
        enc = tiktoken.encoding_for_model(model_hint)
        return len(enc.encode(text))
    except Exception:
        return max(1, len(text) // 4)

def estimate_cost(tokens_in: int, tokens_out: int, price_in_per_1k: float, price_out_per_1k: float) -> float:
    return (tokens_in/1000)*price_in_per_1k + (tokens_out/1000)*price_out_per_1k

## 2) Difference of Three strategy token size

From Notebook 1 schema and examples.

In [None]:
import json

SCHEMA = (
    'Return ONLY valid JSON with exactly these keys:\n'
    '{\n'
    '  "category": "string",\n'
    '  "urgency": "low|medium|high",\n'
    '  "reason": "string (max 1 sentence)"\n'
    '}\n'
    'No extra text, no markdown, JSON only.'
)

ONE_EXAMPLE = {
    'email': 'My order arrived broken. I want a replacement as soon as possible.',
    'answer': {'category':'Damaged product','urgency':'medium','reason':'Customer reports a damaged item and requests a replacement.'}
}

FEW_EXAMPLES = [
    ('Where is my package? It was supposed to arrive 5 days ago.', {'category':'Delivery issue','urgency':'high','reason':'Customer reports a significantly delayed delivery.'}),
    ('How can I return the product? I changed my mind.', {'category':'Return request','urgency':'low','reason':'Customer asks for return instructions without a critical issue.'}),
    ('You charged me twice for the same order. Fix this immediately.', {'category':'Billing issue','urgency':'high','reason':'Customer reports a double charge and requests urgent resolution.'}),
]

def prompt_zero(email_text: str) -> str:
    return ('You are a customer support triage assistant.\n'
            'Classify the email into a category and urgency.\n\n' + SCHEMA + '\n\n' + 'Email:\n' + email_text)

def prompt_one(email_text: str) -> str:
    return ('You are a customer support triage assistant.\n'
            'Use the example to follow the same output format and labeling style.\n\n'
            + SCHEMA + '\n\n'
            + 'Example:\nEmail:\n' + ONE_EXAMPLE['email'] + '\n'
            + 'Answer:\n' + json.dumps(ONE_EXAMPLE['answer'], ensure_ascii=False) + '\n\n'
            + 'Now classify this email:\n\nEmail:\n' + email_text)

def prompt_few(email_text: str) -> str:
    ex_block = ''
    for mail, ans in FEW_EXAMPLES:
        ex_block += 'Email:\n' + mail + '\nAnswer:\n' + json.dumps(ans, ensure_ascii=False) + '\n\n'
    return ('You are a customer support triage assistant.\n'
            'Follow the same pattern as the examples.\n\n' + SCHEMA + '\n\n'
            + 'Examples:\n' + ex_block
            + 'Now classify this email:\n\nEmail:\n' + email_text)

### Token Calculation

In [None]:
import pandas as pd

sample = EMAILS[0]['text']
model_hint = os.getenv('OPENAI_MODEL', 'gpt-4o-mini')

p0, p1, p2 = prompt_zero(sample), prompt_one(sample), prompt_few(sample)
t0, t1, t2 = count_tokens(p0, model_hint), count_tokens(p1, model_hint), count_tokens(p2, model_hint)

pd.DataFrame([
    {'strategy':'zero-shot', 'prompt_tokens': t0},
    {'strategy':'one-shot',  'prompt_tokens': t1},
    {'strategy':'few-shot',  'prompt_tokens': t2},
]).sort_values('prompt_tokens')

Unnamed: 0,strategy,prompt_tokens
0,zero-shot,97
1,one-shot,155
2,few-shot,238


## 3) Simulation of cost

This price is estimated, 1M input token approxiametly $0.15

In [None]:

price_in_per_1k  = 10   # örn: 0.15
price_out_per_1k = 20   # örn: 0.60
avg_out_tokens   = 90

rows = []
for name, tok in [('zero-shot', t0), ('one-shot', t1), ('few-shot', t2)]:
    rows.append({
        'strategy': name,
        'prompt_tokens_in': tok,
        'avg_tokens_out': avg_out_tokens,
        'est_cost_per_call': estimate_cost(tok, avg_out_tokens, price_in_per_1k, price_out_per_1k)
    })
pd.DataFrame(rows)

Unnamed: 0,strategy,prompt_tokens_in,avg_tokens_out,est_cost_per_call
0,zero-shot,97,90,2.77
1,one-shot,155,90,3.35
2,few-shot,238,90,4.18


## 4) Context overloading ( as the sample size increases )

How does the prompt token grow when the number of few-shot samples is increased?

In [None]:
def prompt_with_n_examples(email_text: str, n: int) -> str:
    exs = (FEW_EXAMPLES * ((n // len(FEW_EXAMPLES)) + 1))[:n]
    ex_block = ''
    for mail, ans in exs:
        ex_block += 'Email:\n' + mail + '\nAnswer:\n' + json.dumps(ans, ensure_ascii=False) + '\n\n'
    return ('You are a customer support triage assistant.\n'
            'Follow the same pattern as the examples.\n\n' + SCHEMA + '\n\n'
            + 'Examples:\n' + ex_block
            + 'Now classify this email:\n\nEmail:\n' + email_text)

for n in [0, 1, 3, 5, 10, 20, 40]:
    p = prompt_with_n_examples(sample, n)
    print(f'n={n:>2}  prompt_tokens≈{count_tokens(p, model_hint)}')

n= 0  prompt_tokens≈102
n= 1  prompt_tokens≈147
n= 3  prompt_tokens≈238
n= 5  prompt_tokens≈328
n=10  prompt_tokens≈555
n=20  prompt_tokens≈1008
n=40  prompt_tokens≈1915


## 5) CoT (Chain-of-Thought): kalite vs maliyet

Burada iki yaklaşımı karşılaştırıyoruz:
- **Direct**: sadece sonucu üret
- **Step-by-step**: adım adım çöz

> Üretimde full CoT’yi kullanıcıya göstermek istemeyebilirsiniz. Alternatif: kısa gerekçe + self-check.

In [None]:
MATH_TASK = ('A company sold 120 items on Monday, 95 on Tuesday, and 110 on Wednesday. '
             'If 15 items were returned, how many net items were sold?')

PROMPT_DIRECT = 'Answer with a single line: "Net sold = <number>". Question: ' + MATH_TASK
PROMPT_COT    = 'Solve step by step, then answer with a single line: "Net sold = <number>". Question: ' + MATH_TASK

direct = llm_text(PROMPT_DIRECT)
cot    = llm_text(PROMPT_COT)

print('DIRECT:\n', direct)
print('\nCOT:\n', cot)

print('\nToken estimates (prompt only):')
print('direct:', count_tokens(PROMPT_DIRECT, model_hint))
print('cot   :', count_tokens(PROMPT_COT, model_hint))

## 6) Alternatif: brief rationale + self-check

CoT yerine, üretimde daha güvenli ve kısa bir format:

In [None]:
PROMPT_BRIEF = (
    'Return:\n'
    '1) Answer: Net sold = <number>\n'
    '2) Rationale: (max 2 sentences)\n'
    '3) Self-check: (max 2 bullets)\n\n'
    'Question: ' + MATH_TASK
)

brief = llm_text(PROMPT_BRIEF)
print(brief)
print('\nPrompt token estimate:', count_tokens(PROMPT_BRIEF, model_hint))

## 7) Egzersiz (3–5 dk)

1) CoT prompt’unu Türkçe yapıp token sayımını kıyaslayın.
2) 10–20 örneğe çıkınca context’in nasıl şiştiğini not edin.
3) Brief + self-check formatını farklı bir problemde deneyin.