# Notebook 03: Zero-Shot, Few-Shot, CoT, ToT

**Objectives:**
- Understand shot-based learning (zero, few)
- Apply Chain-of-Thought (CoT) reasoning
- Explore Tree-of-Thought (ToT) with branches
- Automatic routing to reasoning models
- Handle context overflow with overflow_summarize

In [2]:
import sys
sys.path.append('..')

from utils.prompts import render
from utils.llm_client import LLMClient
from utils.logging_utils import log_llm_call
from utils.router import pick_model, should_use_reasoning_model
from IPython.display import Markdown, display

## Part 1: Zero-Shot vs Few-Shot

Zero-shot: No examples. Few-shot: Provide examples to guide behavior.

In [3]:
# Zero-shot classification
prompt_text, spec = render(
    'zero_shot.v1',
    role='sentiment classifier',
    instruction='Classify the sentiment as positive, negative, or neutral',
    constraints='Return only the label',
    format='Single word'
)

model = pick_model('openai', 'general')
client = LLMClient('openai', model)

test_text = 'This product is amazing! Best purchase ever.'
messages = [{'role': 'user', 'content': f"{prompt_text}\n\nText: {test_text}"}]

response = client.chat(messages, temperature=0.0)
print('Zero-shot result:', response['text'])
log_llm_call('openai', model, 'zero_shot', response['latency_ms'], response['usage'])

Zero-shot result: positive


In [3]:
# Few-shot with comprehensive examples
examples = """
Example 1:
Review: I'm really happy with the product but it's bad!
Sentiment: negative
Explanation: User says product is bad but they are happy. When there's a conflict between 
user emotion and product quality, prioritize the product quality statement. 
Therefore, sentiment is negative.

Example 2:
Review: I'm really unhappy with the product but it's amazing!
Sentiment: positive
Explanation: User says product is amazing but they are unhappy. Prioritizing product 
quality over user emotion, the sentiment is positive.

Example 3:
Review: I'm really happy with the product! It's amazing!
Sentiment: positive
Explanation: Both user emotion and product quality are positive. Clear positive sentiment.

Example 4:
Review: I'm really unhappy with the product! It's terrible!
Sentiment: negative
Explanation: Both user emotion and product quality are negative. Clear negative sentiment.

Example 5:
Review: The product is okay, but it's not great.
Sentiment: neutral
Explanation: User expresses ambivalence. Product quality is neither strongly positive nor negative.

Example 6:
Review: I'm not sure about the product yet.
Sentiment: neutral
Explanation: User is uncertain and hasn't formed a clear opinion about product quality.
"""

# Use a more nuanced test case
test_text_nuanced = "It seems nice but it's not for me"

prompt_text, spec = render(
    'few_shot.v1',
    role='sentiment classifier',
    examples=examples,
    query=f'Review: {test_text_nuanced}',
    constraints='Follow the pattern in examples: provide sentiment and brief explanation',
    format='Sentiment: {{sentiment}}\n\nExplanation: {{explanation}}'
)

messages = [{'role': 'user', 'content': prompt_text}]
response = client.chat(messages, temperature=0.2)
print('Few-shot result (nuanced):')
print(response)
log_llm_call('openai', model, 'few_shot', response['latency_ms'], response['usage'])

Few-shot result (nuanced):
{'text': "Sentiment: negative\n\nExplanation: User says the product seems nice, which is positive, but then states it's not for them, which is negative. Prioritizing the product quality statement, the sentiment is negative.\n", 'usage': {'input_tokens_est': 313, 'context_tokens_est': 0, 'total_est': 316, 'prompt_tokens_actual': 351, 'completion_tokens_actual': 45, 'total_tokens_actual': 396}, 'latency_ms': 796, 'raw': GenerateContentResponse(
  automatic_function_calling_history=[],
  candidates=[
    Candidate(
      avg_logprobs=-0.20678130255805122,
      content=Content(
        parts=[
          Part(
            text="""Sentiment: negative

Explanation: User says the product seems nice, which is positive, but then states it's not for them, which is negative. Prioritizing the product quality statement, the sentiment is negative.
"""
          ),
        ],
        role='model'
      ),
      finish_reason=<FinishReason.STOP: 'STOP'>
    ),
  ],
  model_v

In [4]:
# CoT auto-routes to reasoning model
reasoning_model = pick_model('google', 'cot')
print(f'Using reasoning model: {reasoning_model}')

client_reasoning = LLMClient('google', reasoning_model)

# Problem from live class: break time vs travel time confusion
problem = """A car travels 100 miles in 2 hours. 

Question 1: What is the average speed of the car?
Question 2: If the car stopped for 40 minutes during this 2-hour journey, 
what was the average speed during the actual driving time?

Important: The 2 hours already includes the 40-minute stop."""

# Additional guidance
instruction = """Solve the following problem step by step.
1. First identify whether the car travelled the entire time without stopping or not.
2. If car stopped for x minutes and overall travelled for y hours, the actual driving duration is y-x.
3. If stopping time x is mentioned, do not add it to the travel duration because it's already included.
   So actual travel time is y (total time) - x (stopping time).
4. If car travelled the entire time without stopping, then average speed is distance / y.
5. If car stopped for x minutes, then average speed during driving is distance / (y - x)."""

prompt_text, spec = render(
    'cot_reasoning.v1',
    role='math tutor',
    problem=problem
)

# Combine problem with instruction (as done in live class)
full_prompt = f"""text: {prompt_text}

instruction: {instruction}"""

# FIX: Remove space before 'role' to avoid KeyError for Google provider
messages = [{'role': 'user', 'content': full_prompt}]
response = client_reasoning.chat(messages, temperature=spec.temperature, max_tokens=spec.max_tokens)

print('CoT Response (Travel Time with Break):')
print('=' * 80)
print(response)
print('=' * 80)
log_llm_call('openai', reasoning_model, 'cot', response['latency_ms'], response['usage'])

Using reasoning model: gemini-3-pro-preview
CoT Response (Travel Time with Break):
{'text': '**Reasoning Steps:**\n\n1.  **Question 1:** Calculate the overall average speed using the total distance (100 miles) and the total time elapsed (2 hours). Formula: $\\text{Speed} = \\frac{\\text{Total Distance}}{\\text{Total Time}}$.\n2.  **Question 2:** Determine the actual driving time. Since the 2-hour total includes the stop, subtract the stopping time from the total time.\n    *   Total time = 120 minutes (2 hours).\n    *   Stopping time = 40 minutes.\n    *   Actual driving time = $120 - 40 = 80$ minutes.\n3.  Convert the actual driving time into hours to calculate miles per hour (mph).\n    *   $80 \\text{ minutes} = \\frac{80}{60} \\text{ hours} = \\frac{4}{3} \\text{ hours}$ (approx 1.33 hours).\n4.  Calculate the average speed during driving time using $\\text{Speed} = \\frac{\\text{Total Distance}}{\\text{Actual Driving Time}}$.\n\n**Answer:**\n\n**Question 1:** 50 mph\n**Question 2

In [5]:
# CoT auto-routes to reasoning model
reasoning_model = pick_model('google', 'cot')
print(f'Using reasoning model: {reasoning_model}')

client_reasoning = LLMClient('google', reasoning_model)

# Problem from live class: break time vs travel time confusion
problem = """A car travels 100 miles in 2 hours. 

Question 1: What is the average speed of the car?
Question 2: If the car stopped for 40 minutes during this 2-hour journey, 
what was the average speed during the actual driving time?

Important: The 2 hours already includes the 40-minute stop."""

# Additional guidance (from live class approach)
instruction = """Solve the following problem step by step.
1. First identify whether the car travelled the entire time without stopping or not.
2. If car stopped for x minutes and overall travelled for y hours, the actual driving duration is y-x.
3. If stopping time x is mentioned, do not add it to the travel duration because it's already included.
   So actual travel time is y (total time) - x (stopping time).
4. If car travelled the entire time without stopping, then average speed is distance / y.
5. If car stopped for x minutes, then average speed during driving is distance / (y - x)."""

prompt_text, spec = render(
    'cot_reasoning.v1',
    role='math tutor',
    problem=problem
)

# Combine problem with instruction (as done in live class)
full_prompt = f"""text: {prompt_text}

instruction: {instruction}"""

messages = [{'role': 'user', 'content': full_prompt}]
response = client_reasoning.chat(messages, temperature=spec.temperature, max_tokens=spec.max_tokens)

print('CoT Response (Travel Time with Break):')
print('=' * 80)
print(response)
print('=' * 80)
log_llm_call('google', reasoning_model, 'cot', response['latency_ms'], response['usage'])

Using reasoning model: gemini-3-pro-preview
CoT Response (Travel Time with Break):
{'text': '**Reasoning Steps:**\n\n1.  **Analyze Question 1 (Overall Average Speed):**\n    *   Identify total distance ($d$) and total time ($y$).\n    *   $d = 100$ miles, $y = 2$ hours.\n    *   Apply formula: $\\text{Average Speed} = \\text{Total Distance} / \\text{Total Time}$.\n\n2.  **Analyze Question 2 (Actual Driving Speed):**\n    *   Identify the stopping time ($x$) included in the total time. $x = 40$ minutes.\n    *   Convert stopping time to hours: $40 \\text{ minutes} = \\frac{40}{60} \\text{ hours} = \\frac{2}{3} \\text{ hours}$.\n    *   Calculate actual driving time ($t_{driving}$) by subtracting stopping time from total time ($y - x$): $2 \\text{ hours} - \\frac{2}{3} \\text{ hours} = \\frac{4}{3} \\text{ hours}$ (or 1 hour 20 minutes).\n    *   Apply formula: $\\text{Driving Speed} = \\text{Total Distance} / \\text{Actual Driving Time}$.\n    *   Calculation: $100 / (\\frac{4}{3})$.\n\

## Part 3: Tree-of-Thought (ToT)

Explore multiple solution paths, then select the best.

In [6]:
problem = """You have a 3-gallon jug and a 5-gallon jug.
How can you measure exactly 4 gallons?"""

prompt_text, spec = render(
    'tot_reasoning.v1',
    role='puzzle solver',
    problem=problem,
    branches='3'
)

messages = [{'role': 'user', 'content': prompt_text}]
response = client_reasoning.chat(messages, temperature=spec.temperature, max_tokens=spec.max_tokens)

print('ToT Response (Multiple Solution Paths):')
print('=' * 80)
print(response)
print('=' * 80)
log_llm_call('openai', reasoning_model, 'tot', response['latency_ms'], response['usage'])

ToT Response (Multiple Solution Paths):
{'text': None, 'usage': {'input_tokens_est': 76, 'context_tokens_est': 0, 'total_est': 79, 'prompt_tokens_actual': 76, 'completion_tokens_actual': 0, 'total_tokens_actual': 76}, 'latency_ms': 43435, 'raw': GenerateContentResponse(
  automatic_function_calling_history=[],
  candidates=[
    Candidate(
      content=Content(),
      finish_reason=<FinishReason.MAX_TOKENS: 'MAX_TOKENS'>,
      index=0
    ),
  ],
  model_version='gemini-3-pro-preview',
  response_id='sfdHafbmO8mQmtkP24-AGQ',
  sdk_http_response=HttpResponse(
    headers=<dict len=11>
  ),
  usage_metadata=GenerateContentResponseUsageMetadata(
    prompt_token_count=76,
    prompt_tokens_details=[
      ModalityTokenCount(
        modality=<MediaModality.TEXT: 'TEXT'>,
        token_count=76
      ),
    ],
    thoughts_token_count=4093,
    total_token_count=4169
  )
), 'meta': {'retry_count': 0, 'backoff_ms_total': 0, 'overflow_handled': False}}


## Part 4: Context Overflow Demo

When context exceeds limits, use overflow_summarize prompt.

In [7]:
# Simulate large context
large_context = 'Product details: ' + ' '.join(['Feature description.'] * 500)

prompt_text, spec = render(
    'overflow_summarize.v1',
    context=large_context,
    max_tokens_context='200',
    task='List the top 3 features',
    format='Bullet list'
)

# Use hard_prompt_cap to trigger truncation
client_capped = LLMClient('google', model, hard_prompt_cap=300)
messages = [{'role': 'user', 'content': prompt_text}]

response = client_capped.chat(messages, temperature=0.2)
print('Overflow handled:', response['meta']['overflow_handled'])
print('Response:', response['text'][:200], '...')

Overflow handled: True
Response: The provided context consists of a product description with a long list of feature descriptions. Due to the truncation, the specific features, names, and numbers are unavailable. The summary is theref ...


## Key Takeaways

1. **Zero-shot**: Fast, works for simple tasks
2. **Few-shot**: Examples improve accuracy significantly
3. **CoT/ToT**: Reasoning models required, higher token cost but better logic
4. **Automatic routing**: `pick_model()` selects reasoning models for CoT/ToT
5. **Overflow handling**: Use summarization or truncation when context is too large

**Next:** `04_structured_outputs_json_schema.ipynb`