# Lab 1: Chain-of-Thought Implementation

**Week 3 - Advanced Prompting & OpenAI API**

**Provided by:** ADC ENGINEERING & CONSULTING LTD

## Objectives

In this lab, you will:
- Implement zero-shot chain-of-thought prompting
- Compare CoT vs non-CoT performance
- Build a self-consistency system
- Create a prompt chain for complex tasks
- Measure and analyze reasoning quality

## Prerequisites

- OpenAI API key configured
- Understanding of prompt engineering basics
- Python 3.9+

## Setup

In [None]:
# Install required packages
!pip install openai python-dotenv tiktoken --quiet

In [None]:
import os
from openai import OpenAI
from dotenv import load_dotenv
from collections import Counter
import re
import time

# Load environment variables
load_dotenv()

# Initialize OpenAI client
client = OpenAI(api_key=os.getenv('OPENAI_API_KEY'))

print("âœ“ Setup complete")

## Part 1: Zero-Shot Chain-of-Thought

Let's implement and test zero-shot CoT prompting.

In [None]:
def generate_response(prompt, model="gpt-4", temperature=0.3):
    """Generate a response from the OpenAI API."""
    response = client.chat.completions.create(
        model=model,
        messages=[{"role": "user", "content": prompt}],
        temperature=temperature
    )
    return response.choices[0].message.content

def zero_shot_cot(problem, model="gpt-4"):
    """Apply zero-shot chain-of-thought."""
    prompt = f"{problem}\n\nLet's think step by step."
    return generate_response(prompt, model=model)

### Exercise 1.1: Test Zero-Shot CoT on Math Problems

In [None]:
# Test problem
problem = """
A bakery sells cupcakes for $3 each and cookies for $2 each.
Sarah buys 4 cupcakes and has $15 total to spend.
How many cookies can she buy with the remaining money?
"""

# Without CoT
print("=== Without Chain-of-Thought ===")
result_no_cot = generate_response(problem)
print(result_no_cot)
print()

# With CoT
print("=== With Chain-of-Thought ===")
result_cot = zero_shot_cot(problem)
print(result_cot)

### Exercise 1.2: Logic Puzzle with CoT

In [None]:
logic_problem = """
Three switches outside a room control three light bulbs inside.
You can flip the switches as many times as you want, but you can only
enter the room once. How can you determine which switch controls which bulb?
"""

solution = zero_shot_cot(logic_problem)
print(solution)

### Exercise 1.3: Code Debugging with CoT

In [None]:
code_problem = """
This function should check if a string is a palindrome, but it's not working:

def is_palindrome(s):
    return s == s.reverse()

What's wrong and how can we fix it?
"""

debugging_result = zero_shot_cot(code_problem)
print(debugging_result)

## Part 2: Self-Consistency

Implement self-consistency to improve accuracy through multiple reasoning paths.

In [None]:
def extract_final_answer(text):
    """Extract the final answer from reasoning text."""
    # Look for common answer patterns
    patterns = [
        r'(?:Answer|Final answer|Therefore|Thus|So):?\s*(.+?)(?:\n|$)',
        r'(?:is|equals?)\s+(\d+)',
        r'(\d+)\s+(?:is the answer|is correct)'
    ]
    
    for pattern in patterns:
        match = re.search(pattern, text, re.IGNORECASE)
        if match:
            return match.group(1).strip()
    
    # Fallback: return last line
    lines = [l.strip() for l in text.strip().split('\n') if l.strip()]
    return lines[-1] if lines else text.strip()

def self_consistency_cot(problem, num_samples=5, model="gpt-4"):
    """Generate multiple reasoning paths and select most consistent answer."""
    prompt = f"{problem}\n\nLet's think step by step."
    
    responses = []
    answers = []
    
    print(f"Generating {num_samples} reasoning paths...\n")
    
    for i in range(num_samples):
        response = generate_response(prompt, model=model, temperature=0.7)
        responses.append(response)
        
        answer = extract_final_answer(response)
        answers.append(answer)
        
        print(f"Path {i+1} answer: {answer}")
    
    # Find most common answer
    answer_counts = Counter(answers)
    most_common_answer, count = answer_counts.most_common(1)[0]
    
    return {
        "final_answer": most_common_answer,
        "confidence": count / num_samples,
        "all_responses": responses,
        "answer_distribution": dict(answer_counts)
    }

### Exercise 2.1: Test Self-Consistency

In [None]:
# A tricky problem where reasoning might vary
tricky_problem = """
A farmer has 15 cows. All but 8 die. How many cows does the farmer have left?
"""

result = self_consistency_cot(tricky_problem, num_samples=5)

print("\n" + "="*50)
print(f"Final Answer: {result['final_answer']}")
print(f"Confidence: {result['confidence']:.0%}")
print(f"Answer Distribution: {result['answer_distribution']}")

### Exercise 2.2: Compare Accuracy

Compare single-shot vs self-consistency on multiple problems.

In [None]:
# Test problems with known answers
test_problems = [
    {
        "problem": "If you have 3 apples and get 4 more, then give away 2, how many do you have?",
        "correct_answer": "5"
    },
    {
        "problem": "A book costs $12. If you have $50 and buy 3 books, how much money is left?",
        "correct_answer": "14"
    },
    {
        "problem": "If there are 24 hours in a day, how many hours are in 3.5 days?",
        "correct_answer": "84"
    }
]

print("Comparing approaches...\n")

for i, test in enumerate(test_problems, 1):
    print(f"\nProblem {i}: {test['problem']}")
    print(f"Correct answer: {test['correct_answer']}")
    
    # Single shot
    single_result = zero_shot_cot(test['problem'])
    single_answer = extract_final_answer(single_result)
    print(f"Single-shot answer: {single_answer}")
    
    # Self-consistency
    sc_result = self_consistency_cot(test['problem'], num_samples=3)
    print(f"Self-consistency answer: {sc_result['final_answer']} (confidence: {sc_result['confidence']:.0%})")
    
    print("-" * 50)

## Part 3: Prompt Chaining

Break complex tasks into manageable steps.

In [None]:
class PromptChain:
    """Build and execute chains of prompts."""
    
    def __init__(self, model="gpt-4", temperature=0.3):
        self.model = model
        self.temperature = temperature
        self.steps = []
    
    def add_step(self, name, prompt_template, extract_fn=None):
        """Add a step to the chain."""
        self.steps.append({
            "name": name,
            "template": prompt_template,
            "extract_fn": extract_fn or (lambda x: x)
        })
        return self
    
    def execute(self, initial_input):
        """Execute the prompt chain."""
        current_input = initial_input
        results = []
        
        for step in self.steps:
            print(f"\nExecuting: {step['name']}")
            print("-" * 50)
            
            # Format prompt
            if isinstance(current_input, dict):
                prompt = step["template"].format(**current_input)
            else:
                prompt = step["template"].format(input=current_input)
            
            # Generate response
            output = generate_response(prompt, model=self.model, temperature=self.temperature)
            
            # Extract relevant data
            extracted = step["extract_fn"](output)
            
            results.append({
                "step_name": step["name"],
                "prompt": prompt,
                "output": output,
                "extracted": extracted
            })
            
            print(f"Output:\n{output}\n")
            
            current_input = extracted
        
        return {
            "final_output": current_input,
            "steps": results
        }

### Exercise 3.1: Customer Feedback Analysis Chain

In [None]:
# Build a chain for analyzing customer feedback
feedback_chain = PromptChain()

# Step 1: Extract key information
feedback_chain.add_step(
    "Extract Information",
    """
Extract the following from this customer feedback:
- Main issue or topic
- Sentiment (positive/negative/neutral)
- Urgency level (high/medium/low)
- Key details

Feedback: {input}

Provide structured output.
    """
)

# Step 2: Categorize and prioritize
feedback_chain.add_step(
    "Categorize",
    """
Based on this analysis:

{input}

Categorize into: TECHNICAL, BILLING, PRODUCT, SERVICE, OTHER
Assign priority: P1 (critical), P2 (high), P3 (medium), P4 (low)
Suggest department: Support, Sales, Engineering, Finance
    """
)

# Step 3: Generate action items
feedback_chain.add_step(
    "Action Items",
    """
Based on this categorization:

{input}

Generate 3 specific, actionable next steps for the team.
Include estimated timeframe for each action.
    """
)

# Test the chain
customer_feedback = """
I've been trying to access my account for 3 days now. The password reset
email never arrives, and when I try to contact support, I just get a
generic automated response. This is extremely frustrating as I need to
download my invoice for accounting purposes. I'm considering switching
to a competitor if this isn't resolved by end of week.
"""

result = feedback_chain.execute(customer_feedback)
print("\n" + "="*50)
print("FINAL OUTPUT:")
print("="*50)
print(result["final_output"])

### Exercise 3.2: Research Paper Analysis Chain

In [None]:
# Build chain for research paper analysis
paper_chain = PromptChain(temperature=0.4)

paper_chain.add_step(
    "Extract Core Elements",
    """
Extract from this abstract:
1. Research question/hypothesis
2. Methodology
3. Key findings (3-5 points)
4. Stated limitations

Abstract: {input}
    """
).add_step(
    "Analyze Significance",
    """
Based on this paper summary:

{input}

Analyze:
- Scientific significance (how does it advance the field?)
- Practical applications (real-world use cases)
- Limitations and concerns
    """
).add_step(
    "Executive Summary",
    """
Create a 150-word executive summary for business leaders based on:

{input}

Focus on practical implications and business value. Use clear, non-technical language.
    """
)

# Test with a sample abstract
abstract = """
This study examines the effectiveness of chain-of-thought prompting in
improving large language model performance on complex reasoning tasks.
Using a dataset of 1,000 multi-step problems across mathematics, logic,
and common sense reasoning, we demonstrate that explicit step-by-step
reasoning improves accuracy by 23% compared to direct prompting. We
introduce self-consistency decoding, which samples multiple reasoning
paths and selects the most consistent answer, further improving
accuracy by 12%. However, these approaches increase computational
costs by 3-5x due to longer prompts and multiple samples.
"""

result = paper_chain.execute(abstract)
print("\n" + "="*50)
print("EXECUTIVE SUMMARY:")
print("="*50)
print(result["final_output"])

## Part 4: Challenge Exercises

Apply what you've learned to solve complex problems.

### Challenge 1: Build a Reasoning Evaluator

Create a system that evaluates the quality of reasoning steps.

In [None]:
def evaluate_reasoning(problem, reasoning):
    """
    Evaluate the quality of reasoning for a given problem.
    
    TODO: Implement this function to:
    1. Check if each step follows logically from the previous
    2. Verify calculations and facts
    3. Identify any logical gaps or errors
    4. Provide a quality score (1-10)
    """
    # Your implementation here
    pass

# Test your evaluator
test_problem = "If 5 workers can build a wall in 10 days, how long would it take 10 workers?"
test_reasoning = zero_shot_cot(test_problem)

# evaluation = evaluate_reasoning(test_problem, test_reasoning)
# print(evaluation)

### Challenge 2: Adaptive Reasoning Strategy

Build a system that automatically chooses the best reasoning approach based on the problem type.

In [None]:
def adaptive_reasoning(problem):
    """
    Automatically select and apply the best reasoning strategy.
    
    TODO: Implement to:
    1. Analyze the problem type (math, logic, planning, etc.)
    2. Choose appropriate technique (CoT, self-consistency, chain, etc.)
    3. Apply the chosen technique
    4. Return result with explanation of why that approach was chosen
    """
    # Your implementation here
    pass

# Test cases
test_cases = [
    "What is 25% of 80?",  # Simple math
    "Design a system to reduce office energy consumption.",  # Complex planning
    "If all A are B, and all B are C, are all A also C?",  # Logic
]

# for problem in test_cases:
#     result = adaptive_reasoning(problem)
#     print(f"\nProblem: {problem}")
#     print(f"Strategy: {result['strategy']}")
#     print(f"Answer: {result['answer']}")
#     print("-" * 50)

### Challenge 3: Multi-Step Problem Solver

Create a sophisticated system that combines multiple techniques.

In [None]:
class AdvancedProblemSolver:
    """
    Advanced problem solver combining multiple techniques.
    
    TODO: Implement to:
    1. Break complex problems into sub-problems
    2. Apply appropriate reasoning technique to each
    3. Combine results
    4. Verify final answer
    5. Provide confidence score
    """
    
    def __init__(self):
        pass
    
    def solve(self, problem):
        # Your implementation here
        pass

# Test with a complex problem
complex_problem = """
A company needs to optimize its supply chain. They have 3 warehouses,
10 retail locations, and transportation costs vary by distance and volume.
Warehouse 1 has 1000 units, Warehouse 2 has 1500 units, Warehouse 3 has 2000 units.
Each retail location needs 400 units.
What's the optimal distribution strategy to minimize costs while meeting demand?
"""

# solver = AdvancedProblemSolver()
# result = solver.solve(complex_problem)
# print(result)

## Summary & Key Takeaways

In this lab, you've learned:

1. **Zero-Shot CoT**: Simple "Let's think step by step" dramatically improves reasoning
2. **Self-Consistency**: Multiple reasoning paths increase accuracy
3. **Prompt Chaining**: Breaking complex tasks into steps improves results
4. **Trade-offs**: Advanced techniques improve accuracy but increase costs

### Best Practices

- Start with simple techniques and add complexity only if needed
- Always verify reasoning steps for important tasks
- Monitor token usage and costs
- Test multiple approaches to find what works best
- Document successful prompt patterns for reuse

### Next Steps

- Complete the challenge exercises
- Experiment with different problem types
- Build your own reasoning toolkit
- Move on to Lab 2: OpenAI API Deep Dive