## Setting Up Chain-of-Thought (COT) prompting for 3 LLMs (GPT-4o, Claude-4, Deepseek R1)

*Packages*

In [27]:
import os
from typing import List, Dict, Tuple
import pandas as pd
from IPython.display import display
import openai
from openai import OpenAI
import anthropic

*API call functions for OpenAI GPT, Anthropic Claude, and Deepseek*

In [41]:


def deepseek_chat(messages, model="deepseek-chat") -> str:
    client = OpenAI(
        api_key=os.getenv("DEEPSEEK_API_KEY"),
        base_url="https://api.deepseek.com"
    )

    response = client.chat.completions.create(
        model=model,
        messages=messages,
        stream=False
    )

    return response.choices[0].message.content

def _chat(model: str, messages: List[Dict], max_tokens: int = 1024) -> str:
    if model.startswith("gpt"):
        try:
            client = openai.OpenAI(api_key=os.getenv("OPENAI_API_KEY"))
            response = client.chat.completions.create(
                model=model,
                messages=messages,
                max_tokens=max_tokens
            )
            return response.choices[0].message.content
        except AttributeError:
            openai.api_key = os.getenv("OPENAI_API_KEY")
            response = openai.ChatCompletion.create(
                model=model,
                messages=messages,
                max_tokens=max_tokens
            )
            return response["choices"][0]["message"]["content"]

    elif model.startswith("claude"):
        client = anthropic.Anthropic(api_key=os.getenv("ANTHROPIC_API_KEY"))
        prompt = "\n\n".join(f"{m['role'].capitalize()}: {m['content']}" for m in messages) + "\n\nAssistant:"
        response = client.messages.create(
            model=model,
            system="You are a helpful assistant skilled in causal reasoning and CLDs.",
            max_tokens=max_tokens,
            messages=[{"role": "user", "content": prompt}]
        )
        return response.content[0].text

    elif model.startswith("deepseek"):
        return deepseek_chat(messages, model=model)
    
    else:
        raise ValueError(f"Unknown model: {model}")


*Function to chain prompt sequence to realise the chain-of-thought prompting*

In [37]:
def run_llm_chain(model: str, prompt_sequence: List[str]) -> List[Dict]:
    messages = []
    history = []

    for idx, prompt in enumerate(prompt_sequence, start=1):
        messages.append({"role": "user", "content": prompt})
        reply = _chat(model, messages)
        messages.append({"role": "assistant", "content": reply})
        history.append({
            "stage": idx,
            "prompt": prompt,
            "reply": reply
        })

    return history


*Function to run all (or selected models)*

In [38]:
def run_all_models(models: List[str], prompt_sequence: List[str]) -> Dict[str, List[Dict]]:
    all_results = {}
    for model in models:
        print(f"🚀 Running chain for {model}...")
        try:
            result = run_llm_chain(model, prompt_sequence)
            all_results[model] = result
        except Exception as e:
            print(f"❌ Error for {model}: {e}")
    return all_results


### (I) Testing function with 2 prompt toy example

In [50]:
# toy sequence of 2 prompts
PROMPT_SEQUENCE = [
    "What is your understanding of the issue with causal inference in traditional statistics?",
    "Can you explain with example how to represent causal inference in directed acyclic graph?"
]   

In [29]:
models_to_test = ["gpt-4o"]
results = run_all_models(models_to_test, PROMPT_SEQUENCE)


🚀 Running chain for gpt-4o...


In [30]:
print(results)

{'gpt-4o': [{'stage': 1, 'prompt': 'What is your understanding of the issue with causal inference in traditional statistics?', 'reply': 'Causal inference is a major challenge in traditional statistics due to its focus on correlation rather than causation. In traditional statistical analysis, we primarily deal with associations or correlations between variables, which do not necessarily imply that one variable causes the change in another. Several key issues contribute to the difficulty of causal inference:\n\n1. **Confounding Variables**: These are variables that influence both the independent and dependent variables, creating a spurious association. If not controlled for, confounders can lead to incorrect conclusions regarding causal relationships.\n\n2. **Reverse Causation**: This occurs when the direction of cause and effect is opposite to what is assumed. Traditional statistical methods often have difficulty in distinguishing whether X causes Y or Y causes X.\n\n3. **Simpson’s Para

In [31]:
models_to_test = ["claude-opus-4-20250514"]
results = run_all_models(models_to_test, PROMPT_SEQUENCE)

🚀 Running chain for claude-opus-4-20250514...


In [32]:
results

{'claude-opus-4-20250514': [{'stage': 1,
   'prompt': 'What is your understanding of the issue with causal inference in traditional statistics?',
   'reply': 'Traditional statistics faces several fundamental challenges with causal inference:\n\n## The Core Problem: Correlation ≠ Causation\n\nTraditional statistical methods excel at identifying associations and correlations but cannot, by themselves, determine whether these relationships are causal. A significant correlation between two variables could arise from:\n- A causing B\n- B causing A  \n- A third variable C causing both A and B (confounding)\n- Pure coincidence\n\n## Key Limitations\n\n**1. Observational Data Ambiguity**\nMost real-world data is observational rather than experimental. Without random assignment, we can\'t isolate causal effects from confounding factors. Traditional regression models can control for observed confounders, but unobserved confounders remain problematic.\n\n**2. Lack of Causal Framework**\nTradition

In [43]:
models_to_test = ["deepseek-chat"] # note that this is for DeepSeek V1
results = run_all_models(models_to_test, PROMPT_SEQUENCE)

🚀 Running chain for deepseek-chat...


In [44]:
results

{'deepseek-chat': [{'stage': 1,
   'prompt': 'What is your understanding of the issue with causal inference in traditional statistics?',
   'reply': 'Causal inference is a fundamental challenge in traditional statistics because statistical methods alone cannot establish causation without strong assumptions or experimental control. Here’s a breakdown of the key issues:\n\n### 1. **Correlation ≠ Causation**\n   - Traditional statistics often focuses on identifying associations (e.g., regression, correlation), but association does not imply causation. Confounding variables may explain observed relationships.\n   - Example: Ice cream sales and drowning incidents are correlated (both rise in summer), but neither causes the other.\n\n### 2. **Lack of Counterfactuals**\n   - Causal questions require comparing what happened to what *would have happened* under alternative conditions (counterfactuals). Traditional observational data lacks this because we cannot observe the same unit under both t

### (II) Testing the result based on the 8 prompts for the 4 stage process to construct CLDs

In [45]:
from prompts.system_messages import PROMPT_SEQUENCE

*Note that "deepseek-reasoner" is for DeepSeek R1 (the reasoning model) and be warned of exceptionally long run-time (>10 min)*

In [None]:
models_to_test = ["gpt-4o", "claude-opus-4-20250514", "deepseek-reasoner"]
results = run_all_models(models_to_test, PROMPT_SEQUENCE)

🚀 Running chain for gpt-4o...
🚀 Running chain for claude-opus-4-20250514...
🚀 Running chain for deepseek-reasoner...


*Export results as individual JSON*

In [51]:
import os
import json

# Ensure the subfolder exists
os.makedirs("output", exist_ok=True)

# Export each model's history into the output folder
for model_name, history in results.items():
    with open(f"output/{model_name}_results.json", "w", encoding="utf-8") as f:
        json.dump(history, f, ensure_ascii=False, indent=2)


*Export results as individual csv*

In [52]:
import pandas as pd

# Export each model's results as a CSV file into output/
for model_name, history in results.items():
    df = pd.DataFrame(history)
    df.to_csv(f"output/{model_name}_results.csv", index=False, encoding="utf-8")
