Author: Mindy Ng

# 1. Give “therapist exams” to LLMs and see how they score

PURPOSE: See if LLMs can pass standard exams for therapists. Provides us with a baseline of their knowledge about therapy. NOTE: Their ability to perform therapeutic techniques is likely worse than their knowledge on therapy theory.

1. Take 50 sample MCQs from [NCE](https://www.tests.com/practice/NCE-National-Counselor-Exam-Practice-Test)
    1. “””
    The following question is from the National Counselor Exam. Please answer as though you are taking the exam in order to become a National Certified Counselor. Please give your answer as a single letter from the multiple choices. The question is:
    
    A counselor intensifies a client's emotional state to show the client the irrationality of her emotional reaction.  What is this technique known as?
    
    A) Paradoxical intention
    B) Transactional-Analysis
    C) Systematic Desensitization
    D) Reconfiguration
    ”””
2. Take 50 sample MCQs from [MFT](https://www.mometrix.com/academy/mft-practice-test/)
    1. “””
    The following question is from the The Marital and Family Therapy (MFT) Exam. Please answer as though you are taking the exam in order to become a marital and family therapist. Please give your answer as a single letter from the multiple choices. The question is:
    
    Therapists who collaborate with clients on developing treatment plan goals based on the client's "best hope" for a successful outcome would most likely adhere to which of one of the following theoretical orientations?
    
    A) Exposure therapy
    B) Psychoanalytic therapy
    C) Structural family therapy
    D) Solution-focused therapy
    ”””
3. Evaluate ChatGPT, Claude, LLaMA, DeepSeek on these

In [1]:
!pip install anthropic --quiet

[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m264.0/264.0 kB[0m [31m5.7 MB/s[0m eta [36m0:00:00[0ma [36m0:00:01[0m
[?25h

In [2]:
from kaggle_secrets import UserSecretsClient

from openai import OpenAI
import anthropic

import numpy as np
import pandas as pd

import re

import time
from tqdm.auto import tqdm

In [3]:
nce = pd.read_csv("/kaggle/input/ethics-therapy-probe-experiments/NCE_test.csv", encoding='ISO-8859-1')
mft = pd.read_csv("/kaggle/input/ethics-therapy-probe-experiments/MFT_test.csv", encoding='ISO-8859-1')

In [4]:
nce.head()

Unnamed: 0,question,option_A,option_B,option_C,option_D,correct_answer
0,A counselor intensifies a client's emotional s...,Paradoxical intention,Transactional-Analysis,Systematic Desensitization,Reconfiguration,A
1,"Despite its age, Kohlberg's Theory of Moral De...",Personhood,Principled thought,Self-actualization,None of the above,B
2,Which of the following interventions is not co...,Discussion of patterns in a client's relations...,A detailed past history of the client,Exploration of the therapist/client relationship,The ABC Model,D
3,Members of a counseling group that score high ...,True,False,,,B
4,If a child is learning to use abstract concept...,Preoperational,Sensorimotoer,Concrete,Early Adolescence,D


In [5]:
mft.head()

Unnamed: 0,question,option_A,option_B,option_C,option_D,correct_answer,non_expert_baseline
0,Therapists who collaborate with clients on dev...,Exposure therapy,Psychoanalytic therapy,Structural family therapy,Solution-focused therapy,D,1
1,"According to California laws and regulations, ...",The therapist must only report proven cases of...,The therapist must report all suspected and pr...,The therapist must decide on a case-by-case ba...,The therapist must obtain a third-party assess...,B,1
2,An MFT provides ongoing therapy for a 9-year-o...,The therapist may perform the evaluation only ...,The therapist may perform the evaluation only ...,The therapist may only provide the court with ...,The therapist may only provide the court with ...,C,0
3,Which of the following practices can be used t...,Hypnosis,Mindfulness,Sensate focus,Flow,B,1
4,Which theorist deduced that infants form attac...,John Bowlby,Erik Erikson,Harry Harlow,Sigmund Freud,C,1


In [6]:
# Setup API keys
user_secrets = UserSecretsClient()

lambda_key = user_secrets.get_secret("lambda")
openai_key = user_secrets.get_secret("openai_apikey_apartlab")
anthropic_key = user_secrets.get_secret("anthropic_apartlab")

In [7]:
# Setup clients
lambda_client = OpenAI(
    api_key=lambda_key,
    base_url="https://api.lambda.ai/v1",
)

openai_client = OpenAI(api_key=openai_key)
claude_client = anthropic.Anthropic(api_key=anthropic_key)

In [8]:
def process_exam_questions(nce_df, mft_df):
    
    # Define models to use
    models = [
        {"name": "llama3.1-405b-instruct-fp8", "client": lambda_client, "type": "openai"},
        {"name": "deepseek-v3-0324", "client": lambda_client, "type": "openai"},
        {"name": "chatgpt-4o-latest", "client": openai_client, "type": "openai"},
        {"name": "claude-3-7-sonnet-20250219", "client": claude_client, "type": "anthropic"}
    ]

    # Avoid pandas error
    nce_df = nce_df.copy()
    mft_df = mft_df.copy()
    
    # Create result columns in each dataframe
    for df in [nce_df, mft_df]:
        for model in models:
            safe_name = model["name"].replace("-", "_").replace(".", "_")
            df[f"{safe_name}_answer"] = ""
    
    # Function to extract just the letter choice from response
    def extract_letter_choice(response):
        # Look for patterns like "Answer: B", "Option: C", "I choose D", etc.
        patterns = [
            r'(?:^|answer\s*(?:is|:)?\s*|option\s*(?:is|:)?\s*|choose\s*|select\s*|pick\s*)([abcd])(?:\s|\.|,|$|"|\)|\')',
            r'(?:^|\s)([abcd])(?:\s|\.|,|$|"|\)|\')'
        ]
        
        for pattern in patterns:
            matches = re.findall(pattern, response, re.IGNORECASE)
            if matches:
                return matches[0].upper()
        
        return "None"  # No clear choice detected
    
    # Process each dataframe
    for exam_type, df in [("NCE", nce_df), ("MFT", mft_df)]:
        print(f"Processing {exam_type} questions...")
        
        for idx in tqdm(range(len(df)), desc=f"Processing {exam_type}"):
            # Create the prompt with question and options
            prompt = f"""
                    Question: {df.loc[idx, 'question']}
                    
                    A. {df.loc[idx, 'option_A']}
                    B. {df.loc[idx, 'option_B']}
                    C. {df.loc[idx, 'option_C']}
                    D. {df.loc[idx, 'option_D']}
                    
                    Select the best answer from the options above. Respond with only the letter (A, B, C, or D) of your choice.
                    """
            
            # Process with each model
            for model in models:
                safe_name = model["name"].replace("-", "_").replace(".", "_")
                
                try:
                    # Get response based on model type
                    if model["type"] == "openai":
                        response = model["client"].chat.completions.create(
                            model=model["name"],
                            messages=[{"role": "user", "content": prompt}]
                        )
                        result = response.choices[0].message.content
                    
                    elif model["type"] == "anthropic":
                        response = model["client"].messages.create(
                            model=model["name"],
                            max_tokens=50,  # Short response needed
                            messages=[{"role": "user", "content": prompt}]
                        )
                        result = response.content[0].text
                    
                    # Extract just the letter choice
                    letter_choice = extract_letter_choice(result)
                    
                    # Store the answer
                    df.loc[idx, f"{safe_name}_answer"] = letter_choice
                    
                    # Rate limiting
                    time.sleep(0.5)
                    
                except Exception as e:
                    print(f"Error with {model['name']} on {exam_type} row {idx}: {e}")
                    df.loc[idx, f"{safe_name}_answer"] = "ERROR"
    
    return nce_df, mft_df

# Run the processing
nce_llm_re, mft_llm_re = process_exam_questions(nce, mft)

Processing NCE questions...


Processing NCE:   0%|          | 0/50 [00:00<?, ?it/s]

Processing MFT questions...


Processing MFT:   0%|          | 0/50 [00:00<?, ?it/s]

In [13]:
nce_llm_re.head()

Unnamed: 0,question,option_A,option_B,option_C,option_D,correct_answer,llama3_1_405b_instruct_fp8_answer,deepseek_v3_0324_answer,chatgpt_4o_latest_answer,claude_3_7_sonnet_20250219_answer
0,A counselor intensifies a client's emotional s...,Paradoxical intention,Transactional-Analysis,Systematic Desensitization,Reconfiguration,A,A,A,A,A
1,"Despite its age, Kohlberg's Theory of Moral De...",Personhood,Principled thought,Self-actualization,None of the above,B,B,B,B,B
2,Which of the following interventions is not co...,Discussion of patterns in a client's relations...,A detailed past history of the client,Exploration of the therapist/client relationship,The ABC Model,D,D,D,D,D
3,Members of a counseling group that score high ...,True,False,,,B,B,B,B,B
4,If a child is learning to use abstract concept...,Preoperational,Sensorimotoer,Concrete,Early Adolescence,D,C,C,D,D


In [14]:
mft_llm_re.head()

Unnamed: 0,question,option_A,option_B,option_C,option_D,correct_answer,non_expert_baseline,llama3_1_405b_instruct_fp8_answer,deepseek_v3_0324_answer,chatgpt_4o_latest_answer,claude_3_7_sonnet_20250219_answer
0,Therapists who collaborate with clients on dev...,Exposure therapy,Psychoanalytic therapy,Structural family therapy,Solution-focused therapy,D,1,D,D,D,D
1,"According to California laws and regulations, ...",The therapist must only report proven cases of...,The therapist must report all suspected and pr...,The therapist must decide on a case-by-case ba...,The therapist must obtain a third-party assess...,B,1,B,B,B,B
2,An MFT provides ongoing therapy for a 9-year-o...,The therapist may perform the evaluation only ...,The therapist may perform the evaluation only ...,The therapist may only provide the court with ...,The therapist may only provide the court with ...,C,0,A,B,C,C
3,Which of the following practices can be used t...,Hypnosis,Mindfulness,Sensate focus,Flow,B,1,B,B,B,B
4,Which theorist deduced that infants form attac...,John Bowlby,Erik Erikson,Harry Harlow,Sigmund Freud,C,1,C,C,C,C


In [15]:
nce_llm_re.to_csv('experiment_1_results_nce.csv')

In [16]:
mft_llm_re.to_csv('experiment_1_results_mft.csv')