# Cognitive testing & LLM biases
This notebook provides example code for using [EDSL](https://docs.expectedparrot.com) to investigate biases of large language models. 

[EDSL is an open-source library](https://github.com/expectedparrot/edsl) for simulating surveys, experiments and other research with AI agents and large language models. 
Before running the code below, please ensure that you have [installed the EDSL library](https://docs.expectedparrot.com/en/latest/installation.html) and either [activated remote inference](https://docs.expectedparrot.com/en/latest/remote_inference.html) from your [Coop account](https://docs.expectedparrot.com/en/latest/coop.html) or [stored API keys](https://docs.expectedparrot.com/en/latest/api_keys.html) for the language models that you want to use with EDSL. Please also see our [documentation page](https://docs.expectedparrot.com/) for tips and tutorials on getting started using EDSL.

## Selecting language models
A list of current available models can be viewed [here](https://www.expectedparrot.com/getting-started/coop-pricing).

To see a list of service providers:

In [1]:
from edsl import Model

Model.services()

service
anthropic
azure
bedrock
deep_infra
deepseek
google
groq
mistral
ollama
open_router


To inspect the default model:

In [2]:
Model()

key,value
model,gpt-4o
parameters:temperature,0.5
parameters:max_tokens,1000
parameters:top_p,1
parameters:frequency_penalty,0
parameters:presence_penalty,0
parameters:logprobs,False
parameters:top_logprobs,3
inference_service,openai


Here we select several models to compare their responses for the survey that we create in the steps below:

In [3]:
from edsl import ModelList

models = ModelList(
    Model(m) for m in ["gemini-2.5-flash", "gpt-4o", "claude-3-5-sonnet-20240620"]
)

ValueError: Model 'claude-3-5-sonnet-20240620' not found in any service. 
                             Available models: ['claude-3-5-haiku-20241022', 'claude-3-7-sonnet-20250219', 'claude-3-haiku-20240307', 'claude-haiku-4-5-20251001', 'claude-opus-4-1-20250805', 'claude-opus-4-20250514', 'claude-opus-4-5-20251101', 'claude-sonnet-4-20250514', 'claude-sonnet-4-5-20250929', 'azure:gpt-4.1', 'azure:gpt-4.1-2', 'azure:gpt-4.1-mini', 'azure:gpt-4o', 'azure:gpt-4o-mini-test', 'azure:gpt-4o-test', 'azure:o1', 'azure:o1-mini', 'azure:o3-mini', 'azure:o4-mini', 'azure:o4-mini-2', 'ai21.jamba-1-5-large-v1:0', 'ai21.jamba-1-5-mini-v1:0', 'amazon.nova-lite-v1:0', 'amazon.nova-micro-v1:0', 'amazon.nova-pro-v1:0', 'anthropic.claude-3-5-sonnet-20240620-v1:0', 'anthropic.claude-3-haiku-20240307-v1:0', 'anthropic.claude-3-sonnet-20240229-v1:0', 'cohere.command-r-plus-v1:0', 'cohere.command-r-v1:0', 'google.gemma-3-12b-it', 'google.gemma-3-27b-it', 'google.gemma-3-4b-it', 'meta.llama3-70b-instruct-v1:0', 'meta.llama3-8b-instruct-v1:0', 'mistral.magistral-small-2509', 'mistral.ministral-3-14b-instruct', 'mistral.ministral-3-3b-instruct', 'mistral.ministral-3-8b-instruct', 'mistral.mistral-7b-instruct-v0:2', 'mistral.mistral-large-2402-v1:0', 'mistral.mistral-large-3-675b-instruct', 'mistral.mistral-small-2402-v1:0', 'mistral.mixtral-8x7b-instruct-v0:1', 'mistral.voxtral-mini-3b-2507', 'mistral.voxtral-small-24b-2507', 'nvidia.nemotron-nano-12b-v2', 'nvidia.nemotron-nano-3-30b', 'qwen.qwen3-32b-v1:0', 'qwen.qwen3-coder-30b-a3b-v1:0', 'qwen.qwen3-next-80b-a3b', 'qwen.qwen3-vl-235b-a22b', 'Gryphe/MythoMax-L2-13b', 'MiniMaxAI/MiniMax-M2', 'NousResearch/Hermes-3-Llama-3.1-405B', 'NousResearch/Hermes-3-Llama-3.1-70B', 'Qwen/Qwen2.5-72B-Instruct', 'Qwen/Qwen2.5-VL-32B-Instruct', 'Qwen/Qwen3-235B-A22B-Instruct-2507', 'Qwen/Qwen3-235B-A22B-Thinking-2507', 'Qwen/Qwen3-Coder-480B-A35B-Instruct', 'Qwen/Qwen3-Coder-480B-A35B-Instruct-Turbo', 'Qwen/Qwen3-Next-80B-A3B-Instruct', 'Qwen/Qwen3-VL-235B-A22B-Instruct', 'Qwen/Qwen3-VL-30B-A3B-Instruct', 'Sao10K/L3-8B-Lunaris-v1-Turbo', 'Sao10K/L3.1-70B-Euryale-v2.2', 'Sao10K/L3.3-70B-Euryale-v2.3', 'allenai/Olmo-3.1-32B-Instruct', 'allenai/olmOCR-2-7B-1025', 'anthropic/claude-3-7-sonnet-latest', 'anthropic/claude-4-opus', 'anthropic/claude-4-sonnet', 'deepseek-ai/DeepSeek-V3', 'deepseek-ai/DeepSeek-V3-0324', 'deepseek-ai/DeepSeek-V3.1', 'deepseek-ai/DeepSeek-V3.1-Terminus', 'deepseek-ai/DeepSeek-V3.2', 'google/gemini-2.5-flash', 'google/gemini-2.5-pro', 'google/gemma-3-12b-it', 'google/gemma-3-27b-it', 'google/gemma-3-4b-it', 'meta-llama/Llama-3.2-11B-Vision-Instruct', 'meta-llama/Llama-3.2-3B-Instruct', 'meta-llama/Llama-3.3-70B-Instruct-Turbo', 'meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8', 'meta-llama/Llama-4-Scout-17B-16E-Instruct', 'meta-llama/Meta-Llama-3-8B-Instruct', 'meta-llama/Meta-Llama-3.1-70B-Instruct', 'meta-llama/Meta-Llama-3.1-70B-Instruct-Turbo', 'meta-llama/Meta-Llama-3.1-8B-Instruct', 'meta-llama/Meta-Llama-3.1-8B-Instruct-Turbo', 'microsoft/WizardLM-2-8x22B', 'microsoft/phi-4', 'mistralai/Mistral-Nemo-Instruct-2407', 'mistralai/Mistral-Small-24B-Instruct-2501', 'mistralai/Mistral-Small-3.2-24B-Instruct-2506', 'mistralai/Mixtral-8x7B-Instruct-v0.1', 'moonshotai/Kimi-K2-Instruct-0905', 'moonshotai/Kimi-K2-Thinking', 'nvidia/Llama-3.1-Nemotron-70B-Instruct', 'nvidia/Llama-3.3-Nemotron-Super-49B-v1.5', 'nvidia/NVIDIA-Nemotron-Nano-12B-v2-VL', 'nvidia/NVIDIA-Nemotron-Nano-9B-v2', 'nvidia/Nemotron-3-Nano-30B-A3B', 'openai/gpt-oss-120b', 'openai/gpt-oss-120b-Turbo', 'openai/gpt-oss-20b', 'zai-org/GLM-4.6', 'zai-org/GLM-4.6V', 'zai-org/GLM-4.7', 'deepseek-chat', 'deepseek-reasoner', 'gemini-2.0-flash', 'gemini-2.0-flash-001', 'gemini-2.0-flash-exp', 'gemini-2.0-flash-exp-image-generation', 'gemini-2.0-flash-lite', 'gemini-2.0-flash-lite-001', 'gemini-2.0-flash-lite-preview', 'gemini-2.0-flash-lite-preview-02-05', 'gemini-2.5-flash', 'gemini-2.5-flash-image', 'gemini-2.5-flash-lite', 'gemini-2.5-flash-lite-preview-09-2025', 'gemini-2.5-flash-preview-09-2025', 'gemini-2.5-pro', 'gemini-3-flash-preview', 'gemini-3-pro-image-preview', 'gemini-3-pro-preview', 'gemini-flash-latest', 'gemini-flash-lite-latest', 'gemini-pro-latest', 'gemini-robotics-er-1.5-preview', 'gemma-3-12b-it', 'gemma-3-1b-it', 'gemma-3-27b-it', 'gemma-3-4b-it', 'gemma-3n-e2b-it', 'gemma-3n-e4b-it', 'nano-banana-pro-preview', 'allam-2-7b', 'groq/compound', 'groq/compound-mini', 'llama-3.1-8b-instant', 'llama-3.3-70b-versatile', 'meta-llama/llama-4-maverick-17b-128e-instruct', 'meta-llama/llama-4-scout-17b-16e-instruct', 'moonshotai/kimi-k2-instruct', 'moonshotai/kimi-k2-instruct-0905', 'openai/gpt-oss-safeguard-20b', 'codestral-2411-rc5', 'codestral-2412', 'codestral-2501', 'codestral-2508', 'codestral-latest', 'devstral-2512', 'devstral-latest', 'devstral-medium-2507', 'devstral-medium-latest', 'devstral-small-2507', 'devstral-small-latest', 'labs-devstral-small-2512', 'labs-mistral-small-creative', 'ministral-14b-2512', 'ministral-14b-latest', 'ministral-3b-2410', 'ministral-3b-2512', 'ministral-3b-latest', 'ministral-8b-2410', 'ministral-8b-2512', 'ministral-8b-latest', 'mistral-large-2411', 'mistral-large-2512', 'mistral-large-latest', 'mistral-large-pixtral-2411', 'mistral-medium-2505', 'mistral-medium-2508', 'mistral-small-2501', 'mistral-small-2506', 'mistral-small-latest', 'mistral-tiny', 'mistral-tiny-2312', 'mistral-tiny-2407', 'mistral-tiny-latest', 'mistral-vibe-cli-latest', 'open-mistral-7b', 'open-mistral-nemo', 'open-mistral-nemo-2407', 'pixtral-12b', 'pixtral-12b-2409', 'pixtral-12b-latest', 'pixtral-large-2411', 'pixtral-large-latest', 'voxtral-mini-2507', 'voxtral-mini-latest', 'voxtral-small-2507', 'voxtral-small-latest', 'chatgpt-4o-latest', 'gpt-3.5-turbo', 'gpt-3.5-turbo-0125', 'gpt-3.5-turbo-1106', 'gpt-3.5-turbo-16k', 'gpt-4', 'gpt-4-0125-preview', 'gpt-4-0613', 'gpt-4-1106-preview', 'gpt-4-turbo', 'gpt-4-turbo-2024-04-09', 'gpt-4-turbo-preview', 'gpt-4.1', 'gpt-4.1-2025-04-14', 'gpt-4.1-mini', 'gpt-4.1-mini-2025-04-14', 'gpt-4.1-nano', 'gpt-4.1-nano-2025-04-14', 'gpt-4o', 'gpt-4o-2024-05-13', 'gpt-4o-2024-08-06', 'gpt-4o-2024-11-20', 'gpt-4o-mini', 'gpt-4o-mini-2024-07-18', 'gpt-5', 'gpt-5-chat-latest', 'gpt-5-mini', 'gpt-5-nano', 'gpt-5.1', 'gpt-5.1-2025-11-13', 'gpt-5.2', 'gpt-5.2-2025-12-11', 'o1', 'o3', 'o3-mini', 'o4-mini', 'gpt-5-2025-08-07', 'gpt-5-codex', 'gpt-5-mini-2025-08-07', 'gpt-5-nano-2025-08-07', 'gpt-5-pro-2025-10-06', 'gpt-5.1-chat-latest', 'gpt-5.1-codex', 'gpt-5.1-codex-mini', 'o1-2024-12-17', 'o1-pro', 'o1-pro-2025-03-19', 'o3-2025-04-16', 'o3-mini-2025-01-31', 'o3-pro', 'o3-pro-2025-06-10', 'o4-mini-2025-04-16', 'sonar', 'sonar-deep-research', 'sonar-pro', 'sonar-reasoning-pro', 'Qwen/Qwen2.5-72B-Instruct-Turbo', 'Qwen/Qwen2.5-7B-Instruct-Turbo', 'Qwen/Qwen3-235B-A22B-Instruct-2507-tput', 'Qwen/Qwen3-Coder-480B-A35B-Instruct-FP8', 'Qwen/Qwen3-Next-80B-A3B-Thinking', 'Qwen/Qwen3-VL-32B-Instruct', 'Qwen/Qwen3-VL-8B-Instruct', 'ServiceNow-AI/Apriel-1.5-15b-Thinker', 'ServiceNow-AI/Apriel-1.6-15b-Thinker', 'arcee-ai/trinity-mini', 'arize-ai/qwen-2-1.5b-instruct', 'deepcogito/cogito-v2-1-671b', 'deepcogito/cogito-v2-preview-llama-109B-MoE', 'deepcogito/cogito-v2-preview-llama-405B', 'deepcogito/cogito-v2-preview-llama-70B', 'deepseek-ai/DeepSeek-R1', 'essentialai/rnj-1-instruct', 'google/gemma-2b-it-Ishan', 'google/gemma-3n-E4B-it', 'marin-community/marin-8b-instruct', 'meta-llama/Llama-3.2-3B-Instruct-Turbo', 'meta-llama/Meta-Llama-3-8B-Instruct-Lite', 'meta-llama/Meta-Llama-3.1-405B-Instruct-Turbo', 'mistralai/Ministral-3-14B-Instruct-2512', 'mistralai/Mistral-7B-Instruct-v0.2', 'mistralai/Mistral-7B-Instruct-v0.3', 'mistralai/Voxtral-Mini-3B-2507', 'scb10x/scb10x-typhoon-2-1-gemma3-12b', 'togethercomputer/Refuel-Llm-V2', 'togethercomputer/Refuel-Llm-V2-Small', 'zai-org/GLM-4.5-Air-FP8', 'grok-2-vision-1212', 'grok-3', 'grok-4-0709', 'grok-4-1-fast-non-reasoning', 'grok-4-1-fast-reasoning', 'grok-4-fast-non-reasoning', 'grok-4-fast-reasoning']. 
                             Available services: ['anthropic', 'azure', 'bedrock', 'deep_infra', 'deepseek', 'google', 'groq', 'mistral', 'openai', 'openai_v2', 'perplexity', 'together', 'xai']
                            Used source: coop_working

## Generating content
EDSL comes with a variety of standard survey question types, such as multiple choice, free text, etc. These can be selected based on the desired format of the response. See details about all types [here](https://docs.expectedparrot.com/en/latest/questions.html#question-type-classes). We can use `QuestionFreeText` to prompt the models to generate some content for our experiment:

In [None]:
from edsl import QuestionFreeText

q = QuestionFreeText(
    question_name = "poem",
    question_text = "Please draft a short poem about any topic. Return only the poem."
)

We generate a response to the question by adding the models to use with the `by` method and then calling the `run` method. This generates a `Results` object with a `Result` for each response to the question:

In [None]:
results = q.by(models).run()

To see a list of all components of results:

In [None]:
results.columns 

We can inspect components of the results individually:

In [None]:
results.select("model", "poem")

## Conducting a review
Next we create a question to have a model evaluating a response that we use as an input to the new question:

In [None]:
from edsl import QuestionLinearScale

q_score = QuestionLinearScale(
    question_name = "score",
    question_text = "Please give the following poem a score. No easy grading! Poem: {{ scenario.poem }}",
    question_options = [0, 1, 2, 3, 4, 5],
    option_labels = {0: "Very poor", 5: "Excellent"},
)

## Parameterizing questions
We use `Scenario` objects to add each response to the new question. EDSL comes with many methods for creating scenarios from different data sources (PDFs, CSVs, docs, images, lists, etc.), as well as `Results` objects:

In [None]:
scenarios = (
    results.to_scenario_list()
    .select("model", "poem")
    .rename({"model": "drafting_model"}) # renaming the 'model' field to distinguish the evaluating model 
)
scenarios

Finally, we conduct the evaluation by having each model score each haiku that was generated (without information about whether the model itself was the source):

In [None]:
results = q_score.by(scenarios).by(models).run()

In [None]:
results.columns

In [None]:
results.sort_by("drafting_model", "model").select("drafting_model", "model", "poem", "score", "score_comment")

## Posting to the Coop
The [Coop](https://www.expectedparrot.com/content/explore) is a platform for creating, storing and sharing LLM-based research.
It is fully integrated with EDSL and accessible from your workspace or Coop account page.
Learn more about [creating an account](https://www.expectedparrot.com/login) and [using the Coop](https://docs.expectedparrot.com/en/latest/coop.html).

Here we post this notebook:

In [None]:
from edsl import Notebook

nb = Notebook(path = "explore_llm_biases.ipynb")

if refresh := False:
    nb.push(
        description = "Example code for comparing model responses and biases", 
        alias = "explore-llm-biases-notebook",
        visibility = "public"
    )
else:
    nb.patch("https://www.expectedparrot.com/content/RobinHorton/explore-llm-biases-notebook", value = nb)