You don’t need a personality test to know these models are unreliable: Assessing the Reliability of Large Language Models on Psychometric Instruments

Bangzhao Shu∗, Lechen Zhang∗, Minje Choi, Lavinia Dunagan, Dallas Card, David Jurgens

Paper Link

Abstract

The versatility of Large Language Models (LLMs) on natural language understanding tasks has made them popular for research in social sciences. In particular, to properly understand the properties and innate personas of LLMs, researchers have performed studies that involve using prompts in the form of questions that ask LLMs of particular opinions. In this study, we take a cautionary step back and examine whether the current format of prompting enables LLMs to provide responses in a consistent and robust manner. We first construct a dataset that contains 693 questions encompassing 39 different instruments of persona measurement on 115 persona axes. Additionally, we design a set of prompts containing minor variations and examine LLM's capabilities to generate accurate answers, as well as consistency variations to examine their consistency towards simple perturbations such as switching the option order. Our experiments on 15 different open-source LLMs reveal that even simple perturbations are sufficient to significantly downgrade a model's question-answering ability, and that most LLMs have low negation consistency. Our results suggest that the currently widespread practice of prompting is insufficient to accurately capture model perceptions, and we discuss potential alternatives to improve such issues.

Code and Data

To be updated soon in this repo

Name		Name	Last commit message	Last commit date
Latest commit History 39 Commits
data		data
result		result
src		src
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

data

data

result

result

src

src

.gitignore

.gitignore

README.md

README.md

Repository files navigation

You don’t need a personality test to know these models are unreliable: Assessing the Reliability of Large Language Models on Psychometric Instruments

Bangzhao Shu∗, Lechen Zhang∗, Minje Choi, Lavinia Dunagan, Dallas Card, David Jurgens

Paper Link

Abstract

Code and Data

About

Releases

Packages

Contributors 2

Languages

orange0629/llm-personas

Folders and files

Latest commit

History

Repository files navigation

You don’t need a personality test to know these models are unreliable: Assessing the Reliability of Large Language Models on Psychometric Instruments

Bangzhao Shu∗, Lechen Zhang∗, Minje Choi, Lavinia Dunagan, Dallas Card, David Jurgens

Paper Link

Abstract

Code and Data

About

Resources

Stars

Watchers

Forks

Languages