Skip to content

generalroboticslab/IdeationEval

Repository files navigation

Scientific judgment drifts over time in AI ideation

Lingyu Zhang, Mitchell Wang, Boyuan Chen

Duke University, General Robotics Lab

overview

Overview

LLMs-driven scientific idea generators assume that scientists’ evaluations form a fixed golden standard. We test this with a two-wave study with 7,182 ratings from 57 active researchers. We find that evaluations of the same idea drifted over time, but relative importance of scientific value dimensions like originality and feasibility stayed consistent. Tuning to a fixed human snap- shot produced improvements that were transient rather than persistent.

Setup

conda create -n ideaeval python==3.12.2
conda activate ideaeval
pip install -r requirements.txt

A semantic scholar api key and an openai api key is required. Configure them as environment variables:

export SS_API_KEY="xxxxxxxxx"
export OPENAI_API_KEY="sk-XXXXXXX"

Study Instructions

1) Building A Dataset

Building A Dataset

2) Generating Ideas

Generating Ideas

3) Preparing Surveys

Preparing Surveys

4) Aligning LLM Evaluator

Aligning LLM Evaluator

5) Analysis

Analysis

Citation

@article{zhang2025ideationeval,
  title={Scientific judgment drifts over time in AI ideation},
  author={Zhang, Lingyu and Wang, Mitchell and Chen, Boyuan},
  journal={arXiv preprint arXiv:2511.04964},
  year={2025}
}

Acknowledgement

This work is supported by DARPA FoundSci program under award HR00112490372.

About

Study on how scientific judgment change

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published