Scientific judgment drifts over time in AI ideation

Lingyu Zhang, Mitchell Wang, Boyuan Chen

Overview

LLMs-driven scientific idea generators assume that scientists’ evaluations form a fixed golden standard. We test this with a two-wave study with 7,182 ratings from 57 active researchers. We find that evaluations of the same idea drifted over time, but relative importance of scientific value dimensions like originality and feasibility stayed consistent. Tuning to a fixed human snap- shot produced improvements that were transient rather than persistent.

Setup

conda create -n ideaeval python==3.12.2
conda activate ideaeval
pip install -r requirements.txt

A semantic scholar api key and an openai api key is required. Configure them as environment variables:

export SS_API_KEY="xxxxxxxxx"
export OPENAI_API_KEY="sk-XXXXXXX"

Study Instructions

1) Building A Dataset

Building A Dataset

2) Generating Ideas

Generating Ideas

3) Preparing Surveys

Preparing Surveys

4) Aligning LLM Evaluator

Aligning LLM Evaluator

5) Analysis

Analysis

Citation

@article{zhang2025ideationeval,
  title={Scientific judgment drifts over time in AI ideation},
  author={Zhang, Lingyu and Wang, Mitchell and Chen, Boyuan},
  journal={arXiv preprint arXiv:2511.04964},
  year={2025}
}

Acknowledgement

This work is supported by DARPA FoundSci program under award HR00112490372.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
analysis		analysis
calibrate		calibrate
data		data
docs		docs
evaluate		evaluate
generate		generate
.gitignore		.gitignore
README.md		README.md
__init__.py		__init__.py
ideas.py		ideas.py
main.py		main.py
paper.py		paper.py
pdf_parser.py		pdf_parser.py
prepare_sheet_full_w1.py		prepare_sheet_full_w1.py
prepare_sheet_full_w2.py		prepare_sheet_full_w2.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Scientific judgment drifts over time in AI ideation

Overview

Setup

Study Instructions

1) Building A Dataset

2) Generating Ideas

3) Preparing Surveys

4) Aligning LLM Evaluator

5) Analysis

Citation

Acknowledgement

About

Uh oh!

Releases

Packages

Languages

generalroboticslab/IdeationEval

Folders and files

Latest commit

History

Repository files navigation

Scientific judgment drifts over time in AI ideation

Overview

Setup

Study Instructions

1) Building A Dataset

2) Generating Ideas

3) Preparing Surveys

4) Aligning LLM Evaluator

5) Analysis

Citation

Acknowledgement

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages