Lingyu Zhang, Mitchell Wang, Boyuan Chen
Duke University, General Robotics Lab
LLMs-driven scientific idea generators assume that scientists’ evaluations form a fixed golden standard. We test this with a two-wave study with 7,182 ratings from 57 active researchers. We find that evaluations of the same idea drifted over time, but relative importance of scientific value dimensions like originality and feasibility stayed consistent. Tuning to a fixed human snap- shot produced improvements that were transient rather than persistent.
conda create -n ideaeval python==3.12.2
conda activate ideaeval
pip install -r requirements.txtA semantic scholar api key and an openai api key is required. Configure them as environment variables:
export SS_API_KEY="xxxxxxxxx"
export OPENAI_API_KEY="sk-XXXXXXX"@article{zhang2025ideationeval,
title={Scientific judgment drifts over time in AI ideation},
author={Zhang, Lingyu and Wang, Mitchell and Chen, Boyuan},
journal={arXiv preprint arXiv:2511.04964},
year={2025}
}
This work is supported by DARPA FoundSci program under award HR00112490372.
