FinSearchComp: Towards a Realistic, Expert-Level Evaluation of Financial Search and Reasoning

ByteDance Seed • Columbia Business School

🔔 Introduction

Realistic decision-making tasks require three core skills: finding the right signals, checking and reconciling sources, and turning them into grounded judgments under time pressure. We provide a foundational, end-to-end evaluation infrastructure—an open finance benchmark with tasks for time-sensitive fetching, historical lookup, and multi-source investigation—that measures these skills directly.

⚙️ Installation

To install the required packages:

# we prefer to run the code in a conda environment
git clone git@github.com:randomtutu/FinSearchComp.git
cd FinSearchComp
conda create -n finsearchcomp python=3.10
conda activate finsearchcomp
pip install -r finsearchcomp/requirements.txt

🚀 Quick Start

You can quick start like this:

Dataset note: The complete release lives at data/finsearchcomp_data.json. Because some ground-truth steps rely on public AkShare APIs that only cover part of the data, we also provide data/finsearchcomp_akshare_version.json. The quick-start commands default to this AkShare-compatible split.

1️⃣ Configure the finsearchcomp/config/config.yaml with your API keys (e.g., Gemini).

2️⃣ Process a specific data file:

python finsearchcomp/chat/chat.py \
  --model_name gemini-2.5-flash \
  --input_file ../data/finsearchcomp_akshare_version.json \
  --output_path result/chat-result/chat.json \
  --limit 1

limit=0 means process all questions in the data file.

3️⃣ Conduct evaluation:

python finsearchcomp/eval/eval.py \
  --model_name gemini-2.5-flash \
  --input finsearchcomp/result/chat-result/chat.json \
  --output finsearchcomp/result/eval-result/eval.json

🛠️ Project Structure

eval

eval.py: Main evaluation script, calculates metrics.

chat

chat.py: Processes full dataset for FinSearchComp.

data

*.json: JSON files with FinSearchComp data.

config

config.yaml: API keys and model settings.
config_wrapper.py: Helper for loading configurations.

result

chat-result/: Dialogue outputs (e.g., demo.json).
eval-result/: Evaluation results.

models

deepseek.py: Implement of dpsk models.
openai_api.py: Implement of openai models.
gemini.py: Implement of gemini models.

logger

config.py: logger config.

📄 Citation

@misc{hu2025finsearchcomprealisticexpertlevelevaluation,
      title={FinSearchComp: Towards a Realistic, Expert-Level Evaluation of Financial Search and Reasoning}, 
      author={Liang Hu and Jianpeng Jiao and Jiashuo Liu and Yanle Ren and Zhoufutu Wen and Kaiyuan Zhang and Xuanliang Zhang and Xiang Gao and Tianci He and Fei Hu and Yali Liao and Zaiyuan Wang and Chenghao Yang and Qianyu Yang and Mingren Yin and Zhiyuan Zeng and Ge Zhang and Xinyi Zhang and Xiying Zhao and Zhenwei Zhu and Hongseok Namkoong and Wenhao Huang and Yuwen Tang},
      year={2025},
      eprint={2509.13160},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2509.13160}, 
}

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
data		data
docs		docs
finsearchcomp		finsearchcomp
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
test.sh		test.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

FinSearchComp: Towards a Realistic, Expert-Level Evaluation of Financial Search and Reasoning

🔔 Introduction

⚙️ Installation

🚀 Quick Start

🛠️ Project Structure

eval

chat

data

config

result

models

logger

📄 Citation

About

Uh oh!

Releases

Packages

Languages

License

iamsk/FinSearchComp

Folders and files

Latest commit

History

Repository files navigation

FinSearchComp: Towards a Realistic, Expert-Level Evaluation of Financial Search and Reasoning

🔔 Introduction

⚙️ Installation

🚀 Quick Start

🛠️ Project Structure

eval

chat

data

config

result

models

logger

📄 Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages