FullStack-Bench

Overview

This is a repository for the full-stack evaluation benchmark (FullStack-Bench) described in the paper "FullStack-Agent: Enhancing Agentic Full-Stack Web Coding via Development-Oriented Testing and Repository Back-Translation". It also contains code for the baseline testing.

Dataset

Dataset Name	Huggingface Link
FullStack-Bench	🤗 luzimu/FullStack-Bench

Installation

Run the following commands:

# install python dependencies
git clone https://github.com/mnluzimu/FullStack-Bench.git
cd FullStack-Bench
conda create -p env/fullstack-bench python=3.10 -y
conda activate env/fullstack-bench
pip install -r requirements.txt

Quick Start

For frontend, backend, and database testing of FullStack-Bench, run:

# FullStack-Dev:
bash src/eval_fullstack-dev/ui_eval_with_answer.sh $WORKING_DIR_ROOT $LOG_DIR_ROOT

# Baselines
# WebGen-Agent:
bash src/eval_fullstack-dev/ui_eval_with_answer.sh $WORKING_DIR_ROOT $LOG_DIR_ROOT
# OpenHands:
bash src/eval_fullstack-dev/ui_eval_with_answer.sh $WORKING_DIR_ROOT $LOG_DIR_ROOT
# Qwen-Code:
python src/eval_qwen-code/ui_eval_with_answer.py --in_dir $WORKING_DIR_ROOT --log_dir $LOG_DIR_ROOT
# TDDev:
python src/eval_tddev/ui_eval_with_answer.py --in_dir $WORKING_DIR_ROOT --log_dir $LOG_DIR_ROOT
# Bolt.diy:
python src/eval_bolt_diy/ui_eval_with_answer.py --in_dir $WORKING_DIR_ROOT --log_dir $LOG_DIR_ROOT

Appearance Evaluation

bash src/grade_appearance/eval_appearance_parallel.sh $LOG_DIR_ROOT

Experimental Results

Experimental results of FullStack-Dev on FullStack-Bench compared to popular baseline methods are shown below:

Cite

If you find our project helpful, please cite:

@misc{lu2026fullstackagentenhancingagenticfullstack,
      title={FullStack-Agent: Enhancing Agentic Full-Stack Web Coding via Development-Oriented Testing and Repository Back-Translation}, 
      author={Zimu Lu and Houxing Ren and Yunqiao Yang and Ke Wang and Zhuofan Zong and Mingjie Zhan and Hongsheng Li},
      year={2026},
      eprint={2602.03798},
      archivePrefix={arXiv},
      primaryClass={cs.SE},
      url={https://arxiv.org/abs/2602.03798}, 
}

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
assets		assets
baselines		baselines
data		data
src		src
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

FullStack-Bench

Overview

Dataset

Installation

Quick Start

Appearance Evaluation

Experimental Results

Cite

About

Uh oh!

Releases

Packages

Languages

mnluzimu/FullStack-Bench

Folders and files

Latest commit

History

Repository files navigation

FullStack-Bench

Overview

Dataset

Installation

Quick Start

Appearance Evaluation

Experimental Results

Cite

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages