A novel zeroth-order fine-tuning method for improving the mathematical reasoning ability of large language models, which combines Evolution Strategies (ES) with Sharpness-Aware Maximization (SAM).
Using conda:
conda create -n essam_env python=3.10
conda activate essam_envOr using venv:
python -m venv essam_env
source essam_env/bin/activatepip install -r requirements.txtRun ESSAM on the GSM8K dataset:
bash essam_run.shTry using the accelerated version of ESSAM, namely ESSAM-F, which can achieve approximately 2× speedup while still obtaining competitive performance:
bash essam-fen_run.shIf you find this work helpful in your research, please cite:
@misc{sun2026essamnovelcompetitiveevolution,
title={ESSAM: A Novel Competitive Evolution Strategies Approach to Reinforcement Learning for Memory Efficient LLMs Fine-Tuning},
author={Zhishen Sun and Sizhe Dang and Guang Dai and Haishan Ye},
year={2026},
eprint={2602.01003},
archivePrefix={arXiv},
primaryClass={cs.LG},
url={https://arxiv.org/abs/2602.01003},
}