GHOSTS: Mathematical Capabilities of ChatGPT [Project Website]

A Natural-Language Dataset and a New Benchmark for Advanced Mathematics

Abstract

We investigate the mathematical capabilities of two iterations of ChatGPT (released 9-January-2023 and 30-January-2023) and of GPT-4 by testing them on publicly available datasets, as well as hand-crafted ones, using a novel methodology. In contrast to formal mathematics, where large databases of formal proofs are available (e.g., the Lean Mathematical Library), current datasets of natural-language mathematics, used to benchmark language models, either cover only elementary mathematics or are very small. We address this by publicly releasing two new datasets: GHOSTS and miniGHOSTS. These are the first natural-language datasets curated by working researchers in mathematics that (1) aim to cover graduate-level mathematics, (2) provide a holistic overview of the mathematical capabilities of language models, and (3) distinguish multiple dimensions of mathematical reasoning. These datasets also test whether ChatGPT and GPT-4 can be helpful assistants to professional mathematicians by emulating use cases that arise in the daily professional activities of mathematicians. We benchmark the models on a range of fine-grained performance metrics. For advanced mathematics, this is the most detailed evaluation effort to date. We find that ChatGPT can be used most successfully as a mathematical assistant for querying facts, acting as a mathematical search engine and knowledge base interface. GPT-4 can additionally be used for undergraduate-level mathematics but fails on graduate-level difficulty. Contrary to many positive reports in the media about GPT-4 and ChatGPT's exam-solving abilities (a potential case of selection bias), their overall mathematical performance is well below the level of a graduate student. Hence, if your goal is to use ChatGPT to pass a graduate-level math exam, you would be better off copying from your average peer!

Licence

We refer to the paper for detailed information about the licence (TL;DR everything we created by ourselves is shared under CC BY-NC 4.0; for all the prompts from copyrighted books, the respective licenses apply)

Citation

If you use our dataset, please cite our paper:

@article{frieder2023mathematical,
  title={Mathematical capabilities of chatgpt},
  author={Frieder, Simon and Pinchetti, Luca and Griffiths, Ryan-Rhys and Salvatori, Tommaso and Lukasiewicz, Thomas and Petersen, Philipp Christian and Chevalier, Alexis and Berner, Julius},
  journal={arXiv preprint arXiv:2301.13867},
  year={2023}
}

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
dataset_30jan		dataset_30jan
dataset_9jan		dataset_9jan
miniGHOSTS_gpt4		miniGHOSTS_gpt4
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GHOSTS: Mathematical Capabilities of ChatGPT [Project Website]

Abstract

Licence

Citation

About

Contributors 2

friederrr/GHOSTS

Folders and files

Latest commit

History

Repository files navigation

GHOSTS: Mathematical Capabilities of ChatGPT [Project Website]

Abstract

Licence

Citation

About

Topics

Resources

Stars

Watchers

Forks

Contributors 2