[EMNLP 2024] MMCode: Benchmarking Multimodal Large Language Models for Code Generation with Visually Rich Programming Problems

Dataset Description

MMCode is a multi-modal code generation dataset designed to evaluate the problem-solving skills of code language models in visually rich contexts. It contains 3,548 questions paired with 6,622 images, derived from real-world programming challenges across 10 code competition websites, with Python solutions and tests provided. The dataset emphasizes the extreme demand for reasoning abilities, the interwoven nature of textual and visual contents, and the occurrence of questions containing multiple images.

For more detailed introduction of the data, please see the 🤗 Huggingface Dataset.

Getting Started

Set Up

Before you begin, ensure your environment variables are set:

OPENAI_API_KEY: Your OpenAI API key.
GOOGLE_API_KEY: Your Google API key.

Inference

An example for GPT-4V generation:

python generate.py \
    --model gpt4v \
    --problems_root <path_to_the_test_set> \
    --save_path "results/gpt4v-mmcode_test.jsonl"

Evaluation

To evaluate the results generated by GPT-4V, run:

python eval.py \
    --problems_root <path_to_the_test_set> \
    --generation_file "results/gpt4v-mmcode_test.jsonl"

Citation

Please consider citing if you find our work useful:

@misc{li2024mmcode,
      title={MMCode: Evaluating Multi-Modal Code Large Language Models with Visually Rich Programming Problems}, 
      author={Kaixin Li and Yuchen Tian and Qisheng Hu and Ziyang Luo and Jing Ma},
      year={2024},
      eprint={2404.09486},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
.gitignore		.gitignore
README.md		README.md
eval.py		eval.py
generate.py		generate.py
models.py		models.py
testing_utils.py		testing_utils.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

[EMNLP 2024] MMCode: Benchmarking Multimodal Large Language Models for Code Generation with Visually Rich Programming Problems

Dataset Description

Getting Started

Set Up

Inference

Evaluation

Citation

About

Releases

Packages

Languages

likaixin2000/MMCode

Folders and files

Latest commit

History

Repository files navigation

[EMNLP 2024] MMCode: Benchmarking Multimodal Large Language Models for Code Generation with Visually Rich Programming Problems

Dataset Description

Getting Started

Set Up

Inference

Evaluation

Citation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages