Skip to content

Happylkx/MMCode

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

MMCode: Evaluating Multi-Modal Code Large Language Models with Visually Rich Programming Problems

Contributions welcome Research Paper Huggingface Dataset

Dataset Description

MMCode is a multi-modal code generation dataset designed to evaluate the problem-solving skills of code language models in visually rich contexts. It contains 3,548 questions paired with 6,622 images, derived from real-world programming challenges across 10 code competition websites, with Python solutions and tests provided. The dataset emphasizes the extreme demand for reasoning abilities, the interwoven nature of textual and visual contents, and the occurrence of questions containing multiple images.

For more detailed introduction of the data, please see the 🤗 Huggingface Dataset.

Getting Started

Set Up

Before you begin, ensure your environment variables are set:

  • OPENAI_API_KEY: Your OpenAI API key.
  • GOOGLE_API_KEY: Your Google API key.

Inference

An example for GPT-4V generation:

python generate.py \
    --model gpt4v \
    --problems_root <path_to_the_test_set> \
    --save_path "results/gpt4v-mmcode_test.jsonl"

Evaluation

To evaluate the results generated by GPT-4V, run:

python eval.py \
    --problems_root <path_to_the_test_set> \
    --generation_file "results/gpt4v-mmcode_test.jsonl"

Citation

Please consider citing if you find our work useful:

@misc{li2024mmcode,
      title={MMCode: Evaluating Multi-Modal Code Large Language Models with Visually Rich Programming Problems}, 
      author={Kaixin Li and Yuchen Tian and Qisheng Hu and Ziyang Luo and Jing Ma},
      year={2024},
      eprint={2404.09486},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

About

Multi-modal code generation problems.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages