Smart Coding benchmark suite: built on KubeEdge-lanvs #98

YangBrooksHan · 2024-05-09T00:55:54Z

What would you like to be added/modified:

Build a collaborative code intelligent agent alignment dataset for LLMs:
- The dataset should include behavioral trajectories, feedback, and iterative processes of software engineers during development, as well as relevant code versions and annotation information.
- The dataset should cover code scenarios of different programming languages, business domains, and complexities.
- The dataset should comply with privacy protection and intellectual property requirements, providing good accessibility and usability.
Design a code intelligent agent collaborative evaluation benchmark for LLMs:
- The evaluation benchmark should include common tasks of code intelligent agents such as code generation, recommendation, and analysis.
- Evaluation metrics should cover multiple dimensions including functionality, reliability, interpretability, etc., matching the feedback and requirements of software engineers.
- The evaluation benchmark should assess the performance of LLMs in collaborative code intelligent agent tasks and provide a basis for further algorithm optimization.
Integrate the dataset and evaluation benchmark into the KubeEdge-Ianvs framework:
- Incorporate the dataset and evaluation benchmark as part of the Ianvs framework, providing good scalability and integrability.
- Ensure that the dataset and evaluation benchmark can efficiently run on edge devices within the Ianvs framework and seamlessly collaborate with other functional modules of Ianvs.
- Release an upgraded version of the Ianvs framework and promote it to developers and researchers in the fields of edge computing and AI.

By implementing this project, we aim to provide crucial datasets and evaluation benchmarks for the further development of LLMs in the field of code intelligent agents, promote efficient collaboration between LLMs and software engineers in edge computing environments, and drive innovation and application of edge intelligence technology

Why is this needed:

Large Language Models (LLMs) have demonstrated powerful capabilities in tasks such as code generation, automatic programming, and code analysis. However, these models are typically trained on generic code data and often fail to fully leverage the collaboration and feedback from software engineers in real-world scenarios. To construct a more intelligent and efficient code ecosystem, it is necessary to establish a collaborative code dataset and evaluation benchmark to facilitate tight collaboration between LLMs and software engineers. This project aims to build a collaborative code intelligent agent alignment dataset and evaluation benchmark for LLMs based on the open-source edge computing framework KubeEdge-Ianvs. This dataset will include behavioral trajectories, feedback, and iterative processes of software engineers during development, as well as relevant code versions and annotation information. Through this data, we will design evaluation metrics and benchmarks to measure the performance of LLMs in tasks such as code generation, recommendation, and analysis, fostering collaboration between LLMs and software engineers.

Recommended Skills:
Proficiency in large language model fine-tuning
Python programming skills
Preferably a background in software engineering (familiarity with formal verification is a plus)

Useful links:
https://www.swebench.com/

https://fine-grained-hallucination.github.io/

https://cloud.189.cn/t/36JV7fvyIv2q (访问码:evr9)

MooreZheng · 2024-05-09T03:37:53Z

If anyone has questions regarding this issue, please feel free to leave a message here. We would also appreciate it if new members could introduce themselves to the community.

MooreZheng added the kind/feature Categorizes issue or PR as related to a new feature. label Aug 13, 2024

safe-b mentioned this issue Sep 29, 2024

基于KubeEdge-Ianvs的领域特定大模型基准测试的实现 #155

Closed

safe-b mentioned this issue Oct 25, 2024

OSPP: Smart Coding benchmark suite: built on KubeEdge-lanvs #159

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Smart Coding benchmark suite: built on KubeEdge-lanvs #98

Smart Coding benchmark suite: built on KubeEdge-lanvs #98

YangBrooksHan commented May 9, 2024

MooreZheng commented May 9, 2024

Smart Coding benchmark suite: built on KubeEdge-lanvs #98

Smart Coding benchmark suite: built on KubeEdge-lanvs #98

Comments

YangBrooksHan commented May 9, 2024

MooreZheng commented May 9, 2024