Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Smart Coding benchmark suite: built on KubeEdge-lanvs #98

Open
YangBrooksHan opened this issue May 9, 2024 · 1 comment
Open

Smart Coding benchmark suite: built on KubeEdge-lanvs #98

YangBrooksHan opened this issue May 9, 2024 · 1 comment
Labels
kind/feature Categorizes issue or PR as related to a new feature.

Comments

@YangBrooksHan
Copy link

What would you like to be added/modified:

  1. Build a collaborative code intelligent agent alignment dataset for LLMs:
    • The dataset should include behavioral trajectories, feedback, and iterative processes of software engineers during development, as well as relevant code versions and annotation information.
    • The dataset should cover code scenarios of different programming languages, business domains, and complexities.
    • The dataset should comply with privacy protection and intellectual property requirements, providing good accessibility and usability.
  2. Design a code intelligent agent collaborative evaluation benchmark for LLMs:
    • The evaluation benchmark should include common tasks of code intelligent agents such as code generation, recommendation, and analysis.
    • Evaluation metrics should cover multiple dimensions including functionality, reliability, interpretability, etc., matching the feedback and requirements of software engineers.
    • The evaluation benchmark should assess the performance of LLMs in collaborative code intelligent agent tasks and provide a basis for further algorithm optimization.
  3. Integrate the dataset and evaluation benchmark into the KubeEdge-Ianvs framework:
    • Incorporate the dataset and evaluation benchmark as part of the Ianvs framework, providing good scalability and integrability.
    • Ensure that the dataset and evaluation benchmark can efficiently run on edge devices within the Ianvs framework and seamlessly collaborate with other functional modules of Ianvs.
    • Release an upgraded version of the Ianvs framework and promote it to developers and researchers in the fields of edge computing and AI.

By implementing this project, we aim to provide crucial datasets and evaluation benchmarks for the further development of LLMs in the field of code intelligent agents, promote efficient collaboration between LLMs and software engineers in edge computing environments, and drive innovation and application of edge intelligence technology

Why is this needed:

Large Language Models (LLMs) have demonstrated powerful capabilities in tasks such as code generation, automatic programming, and code analysis. However, these models are typically trained on generic code data and often fail to fully leverage the collaboration and feedback from software engineers in real-world scenarios. To construct a more intelligent and efficient code ecosystem, it is necessary to establish a collaborative code dataset and evaluation benchmark to facilitate tight collaboration between LLMs and software engineers. This project aims to build a collaborative code intelligent agent alignment dataset and evaluation benchmark for LLMs based on the open-source edge computing framework KubeEdge-Ianvs. This dataset will include behavioral trajectories, feedback, and iterative processes of software engineers during development, as well as relevant code versions and annotation information. Through this data, we will design evaluation metrics and benchmarks to measure the performance of LLMs in tasks such as code generation, recommendation, and analysis, fostering collaboration between LLMs and software engineers.

Recommended Skills:
Proficiency in large language model fine-tuning
Python programming skills
Preferably a background in software engineering (familiarity with formal verification is a plus)

Useful links:
https://www.swebench.com/

https://fine-grained-hallucination.github.io/

https://cloud.189.cn/t/36JV7fvyIv2q (访问码:evr9)

@MooreZheng
Copy link
Collaborator

If anyone has questions regarding this issue, please feel free to leave a message here. We would also appreciate it if new members could introduce themselves to the community.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature Categorizes issue or PR as related to a new feature.
Projects
None yet
Development

No branches or pull requests

2 participants