CodeV:Empowering LLMs for Verilog Generation through Multi-Level Summarization

CodeV is an innovative series of open-source, instruction-tuned Large Language Models (LLMs) specifically designed for the generation of high-quality Verilog code, addressing the challenges faced by existing models in this domain. (This repo is under development)

Train and Fine-tuning

The training environment configuration and running methods refer to the magicoder project.

Test

If you want to test the generation capability of existing models on Verilog, you need to install the VerilogEval and RTLLM environments.

Quick Start

from transformers import pipeline
import torch
prompt= "FILL IN THE QUESTION"
generator = pipeline(
  model="CODEV",
  task="text-generation",
  torch_dtype=torch.bfloat16,
  device_map="auto",
)
result = generator(prompt , max_length=2048,num_return_sequences=1, temperature=0.0)
response = result[0]["generated_text"]
print("Response:", response)

Models and Datasets

	Base Model	CodeV
6.7B	deepseek-ai/deepseek-coder-6.7b-base	yang-z/CodeV-DS-6.7B
7B	codellama/CodeLlama-7b-Python-hf	yang-z/CodeV-CL-7B
7B	Qwen/CodeQwen1.5-7B-Chat	yang-z/CodeV-QW-7B
7B	Qwen/Qwen2.5-Coder-7B	yang-z/CodeV-QC-7B
6.7B	deepseek-ai/deepseek-coder-6.7b-base	yang-z/CodeV-All-DSC
7B	codellama/CodeLlama-7b-Python-hf	yang-z/CodeV-All-CL
7B	Qwen/CodeQwen1.5-7B-Chat	yang-z/CodeV-All-CQ
7B	Qwen/Qwen2.5-Coder-7B	yang-z/CodeV-All-QC

Dataset: yang-z/CodeV-All

💻 LLM-generated Verilog code

We have collected existing LLMs of Verilog code and demonstrated their performance on VerilogEval and RTLLM in Chip Design LLM Zoo.

Paper

Arxiv: https://arxiv.org/abs/2407.10424

Please cite the paper if you use the models from CodeV.

@misc{yang-z,
      title={CodeV: Empowering LLMs for Verilog Generation through Multi-Level Summarization}, 
      author={Yang Zhao and Di Huang and Chongxiao Li and Pengwei Jin and Ziyuan Nan and Tianyun Ma and Lei Qi and Yansong Pan and Zhenxing Zhang and Rui Zhang and Xishan Zhang and Zidong Du and Qi Guo and Xing Hu and Yunji Chen},
      year={2024},
      eprint={2407.10424},
      archivePrefix={arXiv},
      primaryClass={cs.PL},
      url={https://arxiv.org/abs/2407.10424}, 
}

Acknowledgements

Magicoder: Training code, original datasets and data decontamination
DeepSeek-Coder: Base model for CodeV-DeepSeek
CodeLlama: Base model for CodeLlama
CodeQwen: CodeV-CodeQwen

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
assets		assets
src		src
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CodeV:Empowering LLMs for Verilog Generation through Multi-Level Summarization

Train and Fine-tuning

Test

Quick Start

Models and Datasets

💻 LLM-generated Verilog code

Paper

Acknowledgements

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

CodeV:Empowering LLMs for Verilog Generation through Multi-Level Summarization

Train and Fine-tuning

Test

Quick Start

Models and Datasets

💻 LLM-generated Verilog code

Paper

Acknowledgements

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages