原仓库链接:iamfaith/TransformCode
所用模型为其开源的下面链接使用的clone检测模型,用于识别相似代码片段
method name prediction with CodeBERT
The model file is codebert_predictor.py
Weight can be downloaded from here (Github lfs space is limited to 1Gb, so we use netdisk): 链接:https://pan.baidu.com/s/1IMaBapXZ6_tXSdxYMbQIdg?pwd=csci 提取码:csci
下载环境
pip install -r requirements.txt然后修改code_embedder_full.py中的路径
这里的codebert_clone_model.bin需要下载,也是上面那个链接,也可见https://drive.google.com/drive/folders/1KhRi9evmwf-GvydsobV73f5uW3xAi89z?usp=drive_link
class CodeEmbedder:
def __init__(
self,
tokenizer_path: str = "/yourpath/TransformCode/custom_tokenizer/WordPiece_tokenizer.json",
weight_path: str = "/yourpath/TransformCode/weight/codebert_clone_model.bin",
然后直接用code_resource_pool.py即可得到优化片段参考
if __name__ == "__main__":
# 初始化资源池
pool = CodeResourcePool(json_path="/data/junwan/TransformCode/data/api.json")
test_code = "user_input = input()"
# 测试获取前3相似代码
top3_codes = pool.get_top3_similar_codes(test_code)
print("\n仅供参考的3个优化方案片段:")
for i, code in enumerate(top3_codes, 1):
# print(code)
before = code["before"]
after = code["after"]
print(f"{i}. {before} -> {after}")
llm_test.py直接推理
# 初始化资源池
pool = CodeResourcePool(json_path="/yourpath/TransformCode/data/api.json")
# 待优化的代码示例
code_to_optimize = "xxx"
# 执行优化
optimized_result = code_optimization_pipeline(code_to_optimize, pool)
if optimized_result:
print(optimized_result)