Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
477 changes: 477 additions & 0 deletions batch_test_prompts.py

Large diffs are not rendered by default.

534 changes: 407 additions & 127 deletions codedog/utils/code_evaluator.py

Large diffs are not rendered by default.

93 changes: 93 additions & 0 deletions custom_system_prompt.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,93 @@
# EXPERT CODE REVIEWER ROLE

You are a world-class code reviewer with expertise in multiple programming languages and frameworks. Your task is to provide detailed, actionable feedback on code changes to help developers improve their code quality and productivity.

## EVALUATION CRITERIA

Evaluate the code on these dimensions (1-10 scale):

1. **Readability** (1-10): How easy is the code to read and understand?
- Variable/function naming clarity
- Code organization
- Consistent formatting
- Appropriate comments

2. **Efficiency** (1-10): How efficiently does the code perform its tasks?
- Algorithm complexity
- Resource usage
- Performance considerations
- Potential bottlenecks

3. **Security** (1-10): How well does the code handle security concerns?
- Input validation
- Authentication/authorization
- Data protection
- Vulnerability prevention

4. **Structure** (1-10): How well is the code structured?
- Modularity
- Separation of concerns
- Design patterns
- SOLID principles

5. **Error Handling** (1-10): How robust is the error handling?
- Exception management
- Edge cases
- Graceful failure
- Informative error messages

6. **Documentation** (1-10): How well is the code documented?
- Comments quality
- Docstrings
- API documentation
- Usage examples

7. **Code Style** (1-10): How well does the code adhere to style conventions?
- Language-specific conventions
- Project style consistency
- Modern language features
- Best practices

## CODE CHANGE ANALYSIS

When analyzing code changes (especially diffs):

### Effective Changes (Count toward working hours)
- Logic modifications
- Functionality additions/removals
- Algorithm changes
- Bug fixes
- API changes
- Data structure modifications
- Performance optimizations
- Security fixes
- Error handling improvements

### Non-Effective Changes (Minimal impact on working hours)
- Whitespace adjustments
- Indentation fixes
- Comment additions without code changes
- Import reordering
- Variable/function renaming without behavior changes
- Code reformatting
- String quote style changes
- Adding/removing trailing commas
- Style changes to match linter rules

## WORKING HOURS ESTIMATION

Provide a realistic estimate of how many hours an experienced programmer (5-10+ years) would need to implement these changes:

- For purely non-effective changes: 0.1-0.5 hours depending on size
- For effective changes, consider:
* Complexity (simple, moderate, complex)
* Domain knowledge required
* Testing requirements
* Integration complexity

Include time for:
- Understanding existing code
- Designing the solution
- Implementation
- Testing and debugging
- Documentation and review
202 changes: 202 additions & 0 deletions prompt_testing_README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,202 @@
# CodeDog Prompt Testing Tools

这个目录包含两个用于测试代码评审提示(prompts)的工具:

1. `test_prompt.py` - 单个文件或差异的提示测试工具
2. `batch_test_prompts.py` - 从GitLab批量获取差异并测试提示的工具

## 环境设置

确保您已经安装了所需的依赖:

```bash
pip install python-gitlab python-dotenv langchain-openai
```

并创建了一个包含必要环境变量的`.env`文件:

```
# OpenAI API配置
OPENAI_API_KEY=your_openai_api_key

# GitLab配置
GITLAB_URL=https://gitlab.com # 或您的GitLab实例URL
GITLAB_TOKEN=your_gitlab_token
```

## 单个文件测试工具 (test_prompt.py)

这个工具允许您测试单个文件或差异的代码评审提示。

### 基本用法

1. **评估文件**:
```bash
python test_prompt.py --file example.py
```

2. **评估差异文件**:
```bash
python test_prompt.py --diff example.diff
```

3. **使用特定模型**:
```bash
python test_prompt.py --diff example.diff --model gpt-4
```

4. **使用自定义系统提示**:
```bash
python test_prompt.py --diff example.diff --system-prompt custom_system_prompt.txt
```

5. **输出为Markdown格式**:
```bash
python test_prompt.py --diff example.diff --format markdown
```

6. **保存输出到文件**:
```bash
python test_prompt.py --diff example.diff --output results.json
```

### 命令行选项

```
usage: test_prompt.py [-h] (--file FILE | --diff DIFF) [--model MODEL]
[--system-prompt SYSTEM_PROMPT] [--output OUTPUT]
[--format {json,markdown}]

Test code review prompts

options:
-h, --help show this help message and exit
--file FILE Path to the file to evaluate
--diff DIFF Path to the diff file to evaluate
--model MODEL Model to use for evaluation (default: gpt-3.5-turbo)
--system-prompt SYSTEM_PROMPT
Path to a file containing a custom system prompt
--output OUTPUT Path to save the output (default: stdout)
--format {json,markdown}
Output format (default: json)
```

## 批量测试工具 (batch_test_prompts.py)

这个工具允许您从GitLab获取多个差异并批量测试代码评审提示。

### 基本用法

1. **从GitLab获取MR并测试**:
```bash
python batch_test_prompts.py --project your_group/your_project
```

2. **指定文件类型**:
```bash
python batch_test_prompts.py --project your_group/your_project --include .py,.js --exclude .md,.txt
```

3. **测试多个模型**:
```bash
python batch_test_prompts.py --project your_group/your_project --models gpt-3.5-turbo,gpt-4
```

4. **测试多个系统提示**:
```bash
python batch_test_prompts.py --project your_group/your_project --system-prompts prompt1.txt,prompt2.txt
```

5. **自定义输出目录和格式**:
```bash
python batch_test_prompts.py --project your_group/your_project --output-dir my_tests --format markdown
```

### 命令行选项

```
usage: batch_test_prompts.py [-h] --project PROJECT [--mr-count MR_COUNT]
[--max-files MAX_FILES] [--include INCLUDE]
[--exclude EXCLUDE]
[--state {merged,opened,closed}] [--models MODELS]
[--system-prompts SYSTEM_PROMPTS]
[--output-dir OUTPUT_DIR]
[--format {json,markdown}]

Batch test code review prompts on GitLab MRs

options:
-h, --help show this help message and exit
--project PROJECT GitLab project ID or path
--mr-count MR_COUNT Number of MRs to fetch (default: 5)
--max-files MAX_FILES
Maximum files per MR (default: 3)
--include INCLUDE Included file extensions, comma separated, e.g. .py,.js
--exclude EXCLUDE Excluded file extensions, comma separated, e.g. .md,.txt
--state {merged,opened,closed}
MR state to fetch (default: merged)
--models MODELS Models to test, comma separated (default: gpt-3.5-turbo)
--system-prompts SYSTEM_PROMPTS
Paths to system prompt files, comma separated
--output-dir OUTPUT_DIR
Output directory (default: prompt_tests)
--format {json,markdown}
Output format (default: json)
```

## 输出结果

### 单个文件测试

单个文件测试工具的输出是一个JSON或Markdown文件,包含代码评审结果。

JSON格式示例:
```json
{
"readability": 8,
"efficiency": 7,
"security": 6,
"structure": 7,
"error_handling": 5,
"documentation": 9,
"code_style": 8,
"overall_score": 7.1,
"effective_code_lines": 15,
"non_effective_code_lines": 5,
"estimated_hours": 1.5,
"comments": "详细分析..."
}
```

### 批量测试

批量测试工具的输出是一个目录结构,包含:

1. `diffs/` - 保存从GitLab获取的差异文件
2. `results/` - 保存测试结果
- 每个模型一个子目录
- 每个子目录包含每个差异文件的评估结果
- `summary.json` - 包含该模型所有测试的汇总
3. `comparison_report.md` - 如果测试了多个模型或提示,则生成比较报告

## 自定义系统提示

您可以创建自定义系统提示文件,用于测试不同的提示效果。系统提示文件是一个纯文本文件,包含您想要使用的系统提示。

示例:`custom_system_prompt.txt`

## 提示优化建议

1. **明确角色和目标**:明确定义代码评审员的角色和评审目标。

2. **详细的评估维度**:为每个评估维度提供详细的评估标准。

3. **区分有效和无效代码修改**:明确区分哪些修改是有效的,哪些是无效的。

4. **工作时间估算指南**:提供详细的工作时间估算指南。

5. **结构化输出格式**:明确定义输出格式,确保一致性。

6. **语言特定考虑因素**:为不同的编程语言提供特定的考虑因素。

通过使用这些工具,您可以快速测试和优化代码评审提示,找到最适合您需求的提示。
Loading
Loading