Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3,193 changes: 3,193 additions & 0 deletions .github/workflows/duplicate-code-detector.lock.yml

Large diffs are not rendered by default.

242 changes: 242 additions & 0 deletions .github/workflows/duplicate-code-detector.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,242 @@
---
name: Duplicate Code Detector
on:
Comment thread
pelikhan marked this conversation as resolved.
push:
branches:
- main
Comment thread
pelikhan marked this conversation as resolved.
paths:
- "**.go"
workflow_dispatch:
permissions:
contents: read
actions: read
engine: copilot
mcp-servers:
serena:
container: "ghcr.io/oraios/serena"
version: "latest"
args:
- "-v"
- "${{ github.workspace }}:/workspace:ro"
- "-w"
- "/workspace"
env:
SERENA_DOCKER: "1"
SERENA_PORT: "9121"
SERENA_DASHBOARD_PORT: "24282"
allowed:
- activate_project
- find_symbol
- find_referencing_symbols
- get_symbols_overview
- read_file
- search_for_pattern
- list_dir
- find_file
safe-outputs:
create-issue:
title-prefix: "[duplicate-code] "
labels: [code-quality, automated-analysis]
timeout_minutes: 15
---

# Duplicate Code Detection Agent

You are a code quality agent that analyzes commits to detect duplicated code patterns using Serena's semantic code analysis capabilities.

## Mission

When commits are pushed to the main branch, you must:

1. **Analyze Recent Commits**: Review the changes in the latest commits
2. **Detect Duplicated Code**: Identify similar or duplicated code patterns across the codebase
3. **Report Findings**: Create an issue with detailed findings if significant duplication is detected

## Current Context

- **Repository**: ${{ github.repository }}
- **Commit ID**: ${{ github.event.head_commit.id }}
- **Triggered by**: @${{ github.actor }}

## Analysis Process

### 1. Project Activation

First, activate the project in Serena:
- Use the `activate_project` tool to set up the workspace
- The project path should be `/workspace` (the mounted repository directory in the container)

### 2. Changed Files Analysis

Analyze the files that were changed in the recent commits:
- Identify the files modified in the push event
- Use `get_symbols_overview` to understand the structure of changed files
- Use `read_file` to examine the content of modified files

### 3. Duplicate Detection Strategy

Use Serena's semantic code analysis tools to find duplicates:

**a) Symbol-Level Analysis**:
- For each significant function/method in changed files, use `find_symbol` to search for similarly named symbols
- Use `find_referencing_symbols` to understand code usage patterns
- Look for functions with similar names but in different files (e.g., `processData` in multiple modules)

**b) Pattern Search**:
- Use `search_for_pattern` to find similar code patterns
- Search for common code smells that indicate duplication:
- Similar function signatures
- Repeated logic blocks
- Similar variable naming patterns
- Identical or near-identical code blocks

**c) Structural Analysis**:
- Use `list_dir` and `find_file` to identify files with similar names or purposes
- Compare symbol overviews across files to find structural similarities

### 4. Duplication Analysis

Evaluate the findings to determine if they represent true code duplication:

**Types of Duplication to Identify**:
- **Exact Duplication**: Identical code blocks in multiple locations
- **Structural Duplication**: Same logic with minor variations (different variable names, etc.)
- **Functional Duplication**: Different implementations of the same functionality
- **Copy-Paste Programming**: Similar code blocks that could be extracted into shared utilities

**Assessment Criteria**:
- **Severity**: How much code is duplicated (lines of code, number of occurrences)
- **Impact**: Where the duplication occurs (critical paths, frequently called code)
- **Maintainability**: How the duplication affects code maintainability
- **Refactoring Opportunity**: Whether the duplication can be easily refactored

### 5. Reporting

If significant duplication is found (threshold: more than 10 lines of duplicated code OR 3+ instances of similar patterns):

Create an issue with:
- **Executive Summary**: Brief description of duplication found
- **Duplication Details**: Specific locations and code blocks
- **Severity Assessment**: Impact and maintainability concerns
- **Refactoring Recommendations**: Suggested approaches to eliminate duplication
- **Code Examples**: Concrete examples of duplicated code with file paths and line numbers

## Detection Guidelines

### What to Report

**DO report**:
- Identical or nearly identical functions in different files
- Repeated code blocks that could be extracted to utilities
- Similar classes or modules with overlapping functionality
- Copy-pasted code with minor modifications
- Duplicated business logic across components

**DON'T report**:
- Standard boilerplate code (imports, exports, etc.)
- Test setup/teardown code (acceptable duplication in tests)
- Configuration files with similar structure
- Language-specific patterns (constructors, getters/setters)
- Small code snippets (< 5 lines) unless highly repetitive

### Analysis Depth

- **Primary Focus**: Analyze all files changed in the current push
- **Secondary Analysis**: Check for duplication with existing codebase
- **Cross-Reference**: Look for patterns across the repository
- **Historical Context**: Consider if this duplication is new or existing

## Output Format

If duplication is found, create an issue with this structure:

```markdown
# 🔍 Duplicate Code Detected

*Analysis of commit ${{ github.event.head_commit.id }}*

## Summary

[Brief overview of duplication findings]

## Duplication Details

### Pattern 1: [Description]
- **Severity**: High/Medium/Low
- **Occurrences**: [Number of instances]
- **Locations**:
- `path/to/file1.ext` (lines X-Y)
- `path/to/file2.ext` (lines A-B)
- **Code Sample**:
```[language]
[Example of duplicated code]
```

### Pattern 2: [Description]
[... additional patterns ...]

## Impact Analysis

- **Maintainability**: [How this affects code maintenance]
- **Bug Risk**: [Potential for inconsistent fixes]
- **Code Bloat**: [Impact on codebase size]

## Refactoring Recommendations

1. **[Recommendation 1]**
- Extract common functionality to: `suggested/path/utility.ext`
- Estimated effort: [hours/complexity]
- Benefits: [specific improvements]

2. **[Recommendation 2]**
[... additional recommendations ...]

## Next Steps

- [ ] Review duplication findings
- [ ] Prioritize refactoring tasks
- [ ] Create refactoring plan
- [ ] Implement changes

## Analysis Details

- **Analyzed Files**: [count]
- **Detection Method**: Serena semantic code analysis
- **Commit**: ${{ github.event.head_commit.id }}
```

## Important Notes

### Security
- Never execute untrusted code or commands
- Only analyze code using Serena's read-only tools
- Do not modify files during analysis

### Efficiency
- Focus on recently changed files first
- Use semantic analysis to find meaningful duplication, not superficial matches
- Balance thoroughness with execution time (stay within timeout)

### Accuracy
- Verify findings before reporting
- Distinguish between acceptable patterns and true duplication
- Consider language-specific idioms and best practices
- Provide specific, actionable recommendations

### Issue Creation
- Only create an issue if significant duplication is found
- Include enough detail for developers to understand and act on findings
- Provide concrete examples with file paths and line numbers
- Suggest practical refactoring approaches

## Tool Usage Strategy

1. **Project Setup**: `activate_project` with repository path
2. **File Discovery**: `list_dir`, `find_file` for changed files
3. **Symbol Analysis**: `get_symbols_overview` for structure understanding
4. **Content Review**: `read_file` for detailed code examination
5. **Pattern Matching**: `search_for_pattern` for similar code
6. **Symbol Search**: `find_symbol` for duplicate function names
7. **Reference Analysis**: `find_referencing_symbols` for usage patterns

Remember: Your goal is to improve code quality by identifying and reporting meaningful code duplication that impacts maintainability and should be refactored. Focus on actionable findings that developers can use to improve the codebase.
34 changes: 33 additions & 1 deletion docs/src/content/docs/guides/mcps.md
Original file line number Diff line number Diff line change
Expand Up @@ -118,17 +118,49 @@ Containerized MCP servers for isolation and portability:
mcp-servers:
notion:
container: "mcp/notion"
version: "latest"
env:
NOTION_TOKEN: "${{ secrets.NOTION_TOKEN }}"
allowed: ["create_page", "search_pages"]
```

The `container` field automatically generates:
- **Command**: `"docker"`
- **Args**: `["run", "--rm", "-i", "-e", "NOTION_TOKEN", "mcp/notion"]`
- **Args**: `["run", "--rm", "-i", "-e", "NOTION_TOKEN", "mcp/notion:latest"]`

The `version` field is optional and allows you to specify the container tag separately from the image name. If not provided, you can include the version in the container field directly (e.g., `container: "mcp/notion:latest"`).

**Use cases**: Third-party MCP servers, complex dependencies, security isolation

#### Custom Docker Arguments

For advanced use cases, you can provide custom Docker arguments such as volume mounts or working directory:

```yaml
mcp-servers:
serena:
container: "ghcr.io/oraios/serena"
version: "latest"
args:
- "-v"
- "${{ github.workspace }}:/workspace:ro"
- "-w"
- "/workspace"
env:
SERENA_DOCKER: "1"
allowed: ["read_file", "find_symbol"]
```

This generates:
- **Command**: `"docker"`
- **Args**: `["run", "--rm", "-i", "-e", "SERENA_DOCKER", "-v", "${{ github.workspace }}:/workspace:ro", "-w", "/workspace", "ghcr.io/oraios/serena:latest"]`

The custom args are inserted after environment variable flags but before the container image, allowing you to:
- Mount volumes with `-v` for file access
- Set working directory with `-w`
- Configure network settings
- Add any other Docker run flags

### 3. HTTP MCP Servers

Remote MCP servers accessible via HTTP (Claude engine only):
Expand Down
1 change: 1 addition & 0 deletions pkg/parser/mcp.go
Original file line number Diff line number Diff line change
Expand Up @@ -61,6 +61,7 @@ type MCPServerConfig struct {
Command string `json:"command"` // for stdio
Args []string `json:"args"` // for stdio
Container string `json:"container"` // for docker
Version string `json:"version"` // optional version/tag for container
URL string `json:"url"` // for http
Headers map[string]string `json:"headers"` // for http
Env map[string]string `json:"env"` // environment variables
Expand Down
6 changes: 5 additions & 1 deletion pkg/parser/schemas/included_file_schema.json
Original file line number Diff line number Diff line change
Expand Up @@ -191,9 +191,13 @@
},
"container": {
"type": "string",
"pattern": "^[a-zA-Z0-9][a-zA-Z0-9/_.-]*$",
"pattern": "^[a-zA-Z0-9][a-zA-Z0-9/:_.-]*$",
"description": "Container image for stdio MCP connections (alternative to command)"
},
"version": {
"type": "string",
"description": "Optional version/tag for the container image (e.g., 'latest', 'v1.0.0')"
},
"args": {
"type": "array",
"items": {
Expand Down
6 changes: 5 additions & 1 deletion pkg/parser/schemas/main_workflow_schema.json
Original file line number Diff line number Diff line change
Expand Up @@ -2349,9 +2349,13 @@
},
"container": {
"type": "string",
"pattern": "^[a-zA-Z0-9][a-zA-Z0-9/_.-]*$",
"pattern": "^[a-zA-Z0-9][a-zA-Z0-9/:_.-]*$",
"description": "Container image for stdio MCP connections (alternative to command)"
},
"version": {
"type": "string",
"description": "Optional version/tag for the container image (e.g., 'latest', 'v1.0.0')"
},
"args": {
"type": "array",
"items": {
Expand Down
23 changes: 21 additions & 2 deletions pkg/workflow/mcp-config.go
Original file line number Diff line number Diff line change
Expand Up @@ -348,6 +348,7 @@ func getMCPConfig(toolConfig map[string]any, toolName string) (*parser.MCPServer
"type": true,
"command": true,
"container": true,
"version": true,
"args": true,
"env": true,
"proxy-args": true,
Expand Down Expand Up @@ -399,6 +400,9 @@ func getMCPConfig(toolConfig map[string]any, toolName string) (*parser.MCPServer
if container, hasContainer := config.GetString("container"); hasContainer {
result.Container = container
}
if version, hasVersion := config.GetString("version"); hasVersion {
result.Version = version
}
if args, hasArgs := config.GetStringArray("args"); hasArgs {
result.Args = args
}
Expand Down Expand Up @@ -428,6 +432,9 @@ func getMCPConfig(toolConfig map[string]any, toolName string) (*parser.MCPServer

// Handle container transformation for stdio type
if result.Type == "stdio" && result.Container != "" {
// Save user-provided args before transforming
userProvidedArgs := result.Args

// Transform container field to docker command and args
result.Command = "docker"
result.Args = []string{"run", "--rm", "-i"}
Expand All @@ -437,11 +444,23 @@ func getMCPConfig(toolConfig map[string]any, toolName string) (*parser.MCPServer
result.Args = append(result.Args, "-e", envKey)
}

// Insert user-provided args (e.g., volume mounts) before the container image
if len(userProvidedArgs) > 0 {
result.Args = append(result.Args, userProvidedArgs...)
}

// Build container image with version if provided
containerImage := result.Container
if result.Version != "" {
containerImage = containerImage + ":" + result.Version
}

// Add the container image as the last argument
result.Args = append(result.Args, result.Container)
result.Args = append(result.Args, containerImage)

// Clear the container field since it's now part of the command
// Clear the container and version fields since they're now part of the command
result.Container = ""
result.Version = ""
}

return result, nil
Expand Down
Loading
Loading