Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
25 commits
Select commit Hold shift + click to select a range
a713ef4
chore(deps): update project deps
orenlab Feb 2, 2026
b2bc5ac
fix(core): Fixed the alignment and layout of report elements
orenlab Feb 2, 2026
2313138
feat(core): Bump version to 1.2.1
orenlab Feb 2, 2026
afad36a
fix(audit): Security & Robustness fixes
orenlab Feb 2, 2026
e8f2915
fix(audit): Security & Robustness fixes
orenlab Feb 2, 2026
55d6728
chore(tests): Add/expand unit + integration tests
orenlab Feb 2, 2026
d7c4b61
feat: modernize UI
orenlab Feb 3, 2026
c589d5e
feat: modernize UI
orenlab Feb 3, 2026
795cfc6
chore(tests): Add/expand unit + integration tests
orenlab Feb 3, 2026
a35f983
fix(linters): Correcting errors found by the ruff and mypy linters
orenlab Feb 3, 2026
a9e6368
feat(ci): added GitHub Actions for linting and tests run
orenlab Feb 3, 2026
c1df58e
fix(ci): stabilize CFG tests across Python versions (normalize ast.dump)
orenlab Feb 3, 2026
41023b7
chore(docs): update README.md and CHANGELOG.md
orenlab Feb 3, 2026
a6b76ac
chore(baseline): added a baseline for the codeclone package (we have …
orenlab Feb 3, 2026
5fee04a
chore(baseline): added a baseline for the codeclone package (we have …
orenlab Feb 3, 2026
e71e688
chore(baseline): added a baseline for the codeclone package (we have …
orenlab Feb 3, 2026
5dbb637
chore(baseline): added a baseline for the codeclone package (we have …
orenlab Feb 3, 2026
8a993e2
chore(baseline): added a baseline for the codeclone package (we have …
orenlab Feb 3, 2026
907c23e
fix(baseline): I have enhanced the normalization of AST‑dump by remov…
orenlab Feb 3, 2026
eb8a023
fix(baseline): I have enhanced the normalization of AST‑dump by remov…
orenlab Feb 3, 2026
91f94ec
fix(baseline): run baseline check only on Python 3.13 to avoid cross-…
orenlab Feb 3, 2026
54dc41a
fix(baseline): due to the specifics of the AST implementation in Pyth…
orenlab Feb 3, 2026
2b51daa
fix(baseline): due to the specifics of the AST implementation in Pyth…
orenlab Feb 3, 2026
19bf4f2
fix(baseline): due to the specifics of the AST implementation in Pyth…
orenlab Feb 3, 2026
d8139ad
chore(docs): update docs
orenlab Feb 3, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
48 changes: 44 additions & 4 deletions .github/actions/codeclone/action.yml
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,16 @@ branding:
color: blue

inputs:
python-version:
description: "Python version to use"
required: false
default: "3.13"

package-version:
description: "CodeClone version from PyPI (empty = latest)"
required: false
default: ""

path:
description: "Path to the project root"
required: false
Expand All @@ -20,20 +30,50 @@ inputs:
required: false
default: "true"

no-progress:
description: "Disable progress output"
required: false
default: "true"

require-baseline:
description: "Fail if codeclone.baseline.json is missing"
required: false
default: "true"

runs:
using: composite
steps:
- name: Set up Python
uses: actions/setup-python@v5
with:
python-version: ${{ inputs.python-version }}
cache: pip

- name: Install CodeClone
shell: bash
run: |
python -m pip install --upgrade pip
pip install codeclone
if [ -n "${{ inputs.package-version }}" ]; then
pip install "codeclone==${{ inputs.package-version }}"
else
pip install codeclone
fi

- name: Verify baseline exists
if: ${{ inputs.require-baseline == 'true' }}
shell: bash
run: |
test -f "${{ inputs.path }}/codeclone.baseline.json"

- name: Run CodeClone
shell: bash
run: |
extra=""
if [ "${{ inputs.no-progress }}" = "true" ]; then
extra="--no-progress"
fi
if [ "${{ inputs.fail-on-new }}" = "true" ]; then
codeclone "${{ inputs.path }}" --fail-on-new
codeclone "${{ inputs.path }}" --fail-on-new $extra
else
codeclone "${{ inputs.path }}"
fi
codeclone "${{ inputs.path }}" $extra
fi
74 changes: 74 additions & 0 deletions .github/workflows/tests.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,74 @@
name: tests

on:
push:
branches: [ "**" ]
pull_request:

permissions:
contents: read

concurrency:
group: tests-${{ github.ref }}
cancel-in-progress: true

jobs:
test:
runs-on: ubuntu-latest
strategy:
fail-fast: false
matrix:
python-version: [ "3.10", "3.11", "3.12", "3.13", "3.14" ]
steps:
- name: Checkout
uses: actions/checkout@v6.0.2

- name: Set up Python
uses: actions/setup-python@v6.2.0
with:
python-version: ${{ matrix.python-version }}
allow-prereleases: true

- name: Set up uv
uses: astral-sh/setup-uv@v5
with:
enable-cache: true

- name: Install dependencies
run: uv sync --all-extras --dev

- name: Run tests
run: uv run pytest --cov=codeclone --cov-report=term-missing --cov-fail-under=98

- name: Verify baseline exists
if: ${{ matrix.python-version == '3.13' }}
run: test -f codeclone.baseline.json

- name: Check for new clones vs baseline
if: ${{ matrix.python-version == '3.13' }}
run: uv run codeclone . --fail-on-new --no-progress

lint:
runs-on: ubuntu-latest
steps:
- name: Checkout
uses: actions/checkout@v6.0.2

- name: Set up Python
uses: actions/setup-python@v6.2.0
with:
python-version: "3.13"

- name: Set up uv
uses: astral-sh/setup-uv@v5
with:
enable-cache: true

- name: Install dependencies
run: uv sync --all-extras --dev

- name: Ruff
run: uv run ruff check .

- name: Mypy
run: uv run mypy .
191 changes: 153 additions & 38 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,71 +1,186 @@
# Changelog

## [1.2.1] - 2026-02-02

### Overview

This release focuses on security hardening, robustness, and long-term maintainability.
No breaking API changes were introduced.

The goal of this release is to provide users with a safe, deterministic, and CI-friendly
tool suitable for security-sensitive and large-scale environments.

### Security & Robustness

- **Path Traversal Protection**
Implemented strict path validation to prevent scanning outside the project root or
accessing sensitive system directories, including macOS `/private` paths.

- **Cache Integrity Protection**
Added HMAC-SHA256 signing for cache files to prevent cache poisoning and detect tampering.

- **Parser Safety Limits**
Introduced AST parsing time limits to mitigate risks from pathological or adversarial inputs.

- **Resource Exhaustion Protection**
Enforced a maximum file size limit (10MB) and a maximum file count per scan to prevent
excessive memory or CPU usage.

- **Structured Error Handling**
Introduced a dedicated exception hierarchy (`ParseError`, `CacheError`, etc.) and replaced
broad exception handling with graceful, user-friendly failure reporting.

### Performance Improvements

- **Optimized AST Normalization**
Replaced expensive `deepcopy` operations with in-place AST normalization, significantly
reducing CPU and memory overhead.

- **Improved Memory Efficiency**
Added an LRU cache for file reading and optimized string concatenation during fingerprint
generation.

- **HTML Report Memory Bounds**
HTML reports now read only the required line ranges instead of entire files, reducing peak
memory usage on large codebases.

### Architecture & Maintainability

- **Strict Type Safety**
Migrated all optional typing to Python 3.10+ `| None` syntax and achieved 100% `mypy` strict
compliance.

- **Modular CFG Design**
Split CFG data structures and builder logic into separate modules (`cfg_model.py` and
`cfg.py`) for improved clarity and extensibility.

- **Template Extraction**
Extracted HTML templates into a dedicated `templates.py` module.

- Added a `py.typed` marker for downstream type checkers.
- Added `__slots__` to performance-critical classes to reduce per-object memory overhead.

### CLI & User Experience

- Added a sequential execution fallback when process pools are unavailable (for example, in
restricted or sandboxed environments).
- Emit clear, user-visible warnings when cache validation fails instead of silently ignoring
corrupted state.
- Hardened HTML report template to safely embed JavaScript template literals and aligned it
with linting requirements.

### Testing & Quality

- Expanded unit and integration test coverage across the CLI, CFG construction, cache
handling, scanner, and HTML reporting paths.
- Added security regression tests for dot-dot traversal and symlinked sensitive directories.
- Tightened cache mismatch assertions to verify full state reset.
- Achieved and enforced 98%+ line coverage, with coverage configuration added to
`pyproject.toml`.
- Added GitHub Actions workflow with Python 3.10–3.14 test matrix, including `ruff` and
`mypy` checks.
- CI baseline enforcement now runs on a single pinned Python version to avoid AST dump
differences across interpreter versions.

### Python Version Consistency for Baseline Checks

Due to inherent differences in Python’s AST between interpreter versions, baseline
generation and verification must be performed using the same Python version.

The baseline file now stores the Python version (`major.minor`) used during generation.
When running with `--fail-on-new`, codeclone verifies that the current interpreter version
matches the baseline and exits with code 2 if they differ.

This design ensures deterministic and reproducible clone detection results while preserving
support for Python 3.10–3.14 across the test matrix.

### Fixed

- **CFG Exception Handling**
Fixed incorrect control-flow linking for `try`/`except` blocks.

- **Pattern Matching Support**
Added missing structural handling for `match`/`case` statements in the CFG.

- **Block Detection Scaling**
Made `MIN_LINE_DISTANCE` dynamic based on block size to improve clone detection accuracy
across differently sized functions.

---

## [1.2.0] - 2026-02-02

### BREAKING CHANGES

- **CLI Arguments**: Renamed output flags for brevity and consistency:
- **CLI Arguments**
Renamed output flags for brevity and consistency:
- `--json-out` → `--json`
- `--text-out` → `--text`
- `--html-out` → `--html`
- `--cache` → `--cache-dir`
- **Baseline Behavior**:
- The default baseline file location has changed from `~/.config/codeclone/baseline.json` to
`./codeclone.baseline.json`. This encourages committing the baseline file to the repository, simplifying CI/CD
integration.
- The CLI now warns if a baseline file is expected but missing (unless `--update-baseline` is used).

- **Baseline Behavior**
- The default baseline file location changed from
`~/.config/codeclone/baseline.json` to `./codeclone.baseline.json`.
- The CLI now warns if a baseline file is expected but missing (unless
`--update-baseline` is used).

### Added

- **Detection Engine**:
- **Deep CFG Analysis**: Added support for constructing control flow graphs for `try`/`except`/`finally`, `with`/
`async with`, and `match`/`case` (Python 3.10+) statements. The tool now analyzes the internal structure of these
blocks instead of treating them as opaque statements.
- **Normalization**: Implemented normalization for Augmented Assignments. Code using `x += 1` is now detected as a
clone of `x = x + 1`.
- **Rich Output**: Integrated `rich` library for professional CLI output, including:
- Color-coded status messages (Success/Warning/Error).
- Progress bars and spinners for long-running tasks.
- **Detection Engine**
- Deep CFG analysis for `try`/`except`/`finally`, `with`/`async with`, and
`match`/`case` (Python 3.10+) statements.
- Normalization for augmented assignments (`x += 1` vs `x = x + 1`).

- **Rich Output**
- Color-coded status messages.
- Progress indicators for long-running tasks.
- Formatted summary tables.
- **CI/CD Improvements**: Clearer separation of arguments in `--help` output (Target, Tuning, Baseline, Reporting).

- **CI/CD Improvements**
- Clearer argument grouping in `--help` output.

### Improved

- **Baseline**: Enhanced `Baseline` class with safer JSON loading (error handling for corrupted files), better typing (
using `set` instead of `Set`), and cleaner API for creating instances (`from_groups` accepts path).
- **Cache**: Refactored `Cache` to handle corrupted cache files gracefully by starting fresh instead of crashing.
Updated typing to modern standards.
- **Normalization**: Added `copy.deepcopy` to AST normalization to prevent side effects on the original AST nodes during
fingerprinting. This ensures the AST remains intact for any subsequent operations.
- **Typing**: General typing improvements across `report.py` and other modules to align with Python 3.10+ practices.
- **Baseline**
- Safer JSON loading.
- Improved typing and cleaner construction API.

- **Cache**
- Graceful recovery from corrupted cache files.
- Updated typing to modern Python standards.

- **Typing**
- General typing improvements across reporting and normalization modules.

---

## [1.1.0] 2026-01-19
## [1.1.0] - 2026-01-19

### Added

- Control Flow Graph (CFG v1) for structural clone detection
- Deterministic CFG-based function fingerprints
- Interactive HTML report with syntax highlighting
- Dark/light theme toggle in HTML report
- Block-level clone visualization
- Control Flow Graph (CFG v1) for structural clone detection.
- Deterministic CFG-based function fingerprints.
- Interactive HTML report with syntax highlighting.
- Block-level clone visualization.

### Changed

- Function clone detection now based on CFG instead of pure AST
- Improved robustness against refactoring and control-flow changes
- Function clone detection now based on CFG instead of pure AST.
- Improved robustness against refactoring and control-flow changes.

### Documentation

- Added `docs/cfg.md` with CFG semantics and limitations
- Added `docs/architecture.md` describing system design
- Added `docs/cfg.md` with CFG semantics and limitations.
- Added `docs/architecture.md` describing system design.

---

## [1.0.0] 2026-01-17
## [1.0.0] - 2026-01-17

### Initial release

- AST-based function clone detection
- Block-level clone detection (Type-3-lite)
- Baseline workflow for CI
- JSON and text reports
- AST-based function clone detection.
- Block-level clone detection (Type-3-lite).
- Baseline workflow for CI.
- JSON and text reports.
Loading