feat: Honor .gitignore and .cursorignore during AI file sampling by pl018 · Pull Request #10 · pl018/project-manager-cli

pl018 · 2025-12-18T10:53:32Z

Add proper gitignore support to prevent template/scaffolding files (like BMAD method files) from being included in AI analysis. This fixes incorrect project categorization caused by including files that should be ignored.

Changes:

Add pathspec dependency for gitignore pattern matching
Implement _load_ignore_patterns() method to parse .gitignore and .cursorignore
Filter all sampled files against ignore patterns before AI analysis
Fix missing imports (glob, re) in project_service.py

Resolves issue where projects were incorrectly tagged based on template files that should have been ignored.

Summary by CodeRabbit

New Features
- File sampling now respects .gitignore and .cursorignore patterns, automatically filtering out ignored files during project analysis and README discovery.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

Add proper gitignore support to prevent template/scaffolding files (like BMAD method files) from being included in AI analysis. This fixes incorrect project categorization caused by including files that should be ignored. Changes: - Add pathspec dependency for gitignore pattern matching - Implement _load_ignore_patterns() method to parse .gitignore and .cursorignore - Filter all sampled files against ignore patterns before AI analysis - Fix missing imports (glob, re) in project_service.py Resolves issue where projects were incorrectly tagged based on template files that should have been ignored.

bolt-new-by-stackblitz · 2025-12-18T10:53:36Z

Run & review this pull request in StackBlitz Codeflow.

coderabbitai · 2025-12-18T10:53:41Z

Walkthrough

Added pathspec dependency to support ignore pattern loading. Implemented _load_ignore_patterns method in ProjectContext to read .gitignore and .cursorignore files. Integrated ignore checks into file sampling to filter out ignored files during directory traversal.

Changes

Cohort / File(s)	Change Summary
Dependency Addition `pyproject.toml`	Added runtime dependency `pathspec>=0.11.0` to project dependencies
Ignore Pattern Support `src/project_manager_cli/services/project_service.py`	Added `_load_ignore_patterns` method to read .gitignore and .cursorignore files and build pathspec; integrated ignore filtering into file sampling logic to skip ignored files during traversal; added imports (glob, re, List, pathspec)

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

Verify pathspec is correctly built from both .gitignore and .cursorignore files
Check defensive handling in path resolution and exception handling
Confirm ignore filtering is applied consistently across all file sampling paths

Poem

🐰 With pathspec in paw, this rabbit did weave,
Ignore patterns fine—files to retrieve!
.gitignore and .cursor align in the fold,
Filtering treasures, both new and bold! 📋✨

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly and specifically describes the main change: adding support for .gitignore and .cursorignore patterns during file sampling.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch claude/respect-ignore-files-qOz6W

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (2)

src/project_manager_cli/services/project_service.py (2)
42-71: Implementation looks correct.

The method properly loads patterns from both ignore files and constructs a PathSpec using the correct 'gitwildmatch' pattern style for gitignore syntax. The defensive exception handling ensures the tool continues to function even if ignore files are unreadable.

Note: Static analysis flags broad exception catching (lines 56, 67), but this is acceptable here for robustness since we want to gracefully degrade if ignore files can't be read.

119-125: Consider adding logging to the exception handler.

The ignore pattern check works correctly, but the broad exception handler silently skips any path resolution issues. While relative_to should normally succeed during os.walk, adding a debug log would help diagnose unexpected failures.
🔎 Suggested improvement:
                     # Check if file is ignored by .gitignore or .cursorignore
                     try:
                         relative_path = str(p.relative_to(base_dir))
                         if ignore_spec.match_file(relative_path):
                             continue
-                    except Exception:
+                    except Exception as e:
+                        self.logger.debug(f"Could not check ignore status for {p}: {e}")
                         continue

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 79aaf02 and 2396f7a.

📒 Files selected for processing (2)

pyproject.toml (1 hunks)
src/project_manager_cli/services/project_service.py (5 hunks)

🧰 Additional context used

🪛 Ruff (0.14.8)

src/project_manager_cli/services/project_service.py

56-56: Do not catch blind exception: Exception

(BLE001)

57-57: Use explicit conversion flag

Replace with conversion flag

(RUF010)

67-67: Do not catch blind exception: Exception

(BLE001)

68-68: Use explicit conversion flag

Replace with conversion flag

(RUF010)

124-125: try-except-continue detected, consider logging the exception

(S112)

124-124: Do not catch blind exception: Exception

(BLE001)

🔇 Additional comments (4)

src/project_manager_cli/services/project_service.py (3)

3-11: LGTM! Missing imports fixed.

The addition of glob and re imports fixes previously missing imports (used at lines 228, 232-233), and the new List and pathspec imports support the ignore pattern functionality.

81-82: LGTM!

Loading ignore patterns once at the beginning of file sampling is the correct approach for efficiency and consistency.

97-100: LGTM!

The README filtering correctly respects ignore patterns. Since the path is constructed from base_dir / name, the relative_to call will always succeed, so no additional exception handling is needed here.

pyproject.toml (1)

26-26: No security concerns with pathspec version 0.11.0.

No direct vulnerabilities have been found for pathspec in Snyk's vulnerability database, and the latest available version is 0.12.1. The >=0.11.0 constraint is appropriate and allows upgrades to newer secure releases.

claude · 2025-12-18T10:56:29Z

Code Review: PR #10 - Honor .gitignore and .cursorignore during AI file sampling

Overview

Excellent PR! This addresses a real problem where template/scaffolding files were polluting AI analysis and causing incorrect project categorization. The implementation is solid and follows best practices.

✅ Strengths

Proper Library Choice: Using pathspec library with gitwildmatch pattern matching is the correct approach - it properly handles gitignore semantics (negation patterns, directory matching, etc.)
Defensive Error Handling: Good use of try-except blocks when reading ignore files (lines 52-57, 62-67) with appropriate warning logs rather than failing hard
Comprehensive Coverage: Supports both .gitignore and .cursorignore files, accumulating patterns from both sources
README Protection: The ignore patterns are correctly applied to priority README files (lines 97-100), not just the file walk
Missing Import Fixes: Correctly added missing glob, re, and List imports that were referenced but not imported
Type Hints: Good addition of List type hint for the patterns variable

🔍 Code Quality Observations

Architecture & Design

The _load_ignore_patterns() method is well-encapsulated and reusable
The ignore checking is applied consistently in both README discovery and file walk sections
Pattern matching happens early in the file processing pipeline (before size checks), which is efficient

Error Handling

Line 124-125: The bare except Exception when computing relative paths could be more specific. This might silently catch issues like permission errors or path resolution failures. Consider:

except (ValueError, OSError) as e:
    self.logger.debug(f"Could not compute relative path for {p}: {e}")
    continue

Performance Considerations

✅ Good: Patterns are loaded once at the start of get_file_samples()
✅ Good: PathSpec matching is very efficient (O(1) for most patterns)
⚠️ Minor: Each file match requires computing relative_to(base_dir). This is probably fine, but for very large repos, you might cache the base_dir string length and use string slicing instead

🐛 Potential Issues

1. Empty Pattern Handling

When both .gitignore and .cursorignore are missing or empty, pathspec.PathSpec.from_lines('gitwildmatch', []) is called. This should work fine (matches nothing), but it's worth verifying this edge case.

Suggestion: Add a debug log when no patterns are found:

if not patterns:
    self.logger.debug("No ignore patterns found in .gitignore or .cursorignore")

2. Pattern Filtering

The code doesn't filter out comments (#) or empty lines from the pattern files. While pathspec might handle this gracefully, it's worth confirming. Gitignore files often have comments and blank lines.

Suggestion: Filter patterns when loading:

patterns.extend(
    line for line in f.read().splitlines() 
    if line.strip() and not line.strip().startswith('#')
)

3. Cross-Platform Path Handling

On Windows, pathspec expects forward slashes, but Path.relative_to() might return backslashes. This could cause pattern mismatches.

Verification needed: Test on Windows to ensure patterns match correctly.

Potential fix:

relative_path = str(p.relative_to(base_dir)).replace('\\', '/')

🔒 Security Considerations

✅ Safe: Using utf-8 encoding with proper error handling when reading ignore files
✅ Safe: No injection risks - pathspec library safely handles arbitrary patterns
✅ Safe: File operations are contained to the project directory

🧪 Test Coverage

Current state: As noted in CLAUDE.md, this project has no formal test suite.

Recommended test cases (when test infrastructure is added):

Test with missing .gitignore/.cursorignore files
Test with empty ignore files
Test with comment lines and blank lines in ignore files
Test pattern matching against various gitignore patterns:
- Simple patterns: *.log
- Directory patterns: node_modules/
- Negation patterns: !important.log
- Wildcard patterns: **/build/**
Test README filtering works when README is ignored
Test cross-platform path handling (Windows vs Unix)
Test with projects containing .cursorignore only, .gitignore only, or both

📝 Documentation

Suggestions:

Add docstring example to _load_ignore_patterns():

"""Load patterns from .gitignore and .cursorignore files.

Returns a PathSpec object that can be used to check if files should be ignored.

Example:
    >>> ignore_spec = self._load_ignore_patterns(Path('/project'))
    >>> ignore_spec.match_file('node_modules/package.json')  # True
    >>> ignore_spec.match_file('src/main.py')  # False
"""

Consider updating CLAUDE.md's "AI Tagging Service" section to mention gitignore support:
- Respects .gitignore and .cursorignore patterns to exclude template/scaffolding files

🎯 Recommendations

Before merging:

⚠️ IMPORTANT: Test on Windows to verify path separator handling
Add pattern filtering for comments and blank lines
Add debug log for empty pattern case
Make the bare except more specific (line 124)

Future improvements:

Add test suite for this functionality (aligns with Phase 6 in CLAUDE.md)
Consider caching PathSpec objects if the same project is analyzed multiple times
Add metrics logging: "Filtered X files based on ignore patterns"

🎖️ Final Assessment

Quality Score: 8.5/10

Recommendation: ✅ Approve with minor suggestions

This PR successfully solves the stated problem and demonstrates good engineering practices. The implementation is clean, well-structured, and follows the project's conventions. The main concerns are Windows path handling (needs verification) and pattern parsing robustness (comments/blanks). These are minor issues that can be addressed in follow-up commits if needed.

The addition of proper gitignore support is a significant quality improvement that will make AI tagging much more accurate and reliable.

Great work! 🚀

coderabbitai Bot reviewed Dec 18, 2025

View reviewed changes

pl018 merged commit d075afd into master Dec 18, 2025
3 checks passed

pl018 deleted the claude/respect-ignore-files-qOz6W branch December 18, 2025 11:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Honor .gitignore and .cursorignore during AI file sampling#10

feat: Honor .gitignore and .cursorignore during AI file sampling#10
pl018 merged 1 commit intomasterfrom
claude/respect-ignore-files-qOz6W

pl018 commented Dec 18, 2025 •

edited by coderabbitai Bot

Loading

Uh oh!

bolt-new-by-stackblitz Bot commented Dec 18, 2025

Uh oh!

coderabbitai Bot commented Dec 18, 2025 •

edited

Loading

Uh oh!

coderabbitai Bot left a comment

Uh oh!

claude Bot commented Dec 18, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

pl018 commented Dec 18, 2025 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

bolt-new-by-stackblitz Bot commented Dec 18, 2025

Uh oh!

coderabbitai Bot commented Dec 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Poem

Pre-merge checks and finishing touches

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

claude Bot commented Dec 18, 2025

Code Review: PR #10 - Honor .gitignore and .cursorignore during AI file sampling

Overview

✅ Strengths

🔍 Code Quality Observations

Architecture & Design

Error Handling

Performance Considerations

🐛 Potential Issues

1. Empty Pattern Handling

2. Pattern Filtering

3. Cross-Platform Path Handling

🔒 Security Considerations

🧪 Test Coverage

📝 Documentation

🎯 Recommendations

🎖️ Final Assessment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

pl018 commented Dec 18, 2025 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Dec 18, 2025 •

edited

Loading