Skip to content

feat: Honor .gitignore and .cursorignore during AI file sampling#10

Merged
pl018 merged 1 commit intomasterfrom
claude/respect-ignore-files-qOz6W
Dec 18, 2025
Merged

feat: Honor .gitignore and .cursorignore during AI file sampling#10
pl018 merged 1 commit intomasterfrom
claude/respect-ignore-files-qOz6W

Conversation

@pl018
Copy link
Copy Markdown
Owner

@pl018 pl018 commented Dec 18, 2025

Add proper gitignore support to prevent template/scaffolding files (like BMAD method files) from being included in AI analysis. This fixes incorrect project categorization caused by including files that should be ignored.

Changes:

  • Add pathspec dependency for gitignore pattern matching
  • Implement _load_ignore_patterns() method to parse .gitignore and .cursorignore
  • Filter all sampled files against ignore patterns before AI analysis
  • Fix missing imports (glob, re) in project_service.py

Resolves issue where projects were incorrectly tagged based on template files that should have been ignored.

Summary by CodeRabbit

  • New Features
    • File sampling now respects .gitignore and .cursorignore patterns, automatically filtering out ignored files during project analysis and README discovery.

✏️ Tip: You can customize this high-level summary in your review settings.

Add proper gitignore support to prevent template/scaffolding files (like BMAD method files) from being included in AI analysis. This fixes incorrect project categorization caused by including files that should be ignored.

Changes:
- Add pathspec dependency for gitignore pattern matching
- Implement _load_ignore_patterns() method to parse .gitignore and .cursorignore
- Filter all sampled files against ignore patterns before AI analysis
- Fix missing imports (glob, re) in project_service.py

Resolves issue where projects were incorrectly tagged based on template files that should have been ignored.
@bolt-new-by-stackblitz
Copy link
Copy Markdown

Review PR in StackBlitz Codeflow Run & review this pull request in StackBlitz Codeflow.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Dec 18, 2025

Walkthrough

Added pathspec dependency to support ignore pattern loading. Implemented _load_ignore_patterns method in ProjectContext to read .gitignore and .cursorignore files. Integrated ignore checks into file sampling to filter out ignored files during directory traversal.

Changes

Cohort / File(s) Change Summary
Dependency Addition
pyproject.toml
Added runtime dependency pathspec>=0.11.0 to project dependencies
Ignore Pattern Support
src/project_manager_cli/services/project_service.py
Added _load_ignore_patterns method to read .gitignore and .cursorignore files and build pathspec; integrated ignore filtering into file sampling logic to skip ignored files during traversal; added imports (glob, re, List, pathspec)

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

  • Verify pathspec is correctly built from both .gitignore and .cursorignore files
  • Check defensive handling in path resolution and exception handling
  • Confirm ignore filtering is applied consistently across all file sampling paths

Poem

🐰 With pathspec in paw, this rabbit did weave,
Ignore patterns fine—files to retrieve!
.gitignore and .cursor align in the fold,
Filtering treasures, both new and bold! 📋✨

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and specifically describes the main change: adding support for .gitignore and .cursorignore patterns during file sampling.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch claude/respect-ignore-files-qOz6W

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
src/project_manager_cli/services/project_service.py (2)

42-71: Implementation looks correct.

The method properly loads patterns from both ignore files and constructs a PathSpec using the correct 'gitwildmatch' pattern style for gitignore syntax. The defensive exception handling ensures the tool continues to function even if ignore files are unreadable.

Note: Static analysis flags broad exception catching (lines 56, 67), but this is acceptable here for robustness since we want to gracefully degrade if ignore files can't be read.


119-125: Consider adding logging to the exception handler.

The ignore pattern check works correctly, but the broad exception handler silently skips any path resolution issues. While relative_to should normally succeed during os.walk, adding a debug log would help diagnose unexpected failures.

🔎 Suggested improvement:
                     # Check if file is ignored by .gitignore or .cursorignore
                     try:
                         relative_path = str(p.relative_to(base_dir))
                         if ignore_spec.match_file(relative_path):
                             continue
-                    except Exception:
+                    except Exception as e:
+                        self.logger.debug(f"Could not check ignore status for {p}: {e}")
                         continue
📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 79aaf02 and 2396f7a.

📒 Files selected for processing (2)
  • pyproject.toml (1 hunks)
  • src/project_manager_cli/services/project_service.py (5 hunks)
🧰 Additional context used
🪛 Ruff (0.14.8)
src/project_manager_cli/services/project_service.py

56-56: Do not catch blind exception: Exception

(BLE001)


57-57: Use explicit conversion flag

Replace with conversion flag

(RUF010)


67-67: Do not catch blind exception: Exception

(BLE001)


68-68: Use explicit conversion flag

Replace with conversion flag

(RUF010)


124-125: try-except-continue detected, consider logging the exception

(S112)


124-124: Do not catch blind exception: Exception

(BLE001)

🔇 Additional comments (4)
src/project_manager_cli/services/project_service.py (3)

3-11: LGTM! Missing imports fixed.

The addition of glob and re imports fixes previously missing imports (used at lines 228, 232-233), and the new List and pathspec imports support the ignore pattern functionality.


81-82: LGTM!

Loading ignore patterns once at the beginning of file sampling is the correct approach for efficiency and consistency.


97-100: LGTM!

The README filtering correctly respects ignore patterns. Since the path is constructed from base_dir / name, the relative_to call will always succeed, so no additional exception handling is needed here.

pyproject.toml (1)

26-26: No security concerns with pathspec version 0.11.0.

No direct vulnerabilities have been found for pathspec in Snyk's vulnerability database, and the latest available version is 0.12.1. The >=0.11.0 constraint is appropriate and allows upgrades to newer secure releases.

@claude
Copy link
Copy Markdown

claude Bot commented Dec 18, 2025

Code Review: PR #10 - Honor .gitignore and .cursorignore during AI file sampling

Overview

Excellent PR! This addresses a real problem where template/scaffolding files were polluting AI analysis and causing incorrect project categorization. The implementation is solid and follows best practices.


✅ Strengths

  1. Proper Library Choice: Using pathspec library with gitwildmatch pattern matching is the correct approach - it properly handles gitignore semantics (negation patterns, directory matching, etc.)

  2. Defensive Error Handling: Good use of try-except blocks when reading ignore files (lines 52-57, 62-67) with appropriate warning logs rather than failing hard

  3. Comprehensive Coverage: Supports both .gitignore and .cursorignore files, accumulating patterns from both sources

  4. README Protection: The ignore patterns are correctly applied to priority README files (lines 97-100), not just the file walk

  5. Missing Import Fixes: Correctly added missing glob, re, and List imports that were referenced but not imported

  6. Type Hints: Good addition of List type hint for the patterns variable


🔍 Code Quality Observations

Architecture & Design

  • The _load_ignore_patterns() method is well-encapsulated and reusable
  • The ignore checking is applied consistently in both README discovery and file walk sections
  • Pattern matching happens early in the file processing pipeline (before size checks), which is efficient

Error Handling

Line 124-125: The bare except Exception when computing relative paths could be more specific. This might silently catch issues like permission errors or path resolution failures. Consider:

except (ValueError, OSError) as e:
    self.logger.debug(f"Could not compute relative path for {p}: {e}")
    continue

Performance Considerations

  • Good: Patterns are loaded once at the start of get_file_samples()
  • Good: PathSpec matching is very efficient (O(1) for most patterns)
  • ⚠️ Minor: Each file match requires computing relative_to(base_dir). This is probably fine, but for very large repos, you might cache the base_dir string length and use string slicing instead

🐛 Potential Issues

1. Empty Pattern Handling

When both .gitignore and .cursorignore are missing or empty, pathspec.PathSpec.from_lines('gitwildmatch', []) is called. This should work fine (matches nothing), but it's worth verifying this edge case.

Suggestion: Add a debug log when no patterns are found:

if not patterns:
    self.logger.debug("No ignore patterns found in .gitignore or .cursorignore")

2. Pattern Filtering

The code doesn't filter out comments (#) or empty lines from the pattern files. While pathspec might handle this gracefully, it's worth confirming. Gitignore files often have comments and blank lines.

Suggestion: Filter patterns when loading:

patterns.extend(
    line for line in f.read().splitlines() 
    if line.strip() and not line.strip().startswith('#')
)

3. Cross-Platform Path Handling

On Windows, pathspec expects forward slashes, but Path.relative_to() might return backslashes. This could cause pattern mismatches.

Verification needed: Test on Windows to ensure patterns match correctly.

Potential fix:

relative_path = str(p.relative_to(base_dir)).replace('\\', '/')

🔒 Security Considerations

  • Safe: Using utf-8 encoding with proper error handling when reading ignore files
  • Safe: No injection risks - pathspec library safely handles arbitrary patterns
  • Safe: File operations are contained to the project directory

🧪 Test Coverage

Current state: As noted in CLAUDE.md, this project has no formal test suite.

Recommended test cases (when test infrastructure is added):

  1. Test with missing .gitignore/.cursorignore files
  2. Test with empty ignore files
  3. Test with comment lines and blank lines in ignore files
  4. Test pattern matching against various gitignore patterns:
    • Simple patterns: *.log
    • Directory patterns: node_modules/
    • Negation patterns: !important.log
    • Wildcard patterns: **/build/**
  5. Test README filtering works when README is ignored
  6. Test cross-platform path handling (Windows vs Unix)
  7. Test with projects containing .cursorignore only, .gitignore only, or both

📝 Documentation

Suggestions:

  1. Add docstring example to _load_ignore_patterns():

    """Load patterns from .gitignore and .cursorignore files.
    
    Returns a PathSpec object that can be used to check if files should be ignored.
    
    Example:
        >>> ignore_spec = self._load_ignore_patterns(Path('/project'))
        >>> ignore_spec.match_file('node_modules/package.json')  # True
        >>> ignore_spec.match_file('src/main.py')  # False
    """
  2. Consider updating CLAUDE.md's "AI Tagging Service" section to mention gitignore support:

    • Respects .gitignore and .cursorignore patterns to exclude template/scaffolding files

🎯 Recommendations

Before merging:

  1. ⚠️ IMPORTANT: Test on Windows to verify path separator handling
  2. Add pattern filtering for comments and blank lines
  3. Add debug log for empty pattern case
  4. Make the bare except more specific (line 124)

Future improvements:

  1. Add test suite for this functionality (aligns with Phase 6 in CLAUDE.md)
  2. Consider caching PathSpec objects if the same project is analyzed multiple times
  3. Add metrics logging: "Filtered X files based on ignore patterns"

🎖️ Final Assessment

Quality Score: 8.5/10

Recommendation: ✅ Approve with minor suggestions

This PR successfully solves the stated problem and demonstrates good engineering practices. The implementation is clean, well-structured, and follows the project's conventions. The main concerns are Windows path handling (needs verification) and pattern parsing robustness (comments/blanks). These are minor issues that can be addressed in follow-up commits if needed.

The addition of proper gitignore support is a significant quality improvement that will make AI tagging much more accurate and reliable.

Great work! 🚀

@pl018 pl018 merged commit d075afd into master Dec 18, 2025
3 checks passed
@pl018 pl018 deleted the claude/respect-ignore-files-qOz6W branch December 18, 2025 11:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants