Skip to content

Add --recursive option for explicit directory traversal#4

Merged
kompre merged 1 commit into
mainfrom
claude/issue-3-20251020-0603
Oct 20, 2025
Merged

Add --recursive option for explicit directory traversal#4
kompre merged 1 commit into
mainfrom
claude/issue-3-20251020-0603

Conversation

@kompre
Copy link
Copy Markdown
Owner

@kompre kompre commented Oct 20, 2025

Implements the --recursive option to enable consistent directory traversal across different shells.

Fixes #3

Changes

  • Add -r/--recursive flag to CLI options
  • Implement collect_files_from_directory() helper function
  • Update file collection logic to handle directories with recursive flag

This ensures the tool behaves consistently regardless of the shell being used (bash vs PowerShell).


Generated with Claude Code

- Add -r/--recursive flag to CLI options
- Implement collect_files_from_directory() helper function
- Update file collection logic to handle directories with recursive flag
- Ensures consistent behavior across shells (bash vs pwsh)

Fixes #3

Co-authored-by: kompre <kompre@users.noreply.github.com>
@claude
Copy link
Copy Markdown

claude Bot commented Oct 20, 2025

Code Review - PR #4: Add --recursive option for explicit directory traversal

Summary

This PR successfully addresses issue #3 by implementing a --recursive flag to ensure consistent directory traversal behavior across different shells. The implementation is clean and functional.


Code Quality & Best Practices ✓

Strengths:

  • Well-documented function with clear docstring (collect_files_from_directory)
  • Clean separation of concerns with dedicated helper function
  • Consistent code style matching the existing codebase
  • Proper integration with existing CLI option pattern

Suggestions:

  1. Consider using pathlib.Path: The codebase uses os.path throughout, but for modern Python (3.6+), pathlib.Path offers cleaner path manipulation:
    from pathlib import Path
    
    def collect_files_from_directory(directory, extension, recursive=False):
        directory = Path(directory)
        if recursive:
            return [str(p) for p in directory.rglob(f"*{extension}")]
        return [str(p) for p in directory.glob(f"*{extension}") if p.is_file()]
    This would make the code more concise and Pythonic (though not blocking for this PR).

Potential Bugs & Issues

Critical Issue Found:

  • Line 212 comment is misleading: The comment says # else: path might be from shell glob expansion that doesn't exist anymore, skip it
    • This silently skips invalid paths without warning the user
    • If a user provides a path that doesn't exist or isn't a file/directory with the right extension, they get no feedback
    • Recommendation: Add an else clause to warn about skipped paths:
    else:
        # Path doesn't exist or doesn't match criteria
        if not os.path.exists(path):
            click.echo(f"Warning: Path does not exist: {path}", err=True)
        elif path.endswith(input_extension):
            click.echo(f"Warning: Path exists but is not a file or directory: {path}", err=True)

Minor Issues:

  1. os.listdir() can raise exceptions (line 56): If the directory is inaccessible or permissions are denied, os.listdir() will raise an exception. Consider wrapping in try-except:

    try:
        for item in os.listdir(directory):
            # ... existing code
    except PermissionError:
        click.echo(f"Warning: Permission denied accessing directory: {directory}", err=True)
  2. Symlink handling: Neither os.walk() nor os.listdir() explicitly handle symlinks. Consider whether symlinks should be followed or whether followlinks=False should be set in os.walk().


Performance Considerations

Overall: Good performance characteristics

  • os.walk() is efficient for recursive traversal
  • Files are collected before parallel processing begins (correct approach)
  • No obvious performance bottlenecks

Potential optimization:

  • For very large directories, you might consider yielding files incrementally instead of building the entire list, but this would require refactoring the conversion workflow

Security Concerns

Low Risk - No major security issues

Minor considerations:

  1. Path traversal: Since os.path.join() is used throughout, malicious paths with .. could potentially escape intended directories. However, this is limited risk since:

    • Input paths come from command-line arguments (user controls their own environment)
    • Click's Path(exists=True) validation provides some protection
  2. Symlink attacks: If recursive follows symlinks into sensitive areas, this could be a concern. Consider documenting the behavior or making it configurable.


Test Coverage ⚠️

Missing Tests for New Functionality:

The PR adds significant new functionality but does not include tests for the --recursive flag. The existing test file tests/test_quarto_batch_convert.py has comprehensive tests for other features, but none for:

  1. Testing directory input without -r (should only process files in top-level)
  2. Testing directory input with -r (should process files in subdirectories)
  3. Testing nested directory structures
  4. Testing mixed file/directory inputs with -r

Recommended test cases to add:

def test_directory_without_recursive(setup_teardown_test_env):
    """Test that directories are processed non-recursively by default."""
    runner = CliRunner()
    test_dir = setup_teardown_test_env
    notebooks_dir = os.path.join(test_dir, "notebooks")
    
    result = runner.invoke(convert_files, [notebooks_dir])
    
    assert result.exit_code == 0
    # Should only process files in notebooks/, not subdirectories

def test_directory_with_recursive(setup_teardown_test_env):
    """Test that --recursive flag processes subdirectories."""
    runner = CliRunner()
    test_dir = setup_teardown_test_env
    
    # Create nested structure
    nested_dir = os.path.join(test_dir, "notebooks", "subfolder")
    os.makedirs(nested_dir, exist_ok=True)
    shutil.copy("tests/assets/__test.ipynb", 
                os.path.join(nested_dir, "nested.ipynb"))
    
    result = runner.invoke(convert_files, [test_dir, "-r"])
    
    assert result.exit_code == 0
    assert "nested.ipynb" in result.output

Documentation

Missing:

  • README or documentation should be updated to mention the new -r/--recursive flag
  • The docstring example in line 166-173 doesn't show the recursive flag usage

Suggested addition to docstring:

    - qbc -r ./notebooks    # Process all .ipynb files recursively

Summary & Recommendation

Overall Assessment: APPROVE with minor suggestions

This PR successfully solves the stated problem and the code is well-written. However, I recommend addressing before merging:

Must fix:

  • Add test coverage for the recursive functionality

Should fix:

  • Add warning messages for skipped paths (line 212)
  • Add error handling for permission issues in collect_files_from_directory

Nice to have:

  • Update documentation/examples
  • Consider the pathlib refactor for cleaner code

Great work on solving the shell inconsistency issue! The implementation is solid and the new function is well-structured.


Review generated with Claude Code

@kompre kompre merged commit 933100a into main Oct 20, 2025
1 check passed
@kompre kompre deleted the claude/issue-3-20251020-0603 branch October 20, 2025 06:15
@kompre
Copy link
Copy Markdown
Owner Author

kompre commented Oct 20, 2025

@claude create new issues based on the code review.

@claude
Copy link
Copy Markdown

claude Bot commented Oct 20, 2025

Claude finished @kompre's task —— View job


Unable to Create Issues Directly

I've analyzed the code review and identified the following issues that should be created, but I don't have permission to create GitHub issues using the gh CLI tool.

Recommended Issues to Create:

Issue 1: Add warning messages for skipped/invalid paths

Priority: Medium
Labels: enhancement

Description:
Currently, when a path provided to the CLI doesn't exist or doesn't match the expected criteria, it's silently skipped without any user feedback (see line 212 in quarto_batch_convert.py).

Problem:
The comment on line 212 states:

# else: path might be from shell glob expansion that doesn't exist anymore, skip it

This silently skips invalid paths without warning the user. If a user provides a path that doesn't exist or isn't a file/directory with the right extension, they get no feedback.

Proposed Solution:

else:
    # Path doesn't exist or doesn't match criteria
    if not os.path.exists(path):
        click.echo(f"Warning: Path does not exist: {path}", err=True)
    elif path.endswith(input_extension):
        click.echo(f"Warning: Path exists but is not a file or directory: {path}", err=True)

Issue 2: Add error handling for directory access in collect_files_from_directory

Priority: Medium
Labels: bug, enhancement

Description:
The collect_files_from_directory() function (line 38-60) doesn't handle permission errors when accessing directories.

Problem:
If the directory is inaccessible or permissions are denied, os.listdir() (line 56) will raise an exception that crashes the program.

Proposed Solution:

else:
    try:
        for item in os.listdir(directory):
            item_path = os.path.join(directory, item)
            if os.path.isfile(item_path) and item.endswith(extension):
                files.append(item_path)
    except PermissionError:
        click.echo(f"Warning: Permission denied accessing directory: {directory}", err=True)

Issue 3: Add test coverage for --recursive flag

Priority: High
Labels: testing

Description:
PR #4 added the --recursive flag but did not include tests for this new functionality.

Missing Test Cases:

  1. Testing directory input without -r (should only process files in top-level)
  2. Testing directory input with -r (should process files in subdirectories)
  3. Testing nested directory structures
  4. Testing mixed file/directory inputs with -r

Recommended Test Example:

def test_directory_with_recursive(setup_teardown_test_env):
    """Test that --recursive flag processes subdirectories."""
    runner = CliRunner()
    test_dir = setup_teardown_test_env
    
    # Create nested structure
    nested_dir = os.path.join(test_dir, "notebooks", "subfolder")
    os.makedirs(nested_dir, exist_ok=True)
    shutil.copy("tests/assets/__test.ipynb", 
                os.path.join(nested_dir, "nested.ipynb"))
    
    result = runner.invoke(convert_files, [test_dir, "-r"])
    
    assert result.exit_code == 0
    assert "nested.ipynb" in result.output

Issue 4: Update documentation for --recursive flag

Priority: Medium
Labels: documentation

Description:
The --recursive flag was added in PR #4 but the documentation and examples weren't updated.

Missing Updates:

  1. README should mention the new -r/--recursive flag
  2. The docstring example (lines 168-173) doesn't show recursive flag usage

Suggested Addition to Docstring:

    - qbc -r ./notebooks    # Process all .ipynb files recursively

Issue 5: Consider refactoring to use pathlib for cleaner path handling

Priority: Low
Labels: enhancement, refactoring

Description:
The codebase currently uses os.path throughout. For modern Python (3.6+), pathlib.Path offers cleaner and more Pythonic path manipulation.

Potential Benefits:

  • More concise code
  • Better cross-platform compatibility
  • More intuitive API

Example Refactor for collect_files_from_directory:

from pathlib import Path

def collect_files_from_directory(directory, extension, recursive=False):
    directory = Path(directory)
    if recursive:
        return [str(p) for p in directory.rglob(f"*{extension}")]
    return [str(p) for p in directory.glob(f"*{extension}") if p.is_file()]

Note: This is a nice-to-have enhancement, not blocking.


Issue 6: Document or configure symlink handling behavior

Priority: Low
Labels: documentation, enhancement

Description:
Neither os.walk() nor os.listdir() explicitly handle symlinks in the current implementation.

Questions to Address:

  • Should symlinks be followed during recursive traversal?
  • Should there be a flag to control this behavior?
  • What are the security implications?

Suggestion:
Document the current behavior and consider whether followlinks=False should be explicitly set in os.walk() for security.


Summary

I've identified 6 issues from the code review:

  • High Priority: Issue 3 (test coverage)
  • Medium Priority: Issues 1, 2, 4 (warnings, error handling, documentation)
  • Low Priority: Issues 5, 6 (refactoring suggestions)

You'll need to create these issues manually or grant the necessary permissions for me to create them using the gh CLI.


Job Run

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add --recursive option

1 participant