diff --git a/README.md b/README.md
index b4921a9..ee5c46d 100644
--- a/README.md
+++ b/README.md
@@ -11,6 +11,7 @@ corpus.
database in Atlas.
- `dodec`, or the Database of Devoured Example Code: a query tool that lets us find code examples and related
metadata in the database for reporting or to perform manual updates.
+- `audit-cli`: A Go CLI project to help us audit docs content from files on the local filesystem.
- `examples-copier`: a Go app that runs as a GitHub App and copies files from the
source code repo (generated code examples) to multiple target repos and branches.
- `github-check-releases`: a Node.js script that gets the latest release versions
diff --git a/audit-cli/.gitignore b/audit-cli/.gitignore
new file mode 100644
index 0000000..bf1138a
--- /dev/null
+++ b/audit-cli/.gitignore
@@ -0,0 +1 @@
+audit-cli
diff --git a/audit-cli/README.md b/audit-cli/README.md
new file mode 100644
index 0000000..8b2cb50
--- /dev/null
+++ b/audit-cli/README.md
@@ -0,0 +1,1146 @@
+# audit-cli
+
+A Go CLI tool for extracting and analyzing code examples from MongoDB documentation written in reStructuredText (RST).
+
+## Table of Contents
+
+- [Overview](#overview)
+- [Installation](#installation)
+- [Usage](#usage)
+ - [Extract Commands](#extract-commands)
+ - [Search Commands](#search-commands)
+ - [Analyze Commands](#analyze-commands)
+ - [Compare Commands](#compare-commands)
+- [Development](#development)
+ - [Project Structure](#project-structure)
+ - [Adding New Commands](#adding-new-commands)
+ - [Testing](#testing)
+ - [Code Patterns](#code-patterns)
+- [Supported RST Directives](#supported-rst-directives)
+
+## Overview
+
+This CLI tool helps maintain code quality across MongoDB's documentation by:
+
+1. **Extracting code examples** from RST files into individual, testable files
+2. **Searching extracted code** for specific patterns or substrings
+3. **Analyzing include relationships** to understand file dependencies
+4. **Comparing file contents** across documentation versions to identify differences
+5. **Following include directives** to process entire documentation trees
+6. **Handling MongoDB-specific conventions** like steps files, extracts, and template variables
+
+## Installation
+
+### Build from Source
+
+```bash
+cd audit-cli
+go build
+```
+
+This creates an `audit-cli` executable in the current directory.
+
+### Run Without Building
+
+```bash
+cd audit-cli
+go run main.go [command] [flags]
+```
+
+## Usage
+
+The CLI is organized into parent commands with subcommands:
+
+```
+audit-cli
+├── extract # Extract content from RST files
+│ └── code-examples
+├── search # Search through extracted content or source files
+│ └── find-string
+├── analyze # Analyze RST file structures
+│ └── includes
+└── compare # Compare files across versions
+ └── file-contents
+```
+
+### Extract Commands
+
+#### `extract code-examples`
+
+Extract code examples from reStructuredText files into individual files. For details about what code example directives
+are supported and how, refer to the [Supported rST Directives - Code Example Extraction](#code-example-extraction)
+section below.
+
+**Use Cases:**
+
+This command helps writers:
+- Examine all the code examples that make up a specific page or section
+- Split out code examples into individual files for migration to test infrastructure
+- Report on the number of code examples by language
+- Report on the number of code examples by directive type
+- Use additional commands, such as search, to find strings within specific code examples
+
+**Basic Usage:**
+
+```bash
+# Extract from a single file
+./audit-cli extract code-examples path/to/file.rst -o ./output
+
+# Extract from a directory (non-recursive)
+./audit-cli extract code-examples path/to/docs -o ./output
+
+# Extract recursively from all subdirectories
+./audit-cli extract code-examples path/to/docs -o ./output -r
+
+# Follow include directives
+./audit-cli extract code-examples path/to/file.rst -o ./output -f
+
+# Combine recursive scanning and include following
+./audit-cli extract code-examples path/to/docs -o ./output -r -f
+
+# Dry run (show what would be extracted without writing files)
+./audit-cli extract code-examples path/to/file.rst -o ./output --dry-run
+
+# Verbose output
+./audit-cli extract code-examples path/to/file.rst -o ./output -v
+```
+
+**Flags:**
+
+- `-o, --output
` - Output directory for extracted files (default: `./output`)
+- `-r, --recursive` - Recursively scan directories for RST files. If you do not provide this flag, the tool will only
+ extract code examples from the top-level RST file. If you do provide this flag, the tool will recursively scan all
+ subdirectories for RST files and extract code examples from all files.
+- `-f, --follow-includes` - Follow `.. include::` directives in RST files. If you do not provide this flag, the tool
+ will only extract code examples from the top-level RST file. If you do provide this flag, the tool will follow any
+ `.. include::` directives in the RST file and extract code examples from all included files. When combined with `-r`,
+ the tool will recursively scan all subdirectories for RST files and follow `.. include::` directives in all files. If
+ an include filepath is *outside* the input directory, the `-r` flag would not parse it, but the `-f` flag would
+ follow the include directive and parse the included file. This effectively lets you parse all the files that make up
+ a single page, if you start from the page's root `.txt` file.
+- `--dry-run` - Show what would be extracted without writing files
+- `-v, --verbose` - Show detailed processing information
+
+**Output Format:**
+
+Extracted files are named: `{source-base}.{directive-type}.{index}.{ext}`
+
+Examples:
+- `my-doc.code-block.1.js` - First code-block from my-doc.rst
+- `my-doc.literalinclude.2.py` - Second literalinclude from my-doc.rst
+- `my-doc.io-code-block.1.input.js` - Input from first io-code-block
+- `my-doc.io-code-block.1.output.json` - Output from first io-code-block
+
+**Report:**
+
+After extraction, the code extraction report shows:
+- Number of files traversed
+- Number of output files written
+- Code examples by language
+- Code examples by directive type
+
+### Search Commands
+
+#### `search find-string`
+
+Search through files for a specific substring. Can search through extracted code example files or RST source files.
+
+**Default Behavior:**
+- **Case-insensitive** search (matches "curl", "CURL", "Curl", etc.)
+- **Exact word matching** (excludes partial matches like "curl" in "libcurl")
+
+Use `--case-sensitive` to make the search case-sensitive, or `--partial-match` to allow matching the substring as part
+of larger words.
+
+**Use Cases:**
+
+This command helps writers:
+- Find specific strings across documentation files or pages
+ - Search for product names, command names, API methods, or other strings that may need to be updated
+- Understand the number of references and impact of changes across documentation files or pages
+- Identify files that need to be updated when a string needs to be changed
+- Scope work related to specific changes
+
+**Basic Usage:**
+
+```bash
+# Search in a single file (case-insensitive, exact word match)
+./audit-cli search find-string path/to/file.js "curl"
+
+# Search in a directory (non-recursive)
+./audit-cli search find-string path/to/output "substring"
+
+# Search recursively
+./audit-cli search find-string path/to/output "substring" -r
+
+# Search an RST file and all files it includes
+./audit-cli search find-string path/to/source.rst "substring" -f
+
+# Search a directory recursively and follow includes in RST files
+./audit-cli search find-string path/to/source "substring" -r -f
+
+# Verbose output (show file paths and language breakdown)
+./audit-cli search find-string path/to/output "substring" -r -v
+
+# Case-sensitive search (only matches exact case)
+./audit-cli search find-string path/to/output "CURL" --case-sensitive
+
+# Partial match (includes "curl" in "libcurl")
+./audit-cli search find-string path/to/output "curl" --partial-match
+
+# Combine flags for case-sensitive partial matching
+./audit-cli search find-string path/to/output "curl" --case-sensitive --partial-match
+```
+
+**Flags:**
+
+- `-r, --recursive` - Recursively scan directories for RST files. If you do not provide this flag, the tool will only
+ search within the top-level RST file or directory. If you do provide this flag, the tool will recursively scan all
+ subdirectories for RST files and search across all files.
+- `-f, --follow-includes` - Follow `.. include::` directives in RST files. If you do not provide this flag, the tool
+ will search only the top-level RST file or directory. If you do provide this flag, the tool will follow any
+ `.. include::` directives in any RST file in the input path and search across all included files. When
+ combined with `-r`, the tool will recursively scan all subdirectories for RST files and follow `.. include::` directives
+ in all files. If an include filepath is *outside* the input directory, the `-r` flag would not parse it, but the `-f`
+ flag would follow the include directive and search the included file. This effectively lets you parse all the files
+ that make up a single page, if you start from the page's root `.txt` file.
+- `-v, --verbose` - Show file paths and language breakdown
+- `--case-sensitive` - Make search case-sensitive (default: case-insensitive)
+- `--partial-match` - Allow partial matches within words (default: exact word matching)
+
+**Report:**
+
+The search report shows:
+- Number of files scanned
+- Number of files containing the substring (each file counted once)
+
+With `-v` flag, also shows:
+- List of file paths where substring appears
+- Count broken down by language (file extension)
+
+### Analyze Commands
+
+#### `analyze includes`
+
+Analyze `include` directive relationships in RST files to understand file dependencies.
+
+**Use Cases:**
+
+This command helps writers:
+- Understand the impact of changes to widely-included files
+- Identify circular include dependencies (files included multiple times)
+- Document file relationships for maintenance
+- Plan refactoring of complex include structures
+
+**Basic Usage:**
+
+```bash
+# Analyze a single file (shows summary)
+./audit-cli analyze includes path/to/file.rst
+
+# Show hierarchical tree structure
+./audit-cli analyze includes path/to/file.rst --tree
+
+# Show flat list of all included files
+./audit-cli analyze includes path/to/file.rst --list
+
+# Show both tree and list
+./audit-cli analyze includes path/to/file.rst --tree --list
+
+# Verbose output (show processing details)
+./audit-cli analyze includes path/to/file.rst --tree -v
+```
+
+**Flags:**
+
+- `--tree` - Display results as a hierarchical tree structure
+- `--list` - Display results as a flat list of all files
+- `-v, --verbose` - Show detailed processing information
+
+**Output Formats:**
+
+**Summary** (default - no flags):
+- Root file path
+- Total number of files
+- Maximum depth of include nesting
+- Hints to use --tree or --list for more details
+
+**Tree** (--tree flag):
+- Hierarchical tree structure showing include relationships
+- Uses box-drawing characters for visual clarity
+- Shows which files include which other files
+
+**List** (--list flag):
+- Flat numbered list of all files
+- Files listed in depth-first traversal order
+- Shows absolute paths to all files
+
+**Note on File Counting:**
+
+The total file count represents **unique files** discovered through include directives. If a file is included multiple
+times (e.g., file A includes file C, and file B also includes file C), the file is counted only once in the total.
+However, the tree view will show it in all locations where it appears, with subsequent occurrences marked as circular
+includes in verbose mode.
+
+### Compare Commands
+
+#### `compare file-contents`
+
+Compare file contents to identify differences between files. Supports two modes:
+1. **Direct comparison** - Compare two specific files
+2. **Version comparison** - Compare the same file across multiple documentation versions
+
+**Use Cases:**
+
+This command helps writers:
+- Identify content drift across documentation versions
+- Verify that updates have been applied consistently
+- Scope maintenance work when updating shared content
+- Understand how files have diverged over time
+
+**Basic Usage:**
+
+```bash
+# Direct comparison of two files
+./audit-cli compare file-contents file1.rst file2.rst
+
+# Compare with diff output
+./audit-cli compare file-contents file1.rst file2.rst --show-diff
+
+# Version comparison across MongoDB documentation versions
+./audit-cli compare file-contents \
+ /path/to/manual/manual/source/includes/example.rst \
+ --product-dir /path/to/manual \
+ --versions manual,upcoming,v8.0,v7.0
+
+# Show which files differ
+./audit-cli compare file-contents \
+ /path/to/manual/manual/source/includes/example.rst \
+ --product-dir /path/to/manual \
+ --versions manual,upcoming,v8.0,v7.0 \
+ --show-paths
+
+# Show detailed diffs
+./audit-cli compare file-contents \
+ /path/to/manual/manual/source/includes/example.rst \
+ --product-dir /path/to/manual \
+ --versions manual,upcoming,v8.0,v7.0 \
+ --show-diff
+
+# Verbose output (show processing details)
+./audit-cli compare file-contents file1.rst file2.rst -v
+```
+
+**Flags:**
+
+- `-p, --product-dir ` - Product directory path (required for version comparison)
+- `-V, --versions ` - Comma-separated list of versions (e.g., `manual,upcoming,v8.0`)
+- `--show-paths` - Display file paths grouped by status (matching, differing, not found)
+- `-d, --show-diff` - Display unified diff output (implies `--show-paths`)
+- `-v, --verbose` - Show detailed processing information
+
+**Comparison Modes:**
+
+**1. Direct Comparison (Two Files)**
+
+Provide two file paths as arguments:
+
+```bash
+./audit-cli compare file-contents path/to/file1.rst path/to/file2.rst
+```
+
+This mode:
+- Compares exactly two files
+- Reports whether they are identical or different
+- Can show unified diff with `--show-diff`
+
+**2. Version Comparison (Product Directory)**
+
+Provide one file path plus `--product-dir` and `--versions`:
+
+```bash
+./audit-cli compare file-contents \
+ /path/to/manual/manual/source/includes/example.rst \
+ --product-dir /path/to/manual \
+ --versions manual,upcoming,v8.0
+```
+
+This mode:
+- Extracts the relative path from the reference file
+- Resolves the same relative path in each version directory
+- Compares all versions against the reference file
+- Reports matching, differing, and missing files
+
+**Version Directory Structure:**
+
+The tool expects MongoDB documentation to be organized as:
+```
+product-dir/
+├── manual/
+│ └── source/
+│ └── includes/
+│ └── example.rst
+├── upcoming/
+│ └── source/
+│ └── includes/
+│ └── example.rst
+└── v8.0/
+ └── source/
+ └── includes/
+ └── example.rst
+```
+
+**Output Formats:**
+
+**Summary** (default - no flags):
+- Total number of versions compared
+- Count of matching, differing, and missing files
+- Hints to use `--show-paths` or `--show-diff` for more details
+
+**With --show-paths:**
+- Summary (as above)
+- List of files that match (with ✓)
+- List of files that differ (with ✗)
+- List of files not found (with -)
+
+**With --show-diff:**
+- Summary and paths (as above)
+- Unified diff output for each differing file
+- Shows added lines (prefixed with +)
+- Shows removed lines (prefixed with -)
+- Shows context lines around changes
+
+**Examples:**
+
+```bash
+# Check if a file is consistent across all versions
+./audit-cli compare file-contents \
+ ~/workspace/docs-mongodb-internal/content/manual/manual/source/includes/fact-atlas-search.rst \
+ --product-dir ~/workspace/docs-mongodb-internal/content/manual \
+ --versions manual,upcoming,v8.0,v7.0,v6.0
+
+# Find differences and see what changed
+./audit-cli compare file-contents \
+ ~/workspace/docs-mongodb-internal/content/manual/manual/source/includes/fact-atlas-search.rst \
+ --product-dir ~/workspace/docs-mongodb-internal/content/manual \
+ --versions manual,upcoming,v8.0,v7.0,v6.0 \
+ --show-diff
+
+# Compare two specific versions of a file
+./audit-cli compare file-contents \
+ ~/workspace/docs-mongodb-internal/content/manual/manual/source/includes/example.rst \
+ ~/workspace/docs-mongodb-internal/content/manual/v8.0/source/includes/example.rst \
+ --show-diff
+```
+
+**Exit Codes:**
+
+- `0` - Success (files compared successfully, regardless of whether they match)
+- `1` - Error (invalid arguments, file not found, read error, etc.)
+
+**Note on Missing Files:**
+
+Files that don't exist in certain versions are reported separately and do not cause errors. This is expected behavior
+since features may be added or removed across versions.
+
+## Development
+
+### Project Structure
+
+```
+audit-cli/
+├── main.go # CLI entry point
+├── commands/ # Command implementations
+│ ├── extract/ # Extract parent command
+│ │ ├── extract.go # Parent command definition
+│ │ └── code-examples/ # Code examples subcommand
+│ │ ├── code_examples.go # Command logic
+│ │ ├── code_examples_test.go # Tests
+│ │ ├── parser.go # RST directive parsing
+│ │ ├── writer.go # File writing logic
+│ │ ├── report.go # Report generation
+│ │ ├── types.go # Type definitions
+│ │ └── language.go # Language normalization
+│ ├── search/ # Search parent command
+│ │ ├── search.go # Parent command definition
+│ │ └── find-string/ # Find string subcommand
+│ │ ├── find_string.go # Command logic
+│ │ ├── types.go # Type definitions
+│ │ └── report.go # Report generation
+│ ├── analyze/ # Analyze parent command
+│ │ ├── analyze.go # Parent command definition
+│ │ └── includes/ # Includes analysis subcommand
+│ │ ├── includes.go # Command logic
+│ │ ├── analyzer.go # Include tree building
+│ │ ├── output.go # Output formatting
+│ │ └── types.go # Type definitions
+│ └── compare/ # Compare parent command
+│ ├── compare.go # Parent command definition
+│ └── file-contents/ # File contents comparison subcommand
+│ ├── file_contents.go # Command logic
+│ ├── file_contents_test.go # Tests
+│ ├── comparer.go # Comparison logic
+│ ├── differ.go # Diff generation
+│ ├── output.go # Output formatting
+│ ├── types.go # Type definitions
+│ └── version_resolver.go # Version path resolution
+├── internal/ # Internal packages
+│ └── rst/ # RST parsing utilities
+│ ├── parser.go # Generic parsing with includes
+│ ├── include_resolver.go # Include directive resolution
+│ ├── directive_parser.go # Directive parsing
+│ └── file_utils.go # File utilities
+└── testdata/ # Test fixtures
+ ├── input-files/ # Test RST files
+ │ └── source/ # Source directory (required)
+ │ ├── *.rst # Test files
+ │ ├── includes/ # Included RST files
+ │ └── code-examples/ # Code files for literalinclude
+ ├── expected-output/ # Expected extraction results
+ └── compare/ # Compare command test data
+ ├── product/ # Version structure tests
+ │ ├── manual/ # Manual version
+ │ ├── upcoming/ # Upcoming version
+ │ └── v8.0/ # v8.0 version
+ └── *.txt # Direct comparison tests
+```
+
+### Adding New Commands
+
+#### 1. Adding a New Subcommand to an Existing Parent
+
+Example: Adding `extract tables` subcommand
+
+1. **Create the subcommand directory:**
+ ```bash
+ mkdir -p commands/extract/tables
+ ```
+
+2. **Create the command file** (`commands/extract/tables/tables.go`):
+ ```go
+ package tables
+
+ import (
+ "github.com/spf13/cobra"
+ )
+
+ func NewTablesCommand() *cobra.Command {
+ cmd := &cobra.Command{
+ Use: "tables [filepath]",
+ Short: "Extract tables from RST files",
+ Args: cobra.ExactArgs(1),
+ RunE: func(cmd *cobra.Command, args []string) error {
+ // Implementation here
+ return nil
+ },
+ }
+
+ // Add flags
+ cmd.Flags().StringP("output", "o", "./output", "Output directory")
+
+ return cmd
+ }
+ ```
+
+3. **Register the subcommand** in `commands/extract/extract.go`:
+ ```go
+ import (
+ "github.com/mongodb/code-example-tooling/audit-cli/commands/extract/tables"
+ )
+
+ func NewExtractCommand() *cobra.Command {
+ cmd := &cobra.Command{...}
+
+ cmd.AddCommand(codeexamples.NewCodeExamplesCommand())
+ cmd.AddCommand(tables.NewTablesCommand()) // Add this line
+
+ return cmd
+ }
+ ```
+
+#### 2. Adding a New Parent Command
+
+Example: Adding `analyze` parent command
+
+1. **Create the parent directory:**
+ ```bash
+ mkdir -p commands/analyze
+ ```
+
+2. **Create the parent command** (`commands/analyze/analyze.go`):
+ ```go
+ package analyze
+
+ import (
+ "github.com/spf13/cobra"
+ )
+
+ func NewAnalyzeCommand() *cobra.Command {
+ cmd := &cobra.Command{
+ Use: "analyze",
+ Short: "Analyze extracted content",
+ }
+
+ // Add subcommands here
+
+ return cmd
+ }
+ ```
+
+3. **Register in main.go:**
+ ```go
+ import (
+ "github.com/mongodb/code-example-tooling/audit-cli/commands/analyze"
+ )
+
+ func main() {
+ rootCmd.AddCommand(extract.NewExtractCommand())
+ rootCmd.AddCommand(search.NewSearchCommand())
+ rootCmd.AddCommand(analyze.NewAnalyzeCommand()) // Add this line
+ }
+ ```
+
+### Testing
+
+#### Running Tests
+
+```bash
+# Run all tests
+cd audit-cli
+go test ./...
+
+# Run tests for a specific package
+go test ./commands/extract/code-examples -v
+
+# Run a specific test
+go test ./commands/extract/code-examples -run TestRecursiveDirectoryScanning -v
+
+# Run tests with coverage
+go test ./... -cover
+```
+
+#### Test Structure
+
+Tests use a table-driven approach with test fixtures in the `testdata/` directory:
+
+- **Input files**: `testdata/input-files/source/` - RST files and referenced code
+- **Expected output**: `testdata/expected-output/` - Expected extracted files
+- **Test pattern**: Compare actual extraction output against expected files
+
+**Note**: The `testdata` directory name is special in Go - it's automatically ignored during builds, which is important
+since it contains non-Go files (`.cpp`, `.rst`, etc.).
+
+#### Adding New Tests
+
+1. **Create test input files** in `testdata/input-files/source/`:
+ ```bash
+ # Create a new test RST file
+ cat > testdata/input-files/source/my-test.rst << 'EOF'
+ .. code-block:: javascript
+
+ console.log("Hello, World!");
+ EOF
+ ```
+
+2. **Generate expected output**:
+ ```bash
+ ./audit-cli extract code-examples testdata/input-files/source/my-test.rst \
+ -o testdata/expected-output
+ ```
+
+3. **Verify the output** is correct before committing
+
+4. **Add test case** in the appropriate `*_test.go` file:
+ ```go
+ func TestMyNewFeature(t *testing.T) {
+ testDataDir := filepath.Join("..", "..", "..", "testdata")
+ inputFile := filepath.Join(testDataDir, "input-files", "source", "my-test.rst")
+ expectedDir := filepath.Join(testDataDir, "expected-output")
+
+ tempDir, err := os.MkdirTemp("", "test-*")
+ if err != nil {
+ t.Fatalf("Failed to create temp directory: %v", err)
+ }
+ defer os.RemoveAll(tempDir)
+
+ report, err := RunExtract(inputFile, tempDir, false, false, false, false)
+ if err != nil {
+ t.Fatalf("RunExtract failed: %v", err)
+ }
+
+ // Add assertions here
+ }
+ ```
+
+#### Test Conventions
+
+- **Relative paths**: Tests use `filepath.Join("..", "..", "..", "testdata")` to reference test data (three levels up
+ from `commands/extract/code-examples/`)
+- **Temporary directories**: Use `os.MkdirTemp()` for test output, clean up with `defer os.RemoveAll()`
+- **Exact content matching**: Tests compare byte-for-byte content
+- **No trailing newlines**: Expected output files should not have trailing blank lines
+
+#### Updating Expected Output
+
+If you've changed the parsing logic and need to regenerate expected output:
+
+```bash
+cd audit-cli
+
+# Update all expected outputs
+./audit-cli extract code-examples testdata/input-files/source/literalinclude-test.rst \
+ -o testdata/expected-output
+
+./audit-cli extract code-examples testdata/input-files/source/code-block-test.rst \
+ -o testdata/expected-output
+
+./audit-cli extract code-examples testdata/input-files/source/nested-code-block-test.rst \
+ -o testdata/expected-output
+
+./audit-cli extract code-examples testdata/input-files/source/io-code-block-test.rst \
+ -o testdata/expected-output
+
+./audit-cli extract code-examples testdata/input-files/source/include-test.rst \
+ -o testdata/expected-output -f
+```
+
+**Important**: Always verify the new output is correct before committing!
+
+### Code Patterns
+
+#### 1. Command Structure Pattern
+
+All commands follow this pattern:
+
+```go
+package mycommand
+
+import "github.com/spf13/cobra"
+
+func NewMyCommand() *cobra.Command {
+ var flagVar string
+
+ cmd := &cobra.Command{
+ Use: "my-command [args]",
+ Short: "Brief description",
+ Long: "Detailed description",
+ Args: cobra.ExactArgs(1), // Or MinimumNArgs, etc.
+ RunE: func(cmd *cobra.Command, args []string) error {
+ // Get flag values
+ flagValue, _ := cmd.Flags().GetString("flag-name")
+
+ // Call the main logic function
+ return RunMyCommand(args[0], flagValue)
+ },
+ }
+
+ // Define flags
+ cmd.Flags().StringVarP(&flagVar, "flag-name", "f", "default", "Description")
+
+ return cmd
+}
+
+// Separate logic function for testability
+func RunMyCommand(arg string, flagValue string) error {
+ // Implementation here
+ return nil
+}
+```
+
+**Why this pattern?**
+- Separates command definition from logic
+- Makes logic testable without Cobra
+- Consistent across all commands
+
+#### 2. Error Handling Pattern
+
+Use descriptive error wrapping:
+
+```go
+import "fmt"
+
+// Wrap errors with context
+file, err := os.Open(filePath)
+if err != nil {
+ return fmt.Errorf("failed to open file %s: %w", filePath, err)
+}
+
+// Check for specific conditions
+if !fileInfo.IsDir() {
+ return fmt.Errorf("path %s is not a directory", path)
+}
+```
+
+#### 3. File Processing Pattern
+
+Use the scanner pattern for line-by-line processing:
+
+```go
+import (
+ "bufio"
+ "os"
+)
+
+func processFile(filePath string) error {
+ file, err := os.Open(filePath)
+ if err != nil {
+ return fmt.Errorf("failed to open file: %w", err)
+ }
+ defer file.Close()
+
+ scanner := bufio.NewScanner(file)
+ lineNum := 0
+
+ for scanner.Scan() {
+ lineNum++
+ line := scanner.Text()
+
+ // Process line
+ }
+
+ if err := scanner.Err(); err != nil {
+ return fmt.Errorf("error reading file: %w", err)
+ }
+
+ return nil
+}
+```
+
+#### 4. Directory Traversal Pattern
+
+Use `filepath.Walk` for recursive traversal:
+
+```go
+import (
+ "os"
+ "path/filepath"
+)
+
+func traverseDirectory(rootPath string, recursive bool) ([]string, error) {
+ var files []string
+
+ err := filepath.Walk(rootPath, func(path string, info os.FileInfo, err error) error {
+ if err != nil {
+ return err
+ }
+
+ // Skip subdirectories if not recursive
+ if !recursive && info.IsDir() && path != rootPath {
+ return filepath.SkipDir
+ }
+
+ // Collect files
+ if !info.IsDir() {
+ files = append(files, path)
+ }
+
+ return nil
+ })
+
+ return files, err
+}
+```
+
+#### 5. Testing Pattern
+
+Use table-driven tests where appropriate:
+
+```go
+func TestLanguageNormalization(t *testing.T) {
+ tests := []struct {
+ name string
+ input string
+ expected string
+ }{
+ {"TypeScript", "ts", "typescript"},
+ {"C++", "c++", "cpp"},
+ {"Golang", "golang", "go"},
+ }
+
+ for _, tt := range tests {
+ t.Run(tt.name, func(t *testing.T) {
+ result := NormalizeLanguage(tt.input)
+ if result != tt.expected {
+ t.Errorf("NormalizeLanguage(%q) = %q, want %q",
+ tt.input, result, tt.expected)
+ }
+ })
+ }
+}
+```
+
+#### 6. Verbose Output Pattern
+
+Use a consistent pattern for verbose logging:
+
+```go
+func processWithVerbose(filePath string, verbose bool) error {
+ if verbose {
+ fmt.Printf("Processing: %s\n", filePath)
+ }
+
+ // Do work
+
+ if verbose {
+ fmt.Printf("Completed: %s\n", filePath)
+ }
+
+ return nil
+}
+```
+
+## Supported RST Directives
+
+### Code Example Extraction
+
+The tool extracts code examples from the following reStructuredText directives:
+
+#### 1. `literalinclude`
+
+Extracts code from external files with support for partial extraction and dedenting.
+
+**Syntax:**
+```rst
+.. literalinclude:: /path/to/file.py
+ :language: python
+ :start-after: start-tag
+ :end-before: end-tag
+ :dedent:
+```
+
+**Supported Options:**
+- `:language:` - Specifies the programming language (normalized: `ts` → `typescript`, `c++` → `cpp`, `golang` → `go`)
+- `:start-after:` - Extract content after this tag (skips the entire line containing the tag)
+- `:end-before:` - Extract content before this tag (cuts before the entire line containing the tag)
+- `:dedent:` - Remove common leading whitespace from the extracted content
+
+**Example:**
+
+Given `code-examples/example.py`:
+```python
+def main():
+ # start-example
+ result = calculate(42)
+ print(result)
+ # end-example
+```
+
+And RST:
+```rst
+.. literalinclude:: /code-examples/example.py
+ :language: python
+ :start-after: start-example
+ :end-before: end-example
+ :dedent:
+```
+
+Extracts:
+```python
+result = calculate(42)
+print(result)
+```
+
+#### 2. `code-block`
+
+Inline code blocks with automatic dedenting based on the first line's indentation.
+
+**Syntax:**
+```rst
+.. code-block:: javascript
+ :copyable: false
+ :emphasize-lines: 2,3
+
+ const greeting = "Hello, World!";
+ console.log(greeting);
+```
+
+**Supported Options:**
+- Language argument - `.. code-block:: javascript` (optional, defaults to `txt`)
+- `:language:` - Alternative way to specify language
+- `:copyable:` - Parsed but not used for extraction
+- `:emphasize-lines:` - Parsed but not used for extraction
+
+**Automatic Dedenting:**
+
+The content is automatically dedented based on the indentation of the first content line. For example:
+
+```rst
+.. note::
+
+ .. code-block:: python
+
+ def hello():
+ print("Hello")
+```
+
+The code has 6 spaces of indentation (3 from `note`, 3 from `code-block`). The tool automatically removes these 6 spaces,
+resulting in:
+
+```python
+def hello():
+ print("Hello")
+```
+
+#### 3. `io-code-block`
+
+Input/output code blocks for interactive examples with nested sub-directives.
+
+**Syntax:**
+```rst
+.. io-code-block::
+ :copyable: true
+
+ .. input::
+ :language: javascript
+
+ db.restaurants.aggregate([
+ { $match: { category: "cafe" } }
+ ])
+
+ .. output::
+ :language: json
+
+ [
+ { _id: 1, category: 'café', status: 'Open' }
+ ]
+```
+
+**Supported Options:**
+- `:copyable:` - Parsed but not used for extraction
+- Nested `.. input::` sub-directive (required)
+ - Can have filepath argument: `.. input:: /path/to/file.js`
+ - Or inline content with `:language:` option
+- Nested `.. output::` sub-directive (optional)
+ - Can have filepath argument: `.. output:: /path/to/output.txt`
+ - Or inline content with `:language:` option
+
+**File-based Content:**
+```rst
+.. io-code-block::
+
+ .. input:: /code-examples/query.js
+ :language: javascript
+
+ .. output:: /code-examples/result.json
+ :language: json
+```
+
+**Output Files:**
+
+Generates two files:
+- `{source}.io-code-block.{index}.input.{ext}` - The input code
+- `{source}.io-code-block.{index}.output.{ext}` - The output (if present)
+
+Example: `my-doc.io-code-block.1.input.js` and `my-doc.io-code-block.1.output.json`
+
+### Include handling
+
+#### 4. `include`
+
+Follows include directives to process entire documentation trees (when `-f` flag is used).
+
+**Syntax:**
+```rst
+.. include:: /includes/intro.rst
+```
+
+**Special MongoDB Conventions:**
+
+The tool handles several MongoDB-specific include patterns:
+
+##### Steps Files
+Converts directory-based paths to filename-based paths:
+- Input: `/includes/steps/run-mongodb-on-linux.rst`
+- Resolves to: `/includes/steps-run-mongodb-on-linux.yaml`
+
+##### Extracts and Release Files
+Resolves ref-based includes by searching YAML files:
+- Input: `/includes/extracts/install-mongodb.rst`
+- Searches: `/includes/extracts-*.yaml` for `ref: install-mongodb`
+- Resolves to: The YAML file containing that ref
+
+##### Template Variables
+Resolves template variables from YAML replacement sections:
+```yaml
+replacement:
+ release_specification_default: "/includes/release/install-windows-default.rst"
+```
+- Input: `{{release_specification_default}}`
+- Resolves to: `/includes/release/install-windows-default.rst`
+
+**Source Directory Resolution:**
+
+The tool walks up the directory tree to find a directory named "source" or containing a "source" subdirectory. This is
+used as the base for resolving relative include paths.
+
+## Internal Packages
+
+### `internal/rst`
+
+Provides reusable utilities for parsing and processing RST files:
+
+- **Include resolution** - Handles all include directive patterns
+- **Directory traversal** - Recursive file scanning
+- **Directive parsing** - Extracts structured data from RST directives
+- **Template variable resolution** - Resolves YAML-based template variables
+- **Source directory detection** - Finds the documentation root
+
+See the code in `internal/rst/` for implementation details.
+
+## Language Normalization
+
+The tool normalizes language identifiers to standard file extensions:
+
+| Input | Normalized | Extension |
+|-------|-----------|-----------|
+| `bash` | `bash` | `.sh` |
+| `c` | `c` | `.c` |
+| `c++` | `cpp` | `.cpp` |
+| `c#` | `csharp` | `.cs` |
+| `console` | `console` | `.sh` |
+| `cpp` | `cpp` | `.cpp` |
+| `cs` | `csharp` | `.cs` |
+| `csharp` | `csharp` | `.cs` |
+| `go` | `go` | `.go` |
+| `golang` | `go` | `.go` |
+| `java` | `java` | `.java` |
+| `javascript` | `javascript` | `.js` |
+| `js` | `javascript` | `.js` |
+| `kotlin` | `kotlin` | `.kt` |
+| `kt` | `kotlin` | `.kt` |
+| `php` | `php` | `.php` |
+| `powershell` | `powershell` | `.ps1` |
+| `ps1` | `powershell` | `.ps1` |
+| `ps5` | `ps5` | `.ps1` |
+| `py` | `python` | `.py` |
+| `python` | `python` | `.py` |
+| `rb` | `ruby` | `.rb` |
+| `rs` | `rust` | `.rs` |
+| `ruby` | `ruby` | `.rb` |
+| `rust` | `rust` | `.rs` |
+| `scala` | `scala` | `.scala` |
+| `sh` | `shell` | `.sh` |
+| `shell` | `shell` | `.sh` |
+| `swift` | `swift` | `.swift` |
+| `text` | `text` | `.txt` |
+| `ts` | `typescript` | `.ts` |
+| `txt` | `text` | `.txt` |
+| `typescript` | `typescript` | `.ts` |
+| (empty string) | `undefined` | `.txt` |
+| `none` | `undefined` | `.txt` |
+| (unknown) | (unchanged) | `.txt` |
+
+**Notes:**
+- Language identifiers are case-insensitive
+- Unknown languages are returned unchanged by `NormalizeLanguage()` but map to `.txt` extension
+- The normalization handles common aliases (e.g., `ts` → `typescript`, `golang` → `go`, `c++` → `cpp`)
+
+## Contributing
+
+When contributing to this project:
+
+1. **Follow the established patterns** - Use the command structure, error handling, and testing patterns described above
+2. **Write tests** - All new functionality should have corresponding tests
+3. **Update documentation** - Keep this README up to date with new features
+4. **Run tests before committing** - Ensure `go test ./...` passes
+5. **Use meaningful commit messages** - Describe what changed and why
diff --git a/audit-cli/commands/analyze/analyze.go b/audit-cli/commands/analyze/analyze.go
new file mode 100644
index 0000000..dd4f9a8
--- /dev/null
+++ b/audit-cli/commands/analyze/analyze.go
@@ -0,0 +1,34 @@
+// Package analyze provides the parent command for analyzing RST file structures.
+//
+// This package serves as the parent command for various analysis operations.
+// Currently supports:
+// - includes: Analyze include directive relationships in RST files
+//
+// Future subcommands could include analyzing cross-references, broken links, or content metrics.
+package analyze
+
+import (
+ "github.com/mongodb/code-example-tooling/audit-cli/commands/analyze/includes"
+ "github.com/spf13/cobra"
+)
+
+// NewAnalyzeCommand creates the analyze parent command.
+//
+// This command serves as a parent for various analysis operations on RST files.
+// It doesn't perform any operations itself but provides a namespace for subcommands.
+func NewAnalyzeCommand() *cobra.Command {
+ cmd := &cobra.Command{
+ Use: "analyze",
+ Short: "Analyze reStructuredText file structures",
+ Long: `Analyze various aspects of reStructuredText files and their relationships.
+
+Currently supports analyzing include directive relationships to understand file dependencies.
+Future subcommands may support analyzing cross-references, broken links, or content metrics.`,
+ }
+
+ // Add subcommands
+ cmd.AddCommand(includes.NewIncludesCommand())
+
+ return cmd
+}
+
diff --git a/audit-cli/commands/analyze/includes/analyzer.go b/audit-cli/commands/analyze/includes/analyzer.go
new file mode 100644
index 0000000..52c1ff1
--- /dev/null
+++ b/audit-cli/commands/analyze/includes/analyzer.go
@@ -0,0 +1,169 @@
+package includes
+
+import (
+ "fmt"
+ "os"
+ "path/filepath"
+
+ "github.com/mongodb/code-example-tooling/audit-cli/internal/rst"
+)
+
+// AnalyzeIncludes analyzes a file and builds a tree of include relationships.
+//
+// This function recursively follows include directives and builds both a tree structure
+// and a flat list of all files discovered. It tracks the maximum depth of nesting.
+//
+// Parameters:
+// - filePath: Path to the RST file to analyze
+// - verbose: If true, print detailed processing information
+//
+// Returns:
+// - *IncludeAnalysis: Analysis results including tree and file list
+// - error: Any error encountered during analysis
+func AnalyzeIncludes(filePath string, verbose bool) (*IncludeAnalysis, error) {
+ absPath, err := filepath.Abs(filePath)
+ if err != nil {
+ return nil, fmt.Errorf("failed to get absolute path: %w", err)
+ }
+
+ // Verify the file exists
+ if _, err := os.Stat(absPath); err != nil {
+ return nil, fmt.Errorf("file not found: %s", absPath)
+ }
+
+ if verbose {
+ fmt.Printf("Analyzing includes for: %s\n\n", absPath)
+ }
+
+ // Build the tree structure
+ visited := make(map[string]bool)
+ tree, err := buildIncludeTree(absPath, visited, verbose, 0)
+ if err != nil {
+ return nil, err
+ }
+
+ // Collect all unique files from the visited map
+ // The visited map contains all unique files that were processed
+ allFiles := make([]string, 0, len(visited))
+ for file := range visited {
+ allFiles = append(allFiles, file)
+ }
+
+ // Calculate max depth
+ maxDepth := calculateMaxDepth(tree, 0)
+
+ analysis := &IncludeAnalysis{
+ RootFile: absPath,
+ Tree: tree,
+ AllFiles: allFiles,
+ TotalFiles: len(allFiles),
+ MaxDepth: maxDepth,
+ }
+
+ return analysis, nil
+}
+
+// buildIncludeTree recursively builds a tree of include relationships.
+//
+// This function creates an IncludeNode for the given file and recursively
+// processes all files it includes, preventing circular includes.
+//
+// Parameters:
+// - filePath: Path to the file to process
+// - visited: Map tracking already-processed files (prevents circular includes)
+// - verbose: If true, print detailed processing information
+// - depth: Current depth in the tree (for verbose output)
+//
+// Returns:
+// - *IncludeNode: Tree node representing this file and its includes
+// - error: Any error encountered during processing
+func buildIncludeTree(filePath string, visited map[string]bool, verbose bool, depth int) (*IncludeNode, error) {
+ absPath, err := filepath.Abs(filePath)
+ if err != nil {
+ return nil, err
+ }
+
+ // Create the node for this file
+ node := &IncludeNode{
+ FilePath: absPath,
+ Children: []*IncludeNode{},
+ }
+
+ // Check if we've already visited this file (circular include)
+ if visited[absPath] {
+ if verbose {
+ indent := getIndent(depth)
+ fmt.Printf("%s⚠ Circular include detected: %s\n", indent, filepath.Base(absPath))
+ }
+ return node, nil
+ }
+ visited[absPath] = true
+
+ // Find include directives in this file
+ includeFiles, err := rst.FindIncludeDirectives(absPath)
+ if err != nil {
+ // Not a fatal error - file might not have includes
+ return node, nil
+ }
+
+ if verbose && len(includeFiles) > 0 {
+ indent := getIndent(depth)
+ fmt.Printf("%s📄 %s (%d includes)\n", indent, filepath.Base(absPath), len(includeFiles))
+ }
+
+ // Recursively process each included file
+ for _, includeFile := range includeFiles {
+ childNode, err := buildIncludeTree(includeFile, visited, verbose, depth+1)
+ if err != nil {
+ fmt.Fprintf(os.Stderr, "Warning: failed to process include %s: %v\n", includeFile, err)
+ continue
+ }
+ node.Children = append(node.Children, childNode)
+ }
+
+ return node, nil
+}
+
+// calculateMaxDepth calculates the maximum depth of the include tree.
+//
+// This function recursively traverses the tree to find the deepest nesting level.
+//
+// Parameters:
+// - node: Current node being processed
+// - currentDepth: Depth of the current node
+//
+// Returns:
+// - int: Maximum depth found in the tree
+func calculateMaxDepth(node *IncludeNode, currentDepth int) int {
+ if node == nil || len(node.Children) == 0 {
+ return currentDepth
+ }
+
+ maxChildDepth := currentDepth
+ for _, child := range node.Children {
+ childDepth := calculateMaxDepth(child, currentDepth+1)
+ if childDepth > maxChildDepth {
+ maxChildDepth = childDepth
+ }
+ }
+
+ return maxChildDepth
+}
+
+// getIndent returns an indentation string for the given depth level.
+//
+// This is used for verbose output to show the tree structure.
+//
+// Parameters:
+// - depth: Nesting depth level
+//
+// Returns:
+// - string: Indentation string (2 spaces per level)
+func getIndent(depth int) string {
+ indent := ""
+ for i := 0; i < depth; i++ {
+ indent += " "
+ }
+ return indent
+}
+
diff --git a/audit-cli/commands/analyze/includes/includes.go b/audit-cli/commands/analyze/includes/includes.go
new file mode 100644
index 0000000..bb0d5f6
--- /dev/null
+++ b/audit-cli/commands/analyze/includes/includes.go
@@ -0,0 +1,100 @@
+// Package includes provides functionality for analyzing include directive relationships.
+//
+// This package implements the "analyze includes" subcommand, which analyzes RST files
+// to understand their include directive relationships. It can display results as:
+// - A hierarchical tree structure showing include relationships
+// - A flat list of all files referenced through includes
+//
+// This helps writers understand the impact of changes to files that are widely included.
+package includes
+
+import (
+ "fmt"
+
+ "github.com/spf13/cobra"
+)
+
+// NewIncludesCommand creates the includes subcommand.
+//
+// This command analyzes include directive relationships in RST files.
+// Supports flags for different output formats (tree or list).
+//
+// Flags:
+// - --tree: Display results as a hierarchical tree structure
+// - --list: Display results as a flat list of all files
+// - -v, --verbose: Show detailed processing information
+func NewIncludesCommand() *cobra.Command {
+ var (
+ showTree bool
+ showList bool
+ verbose bool
+ )
+
+ cmd := &cobra.Command{
+ Use: "includes [filepath]",
+ Short: "Analyze include directive relationships in RST files",
+ Long: `Analyze include directive relationships to understand file dependencies.
+
+This command recursively follows .. include:: directives and shows all files
+that are referenced. This helps writers understand the impact of changes to
+files that are widely included across the documentation.
+
+Output formats:
+ --tree: Show hierarchical tree structure of includes
+ --list: Show flat list of all included files
+
+If neither flag is specified, shows a summary with basic statistics.`,
+ Args: cobra.ExactArgs(1),
+ RunE: func(cmd *cobra.Command, args []string) error {
+ filePath := args[0]
+ return runAnalyze(filePath, showTree, showList, verbose)
+ },
+ }
+
+ cmd.Flags().BoolVar(&showTree, "tree", false, "Display results as a hierarchical tree structure")
+ cmd.Flags().BoolVar(&showList, "list", false, "Display results as a flat list of all files")
+ cmd.Flags().BoolVarP(&verbose, "verbose", "v", false, "Show detailed processing information")
+
+ return cmd
+}
+
+// runAnalyze executes the include analysis operation.
+//
+// This function analyzes the file's include relationships and displays
+// the results according to the specified flags.
+//
+// Parameters:
+// - filePath: Path to the RST file to analyze
+// - showTree: If true, display tree structure
+// - showList: If true, display flat list
+// - verbose: If true, show detailed processing information
+//
+// Returns:
+// - error: Any error encountered during analysis
+func runAnalyze(filePath string, showTree bool, showList bool, verbose bool) error {
+ // Perform the analysis
+ analysis, err := AnalyzeIncludes(filePath, verbose)
+ if err != nil {
+ return fmt.Errorf("failed to analyze includes: %w", err)
+ }
+
+ // Display results based on flags
+ if showTree && showList {
+ // Both flags specified - show both outputs
+ PrintTree(analysis)
+ fmt.Println()
+ PrintList(analysis)
+ } else if showTree {
+ // Only tree
+ PrintTree(analysis)
+ } else if showList {
+ // Only list
+ PrintList(analysis)
+ } else {
+ // Neither flag - show summary
+ PrintSummary(analysis)
+ }
+
+ return nil
+}
+
diff --git a/audit-cli/commands/analyze/includes/output.go b/audit-cli/commands/analyze/includes/output.go
new file mode 100644
index 0000000..bd33aa7
--- /dev/null
+++ b/audit-cli/commands/analyze/includes/output.go
@@ -0,0 +1,116 @@
+package includes
+
+import (
+ "fmt"
+ "path/filepath"
+)
+
+// PrintTree prints the include tree structure.
+//
+// This function displays the hierarchical relationship of includes using
+// tree-style formatting with box-drawing characters.
+//
+// Parameters:
+// - analysis: The analysis results containing the tree structure
+func PrintTree(analysis *IncludeAnalysis) {
+ fmt.Println("============================================================")
+ fmt.Println("INCLUDE TREE")
+ fmt.Println("============================================================")
+ fmt.Printf("Root File: %s\n", analysis.RootFile)
+ fmt.Printf("Total Files: %d\n", analysis.TotalFiles)
+ fmt.Printf("Max Depth: %d\n", analysis.MaxDepth)
+ fmt.Println("============================================================")
+ fmt.Println()
+
+ if analysis.Tree != nil {
+ printTreeNode(analysis.Tree, "", true, true)
+ }
+
+ fmt.Println()
+}
+
+// printTreeNode recursively prints a tree node with proper formatting.
+//
+// This function uses box-drawing characters to create a visual tree structure.
+//
+// Parameters:
+// - node: The node to print
+// - prefix: Prefix string for indentation
+// - isLast: Whether this is the last child of its parent
+// - isRoot: Whether this is the root node
+func printTreeNode(node *IncludeNode, prefix string, isLast bool, isRoot bool) {
+ if node == nil {
+ return
+ }
+
+ // Print the current node
+ if isRoot {
+ fmt.Printf("%s\n", filepath.Base(node.FilePath))
+ } else {
+ connector := "├── "
+ if isLast {
+ connector = "└── "
+ }
+ fmt.Printf("%s%s%s\n", prefix, connector, filepath.Base(node.FilePath))
+ }
+
+ // Print children
+ childPrefix := prefix
+ if !isRoot {
+ if isLast {
+ childPrefix += " "
+ } else {
+ childPrefix += "│ "
+ }
+ }
+
+ for i, child := range node.Children {
+ isLastChild := i == len(node.Children)-1
+ printTreeNode(child, childPrefix, isLastChild, false)
+ }
+}
+
+// PrintList prints a flat list of all included files.
+//
+// This function displays all files discovered through include directives
+// in the order they were discovered (depth-first traversal).
+//
+// Parameters:
+// - analysis: The analysis results containing the file list
+func PrintList(analysis *IncludeAnalysis) {
+ fmt.Println("============================================================")
+ fmt.Println("INCLUDE FILE LIST")
+ fmt.Println("============================================================")
+ fmt.Printf("Root File: %s\n", analysis.RootFile)
+ fmt.Printf("Total Files: %d\n", analysis.TotalFiles)
+ fmt.Println("============================================================")
+ fmt.Println()
+
+ for i, file := range analysis.AllFiles {
+ fmt.Printf("%3d. %s\n", i+1, file)
+ }
+
+ fmt.Println()
+}
+
+// PrintSummary prints a brief summary of the analysis.
+//
+// This function is used when neither --tree nor --list is specified,
+// providing basic statistics about the include structure.
+//
+// Parameters:
+// - analysis: The analysis results
+func PrintSummary(analysis *IncludeAnalysis) {
+ fmt.Println("============================================================")
+ fmt.Println("INCLUDE ANALYSIS SUMMARY")
+ fmt.Println("============================================================")
+ fmt.Printf("Root File: %s\n", analysis.RootFile)
+ fmt.Printf("Total Files: %d\n", analysis.TotalFiles)
+ fmt.Printf("Max Depth: %d\n", analysis.MaxDepth)
+ fmt.Println("============================================================")
+ fmt.Println()
+ fmt.Println("Use --tree to see the hierarchical structure")
+ fmt.Println("Use --list to see a flat list of all files")
+ fmt.Println()
+}
+
diff --git a/audit-cli/commands/analyze/includes/types.go b/audit-cli/commands/analyze/includes/types.go
new file mode 100644
index 0000000..5f7bcc9
--- /dev/null
+++ b/audit-cli/commands/analyze/includes/types.go
@@ -0,0 +1,23 @@
+package includes
+
+// IncludeNode represents a file and its included files in a tree structure.
+//
+// This type is used to build a hierarchical representation of include relationships,
+// where each node represents a file and its children are the files it includes.
+type IncludeNode struct {
+ FilePath string // Absolute path to the file
+ Children []*IncludeNode // Files included by this file
+}
+
+// IncludeAnalysis contains the results of analyzing include directives.
+//
+// This type holds both the tree structure and the flat list of all files
+// discovered through include directives.
+type IncludeAnalysis struct {
+ RootFile string // The original file that was analyzed
+ Tree *IncludeNode // Tree structure of include relationships
+ AllFiles []string // Flat list of all files (in order discovered)
+ TotalFiles int // Total number of unique files
+ MaxDepth int // Maximum depth of include nesting
+}
+
diff --git a/audit-cli/commands/compare/compare.go b/audit-cli/commands/compare/compare.go
new file mode 100644
index 0000000..e353ce9
--- /dev/null
+++ b/audit-cli/commands/compare/compare.go
@@ -0,0 +1,35 @@
+// Package compare provides the parent command for comparing files across versions.
+//
+// This package serves as the parent command for various comparison operations.
+// Currently supports:
+// - file-contents: Compare file contents across different versions
+//
+// Future subcommands could include comparing metadata, structure, or other aspects.
+package compare
+
+import (
+ "github.com/mongodb/code-example-tooling/audit-cli/commands/compare/file-contents"
+ "github.com/spf13/cobra"
+)
+
+// NewCompareCommand creates the compare parent command.
+//
+// This command serves as a parent for various comparison operations on documentation files.
+// It doesn't perform any operations itself but provides a namespace for subcommands.
+func NewCompareCommand() *cobra.Command {
+ cmd := &cobra.Command{
+ Use: "compare",
+ Short: "Compare files across different versions",
+ Long: `Compare files across different versions of MongoDB documentation.
+
+Currently supports comparing file contents to identify differences between
+the same file across multiple documentation versions. This helps writers
+understand how content has diverged across versions and identify maintenance work.`,
+ }
+
+ // Add subcommands
+ cmd.AddCommand(file_contents.NewFileContentsCommand())
+
+ return cmd
+}
+
diff --git a/audit-cli/commands/compare/file-contents/comparer.go b/audit-cli/commands/compare/file-contents/comparer.go
new file mode 100644
index 0000000..deabaa0
--- /dev/null
+++ b/audit-cli/commands/compare/file-contents/comparer.go
@@ -0,0 +1,217 @@
+package file_contents
+
+import (
+ "fmt"
+ "os"
+ "path/filepath"
+)
+
+// CompareFiles performs a direct comparison between two files.
+//
+// This function compares two files directly without version resolution.
+//
+// Parameters:
+// - file1Path: Path to the first file
+// - file2Path: Path to the second file
+// - generateDiff: If true, generate unified diff for differences
+// - verbose: If true, show detailed processing information
+//
+// Returns:
+// - *ComparisonResult: The comparison result
+// - error: Any error encountered during comparison
+func CompareFiles(file1Path, file2Path string, generateDiff bool, verbose bool) (*ComparisonResult, error) {
+ if verbose {
+ fmt.Printf("Comparing files:\n")
+ fmt.Printf(" File 1: %s\n", file1Path)
+ fmt.Printf(" File 2: %s\n", file2Path)
+ }
+
+ // Read the reference file
+ content1, err := os.ReadFile(file1Path)
+ if err != nil {
+ return nil, fmt.Errorf("failed to read file %s: %w", file1Path, err)
+ }
+
+ // Read the comparison file
+ content2, err := os.ReadFile(file2Path)
+ if err != nil {
+ return nil, fmt.Errorf("failed to read file %s: %w", file2Path, err)
+ }
+
+ // Compare contents
+ result := &ComparisonResult{
+ ReferenceFile: file1Path,
+ TotalFiles: 1,
+ }
+
+ comparison := FileComparison{
+ Version: filepath.Base(filepath.Dir(file2Path)),
+ FilePath: file2Path,
+ }
+
+ if AreFilesIdentical(string(content1), string(content2)) {
+ comparison.Status = FileMatches
+ result.MatchingFiles = 1
+ } else {
+ comparison.Status = FileDiffers
+ result.DifferingFiles = 1
+
+ if generateDiff {
+ diff, err := GenerateDiff(file1Path, string(content1), file2Path, string(content2))
+ if err != nil {
+ return nil, fmt.Errorf("failed to generate diff: %w", err)
+ }
+ comparison.Diff = diff
+ }
+ }
+
+ result.Comparisons = []FileComparison{comparison}
+
+ return result, nil
+}
+
+// CompareVersions performs a version-based comparison.
+//
+// This function compares a reference file against the same file across
+// multiple versions of the documentation.
+//
+// Parameters:
+// - referenceFile: Path to the reference file
+// - productDir: Path to the product directory
+// - versions: List of version identifiers to compare
+// - generateDiff: If true, generate unified diff for differences
+// - verbose: If true, show detailed processing information
+//
+// Returns:
+// - *ComparisonResult: The comparison result
+// - error: Any error encountered during comparison
+func CompareVersions(referenceFile, productDir string, versions []string, generateDiff bool, verbose bool) (*ComparisonResult, error) {
+ if verbose {
+ fmt.Printf("Comparing file across %d versions...\n", len(versions))
+ fmt.Printf(" Reference file: %s\n", referenceFile)
+ fmt.Printf(" Product directory: %s\n", productDir)
+ fmt.Printf(" Versions: %v\n", versions)
+ }
+
+ // Extract the reference version from the path
+ referenceVersion, err := ExtractVersionFromPath(referenceFile, productDir)
+ if err != nil {
+ return nil, fmt.Errorf("failed to extract version from reference file: %w", err)
+ }
+
+ if verbose {
+ fmt.Printf(" Reference version: %s\n", referenceVersion)
+ }
+
+ // Read the reference file
+ referenceContent, err := os.ReadFile(referenceFile)
+ if err != nil {
+ return nil, fmt.Errorf("failed to read reference file %s: %w", referenceFile, err)
+ }
+
+ // Resolve version paths
+ versionPaths, err := ResolveVersionPaths(referenceFile, productDir, versions)
+ if err != nil {
+ return nil, fmt.Errorf("failed to resolve version paths: %w", err)
+ }
+
+ // Initialize result
+ result := &ComparisonResult{
+ ReferenceFile: referenceFile,
+ ReferenceVersion: referenceVersion,
+ TotalFiles: len(versionPaths),
+ }
+
+ // Compare each version
+ for _, vp := range versionPaths {
+ if verbose {
+ fmt.Printf(" Checking %s: %s\n", vp.Version, vp.FilePath)
+ }
+
+ comparison := compareFile(referenceFile, string(referenceContent), vp, generateDiff, verbose)
+ result.Comparisons = append(result.Comparisons, comparison)
+
+ // Update counters
+ switch comparison.Status {
+ case FileMatches:
+ result.MatchingFiles++
+ case FileDiffers:
+ result.DifferingFiles++
+ case FileNotFound:
+ result.NotFoundFiles++
+ case FileError:
+ result.ErrorFiles++
+ }
+ }
+
+ return result, nil
+}
+
+// compareFile compares a single version file against the reference content.
+//
+// This is an internal helper function used by CompareVersions.
+//
+// Parameters:
+// - referencePath: Path to the reference file (for diff labels)
+// - referenceContent: Content of the reference file
+// - versionPath: The version path to compare
+// - generateDiff: If true, generate unified diff for differences
+// - verbose: If true, show detailed processing information
+//
+// Returns:
+// - FileComparison: The comparison result for this file
+func compareFile(referencePath, referenceContent string, versionPath VersionPath, generateDiff bool, verbose bool) FileComparison {
+ comparison := FileComparison{
+ Version: versionPath.Version,
+ FilePath: versionPath.FilePath,
+ }
+
+ // Check if file exists
+ if _, err := os.Stat(versionPath.FilePath); os.IsNotExist(err) {
+ comparison.Status = FileNotFound
+ if verbose {
+ fmt.Printf(" → File not found\n")
+ }
+ return comparison
+ }
+
+ // Read the file
+ content, err := os.ReadFile(versionPath.FilePath)
+ if err != nil {
+ comparison.Status = FileError
+ comparison.Error = fmt.Errorf("failed to read file: %w", err)
+ if verbose {
+ fmt.Printf(" → Error reading file: %v\n", err)
+ }
+ return comparison
+ }
+
+ // Compare contents
+ if AreFilesIdentical(referenceContent, string(content)) {
+ comparison.Status = FileMatches
+ if verbose {
+ fmt.Printf(" → Matches\n")
+ }
+ } else {
+ comparison.Status = FileDiffers
+ if verbose {
+ fmt.Printf(" → Differs\n")
+ }
+
+ if generateDiff {
+ diff, err := GenerateDiff(referencePath, referenceContent, versionPath.FilePath, string(content))
+ if err != nil {
+ comparison.Status = FileError
+ comparison.Error = fmt.Errorf("failed to generate diff: %w", err)
+ if verbose {
+ fmt.Printf(" → Error generating diff: %v\n", err)
+ }
+ } else {
+ comparison.Diff = diff
+ }
+ }
+ }
+
+ return comparison
+}
+
diff --git a/audit-cli/commands/compare/file-contents/differ.go b/audit-cli/commands/compare/file-contents/differ.go
new file mode 100644
index 0000000..7e11e70
--- /dev/null
+++ b/audit-cli/commands/compare/file-contents/differ.go
@@ -0,0 +1,81 @@
+package file_contents
+
+import (
+ "github.com/aymanbagabas/go-udiff"
+)
+
+// GenerateDiff generates a unified diff between two file contents.
+//
+// This function uses the Myers diff algorithm to compute the differences
+// between two strings and formats the output as a unified diff.
+//
+// Parameters:
+// - fromName: Name/label for the "from" file (e.g., "manual/source/file.rst")
+// - fromContent: Content of the "from" file
+// - toName: Name/label for the "to" file (e.g., "v8.0/source/file.rst")
+// - toContent: Content of the "to" file
+//
+// Returns:
+// - string: The unified diff output, or empty string if files are identical
+// - error: Any error encountered during diff generation
+func GenerateDiff(fromName, fromContent, toName, toContent string) (string, error) {
+ // If contents are identical, return empty string
+ if fromContent == toContent {
+ return "", nil
+ }
+
+ // Generate unified diff using go-udiff
+ // This uses the default number of context lines (3)
+ diff := udiff.Unified(fromName, toName, fromContent, toContent)
+
+ return diff, nil
+}
+
+// GenerateDiffWithContext generates a unified diff with custom context lines.
+//
+// This function is similar to GenerateDiff but allows specifying the number
+// of context lines to include around changes.
+//
+// Parameters:
+// - fromName: Name/label for the "from" file
+// - fromContent: Content of the "from" file
+// - toName: Name/label for the "to" file
+// - toContent: Content of the "to" file
+// - contextLines: Number of context lines to show around changes (typically 3)
+//
+// Returns:
+// - string: The unified diff output, or empty string if files are identical
+// - error: Any error encountered during diff generation
+func GenerateDiffWithContext(fromName, fromContent, toName, toContent string, contextLines int) (string, error) {
+ // If contents are identical, return empty string
+ if fromContent == toContent {
+ return "", nil
+ }
+
+ // Compute edits
+ edits := udiff.Strings(fromContent, toContent)
+
+ // Generate unified diff with custom context lines
+ // ToUnified returns a string directly
+ diff, err := udiff.ToUnified(fromName, toName, fromContent, edits, contextLines)
+ if err != nil {
+ return "", err
+ }
+
+ return diff, nil
+}
+
+// AreFilesIdentical checks if two file contents are identical.
+//
+// This is a simple byte-by-byte comparison.
+//
+// Parameters:
+// - content1: First file content
+// - content2: Second file content
+//
+// Returns:
+// - bool: true if contents are identical, false otherwise
+func AreFilesIdentical(content1, content2 string) bool {
+ return content1 == content2
+}
+
diff --git a/audit-cli/commands/compare/file-contents/file_contents.go b/audit-cli/commands/compare/file-contents/file_contents.go
new file mode 100644
index 0000000..32a17bf
--- /dev/null
+++ b/audit-cli/commands/compare/file-contents/file_contents.go
@@ -0,0 +1,197 @@
+// Package file_contents provides functionality for comparing file contents across versions.
+//
+// This package implements the "compare file-contents" subcommand, which compares
+// file contents either directly between two files or across multiple versions of
+// MongoDB documentation.
+//
+// The command supports two modes:
+// 1. Direct comparison: Compare two specific files
+// 2. Version comparison: Compare the same file across multiple versions
+//
+// Output can be progressively detailed:
+// - Default: Summary of differences
+// - --show-paths: Include file paths
+// - --show-diff: Include unified diffs
+package file_contents
+
+import (
+ "fmt"
+ "strings"
+
+ "github.com/spf13/cobra"
+)
+
+// NewFileContentsCommand creates the file-contents subcommand.
+//
+// This command compares file contents either directly between two files
+// or across multiple versions of documentation.
+//
+// Usage modes:
+// 1. Direct comparison:
+// compare file-contents file1.rst file2.rst
+//
+// 2. Version comparison:
+// compare file-contents file.rst --product-dir /path/to/product --versions v1,v2,v3
+//
+// Flags:
+// - -p, --product-dir: Product directory path (required for version comparison)
+// - -V, --versions: Comma-separated list of versions (required for version comparison)
+// - --show-paths: Display file paths of files that differ
+// - -d, --show-diff: Display unified diff output
+// - -v, --verbose: Show detailed processing information
+func NewFileContentsCommand() *cobra.Command {
+ var (
+ productDir string
+ versions string
+ showPaths bool
+ showDiff bool
+ verbose bool
+ )
+
+ cmd := &cobra.Command{
+ Use: "file-contents [file1] [file2]",
+ Short: "Compare file contents across versions or between two files",
+ Long: `Compare file contents to identify differences.
+
+This command supports two modes:
+
+1. Direct comparison (two file arguments):
+ Compare two specific files directly.
+ Example: compare file-contents file1.rst file2.rst
+
+2. Version comparison (one file argument + flags):
+ Compare the same file across multiple documentation versions.
+ Example: compare file-contents /path/to/manual/manual/source/file.rst \
+ --product-dir /path/to/manual \
+ --versions manual,upcoming,v8.1,v8.0
+
+The command provides progressive output detail:
+ - Default: Summary of differences
+ - --show-paths: Include file paths grouped by status
+ - --show-diff: Include unified diffs (implies --show-paths)
+
+Files that don't exist in certain versions are reported separately and
+do not cause errors.`,
+ Args: cobra.RangeArgs(1, 2),
+ RunE: func(cmd *cobra.Command, args []string) error {
+ return runCompare(args, productDir, versions, showPaths, showDiff, verbose)
+ },
+ }
+
+ cmd.Flags().StringVarP(&productDir, "product-dir", "p", "", "Product directory path (e.g., /path/to/manual)")
+ cmd.Flags().StringVarP(&versions, "versions", "V", "", "Comma-separated list of versions (e.g., manual,upcoming,v8.1)")
+ cmd.Flags().BoolVar(&showPaths, "show-paths", false, "Display file paths of files that differ")
+ cmd.Flags().BoolVarP(&showDiff, "show-diff", "d", false, "Display unified diff output")
+ cmd.Flags().BoolVarP(&verbose, "verbose", "v", false, "Show detailed processing information")
+
+ return cmd
+}
+
+// runCompare executes the comparison operation.
+//
+// This function validates arguments and delegates to the appropriate
+// comparison function based on the mode (direct or version comparison).
+//
+// Parameters:
+// - args: Command line arguments (1 or 2 file paths)
+// - productDir: Product directory path (for version comparison)
+// - versions: Comma-separated version list (for version comparison)
+// - showPaths: If true, show file paths
+// - showDiff: If true, show diffs
+// - verbose: If true, show detailed processing information
+//
+// Returns:
+// - error: Any error encountered during comparison
+func runCompare(args []string, productDir, versions string, showPaths, showDiff, verbose bool) error {
+ // Validate arguments based on mode
+ if len(args) == 2 {
+ // Direct comparison mode
+ if productDir != "" || versions != "" {
+ return fmt.Errorf("--product-dir and --versions cannot be used with two file arguments")
+ }
+ return runDirectComparison(args[0], args[1], showPaths, showDiff, verbose)
+ } else if len(args) == 1 {
+ // Version comparison mode
+ if productDir == "" {
+ return fmt.Errorf("--product-dir is required when comparing versions (use -p or --product-dir)")
+ }
+ if versions == "" {
+ return fmt.Errorf("--versions is required when comparing versions (use -V or --versions)")
+ }
+ return runVersionComparison(args[0], productDir, versions, showPaths, showDiff, verbose)
+ }
+
+ return fmt.Errorf("expected 1 or 2 file arguments")
+}
+
+// runDirectComparison performs a direct comparison between two files.
+//
+// Parameters:
+// - file1: Path to the first file
+// - file2: Path to the second file
+// - showPaths: If true, show file paths
+// - showDiff: If true, show diffs
+// - verbose: If true, show detailed processing information
+//
+// Returns:
+// - error: Any error encountered during comparison
+func runDirectComparison(file1, file2 string, showPaths, showDiff, verbose bool) error {
+ result, err := CompareFiles(file1, file2, showDiff, verbose)
+ if err != nil {
+ return fmt.Errorf("comparison failed: %w", err)
+ }
+
+ PrintComparisonResult(result, showPaths, showDiff)
+ return nil
+}
+
+// runVersionComparison performs a version-based comparison.
+//
+// Parameters:
+// - referenceFile: Path to the reference file
+// - productDir: Product directory path
+// - versionsStr: Comma-separated version list
+// - showPaths: If true, show file paths
+// - showDiff: If true, show diffs
+// - verbose: If true, show detailed processing information
+//
+// Returns:
+// - error: Any error encountered during comparison
+func runVersionComparison(referenceFile, productDir, versionsStr string, showPaths, showDiff, verbose bool) error {
+ // Parse versions
+ versionList := parseVersions(versionsStr)
+ if len(versionList) == 0 {
+ return fmt.Errorf("no versions specified")
+ }
+
+ result, err := CompareVersions(referenceFile, productDir, versionList, showDiff, verbose)
+ if err != nil {
+ return fmt.Errorf("comparison failed: %w", err)
+ }
+
+ PrintComparisonResult(result, showPaths, showDiff)
+ return nil
+}
+
+// parseVersions parses a comma-separated version string into a slice.
+//
+// This function splits the version string by commas and trims whitespace
+// from each version identifier.
+//
+// Parameters:
+// - versionsStr: Comma-separated version string (e.g., "manual, upcoming, v8.1")
+//
+// Returns:
+// - []string: List of version identifiers
+func parseVersions(versionsStr string) []string {
+ parts := strings.Split(versionsStr, ",")
+ var versions []string
+ for _, part := range parts {
+ trimmed := strings.TrimSpace(part)
+ if trimmed != "" {
+ versions = append(versions, trimmed)
+ }
+ }
+ return versions
+}
+
diff --git a/audit-cli/commands/compare/file-contents/file_contents_test.go b/audit-cli/commands/compare/file-contents/file_contents_test.go
new file mode 100644
index 0000000..2f9cb4a
--- /dev/null
+++ b/audit-cli/commands/compare/file-contents/file_contents_test.go
@@ -0,0 +1,535 @@
+package file_contents
+
+import (
+ "strings"
+ "testing"
+)
+
+// TestCompareFiles tests direct file comparison
+func TestCompareFiles(t *testing.T) {
+ testDataDir := "../../../testdata/compare"
+
+ tests := []struct {
+ name string
+ file1 string
+ file2 string
+ generateDiff bool
+ expectError bool
+ expectDiff bool
+ expectMatching bool
+ }{
+ {
+ name: "different files without diff",
+ file1: testDataDir + "/file1.txt",
+ file2: testDataDir + "/file2.txt",
+ generateDiff: false,
+ expectError: false,
+ expectDiff: true,
+ expectMatching: false,
+ },
+ {
+ name: "different files with diff",
+ file1: testDataDir + "/file1.txt",
+ file2: testDataDir + "/file2.txt",
+ generateDiff: true,
+ expectError: false,
+ expectDiff: true,
+ expectMatching: false,
+ },
+ {
+ name: "identical files",
+ file1: testDataDir + "/identical1.txt",
+ file2: testDataDir + "/identical2.txt",
+ generateDiff: false,
+ expectError: false,
+ expectDiff: false,
+ expectMatching: true,
+ },
+ {
+ name: "nonexistent file",
+ file1: testDataDir + "/file1.txt",
+ file2: testDataDir + "/nonexistent.txt",
+ expectError: true,
+ },
+ }
+
+ for _, tt := range tests {
+ t.Run(tt.name, func(t *testing.T) {
+ result, err := CompareFiles(tt.file1, tt.file2, tt.generateDiff, false)
+
+ if tt.expectError {
+ if err == nil {
+ t.Errorf("expected error but got none")
+ }
+ return
+ }
+
+ if err != nil {
+ t.Fatalf("unexpected error: %v", err)
+ }
+
+ if result == nil {
+ t.Fatal("expected result but got nil")
+ }
+
+ if tt.expectMatching && result.MatchingFiles != 1 {
+ t.Errorf("expected 1 matching file, got %d", result.MatchingFiles)
+ }
+
+ if tt.expectDiff && result.DifferingFiles != 1 {
+ t.Errorf("expected 1 differing file, got %d", result.DifferingFiles)
+ }
+
+ if tt.generateDiff && tt.expectDiff {
+ if len(result.Comparisons) == 0 {
+ t.Fatal("expected comparisons but got none")
+ }
+ if result.Comparisons[0].Diff == "" {
+ t.Error("expected diff output but got empty string")
+ }
+ }
+ })
+ }
+}
+
+// TestCompareVersions tests version-based comparison
+func TestCompareVersions(t *testing.T) {
+ testDataDir := "../../../testdata/compare"
+
+ tests := []struct {
+ name string
+ referenceFile string
+ productDir string
+ versions []string
+ generateDiff bool
+ expectError bool
+ expectMatching int
+ expectDiffering int
+ expectNotFound int
+ }{
+ {
+ name: "compare across three versions",
+ referenceFile: testDataDir + "/product/manual/source/includes/example.rst",
+ productDir: testDataDir + "/product",
+ versions: []string{"manual", "upcoming", "v8.0"},
+ generateDiff: false,
+ expectError: false,
+ expectMatching: 1, // manual matches itself
+ expectDiffering: 2, // upcoming and v8.0 differ
+ expectNotFound: 0,
+ },
+ {
+ name: "compare with diff generation",
+ referenceFile: testDataDir + "/product/manual/source/includes/example.rst",
+ productDir: testDataDir + "/product",
+ versions: []string{"manual", "upcoming"},
+ generateDiff: true,
+ expectError: false,
+ expectMatching: 1,
+ expectDiffering: 1,
+ expectNotFound: 0,
+ },
+ {
+ name: "file not found in some versions",
+ referenceFile: testDataDir + "/product/manual/source/includes/new-feature.rst",
+ productDir: testDataDir + "/product",
+ versions: []string{"manual", "upcoming", "v8.0"},
+ generateDiff: false,
+ expectError: false,
+ expectMatching: 2, // manual and upcoming match
+ expectDiffering: 0,
+ expectNotFound: 1, // v8.0 doesn't have this file
+ },
+ }
+
+ for _, tt := range tests {
+ t.Run(tt.name, func(t *testing.T) {
+ result, err := CompareVersions(tt.referenceFile, tt.productDir, tt.versions, tt.generateDiff, false)
+
+ if tt.expectError {
+ if err == nil {
+ t.Errorf("expected error but got none")
+ }
+ return
+ }
+
+ if err != nil {
+ t.Fatalf("unexpected error: %v", err)
+ }
+
+ if result == nil {
+ t.Fatal("expected result but got nil")
+ }
+
+ if result.MatchingFiles != tt.expectMatching {
+ t.Errorf("expected %d matching files, got %d", tt.expectMatching, result.MatchingFiles)
+ }
+
+ if result.DifferingFiles != tt.expectDiffering {
+ t.Errorf("expected %d differing files, got %d", tt.expectDiffering, result.DifferingFiles)
+ }
+
+ if result.NotFoundFiles != tt.expectNotFound {
+ t.Errorf("expected %d not found files, got %d", tt.expectNotFound, result.NotFoundFiles)
+ }
+
+ if result.TotalFiles != len(tt.versions) {
+ t.Errorf("expected %d total files, got %d", len(tt.versions), result.TotalFiles)
+ }
+
+ // Verify diff generation if requested
+ if tt.generateDiff && tt.expectDiffering > 0 {
+ foundDiff := false
+ for _, comp := range result.Comparisons {
+ if comp.Status == FileDiffers && comp.Diff != "" {
+ foundDiff = true
+ break
+ }
+ }
+ if !foundDiff {
+ t.Error("expected diff output but none found")
+ }
+ }
+ })
+ }
+}
+
+// TestResolveVersionPaths tests version path resolution
+func TestResolveVersionPaths(t *testing.T) {
+ testDataDir := "../../../testdata/compare"
+
+ tests := []struct {
+ name string
+ referenceFile string
+ productDir string
+ versions []string
+ expectError bool
+ expectedPaths map[string]string // version -> expected path suffix
+ }{
+ {
+ name: "resolve paths for multiple versions",
+ referenceFile: testDataDir + "/product/manual/source/includes/example.rst",
+ productDir: testDataDir + "/product",
+ versions: []string{"manual", "upcoming", "v8.0"},
+ expectError: false,
+ expectedPaths: map[string]string{
+ "manual": "manual/source/includes/example.rst",
+ "upcoming": "upcoming/source/includes/example.rst",
+ "v8.0": "v8.0/source/includes/example.rst",
+ },
+ },
+ {
+ name: "file not under product dir",
+ referenceFile: "/some/other/path/file.rst",
+ productDir: testDataDir + "/product",
+ versions: []string{"manual"},
+ expectError: true,
+ },
+ }
+
+ for _, tt := range tests {
+ t.Run(tt.name, func(t *testing.T) {
+ paths, err := ResolveVersionPaths(tt.referenceFile, tt.productDir, tt.versions)
+
+ if tt.expectError {
+ if err == nil {
+ t.Errorf("expected error but got none")
+ }
+ return
+ }
+
+ if err != nil {
+ t.Fatalf("unexpected error: %v", err)
+ }
+
+ if len(paths) != len(tt.versions) {
+ t.Fatalf("expected %d paths, got %d", len(tt.versions), len(paths))
+ }
+
+ for _, vp := range paths {
+ expectedSuffix, ok := tt.expectedPaths[vp.Version]
+ if !ok {
+ t.Errorf("unexpected version: %s", vp.Version)
+ continue
+ }
+
+ if !strings.HasSuffix(vp.FilePath, expectedSuffix) {
+ t.Errorf("expected path to end with %s, got %s", expectedSuffix, vp.FilePath)
+ }
+ }
+ })
+ }
+}
+
+// TestExtractVersionFromPath tests version extraction from file paths
+func TestExtractVersionFromPath(t *testing.T) {
+ testDataDir := "../../../testdata/compare"
+
+ tests := []struct {
+ name string
+ filePath string
+ productDir string
+ expectedVersion string
+ expectError bool
+ }{
+ {
+ name: "extract manual version",
+ filePath: testDataDir + "/product/manual/source/includes/example.rst",
+ productDir: testDataDir + "/product",
+ expectedVersion: "manual",
+ expectError: false,
+ },
+ {
+ name: "extract v8.0 version",
+ filePath: testDataDir + "/product/v8.0/source/includes/example.rst",
+ productDir: testDataDir + "/product",
+ expectedVersion: "v8.0",
+ expectError: false,
+ },
+ {
+ name: "file not under product dir",
+ filePath: "/some/other/path/file.rst",
+ productDir: testDataDir + "/product",
+ expectError: true,
+ },
+ }
+
+ for _, tt := range tests {
+ t.Run(tt.name, func(t *testing.T) {
+ version, err := ExtractVersionFromPath(tt.filePath, tt.productDir)
+
+ if tt.expectError {
+ if err == nil {
+ t.Errorf("expected error but got none")
+ }
+ return
+ }
+
+ if err != nil {
+ t.Fatalf("unexpected error: %v", err)
+ }
+
+ if version != tt.expectedVersion {
+ t.Errorf("expected version %s, got %s", tt.expectedVersion, version)
+ }
+ })
+ }
+}
+
+// TestGenerateDiff tests unified diff generation
+func TestGenerateDiff(t *testing.T) {
+ tests := []struct {
+ name string
+ fromName string
+ fromContent string
+ toName string
+ toContent string
+ expectEmpty bool
+ }{
+ {
+ name: "identical content",
+ fromName: "file1.txt",
+ fromContent: "Line 1\nLine 2\n",
+ toName: "file2.txt",
+ toContent: "Line 1\nLine 2\n",
+ expectEmpty: true,
+ },
+ {
+ name: "different content",
+ fromName: "file1.txt",
+ fromContent: "Line 1\nLine 2\n",
+ toName: "file2.txt",
+ toContent: "Line 1\nLine 2 modified\n",
+ expectEmpty: false,
+ },
+ {
+ name: "added lines",
+ fromName: "file1.txt",
+ fromContent: "Line 1\n",
+ toName: "file2.txt",
+ toContent: "Line 1\nLine 2\n",
+ expectEmpty: false,
+ },
+ {
+ name: "removed lines",
+ fromName: "file1.txt",
+ fromContent: "Line 1\nLine 2\n",
+ toName: "file2.txt",
+ toContent: "Line 1\n",
+ expectEmpty: false,
+ },
+ }
+
+ for _, tt := range tests {
+ t.Run(tt.name, func(t *testing.T) {
+ diff, err := GenerateDiff(tt.fromName, tt.fromContent, tt.toName, tt.toContent)
+ if err != nil {
+ t.Fatalf("unexpected error: %v", err)
+ }
+
+ if tt.expectEmpty {
+ if diff != "" {
+ t.Errorf("expected empty diff but got: %s", diff)
+ }
+ } else {
+ if diff == "" {
+ t.Error("expected non-empty diff but got empty string")
+ }
+ // Verify it's a unified diff format
+ if !strings.Contains(diff, "---") || !strings.Contains(diff, "+++") {
+ t.Errorf("expected unified diff format but got: %s", diff)
+ }
+ }
+ })
+ }
+}
+
+// TestAreFilesIdentical tests file identity checking
+func TestAreFilesIdentical(t *testing.T) {
+ tests := []struct {
+ name string
+ content1 string
+ content2 string
+ identical bool
+ }{
+ {
+ name: "identical content",
+ content1: "Hello, world!\n",
+ content2: "Hello, world!\n",
+ identical: true,
+ },
+ {
+ name: "different content",
+ content1: "Hello, world!\n",
+ content2: "Hello, Go!\n",
+ identical: false,
+ },
+ {
+ name: "empty strings",
+ content1: "",
+ content2: "",
+ identical: true,
+ },
+ {
+ name: "whitespace difference",
+ content1: "Hello\n",
+ content2: "Hello \n",
+ identical: false,
+ },
+ }
+
+ for _, tt := range tests {
+ t.Run(tt.name, func(t *testing.T) {
+ result := AreFilesIdentical(tt.content1, tt.content2)
+ if result != tt.identical {
+ t.Errorf("expected %v but got %v", tt.identical, result)
+ }
+ })
+ }
+}
+
+// TestComparisonResultMethods tests ComparisonResult helper methods
+func TestComparisonResultMethods(t *testing.T) {
+ t.Run("HasDifferences", func(t *testing.T) {
+ result := &ComparisonResult{
+ DifferingFiles: 1,
+ }
+ if !result.HasDifferences() {
+ t.Error("expected HasDifferences to return true")
+ }
+
+ result.DifferingFiles = 0
+ if result.HasDifferences() {
+ t.Error("expected HasDifferences to return false")
+ }
+ })
+
+ t.Run("AllMatch", func(t *testing.T) {
+ result := &ComparisonResult{
+ MatchingFiles: 3,
+ DifferingFiles: 0,
+ ErrorFiles: 0,
+ }
+ if !result.AllMatch() {
+ t.Error("expected AllMatch to return true")
+ }
+
+ result.DifferingFiles = 1
+ if result.AllMatch() {
+ t.Error("expected AllMatch to return false when files differ")
+ }
+
+ result.DifferingFiles = 0
+ result.ErrorFiles = 1
+ if result.AllMatch() {
+ t.Error("expected AllMatch to return false when errors exist")
+ }
+
+ result.ErrorFiles = 0
+ result.MatchingFiles = 0
+ if result.AllMatch() {
+ t.Error("expected AllMatch to return false when no matching files")
+ }
+ })
+}
+
+// TestParseVersions tests version string parsing
+func TestParseVersions(t *testing.T) {
+ tests := []struct {
+ name string
+ versionsStr string
+ expectedCount int
+ expectedVersion []string
+ }{
+ {
+ name: "single version",
+ versionsStr: "manual",
+ expectedCount: 1,
+ expectedVersion: []string{"manual"},
+ },
+ {
+ name: "multiple versions",
+ versionsStr: "manual,upcoming,v8.0",
+ expectedCount: 3,
+ expectedVersion: []string{"manual", "upcoming", "v8.0"},
+ },
+ {
+ name: "versions with spaces",
+ versionsStr: "manual, upcoming, v8.0",
+ expectedCount: 3,
+ expectedVersion: []string{"manual", "upcoming", "v8.0"},
+ },
+ {
+ name: "empty string",
+ versionsStr: "",
+ expectedCount: 0,
+ expectedVersion: []string{},
+ },
+ {
+ name: "trailing comma",
+ versionsStr: "manual,upcoming,",
+ expectedCount: 2,
+ expectedVersion: []string{"manual", "upcoming"},
+ },
+ }
+
+ for _, tt := range tests {
+ t.Run(tt.name, func(t *testing.T) {
+ versions := parseVersions(tt.versionsStr)
+
+ if len(versions) != tt.expectedCount {
+ t.Errorf("expected %d versions, got %d", tt.expectedCount, len(versions))
+ }
+
+ for i, expected := range tt.expectedVersion {
+ if i >= len(versions) {
+ t.Errorf("missing expected version: %s", expected)
+ continue
+ }
+ if versions[i] != expected {
+ t.Errorf("expected version %s at index %d, got %s", expected, i, versions[i])
+ }
+ }
+ })
+ }
+}
diff --git a/audit-cli/commands/compare/file-contents/output.go b/audit-cli/commands/compare/file-contents/output.go
new file mode 100644
index 0000000..d9db245
--- /dev/null
+++ b/audit-cli/commands/compare/file-contents/output.go
@@ -0,0 +1,197 @@
+package file_contents
+
+import (
+ "fmt"
+ "strings"
+)
+
+// PrintComparisonResult prints the comparison result with progressive detail levels.
+//
+// The output format depends on the flags:
+// - Default: Summary only
+// - showPaths: Summary + file paths
+// - showDiff: Summary + paths + diffs
+//
+// Parameters:
+// - result: The comparison result to print
+// - showPaths: If true, show file paths
+// - showDiff: If true, show diffs (implies showPaths)
+func PrintComparisonResult(result *ComparisonResult, showPaths bool, showDiff bool) {
+ // If showDiff is true, we also need to show paths
+ if showDiff {
+ showPaths = true
+ }
+
+ // Print summary
+ printSummary(result)
+
+ // Print paths if requested
+ if showPaths {
+ fmt.Println()
+ printPaths(result)
+ }
+
+ // Print diffs if requested
+ if showDiff {
+ fmt.Println()
+ printDiffs(result)
+ }
+}
+
+// printSummary prints a summary of the comparison results.
+func printSummary(result *ComparisonResult) {
+ if result.ReferenceVersion != "" {
+ // Version comparison mode
+ fmt.Printf("Comparing file across %d versions...\n", result.TotalFiles)
+ } else {
+ // Direct comparison mode
+ fmt.Println("Comparing files...")
+ }
+
+ if result.AllMatch() {
+ // All files match
+ fmt.Printf("✓ All versions match (%d/%d files identical)\n", result.MatchingFiles, result.TotalFiles)
+ } else if result.HasDifferences() {
+ // Some files differ
+ fmt.Printf("⚠ Differences found: %d of %d versions differ", result.DifferingFiles, result.TotalFiles)
+ if result.ReferenceVersion != "" {
+ fmt.Printf(" from %s\n", result.ReferenceVersion)
+ } else {
+ fmt.Println()
+ }
+
+ // Show breakdown
+ if result.MatchingFiles > 0 {
+ fmt.Printf(" - %d version(s) match\n", result.MatchingFiles)
+ }
+ if result.DifferingFiles > 0 {
+ fmt.Printf(" - %d version(s) differ\n", result.DifferingFiles)
+ }
+ if result.NotFoundFiles > 0 {
+ fmt.Printf(" - %d version(s) not found (file does not exist)\n", result.NotFoundFiles)
+ }
+ if result.ErrorFiles > 0 {
+ fmt.Printf(" - %d version(s) had errors\n", result.ErrorFiles)
+ }
+
+ // Show hints
+ fmt.Println()
+ fmt.Println("Use --show-paths to see which files differ")
+ fmt.Println("Use --show-diff to see the differences")
+ } else if result.NotFoundFiles > 0 || result.ErrorFiles > 0 {
+ // No differences, but some files not found or had errors
+ fmt.Printf("✓ No differences found among existing files\n")
+ if result.NotFoundFiles > 0 {
+ fmt.Printf(" - %d version(s) not found (file does not exist)\n", result.NotFoundFiles)
+ }
+ if result.ErrorFiles > 0 {
+ fmt.Printf(" - %d version(s) had errors\n", result.ErrorFiles)
+ }
+ }
+}
+
+// printPaths prints the file paths grouped by status.
+func printPaths(result *ComparisonResult) {
+ // Group comparisons by status
+ var matching, differing, notFound, errors []FileComparison
+ for _, comp := range result.Comparisons {
+ switch comp.Status {
+ case FileMatches:
+ matching = append(matching, comp)
+ case FileDiffers:
+ differing = append(differing, comp)
+ case FileNotFound:
+ notFound = append(notFound, comp)
+ case FileError:
+ errors = append(errors, comp)
+ }
+ }
+
+ // Print matching files
+ if len(matching) > 0 {
+ fmt.Println("Files that match:")
+ for _, comp := range matching {
+ if comp.Version == result.ReferenceVersion {
+ fmt.Printf(" ✓ %s (reference)\n", comp.FilePath)
+ } else {
+ fmt.Printf(" ✓ %s\n", comp.FilePath)
+ }
+ }
+ }
+
+ // Print differing files
+ if len(differing) > 0 {
+ if len(matching) > 0 {
+ fmt.Println()
+ }
+ fmt.Println("Files that differ:")
+ for _, comp := range differing {
+ fmt.Printf(" ✗ %s\n", comp.FilePath)
+ }
+ }
+
+ // Print not found files
+ if len(notFound) > 0 {
+ if len(matching) > 0 || len(differing) > 0 {
+ fmt.Println()
+ }
+ fmt.Println("Files not found:")
+ for _, comp := range notFound {
+ fmt.Printf(" - %s\n", comp.FilePath)
+ }
+ }
+
+ // Print error files
+ if len(errors) > 0 {
+ if len(matching) > 0 || len(differing) > 0 || len(notFound) > 0 {
+ fmt.Println()
+ }
+ fmt.Println("Files with errors:")
+ for _, comp := range errors {
+ fmt.Printf(" ⚠ %s: %v\n", comp.FilePath, comp.Error)
+ }
+ }
+}
+
+// printDiffs prints the unified diffs for files that differ.
+func printDiffs(result *ComparisonResult) {
+ // Find files with diffs
+ var diffsToShow []FileComparison
+ for _, comp := range result.Comparisons {
+ if comp.Status == FileDiffers && comp.Diff != "" {
+ diffsToShow = append(diffsToShow, comp)
+ }
+ }
+
+ if len(diffsToShow) == 0 {
+ return
+ }
+
+ fmt.Println("Diffs:")
+ fmt.Println(strings.Repeat("=", 80))
+
+ for i, comp := range diffsToShow {
+ if i > 0 {
+ fmt.Println()
+ }
+
+ // Print header
+ if result.ReferenceVersion != "" {
+ fmt.Printf("Diff: %s vs %s\n", result.ReferenceVersion, comp.Version)
+ } else {
+ fmt.Printf("Diff: %s\n", comp.Version)
+ }
+ fmt.Println(strings.Repeat("-", 80))
+
+ // Print the diff
+ fmt.Print(comp.Diff)
+
+ // Ensure there's a newline at the end
+ if !strings.HasSuffix(comp.Diff, "\n") {
+ fmt.Println()
+ }
+ }
+
+ fmt.Println(strings.Repeat("=", 80))
+}
+
diff --git a/audit-cli/commands/compare/file-contents/types.go b/audit-cli/commands/compare/file-contents/types.go
new file mode 100644
index 0000000..85c95d6
--- /dev/null
+++ b/audit-cli/commands/compare/file-contents/types.go
@@ -0,0 +1,77 @@
+// Package file_contents provides functionality for comparing file contents across versions.
+package file_contents
+
+// FileStatus represents the status of a file in a comparison.
+type FileStatus int
+
+const (
+ // FileMatches indicates the file content matches the reference file
+ FileMatches FileStatus = iota
+ // FileDiffers indicates the file content differs from the reference file
+ FileDiffers
+ // FileNotFound indicates the file does not exist at the expected path
+ FileNotFound
+ // FileError indicates an error occurred while reading the file
+ FileError
+)
+
+// String returns a string representation of the FileStatus.
+func (s FileStatus) String() string {
+ switch s {
+ case FileMatches:
+ return "matches"
+ case FileDiffers:
+ return "differs"
+ case FileNotFound:
+ return "not found"
+ case FileError:
+ return "error"
+ default:
+ return "unknown"
+ }
+}
+
+// FileComparison represents the comparison result for a single file.
+type FileComparison struct {
+ // Version is the version identifier (e.g., "v8.0", "upcoming")
+ Version string
+ // FilePath is the absolute path to the file
+ FilePath string
+ // Status is the comparison status
+ Status FileStatus
+ // Error is any error encountered (only set if Status == FileError)
+ Error error
+ // Diff is the unified diff output (only set if Status == FileDiffers and diff was requested)
+ Diff string
+}
+
+// ComparisonResult represents the overall comparison result.
+type ComparisonResult struct {
+ // ReferenceFile is the path to the reference file being compared against
+ ReferenceFile string
+ // ReferenceVersion is the version of the reference file (empty for direct comparison)
+ ReferenceVersion string
+ // Comparisons is the list of file comparisons
+ Comparisons []FileComparison
+ // TotalFiles is the total number of files compared
+ TotalFiles int
+ // MatchingFiles is the number of files that match
+ MatchingFiles int
+ // DifferingFiles is the number of files that differ
+ DifferingFiles int
+ // NotFoundFiles is the number of files not found
+ NotFoundFiles int
+ // ErrorFiles is the number of files with errors
+ ErrorFiles int
+}
+
+// HasDifferences returns true if any files differ from the reference.
+func (r *ComparisonResult) HasDifferences() bool {
+ return r.DifferingFiles > 0
+}
+
+// AllMatch returns true if all files match the reference (excluding not found files).
+func (r *ComparisonResult) AllMatch() bool {
+ return r.DifferingFiles == 0 && r.ErrorFiles == 0 && r.MatchingFiles > 0
+}
+
diff --git a/audit-cli/commands/compare/file-contents/version_resolver.go b/audit-cli/commands/compare/file-contents/version_resolver.go
new file mode 100644
index 0000000..4f42d55
--- /dev/null
+++ b/audit-cli/commands/compare/file-contents/version_resolver.go
@@ -0,0 +1,160 @@
+package file_contents
+
+import (
+ "fmt"
+ "path/filepath"
+ "strings"
+)
+
+// VersionPath represents a resolved file path for a specific version.
+type VersionPath struct {
+ Version string
+ FilePath string
+}
+
+// ResolveVersionPaths resolves file paths for all specified versions.
+//
+// Given a reference file path and a list of versions, this function constructs
+// the corresponding file paths for each version by replacing the version segment
+// in the path.
+//
+// Example:
+// Input: /path/to/manual/manual/source/includes/file.rst
+// Versions: [manual, upcoming, v8.1, v8.0]
+// Output:
+// - manual: /path/to/manual/manual/source/includes/file.rst
+// - upcoming: /path/to/manual/upcoming/source/includes/file.rst
+// - v8.1: /path/to/manual/v8.1/source/includes/file.rst
+// - v8.0: /path/to/manual/v8.0/source/includes/file.rst
+//
+// Parameters:
+// - referenceFile: The absolute path to the reference file
+// - productDir: The absolute path to the product directory (e.g., /path/to/manual)
+// - versions: List of version identifiers
+//
+// Returns:
+// - []VersionPath: List of resolved version paths
+// - error: Any error encountered during resolution
+func ResolveVersionPaths(referenceFile string, productDir string, versions []string) ([]VersionPath, error) {
+ // Clean the paths
+ referenceFile = filepath.Clean(referenceFile)
+ productDir = filepath.Clean(productDir)
+
+ // Ensure productDir ends with a separator for proper prefix matching
+ if !strings.HasSuffix(productDir, string(filepath.Separator)) {
+ productDir += string(filepath.Separator)
+ }
+
+ // Check if referenceFile is under productDir
+ if !strings.HasPrefix(referenceFile, productDir) {
+ return nil, fmt.Errorf("reference file %s is not under product directory %s", referenceFile, productDir)
+ }
+
+ // Extract the relative path from productDir
+ relativePath := strings.TrimPrefix(referenceFile, productDir)
+
+ // Find the version segment and the path after it
+ // Expected format: {version}/source/{rest-of-path}
+ parts := strings.Split(relativePath, string(filepath.Separator))
+ if len(parts) < 2 {
+ return nil, fmt.Errorf("invalid file path structure: expected {version}/source/... format, got %s", relativePath)
+ }
+
+ // Find the "source" directory
+ sourceIndex := -1
+ for i, part := range parts {
+ if part == "source" {
+ sourceIndex = i
+ break
+ }
+ }
+
+ if sourceIndex == -1 {
+ return nil, fmt.Errorf("could not find 'source' directory in path: %s", relativePath)
+ }
+
+ if sourceIndex == 0 {
+ return nil, fmt.Errorf("invalid path structure: 'source' cannot be the first segment in %s", relativePath)
+ }
+
+ // The version is the segment before "source"
+ // Everything from "source" onwards is the path we want to preserve
+ pathFromSource := strings.Join(parts[sourceIndex:], string(filepath.Separator))
+
+ // Build version paths
+ var versionPaths []VersionPath
+ for _, version := range versions {
+ versionPath := filepath.Join(productDir, version, pathFromSource)
+ versionPaths = append(versionPaths, VersionPath{
+ Version: version,
+ FilePath: versionPath,
+ })
+ }
+
+ return versionPaths, nil
+}
+
+// ExtractVersionFromPath extracts the version identifier from a file path.
+//
+// Given a file path under a product directory, this function extracts the
+// version segment (the directory name before "source").
+//
+// Example:
+// Input: /path/to/manual/v8.0/source/includes/file.rst
+// Product Dir: /path/to/manual
+// Output: v8.0
+//
+// Parameters:
+// - filePath: The absolute path to the file
+// - productDir: The absolute path to the product directory
+//
+// Returns:
+// - string: The version identifier
+// - error: Any error encountered during extraction
+func ExtractVersionFromPath(filePath string, productDir string) (string, error) {
+ // Clean the paths
+ filePath = filepath.Clean(filePath)
+ productDir = filepath.Clean(productDir)
+
+ // Ensure productDir ends with a separator for proper prefix matching
+ if !strings.HasSuffix(productDir, string(filepath.Separator)) {
+ productDir += string(filepath.Separator)
+ }
+
+ // Check if filePath is under productDir
+ if !strings.HasPrefix(filePath, productDir) {
+ return "", fmt.Errorf("file path %s is not under product directory %s", filePath, productDir)
+ }
+
+ // Extract the relative path from productDir
+ relativePath := strings.TrimPrefix(filePath, productDir)
+
+ // Split into parts
+ parts := strings.Split(relativePath, string(filepath.Separator))
+ if len(parts) < 2 {
+ return "", fmt.Errorf("invalid file path structure: expected {version}/source/... format, got %s", relativePath)
+ }
+
+ // Find the "source" directory
+ sourceIndex := -1
+ for i, part := range parts {
+ if part == "source" {
+ sourceIndex = i
+ break
+ }
+ }
+
+ if sourceIndex == -1 {
+ return "", fmt.Errorf("could not find 'source' directory in path: %s", relativePath)
+ }
+
+ if sourceIndex == 0 {
+ return "", fmt.Errorf("invalid path structure: 'source' cannot be the first segment in %s", relativePath)
+ }
+
+ // The version is the segment before "source"
+ version := parts[sourceIndex-1]
+
+ return version, nil
+}
+
diff --git a/audit-cli/commands/extract/code-examples/code_examples.go b/audit-cli/commands/extract/code-examples/code_examples.go
new file mode 100644
index 0000000..475806f
--- /dev/null
+++ b/audit-cli/commands/extract/code-examples/code_examples.go
@@ -0,0 +1,181 @@
+// Package code_examples provides functionality for extracting code examples from RST files.
+//
+// This package implements the "extract code-examples" subcommand, which parses
+// reStructuredText files and extracts code examples from various directives:
+// - literalinclude: External file references with optional partial extraction
+// - code-block: Inline code blocks with automatic dedenting
+// - io-code-block: Input/output examples with nested directives
+//
+// The extracted code examples are written to individual files with standardized naming:
+// {source-base}.{directive-type}.{index}.{ext}
+//
+// Supports recursive directory scanning and following include directives to process
+// entire documentation trees.
+package code_examples
+
+import (
+ "fmt"
+ "os"
+
+ "github.com/spf13/cobra"
+)
+
+// NewCodeExamplesCommand creates the code-examples subcommand.
+//
+// This command extracts code examples from RST files and writes them to individual
+// files in the output directory. Supports various flags for controlling behavior:
+// - -r, --recursive: Recursively scan directories for RST files
+// - -f, --follow-includes: Follow .. include:: directives
+// - -o, --output: Output directory for extracted files
+// - --dry-run: Show what would be extracted without writing files
+// - -v, --verbose: Show detailed processing information
+func NewCodeExamplesCommand() *cobra.Command {
+ var (
+ recursive bool
+ followIncludes bool
+ outputDir string
+ dryRun bool
+ verbose bool
+ )
+
+ cmd := &cobra.Command{
+ Use: "code-examples [filepath]",
+ Short: "Extract code examples from reStructuredText files",
+ Long: `Extract code examples from reStructuredText directives (code-block, literalinclude, io-code-block)
+and output them as individual files.`,
+ Args: cobra.ExactArgs(1),
+ RunE: func(cmd *cobra.Command, args []string) error {
+ filePath := args[0]
+ return runExtract(filePath, recursive, followIncludes, outputDir, dryRun, verbose)
+ },
+ }
+
+ cmd.Flags().BoolVarP(&recursive, "recursive", "r", false, "Recursively scan directories for files to process")
+ cmd.Flags().BoolVarP(&followIncludes, "follow-includes", "f", false, "Follow .. include:: directives in RST files")
+ cmd.Flags().StringVarP(&outputDir, "output", "o", "./output", "Output directory for code example files")
+ cmd.Flags().BoolVar(&dryRun, "dry-run", false, "Show what would be outputted without writing files")
+ cmd.Flags().BoolVarP(&verbose, "verbose", "v", false, "Provide additional information during execution")
+
+ return cmd
+}
+
+// RunExtract executes the extraction operation and returns the report.
+//
+// This function is exported for use in tests. It extracts code examples from the
+// specified file or directory and writes them to the output directory.
+//
+// Parameters:
+// - filePath: Path to RST file or directory to process
+// - outputDir: Directory where extracted files will be written
+// - recursive: If true, recursively scan directories for RST files
+// - followIncludes: If true, follow .. include:: directives
+// - dryRun: If true, show what would be extracted without writing files
+// - verbose: If true, show detailed processing information
+//
+// Returns:
+// - *Report: Statistics about the extraction operation
+// - error: Any error encountered during extraction
+func RunExtract(filePath string, outputDir string, recursive bool, followIncludes bool, dryRun bool, verbose bool) (*Report, error) {
+ report, err := runExtractInternal(filePath, recursive, followIncludes, outputDir, dryRun, verbose)
+ return report, err
+}
+
+// runExtract executes the extraction operation (internal wrapper for CLI).
+//
+// This is a thin wrapper around runExtractInternal that discards the report
+// and only returns errors, suitable for use in the CLI command handler.
+func runExtract(filePath string, recursive bool, followIncludes bool, outputDir string, dryRun bool, verbose bool) error {
+ _, err := runExtractInternal(filePath, recursive, followIncludes, outputDir, dryRun, verbose)
+ return err
+}
+
+// runExtractInternal executes the extraction operation
+func runExtractInternal(filePath string, recursive bool, followIncludes bool, outputDir string, dryRun bool, verbose bool) (*Report, error) {
+ fileInfo, err := os.Stat(filePath)
+ if err != nil {
+ return nil, fmt.Errorf("failed to access path %s: %w", filePath, err)
+ }
+
+ report := NewReport()
+
+ var filesToProcess []string
+
+ if fileInfo.IsDir() {
+ if verbose {
+ fmt.Printf("Scanning directory: %s (recursive: %v)\n", filePath, recursive)
+ }
+ filesToProcess, err = TraverseDirectory(filePath, recursive)
+ if err != nil {
+ return nil, fmt.Errorf("failed to traverse directory: %w", err)
+ }
+ } else {
+ filesToProcess = []string{filePath}
+ }
+
+ var filteredFiles []string
+ for _, file := range filesToProcess {
+ if ShouldProcessFile(file) {
+ filteredFiles = append(filteredFiles, file)
+ }
+ }
+ filesToProcess = filteredFiles
+
+ if verbose {
+ fmt.Printf("Found %d files to process\n", len(filesToProcess))
+ }
+
+ if !dryRun {
+ if err := EnsureOutputDirectory(outputDir); err != nil {
+ return nil, fmt.Errorf("failed to create output directory: %w", err)
+ }
+ }
+
+ // Track visited files to prevent circular includes
+ visited := make(map[string]bool)
+
+ for _, file := range filesToProcess {
+ if verbose {
+ fmt.Printf("Processing: %s\n", file)
+ }
+
+ // Use ParseFileWithIncludes to follow include directives when followIncludes flag is set
+ examples, processedFiles, err := ParseFileWithIncludes(file, followIncludes, visited, verbose)
+ if err != nil {
+ fmt.Fprintf(os.Stderr, "Warning: failed to parse %s: %v\n", file, err)
+ continue
+ }
+
+ // Add all processed files (including includes) to the report
+ for _, processedFile := range processedFiles {
+ report.AddTraversedFile(processedFile)
+ }
+
+ for _, example := range examples {
+ outputPath, err := WriteCodeExample(example, outputDir, dryRun)
+ if err != nil {
+ fmt.Fprintf(os.Stderr, "Warning: failed to write code example: %v\n", err)
+ continue
+ }
+
+ if verbose {
+ if dryRun {
+ fmt.Printf(" [DRY RUN] Would write: %s\n", outputPath)
+ } else {
+ fmt.Printf(" Wrote: %s\n", outputPath)
+ }
+ }
+
+ report.AddCodeExample(example, outputPath)
+ if !dryRun {
+ report.OutputFilesWritten++
+ }
+ }
+ }
+
+ if dryRun {
+ fmt.Println("\n[DRY RUN MODE - No files were written]")
+ }
+ PrintReport(report, verbose)
+
+ return report, nil
+}
diff --git a/audit-cli/commands/extract/code-examples/code_examples_test.go b/audit-cli/commands/extract/code-examples/code_examples_test.go
new file mode 100644
index 0000000..f1432de
--- /dev/null
+++ b/audit-cli/commands/extract/code-examples/code_examples_test.go
@@ -0,0 +1,598 @@
+package code_examples
+
+import (
+ "os"
+ "path/filepath"
+ "testing"
+)
+
+// TestLiteralIncludeDirective tests the parsing and extraction of literalinclude directives
+func TestLiteralIncludeDirective(t *testing.T) {
+ // Setup paths
+ testDataDir := filepath.Join("..", "..", "..", "testdata")
+ inputFile := filepath.Join(testDataDir, "input-files", "source", "literalinclude-test.rst")
+ expectedOutputDir := filepath.Join(testDataDir, "expected-output")
+
+ // Create temporary output directory
+ tempDir, err := os.MkdirTemp("", "audit-test-*")
+ if err != nil {
+ t.Fatalf("Failed to create temp directory: %v", err)
+ }
+ defer os.RemoveAll(tempDir)
+
+ // Run the extract command
+ report, err := RunExtract(inputFile, tempDir, false, false, false, false)
+ if err != nil {
+ t.Fatalf("RunExtract failed: %v", err)
+ }
+
+ // Verify the report
+ if report.FilesTraversed != 1 {
+ t.Errorf("Expected 1 file traversed, got %d", report.FilesTraversed)
+ }
+
+ if report.OutputFilesWritten != 7 {
+ t.Errorf("Expected 7 output files, got %d", report.OutputFilesWritten)
+ }
+
+ // Expected output files
+ expectedFiles := []string{
+ "literalinclude-test.literalinclude.1.py",
+ "literalinclude-test.literalinclude.2.go",
+ "literalinclude-test.literalinclude.3.js",
+ "literalinclude-test.literalinclude.4.php",
+ "literalinclude-test.literalinclude.5.rb",
+ "literalinclude-test.literalinclude.6.ts",
+ "literalinclude-test.literalinclude.7.cpp",
+ }
+
+ // Compare each output file with expected
+ for _, filename := range expectedFiles {
+ actualPath := filepath.Join(tempDir, filename)
+ expectedPath := filepath.Join(expectedOutputDir, filename)
+
+ // Read actual output
+ actualContent, err := os.ReadFile(actualPath)
+ if err != nil {
+ t.Errorf("Failed to read actual output file %s: %v", filename, err)
+ continue
+ }
+
+ // Read expected output
+ expectedContent, err := os.ReadFile(expectedPath)
+ if err != nil {
+ t.Errorf("Failed to read expected output file %s: %v", filename, err)
+ continue
+ }
+
+ // Compare content
+ if string(actualContent) != string(expectedContent) {
+ t.Errorf("Content mismatch for %s\nExpected:\n%s\n\nActual:\n%s",
+ filename, string(expectedContent), string(actualContent))
+ }
+ }
+
+ // Verify language counts
+ expectedLanguages := map[string]int{
+ "python": 1,
+ "go": 1,
+ "javascript": 1,
+ "php": 1,
+ "ruby": 1,
+ "typescript": 1,
+ "cpp": 1,
+ }
+
+ for lang, expectedCount := range expectedLanguages {
+ if actualCount := report.LanguageCounts[lang]; actualCount != expectedCount {
+ t.Errorf("Expected %d %s examples, got %d", expectedCount, lang, actualCount)
+ }
+ }
+
+ // Verify directive counts
+ if count := report.DirectiveCounts[LiteralInclude]; count != 7 {
+ t.Errorf("Expected 7 literalinclude directives, got %d", count)
+ }
+}
+
+// TestIncludeDirectiveFollowing tests that include directives are followed correctly
+func TestIncludeDirectiveFollowing(t *testing.T) {
+ // Setup paths
+ testDataDir := filepath.Join("..", "..", "..", "testdata")
+ inputFile := filepath.Join(testDataDir, "input-files", "source", "include-test.rst")
+ expectedOutputDir := filepath.Join(testDataDir, "expected-output")
+
+ // Create temporary output directory
+ tempDir, err := os.MkdirTemp("", "audit-test-*")
+ if err != nil {
+ t.Fatalf("Failed to create temp directory: %v", err)
+ }
+ defer os.RemoveAll(tempDir)
+
+ // Run the extract command with include following enabled
+ report, err := RunExtract(inputFile, tempDir, false, true, false, false)
+ if err != nil {
+ t.Fatalf("RunExtract failed: %v", err)
+ }
+
+ // Verify that multiple files were traversed (main file + includes)
+ if report.FilesTraversed < 2 {
+ t.Errorf("Expected at least 2 files traversed (with includes), got %d", report.FilesTraversed)
+ }
+
+ // Verify output file was created
+ if report.OutputFilesWritten != 1 {
+ t.Errorf("Expected 1 output file, got %d", report.OutputFilesWritten)
+ }
+
+ // Compare output with expected
+ // The literalinclude is in examples.rst (included file), so output is named after that
+ actualPath := filepath.Join(tempDir, "examples.literalinclude.1.go")
+ expectedPath := filepath.Join(expectedOutputDir, "examples.literalinclude.1.go")
+
+ actualContent, err := os.ReadFile(actualPath)
+ if err != nil {
+ t.Fatalf("Failed to read actual output: %v", err)
+ }
+
+ expectedContent, err := os.ReadFile(expectedPath)
+ if err != nil {
+ t.Fatalf("Failed to read expected output: %v", err)
+ }
+
+ if string(actualContent) != string(expectedContent) {
+ t.Errorf("Content mismatch\nExpected:\n%s\n\nActual:\n%s",
+ string(expectedContent), string(actualContent))
+ }
+
+ // Verify the language was normalized (golang -> go)
+ if count := report.LanguageCounts["go"]; count != 1 {
+ t.Errorf("Expected 1 go example (normalized from golang), got %d", count)
+ }
+}
+
+// TestEmptyFile tests handling of files with no directives
+func TestCodeBlockDirective(t *testing.T) {
+ // Setup paths
+ testDataDir := filepath.Join("..", "..", "..", "testdata")
+ inputFile := filepath.Join(testDataDir, "input-files", "source", "code-block-test.rst")
+ expectedOutputDir := filepath.Join(testDataDir, "expected-output")
+
+ // Create temp directory for output
+ tempDir, err := os.MkdirTemp("", "audit-test-code-block-*")
+ if err != nil {
+ t.Fatal(err)
+ }
+ defer os.RemoveAll(tempDir)
+
+ // Run extract on code-block test file
+ report, err := RunExtract(inputFile, tempDir, false, false, false, false)
+ if err != nil {
+ t.Fatalf("RunExtract failed: %v", err)
+ }
+
+ // Verify report
+ if report.FilesTraversed != 1 {
+ t.Errorf("Expected 1 file traversed, got %d", report.FilesTraversed)
+ }
+
+ if report.OutputFilesWritten != 7 {
+ t.Errorf("Expected 7 output files, got %d", report.OutputFilesWritten)
+ }
+
+ // Expected output files
+ expectedFiles := []string{
+ "code-block-test.code-block.1.js", // JavaScript with language
+ "code-block-test.code-block.2.py", // Python with options
+ "code-block-test.code-block.3.js", // JSON array example
+ "code-block-test.code-block.4.txt", // No language (undefined)
+ "code-block-test.code-block.5.sh", // Shell script
+ "code-block-test.code-block.6.ts", // TypeScript normalization
+ "code-block-test.code-block.7.cpp", // C++ normalization
+ }
+
+ // Compare each output file with expected
+ for _, filename := range expectedFiles {
+ actualPath := filepath.Join(tempDir, filename)
+ expectedPath := filepath.Join(expectedOutputDir, filename)
+
+ actualContent, err := os.ReadFile(actualPath)
+ if err != nil {
+ t.Errorf("Failed to read actual file %s: %v", filename, err)
+ continue
+ }
+
+ expectedContent, err := os.ReadFile(expectedPath)
+ if err != nil {
+ t.Errorf("Failed to read expected file %s: %v", filename, err)
+ continue
+ }
+
+ if string(actualContent) != string(expectedContent) {
+ t.Errorf("Content mismatch for %s\nExpected:\n%s\n\nActual:\n%s",
+ filename, string(expectedContent), string(actualContent))
+ }
+ }
+}
+
+func TestNestedCodeBlockDirective(t *testing.T) {
+ // Setup paths
+ testDataDir := filepath.Join("..", "..", "..", "testdata")
+ inputFile := filepath.Join(testDataDir, "input-files", "source", "nested-code-block-test.rst")
+ expectedOutputDir := filepath.Join(testDataDir, "expected-output")
+
+ // Create temp directory for output
+ tempDir, err := os.MkdirTemp("", "audit-test-nested-code-block-*")
+ if err != nil {
+ t.Fatal(err)
+ }
+ defer os.RemoveAll(tempDir)
+
+ // Run extract on nested code-block test file
+ report, err := RunExtract(inputFile, tempDir, false, false, false, false)
+ if err != nil {
+ t.Fatalf("RunExtract failed: %v", err)
+ }
+
+ // Verify we found 11 code blocks
+ if report.OutputFilesWritten != 11 {
+ t.Errorf("Expected 11 output files, got %d", report.OutputFilesWritten)
+ }
+
+ // Verify all are code-block directives
+ if report.DirectiveCounts[CodeBlock] != 11 {
+ t.Errorf("Expected 11 code-block directives, got %d", report.DirectiveCounts[CodeBlock])
+ }
+
+ // Expected files and their languages
+ expectedFiles := map[string]string{
+ "nested-code-block-test.code-block.1.js": "javascript",
+ "nested-code-block-test.code-block.2.js": "javascript",
+ "nested-code-block-test.code-block.3.js": "javascript",
+ "nested-code-block-test.code-block.4.py": "python",
+ "nested-code-block-test.code-block.5.go": "go",
+ "nested-code-block-test.code-block.6.ts": "typescript",
+ "nested-code-block-test.code-block.7.ts": "typescript",
+ "nested-code-block-test.code-block.8.sh": "shell",
+ "nested-code-block-test.code-block.9.rb": "ruby",
+ "nested-code-block-test.code-block.10.rb": "ruby",
+ "nested-code-block-test.code-block.11.txt": "undefined",
+ }
+
+ // Verify each expected file exists and matches
+ for filename := range expectedFiles {
+ actualPath := filepath.Join(tempDir, filename)
+ expectedPath := filepath.Join(expectedOutputDir, filename)
+
+ // Check file exists
+ if _, err := os.Stat(actualPath); os.IsNotExist(err) {
+ t.Errorf("Expected output file not created: %s", filename)
+ continue
+ }
+
+ // Compare content
+ actualContent, err := os.ReadFile(actualPath)
+ if err != nil {
+ t.Errorf("Failed to read actual file %s: %v", filename, err)
+ continue
+ }
+
+ expectedContent, err := os.ReadFile(expectedPath)
+ if err != nil {
+ t.Errorf("Failed to read expected file %s: %v", filename, err)
+ continue
+ }
+
+ if string(actualContent) != string(expectedContent) {
+ t.Errorf("Content mismatch for %s\nExpected:\n%s\n\nActual:\n%s",
+ filename, string(expectedContent), string(actualContent))
+ }
+ }
+}
+
+func TestIoCodeBlockDirective(t *testing.T) {
+ // Setup paths
+ testDataDir := filepath.Join("..", "..", "..", "testdata")
+ inputFile := filepath.Join(testDataDir, "input-files", "source", "io-code-block-test.rst")
+ expectedOutputDir := filepath.Join(testDataDir, "expected-output")
+
+ // Create temp directory for output
+ tempDir, err := os.MkdirTemp("", "audit-test-io-code-block-*")
+ if err != nil {
+ t.Fatal(err)
+ }
+ defer os.RemoveAll(tempDir)
+
+ // Run extract on io-code-block test file
+ report, err := RunExtract(inputFile, tempDir, false, false, false, false)
+ if err != nil {
+ t.Fatalf("RunExtract failed: %v", err)
+ }
+
+ // Verify we found 11 code examples (7 directives, but Test 2 fails, Test 7 has no output)
+ // Test 1: input + output = 2
+ // Test 2: fails (file not found) = 0
+ // Test 3: input + output = 2
+ // Test 4: input + output = 2
+ // Test 5: input + output = 2
+ // Test 6: input + output = 2
+ // Test 7: input only = 1
+ // Total: 11
+ if report.OutputFilesWritten != 11 {
+ t.Errorf("Expected 11 output files, got %d", report.OutputFilesWritten)
+ }
+
+ // Verify all are io-code-block directives
+ if report.DirectiveCounts[IoCodeBlock] != 11 {
+ t.Errorf("Expected 11 io-code-block examples, got %d", report.DirectiveCounts[IoCodeBlock])
+ }
+
+ // Expected files
+ expectedFiles := []string{
+ // Test 1: Inline input/output (JavaScript)
+ "io-code-block-test.io-code-block.1.input.js",
+ "io-code-block-test.io-code-block.1.output.js",
+ // Test 2: File-based (skipped - files don't exist)
+ // Test 3: Python inline
+ "io-code-block-test.io-code-block.3.input.py",
+ "io-code-block-test.io-code-block.3.output.py",
+ // Test 4: Shell command
+ "io-code-block-test.io-code-block.4.input.sh",
+ "io-code-block-test.io-code-block.4.output.txt",
+ // Test 5: TypeScript
+ "io-code-block-test.io-code-block.5.input.ts",
+ "io-code-block-test.io-code-block.5.output.txt",
+ // Test 6: Nested in procedure
+ "io-code-block-test.io-code-block.6.input.js",
+ "io-code-block-test.io-code-block.6.output.js",
+ // Test 7: Input only (Go)
+ "io-code-block-test.io-code-block.7.input.go",
+ }
+
+ // Verify each expected file exists and matches
+ for _, filename := range expectedFiles {
+ actualPath := filepath.Join(tempDir, filename)
+ expectedPath := filepath.Join(expectedOutputDir, filename)
+
+ // Check file exists
+ if _, err := os.Stat(actualPath); os.IsNotExist(err) {
+ t.Errorf("Expected output file not created: %s", filename)
+ continue
+ }
+
+ // Compare content
+ actualContent, err := os.ReadFile(actualPath)
+ if err != nil {
+ t.Errorf("Failed to read actual file %s: %v", filename, err)
+ continue
+ }
+
+ expectedContent, err := os.ReadFile(expectedPath)
+ if err != nil {
+ t.Errorf("Failed to read expected file %s: %v", filename, err)
+ continue
+ }
+
+ if string(actualContent) != string(expectedContent) {
+ t.Errorf("Content mismatch for %s\nExpected:\n%s\n\nActual:\n%s",
+ filename, string(expectedContent), string(actualContent))
+ }
+ }
+}
+
+func TestEmptyFile(t *testing.T) {
+ // Create a temporary file with no directives
+ tempDir, err := os.MkdirTemp("", "audit-test-*")
+ if err != nil {
+ t.Fatalf("Failed to create temp directory: %v", err)
+ }
+ defer os.RemoveAll(tempDir)
+
+ // Create a source directory structure
+ sourceDir := filepath.Join(tempDir, "source")
+ if err := os.MkdirAll(sourceDir, 0755); err != nil {
+ t.Fatalf("Failed to create source directory: %v", err)
+ }
+
+ emptyFile := filepath.Join(sourceDir, "empty.rst")
+ if err := os.WriteFile(emptyFile, []byte("Empty File\n==========\n\nNo directives here."), 0644); err != nil {
+ t.Fatalf("Failed to create empty file: %v", err)
+ }
+
+ outputDir := filepath.Join(tempDir, "output")
+ if err := os.MkdirAll(outputDir, 0755); err != nil {
+ t.Fatalf("Failed to create output directory: %v", err)
+ }
+
+ // Run the extract command
+ report, err := RunExtract(emptyFile, outputDir, false, false, false, false)
+ if err != nil {
+ t.Fatalf("RunExtract failed: %v", err)
+ }
+
+ // Verify no output files were created
+ if report.OutputFilesWritten != 0 {
+ t.Errorf("Expected 0 output files for empty file, got %d", report.OutputFilesWritten)
+ }
+
+ // Verify the file was still traversed
+ if report.FilesTraversed != 1 {
+ t.Errorf("Expected 1 file traversed, got %d", report.FilesTraversed)
+ }
+}
+
+// TestRecursiveDirectoryScanning tests that -r flag scans all files in subdirectories
+func TestRecursiveDirectoryScanning(t *testing.T) {
+ // Setup paths
+ testDataDir := filepath.Join("..", "..", "..", "testdata")
+ inputDir := filepath.Join(testDataDir, "input-files", "source")
+
+ // Create temporary output directory
+ tempDir, err := os.MkdirTemp("", "audit-test-recursive-*")
+ if err != nil {
+ t.Fatalf("Failed to create temp directory: %v", err)
+ }
+ defer os.RemoveAll(tempDir)
+
+ // Run the extract command with recursive=true, followIncludes=false
+ report, err := RunExtract(inputDir, tempDir, true, false, false, false)
+ if err != nil {
+ t.Fatalf("RunExtract failed: %v", err)
+ }
+
+ // Verify that multiple files were traversed
+ // Should find all .rst files in source/ and source/includes/
+ // Expected: code-block-test.rst, include-test.rst, io-code-block-test.rst,
+ // literalinclude-test.rst, nested-code-block-test.rst,
+ // includes/examples.rst, includes/intro.rst
+ expectedMinFiles := 7
+ if report.FilesTraversed < expectedMinFiles {
+ t.Errorf("Expected at least %d files traversed with recursive scan, got %d",
+ expectedMinFiles, report.FilesTraversed)
+ }
+
+ // Verify that code examples were extracted from multiple files
+ // Without following includes, include-test.rst should have 0 examples
+ // but all other files should have examples
+ if report.OutputFilesWritten < 30 {
+ t.Errorf("Expected at least 30 output files from recursive scan, got %d",
+ report.OutputFilesWritten)
+ }
+
+ // Verify we have examples from different directive types
+ if report.DirectiveCounts[CodeBlock] == 0 {
+ t.Error("Expected code-block directives to be found")
+ }
+ if report.DirectiveCounts[LiteralInclude] == 0 {
+ t.Error("Expected literalinclude directives to be found")
+ }
+ if report.DirectiveCounts[IoCodeBlock] == 0 {
+ t.Error("Expected io-code-block directives to be found")
+ }
+}
+
+// TestFollowIncludesWithoutRecursive tests that -f flag follows includes in a single file
+func TestFollowIncludesWithoutRecursive(t *testing.T) {
+ // Setup paths
+ testDataDir := filepath.Join("..", "..", "..", "testdata")
+ inputFile := filepath.Join(testDataDir, "input-files", "source", "include-test.rst")
+
+ // Create temporary output directory
+ tempDir, err := os.MkdirTemp("", "audit-test-follow-*")
+ if err != nil {
+ t.Fatalf("Failed to create temp directory: %v", err)
+ }
+ defer os.RemoveAll(tempDir)
+
+ // Run the extract command with recursive=false, followIncludes=true
+ report, err := RunExtract(inputFile, tempDir, false, true, false, false)
+ if err != nil {
+ t.Fatalf("RunExtract failed: %v", err)
+ }
+
+ // Verify that multiple files were traversed (main file + includes)
+ // include-test.rst includes intro.rst and examples.rst
+ expectedFiles := 3
+ if report.FilesTraversed != expectedFiles {
+ t.Errorf("Expected %d files traversed (main + 2 includes), got %d",
+ expectedFiles, report.FilesTraversed)
+ }
+
+ // Verify that the code example from the included file was extracted
+ // examples.rst has 1 literalinclude directive
+ if report.OutputFilesWritten != 1 {
+ t.Errorf("Expected 1 output file from included files, got %d",
+ report.OutputFilesWritten)
+ }
+
+ // Verify the directive type
+ if report.DirectiveCounts[LiteralInclude] != 1 {
+ t.Errorf("Expected 1 literalinclude directive, got %d",
+ report.DirectiveCounts[LiteralInclude])
+ }
+}
+
+// TestRecursiveWithFollowIncludes tests that -r and -f together work correctly
+func TestRecursiveWithFollowIncludes(t *testing.T) {
+ // Setup paths
+ testDataDir := filepath.Join("..", "..", "..", "testdata")
+ inputDir := filepath.Join(testDataDir, "input-files", "source")
+
+ // Create temporary output directory
+ tempDir, err := os.MkdirTemp("", "audit-test-both-*")
+ if err != nil {
+ t.Fatalf("Failed to create temp directory: %v", err)
+ }
+ defer os.RemoveAll(tempDir)
+
+ // Run the extract command with recursive=true, followIncludes=true
+ report, err := RunExtract(inputDir, tempDir, true, true, false, false)
+ if err != nil {
+ t.Fatalf("RunExtract failed: %v", err)
+ }
+
+ // Verify that multiple files were traversed
+ // Should find all .rst files in source/ and source/includes/
+ expectedMinFiles := 7
+ if report.FilesTraversed < expectedMinFiles {
+ t.Errorf("Expected at least %d files traversed, got %d",
+ expectedMinFiles, report.FilesTraversed)
+ }
+
+ // Verify that code examples were extracted
+ // This should be the same as recursive-only since the include files
+ // are already found by recursive directory scanning
+ if report.OutputFilesWritten < 30 {
+ t.Errorf("Expected at least 30 output files, got %d",
+ report.OutputFilesWritten)
+ }
+
+ // Verify we have examples from all directive types
+ if report.DirectiveCounts[CodeBlock] == 0 {
+ t.Error("Expected code-block directives to be found")
+ }
+ if report.DirectiveCounts[LiteralInclude] == 0 {
+ t.Error("Expected literalinclude directives to be found")
+ }
+ if report.DirectiveCounts[IoCodeBlock] == 0 {
+ t.Error("Expected io-code-block directives to be found")
+ }
+}
+
+// TestNoFlagsOnDirectory tests that without -r flag, directory is not scanned
+func TestNoFlagsOnDirectory(t *testing.T) {
+ // Setup paths
+ testDataDir := filepath.Join("..", "..", "..", "testdata")
+ inputDir := filepath.Join(testDataDir, "input-files", "source")
+
+ // Create temporary output directory
+ tempDir, err := os.MkdirTemp("", "audit-test-noflags-*")
+ if err != nil {
+ t.Fatalf("Failed to create temp directory: %v", err)
+ }
+ defer os.RemoveAll(tempDir)
+
+ // Run the extract command with recursive=false, followIncludes=false on a directory
+ report, err := RunExtract(inputDir, tempDir, false, false, false, false)
+ if err != nil {
+ t.Fatalf("RunExtract failed: %v", err)
+ }
+
+ // Without recursive flag, should only process files in the top-level directory
+ // Should NOT include files in includes/ subdirectory
+ // Expected: code-block-test.rst, duplicate-include-test.rst, include-test.rst,
+ // io-code-block-test.rst, literalinclude-test.rst, nested-code-block-test.rst,
+ // nested-include-test.rst (7 files)
+ expectedFiles := 7
+ if report.FilesTraversed != expectedFiles {
+ t.Errorf("Expected %d files traversed (top-level only), got %d",
+ expectedFiles, report.FilesTraversed)
+ }
+
+ // Without followIncludes, include-test.rst should have 0 examples
+ // So we should have examples from the other 4 files
+ if report.OutputFilesWritten < 30 {
+ t.Errorf("Expected at least 30 output files, got %d",
+ report.OutputFilesWritten)
+ }
+}
diff --git a/audit-cli/commands/extract/code-examples/language.go b/audit-cli/commands/extract/code-examples/language.go
new file mode 100644
index 0000000..2a5df93
--- /dev/null
+++ b/audit-cli/commands/extract/code-examples/language.go
@@ -0,0 +1,177 @@
+package code_examples
+
+import "strings"
+
+// Language constants define canonical language names used throughout the tool.
+// These are used for normalization and file extension mapping.
+const (
+ Bash = "bash"
+ C = "c"
+ CPP = "cpp"
+ CSharp = "csharp"
+ Console = "console"
+ Go = "go"
+ Java = "java"
+ JavaScript = "javascript"
+ Kotlin = "kotlin"
+ PHP = "php"
+ PowerShell = "powershell"
+ PS5 = "ps5"
+ Python = "python"
+ Ruby = "ruby"
+ Rust = "rust"
+ Scala = "scala"
+ Shell = "shell"
+ Swift = "swift"
+ Text = "text"
+ TypeScript = "typescript"
+ Undefined = "undefined"
+)
+
+// File extension constants define the file extensions for each language.
+// Used when generating output filenames for extracted code examples.
+const (
+ BashExtension = ".sh"
+ CExtension = ".c"
+ CPPExtension = ".cpp"
+ CSharpExtension = ".cs"
+ ConsoleExtension = ".sh"
+ GoExtension = ".go"
+ JavaExtension = ".java"
+ JavaScriptExtension = ".js"
+ KotlinExtension = ".kt"
+ PHPExtension = ".php"
+ PowerShellExtension = ".ps1"
+ PS5Extension = ".ps1"
+ PythonExtension = ".py"
+ RubyExtension = ".rb"
+ RustExtension = ".rs"
+ ScalaExtension = ".scala"
+ ShellExtension = ".sh"
+ SwiftExtension = ".swift"
+ TextExtension = ".txt"
+ TypeScriptExtension = ".ts"
+ UndefinedExtension = ".txt"
+)
+
+// GetFileExtensionFromLanguage returns the appropriate file extension for a given language.
+//
+// This function maps language identifiers to their corresponding file extensions.
+// Handles various language name variants (e.g., "ts" -> ".ts", "c++" -> ".cpp", "golang" -> ".go").
+// Returns ".txt" for unknown or undefined languages.
+//
+// Parameters:
+// - language: The language identifier (case-insensitive)
+//
+// Returns:
+// - string: The file extension including the leading dot (e.g., ".js", ".py")
+func GetFileExtensionFromLanguage(language string) string {
+ lang := strings.ToLower(strings.TrimSpace(language))
+
+ langExtensionMap := map[string]string{
+ Bash: BashExtension,
+ C: CExtension,
+ CPP: CPPExtension,
+ CSharp: CSharpExtension,
+ Console: ConsoleExtension,
+ Go: GoExtension,
+ Java: JavaExtension,
+ JavaScript: JavaScriptExtension,
+ Kotlin: KotlinExtension,
+ PHP: PHPExtension,
+ PowerShell: PowerShellExtension,
+ PS5: PS5Extension,
+ Python: PythonExtension,
+ Ruby: RubyExtension,
+ Rust: RustExtension,
+ Scala: ScalaExtension,
+ Shell: ShellExtension,
+ Swift: SwiftExtension,
+ Text: TextExtension,
+ TypeScript: TypeScriptExtension,
+ Undefined: UndefinedExtension,
+ "c++": CPPExtension,
+ "c#": CSharpExtension,
+ "cs": CSharpExtension,
+ "golang": GoExtension,
+ "js": JavaScriptExtension,
+ "kt": KotlinExtension,
+ "py": PythonExtension,
+ "rb": RubyExtension,
+ "rs": RustExtension,
+ "sh": ShellExtension,
+ "ts": TypeScriptExtension,
+ "txt": TextExtension,
+ "ps1": PowerShellExtension,
+ "": UndefinedExtension,
+ "none": UndefinedExtension,
+ }
+
+ if extension, exists := langExtensionMap[lang]; exists {
+ return extension
+ }
+
+ return UndefinedExtension
+}
+
+// NormalizeLanguage normalizes a language string to a canonical form.
+//
+// This function converts various language name variants to their canonical forms:
+// - "ts" -> "typescript"
+// - "c++" -> "cpp"
+// - "golang" -> "go"
+// - "js" -> "javascript"
+// - etc.
+//
+// Parameters:
+// - language: The language identifier (case-insensitive)
+//
+// Returns:
+// - string: The normalized language name, or the original string if no normalization is defined
+func NormalizeLanguage(language string) string {
+ lang := strings.ToLower(strings.TrimSpace(language))
+
+ normalizeMap := map[string]string{
+ Bash: Bash,
+ C: C,
+ CPP: CPP,
+ CSharp: CSharp,
+ Console: Console,
+ Go: Go,
+ Java: Java,
+ JavaScript: JavaScript,
+ Kotlin: Kotlin,
+ PHP: PHP,
+ PowerShell: PowerShell,
+ PS5: PS5,
+ Python: Python,
+ Ruby: Ruby,
+ Rust: Rust,
+ Scala: Scala,
+ Shell: Shell,
+ Swift: Swift,
+ Text: Text,
+ TypeScript: TypeScript,
+ "c++": CPP,
+ "c#": CSharp,
+ "cs": CSharp,
+ "golang": Go,
+ "js": JavaScript,
+ "kt": Kotlin,
+ "py": Python,
+ "rb": Ruby,
+ "rs": Rust,
+ "sh": Shell,
+ "ts": TypeScript,
+ "txt": Text,
+ "ps1": PowerShell,
+ "": Undefined,
+ "none": Undefined,
+ }
+
+ if normalized, exists := normalizeMap[lang]; exists {
+ return normalized
+ }
+
+ return lang
+}
diff --git a/audit-cli/commands/extract/code-examples/parser.go b/audit-cli/commands/extract/code-examples/parser.go
new file mode 100644
index 0000000..0681428
--- /dev/null
+++ b/audit-cli/commands/extract/code-examples/parser.go
@@ -0,0 +1,253 @@
+package code_examples
+
+import (
+ "fmt"
+ "os"
+
+ "github.com/mongodb/code-example-tooling/audit-cli/internal/rst"
+)
+
+// ParseFile parses a file and extracts code examples from reStructuredText directives.
+//
+// This function parses all supported RST directives (literalinclude, code-block, io-code-block)
+// and converts them into CodeExample structs ready for writing to files.
+//
+// Parameters:
+// - filePath: Path to the RST file to parse
+//
+// Returns:
+// - []CodeExample: Slice of extracted code examples
+// - error: Any error encountered during parsing
+func ParseFile(filePath string) ([]CodeExample, error) {
+ // Parse all directives from the file
+ directives, err := rst.ParseDirectives(filePath)
+ if err != nil {
+ return nil, err
+ }
+
+ var examples []CodeExample
+ directiveCounts := make(map[rst.DirectiveType]int)
+
+ for _, directive := range directives {
+ // Track directive index for this type
+ directiveCounts[directive.Type]++
+ index := directiveCounts[directive.Type]
+
+ switch directive.Type {
+ case rst.LiteralInclude:
+ example, err := parseLiteralInclude(filePath, directive, index)
+ if err != nil {
+ // Log warning but continue processing
+ fmt.Fprintf(os.Stderr, "Warning: failed to parse literalinclude at line %d in %s: %v\n",
+ directive.LineNum, filePath, err)
+ continue
+ }
+ examples = append(examples, example)
+
+ case rst.CodeBlock:
+ example, err := parseCodeBlock(filePath, directive, index)
+ if err != nil {
+ // Log warning but continue processing
+ fmt.Fprintf(os.Stderr, "Warning: failed to parse code-block at line %d in %s: %v\n",
+ directive.LineNum, filePath, err)
+ continue
+ }
+ examples = append(examples, example)
+
+ case rst.IoCodeBlock:
+ examples = append(examples, parseIoCodeBlock(filePath, directive, index)...)
+ continue
+ }
+ }
+
+ return examples, nil
+}
+
+// parseLiteralInclude parses a literalinclude directive and extracts the code content
+func parseLiteralInclude(sourceFile string, directive rst.Directive, index int) (CodeExample, error) {
+ // Extract the content from the referenced file
+ content, err := rst.ExtractLiteralIncludeContent(sourceFile, directive)
+ if err != nil {
+ return CodeExample{}, err
+ }
+
+ // Get the language from the :language: option
+ language := directive.Options["language"]
+ if language == "" {
+ language = Undefined
+ }
+
+ // Normalize the language
+ language = NormalizeLanguage(language)
+
+ return CodeExample{
+ SourceFile: sourceFile,
+ DirectiveName: DirectiveType(directive.Type),
+ Language: language,
+ Content: content,
+ Index: index,
+ }, nil
+}
+
+// parseCodeBlock parses a code-block directive and extracts the inline code content.
+//
+// The content is already dedented by the directive parser based on the first line's indentation.
+// Language can be specified either as an argument (.. code-block:: javascript) or as an option (:language: javascript).
+func parseCodeBlock(sourceFile string, directive rst.Directive, index int) (CodeExample, error) {
+ // The content is already parsed and dedented by the directive parser
+ content := directive.Content
+ if content == "" {
+ return CodeExample{}, fmt.Errorf("code-block has no content")
+ }
+
+ // Get the language from the directive argument (e.g., .. code-block:: javascript)
+ // or from the :language: option
+ language := directive.Argument
+ if language == "" {
+ language = directive.Options["language"]
+ }
+ if language == "" {
+ language = Undefined
+ }
+
+ // Normalize the language
+ language = NormalizeLanguage(language)
+
+ return CodeExample{
+ SourceFile: sourceFile,
+ DirectiveName: DirectiveType(directive.Type),
+ Language: language,
+ Content: content,
+ Index: index,
+ }, nil
+}
+
+// ParseFileWithIncludes parses a file and recursively follows include directives.
+//
+// This function wraps the internal RST package's ParseFileWithIncludes to extract
+// code examples from the main file and all included files.
+//
+// Parameters:
+// - filePath: Path to the RST file to parse
+// - followIncludes: If true, recursively follow .. include:: directives
+// - visited: Map tracking already-processed files to prevent circular includes
+// - verbose: If true, print detailed processing information
+//
+// Returns:
+// - []CodeExample: All code examples from the file and its includes
+// - []string: List of all processed file paths
+// - error: Any error encountered during parsing
+func ParseFileWithIncludes(filePath string, followIncludes bool, visited map[string]bool, verbose bool) ([]CodeExample, []string, error) {
+ var examples []CodeExample
+
+ // Define the parse function that will be called for each file
+ parseFunc := func(path string) error {
+ fileExamples, err := ParseFile(path)
+ if err != nil {
+ return err
+ }
+ examples = append(examples, fileExamples...)
+ return nil
+ }
+
+ // Use the internal RST package to handle include following
+ processedFiles, err := rst.ParseFileWithIncludes(filePath, followIncludes, visited, verbose, parseFunc)
+ if err != nil {
+ return nil, processedFiles, err
+ }
+
+ return examples, processedFiles, nil
+}
+
+// TraverseDirectory recursively traverses a directory and returns all file paths.
+//
+// This is a wrapper around the internal RST package's TraverseDirectory function.
+//
+// Parameters:
+// - rootPath: Root directory to traverse
+// - recursive: If true, recursively scan subdirectories
+//
+// Returns:
+// - []string: List of all file paths found
+// - error: Any error encountered during traversal
+func TraverseDirectory(rootPath string, recursive bool) ([]string, error) {
+ return rst.TraverseDirectory(rootPath, recursive)
+}
+
+// ShouldProcessFile determines if a file should be processed based on its extension.
+//
+// This is a wrapper around the internal RST package's ShouldProcessFile function.
+// Returns true for files with .rst, .txt, or .md extensions.
+func ShouldProcessFile(filePath string) bool {
+ return rst.ShouldProcessFile(filePath)
+}
+
+// parseIoCodeBlock parses an io-code-block directive and extracts input/output code examples
+// Returns a slice of CodeExample (one for input, one for output if present)
+func parseIoCodeBlock(sourceFile string, directive rst.Directive, index int) []CodeExample {
+ var examples []CodeExample
+
+ // Process input directive
+ if directive.InputDirective != nil {
+ inputExample, err := parseSubDirective(sourceFile, directive.InputDirective, "input", index)
+ if err != nil {
+ fmt.Fprintf(os.Stderr, "Warning: failed to parse input directive at line %d in %s: %v\n",
+ directive.LineNum, sourceFile, err)
+ } else {
+ examples = append(examples, inputExample)
+ }
+ }
+
+ // Process output directive
+ if directive.OutputDirective != nil {
+ outputExample, err := parseSubDirective(sourceFile, directive.OutputDirective, "output", index)
+ if err != nil {
+ fmt.Fprintf(os.Stderr, "Warning: failed to parse output directive at line %d in %s: %v\n",
+ directive.LineNum, sourceFile, err)
+ } else {
+ examples = append(examples, outputExample)
+ }
+ }
+
+ return examples
+}
+
+// parseSubDirective parses an input or output sub-directive within an io-code-block
+func parseSubDirective(sourceFile string, subDir *rst.SubDirective, dirType string, index int) (CodeExample, error) {
+ var content string
+ var err error
+
+ // If there's a filepath argument, read from the file
+ if subDir.Argument != "" {
+ content, err = rst.ExtractLiteralIncludeContent(sourceFile, rst.Directive{
+ Argument: subDir.Argument,
+ Options: subDir.Options,
+ })
+ if err != nil {
+ return CodeExample{}, fmt.Errorf("failed to read file %s: %w", subDir.Argument, err)
+ }
+ } else {
+ // Use inline content
+ content = subDir.Content
+ if content == "" {
+ return CodeExample{}, fmt.Errorf("%s directive has no content or filepath", dirType)
+ }
+ }
+
+ // Get language from options
+ language := subDir.Options["language"]
+ if language == "" {
+ language = Undefined
+ }
+
+ language = NormalizeLanguage(language)
+
+ return CodeExample{
+ SourceFile: sourceFile,
+ DirectiveName: DirectiveType(rst.IoCodeBlock),
+ Language: language,
+ Content: content,
+ Index: index,
+ SubType: dirType, // "input" or "output"
+ }, nil
+}
diff --git a/audit-cli/commands/extract/code-examples/report.go b/audit-cli/commands/extract/code-examples/report.go
new file mode 100644
index 0000000..3a6728e
--- /dev/null
+++ b/audit-cli/commands/extract/code-examples/report.go
@@ -0,0 +1,114 @@
+package code_examples
+
+import (
+ "fmt"
+ "sort"
+ "strings"
+)
+
+// PrintReport prints the extraction report to stdout.
+//
+// Displays statistics about the extraction operation including:
+// - Number of files traversed
+// - Number of output files written
+// - Code examples by language (summary or detailed based on verbose flag)
+// - Code examples by directive type
+// - Per-source-file statistics (if verbose is true)
+//
+// Parameters:
+// - report: The report to print
+// - verbose: If true, show detailed breakdown including file paths and per-source stats
+func PrintReport(report *Report, verbose bool) {
+ fmt.Println("\n" + strings.Repeat("=", 60))
+ fmt.Println("CODE EXTRACTION REPORT")
+ fmt.Println(strings.Repeat("=", 60))
+
+ fmt.Printf("\nFiles Traversed: %d\n", report.FilesTraversed)
+ if verbose && len(report.TraversedFilepaths) > 0 {
+ fmt.Println("\nTraversed Filepaths:")
+ for _, path := range report.TraversedFilepaths {
+ fmt.Printf(" - %s\n", path)
+ }
+ }
+
+ fmt.Printf("\nOutput Files Written: %d\n", report.OutputFilesWritten)
+
+ if len(report.LanguageCounts) > 0 {
+ fmt.Println("\nCode Examples by Language:")
+
+ languages := make([]string, 0, len(report.LanguageCounts))
+ for lang := range report.LanguageCounts {
+ languages = append(languages, lang)
+ }
+ sort.Strings(languages)
+
+ if verbose {
+ for _, lang := range languages {
+ count := report.LanguageCounts[lang]
+ fmt.Printf(" %-15s: %d\n", lang, count)
+ }
+ } else {
+ total := 0
+ for _, count := range report.LanguageCounts {
+ total += count
+ }
+ fmt.Printf(" Total: %d (use --verbose for breakdown)\n", total)
+ }
+ }
+
+ if len(report.DirectiveCounts) > 0 {
+ fmt.Println("\nCode Examples by Directive Type:")
+
+ directives := []DirectiveType{CodeBlock, LiteralInclude, IoCodeBlock}
+ for _, directive := range directives {
+ if count, exists := report.DirectiveCounts[directive]; exists {
+ fmt.Printf(" %-20s: %d\n", directive, count)
+ }
+ }
+ }
+
+ if verbose && len(report.SourcePathStats) > 0 {
+ fmt.Println("\nStatistics by Source File:")
+
+ sourcePaths := make([]string, 0, len(report.SourcePathStats))
+ for path := range report.SourcePathStats {
+ sourcePaths = append(sourcePaths, path)
+ }
+ sort.Strings(sourcePaths)
+
+ for _, sourcePath := range sourcePaths {
+ stats := report.SourcePathStats[sourcePath]
+ fmt.Printf("\n %s:\n", sourcePath)
+
+ if len(stats.DirectiveCounts) > 0 {
+ fmt.Println(" Directives:")
+ directives := []DirectiveType{CodeBlock, LiteralInclude, IoCodeBlock}
+ for _, directive := range directives {
+ if count, exists := stats.DirectiveCounts[directive]; exists {
+ fmt.Printf(" %-20s: %d\n", directive, count)
+ }
+ }
+ }
+
+ if len(stats.LanguageCounts) > 0 {
+ fmt.Println(" Languages:")
+ languages := make([]string, 0, len(stats.LanguageCounts))
+ for lang := range stats.LanguageCounts {
+ languages = append(languages, lang)
+ }
+ sort.Strings(languages)
+
+ for _, lang := range languages {
+ count := stats.LanguageCounts[lang]
+ fmt.Printf(" %-15s: %d\n", lang, count)
+ }
+ }
+
+ if len(stats.OutputFiles) > 0 {
+ fmt.Printf(" Output Files: %d\n", len(stats.OutputFiles))
+ }
+ }
+ }
+
+ fmt.Println("\n" + strings.Repeat("=", 60))
+}
diff --git a/audit-cli/commands/extract/code-examples/types.go b/audit-cli/commands/extract/code-examples/types.go
new file mode 100644
index 0000000..a0cda16
--- /dev/null
+++ b/audit-cli/commands/extract/code-examples/types.go
@@ -0,0 +1,94 @@
+package code_examples
+
+// DirectiveType represents the type of reStructuredText directive.
+type DirectiveType string
+
+const (
+ // CodeBlock represents inline code blocks (.. code-block::)
+ CodeBlock DirectiveType = "code-block"
+ // LiteralInclude represents external file references (.. literalinclude::)
+ LiteralInclude DirectiveType = "literalinclude"
+ // IoCodeBlock represents input/output examples (.. io-code-block::)
+ IoCodeBlock DirectiveType = "io-code-block"
+)
+
+// CodeExample represents a single code example extracted from an RST file.
+//
+// Each code example corresponds to one directive occurrence in the source file
+// and will be written to a separate output file.
+type CodeExample struct {
+ SourceFile string // Path to the source RST file
+ DirectiveName DirectiveType // Type of directive (code-block, literalinclude, io-code-block)
+ Language string // Programming language (normalized)
+ Content string // The actual code content
+ Index int // The occurrence index of this directive in the source file (1-based)
+ SubType string // For io-code-block: "input" or "output"
+}
+
+// Report contains statistics about the extraction operation.
+//
+// Tracks overall statistics as well as per-source-file statistics for detailed reporting.
+type Report struct {
+ FilesTraversed int // Total number of RST files processed
+ TraversedFilepaths []string // List of all processed file paths
+ OutputFilesWritten int // Total number of code example files written
+ LanguageCounts map[string]int // Count of examples by language
+ DirectiveCounts map[DirectiveType]int // Count of examples by directive type
+ SourcePathStats map[string]*SourceStats // Per-file statistics
+}
+
+// SourceStats contains statistics for a single source file.
+//
+// Used for verbose reporting to show detailed breakdown per source file.
+type SourceStats struct {
+ DirectiveCounts map[DirectiveType]int // Count of directives by type in this file
+ LanguageCounts map[string]int // Count of examples by language in this file
+ OutputFiles []string // List of output files generated from this source
+}
+
+// NewReport creates a new initialized Report with empty maps and slices.
+func NewReport() *Report {
+ return &Report{
+ TraversedFilepaths: make([]string, 0),
+ LanguageCounts: make(map[string]int),
+ DirectiveCounts: make(map[DirectiveType]int),
+ SourcePathStats: make(map[string]*SourceStats),
+ }
+}
+
+// NewSourceStats creates a new initialized SourceStats with empty maps and slices.
+func NewSourceStats() *SourceStats {
+ return &SourceStats{
+ DirectiveCounts: make(map[DirectiveType]int),
+ LanguageCounts: make(map[string]int),
+ OutputFiles: make([]string, 0),
+ }
+}
+
+// AddCodeExample updates the report with a new code example.
+//
+// This method updates both global statistics and per-source-file statistics.
+// It should be called once for each code example that is successfully extracted.
+func (r *Report) AddCodeExample(example CodeExample, outputPath string) {
+ // Update global counts
+ r.LanguageCounts[example.Language]++
+ r.DirectiveCounts[example.DirectiveName]++
+
+ // Update source-specific stats
+ if _, exists := r.SourcePathStats[example.SourceFile]; !exists {
+ r.SourcePathStats[example.SourceFile] = NewSourceStats()
+ }
+ stats := r.SourcePathStats[example.SourceFile]
+ stats.DirectiveCounts[example.DirectiveName]++
+ stats.LanguageCounts[example.Language]++
+ stats.OutputFiles = append(stats.OutputFiles, outputPath)
+}
+
+// AddTraversedFile adds a file to the list of traversed files.
+//
+// This method should be called once for each RST file that is processed,
+// including files discovered through include directives.
+func (r *Report) AddTraversedFile(filepath string) {
+ r.FilesTraversed++
+ r.TraversedFilepaths = append(r.TraversedFilepaths, filepath)
+}
diff --git a/audit-cli/commands/extract/code-examples/writer.go b/audit-cli/commands/extract/code-examples/writer.go
new file mode 100644
index 0000000..c8a6670
--- /dev/null
+++ b/audit-cli/commands/extract/code-examples/writer.go
@@ -0,0 +1,97 @@
+package code_examples
+
+import (
+ "fmt"
+ "os"
+ "path/filepath"
+ "strings"
+)
+
+// WriteCodeExample writes a code example to a file in the output directory.
+//
+// Generates a standardized filename and writes the code content to that file.
+// If dryRun is true, returns the filename without actually writing the file.
+//
+// Parameters:
+// - example: The code example to write
+// - outputDir: Directory where the file should be written
+// - dryRun: If true, skip writing and only return the filename
+//
+// Returns:
+// - string: The full path to the output file
+// - error: Any error encountered during writing
+func WriteCodeExample(example CodeExample, outputDir string, dryRun bool) (string, error) {
+ filename := GenerateOutputFilename(example)
+ outputPath := filepath.Join(outputDir, filename)
+
+ if dryRun {
+ return outputPath, nil
+ }
+
+ if err := os.MkdirAll(outputDir, 0755); err != nil {
+ return "", fmt.Errorf("failed to create output directory: %w", err)
+ }
+
+ if err := os.WriteFile(outputPath, []byte(example.Content), 0644); err != nil {
+ return "", fmt.Errorf("failed to write file %s: %w", outputPath, err)
+ }
+
+ return outputPath, nil
+}
+
+// GenerateOutputFilename generates a standardized filename for a code example.
+//
+// The filename format is: {source-base}.{directive-type}.{index}.{ext}
+// For io-code-block directives: {source-base}.{directive-type}.{index}.{subtype}.{ext}
+//
+// Examples:
+// - my-doc.code-block.1.js
+// - my-doc.literalinclude.2.py
+// - my-doc.io-code-block.1.input.js
+// - my-doc.io-code-block.1.output.json
+//
+// Parameters:
+// - example: The code example to generate a filename for
+//
+// Returns:
+// - string: The generated filename (without directory path)
+func GenerateOutputFilename(example CodeExample) string {
+ sourceBase := filepath.Base(example.SourceFile)
+ sourceBase = strings.TrimSuffix(sourceBase, filepath.Ext(sourceBase))
+
+ extension := GetFileExtensionFromLanguage(example.Language)
+
+ // For io-code-block, include the subtype (input/output) in the filename
+ if example.DirectiveName == IoCodeBlock && example.SubType != "" {
+ filename := fmt.Sprintf("%s.%s.%d.%s%s",
+ sourceBase,
+ example.DirectiveName,
+ example.Index,
+ example.SubType,
+ extension,
+ )
+ return filename
+ }
+
+ filename := fmt.Sprintf("%s.%s.%d%s",
+ sourceBase,
+ example.DirectiveName,
+ example.Index,
+ extension,
+ )
+
+ return filename
+}
+
+// EnsureOutputDirectory ensures the output directory exists.
+//
+// Creates the directory and any necessary parent directories with permissions 0755.
+//
+// Parameters:
+// - outputDir: Path to the directory to create
+//
+// Returns:
+// - error: Any error encountered during directory creation
+func EnsureOutputDirectory(outputDir string) error {
+ return os.MkdirAll(outputDir, 0755)
+}
diff --git a/audit-cli/commands/extract/extract.go b/audit-cli/commands/extract/extract.go
new file mode 100644
index 0000000..7f1926b
--- /dev/null
+++ b/audit-cli/commands/extract/extract.go
@@ -0,0 +1,34 @@
+// Package extract provides the parent command for extracting content from RST files.
+//
+// This package serves as the parent command for various extraction operations.
+// Currently supports:
+// - code-examples: Extract code examples from RST directives
+//
+// Future subcommands could include extracting tables, images, or other structured content.
+package extract
+
+import (
+ "github.com/mongodb/code-example-tooling/audit-cli/commands/extract/code-examples"
+ "github.com/spf13/cobra"
+)
+
+// NewExtractCommand creates the extract parent command.
+//
+// This command serves as a parent for various extraction operations on RST files.
+// It doesn't perform any operations itself but provides a namespace for subcommands.
+func NewExtractCommand() *cobra.Command {
+ cmd := &cobra.Command{
+ Use: "extract",
+ Short: "Extract content from reStructuredText files",
+ Long: `Extract various types of content from reStructuredText files.
+
+Currently supports extracting code examples from directives like literalinclude,
+code-block, and io-code-block. Future subcommands may support extracting other
+types of structured content such as tables, images, or metadata.`,
+ }
+
+ // Add subcommands
+ cmd.AddCommand(code_examples.NewCodeExamplesCommand())
+
+ return cmd
+}
diff --git a/audit-cli/commands/search/find-string/find_string.go b/audit-cli/commands/search/find-string/find_string.go
new file mode 100644
index 0000000..be90ec5
--- /dev/null
+++ b/audit-cli/commands/search/find-string/find_string.go
@@ -0,0 +1,316 @@
+// Package find_string provides functionality for searching code example files for substrings.
+//
+// This package implements the "search find-string" subcommand, which searches through
+// extracted code example files to find occurrences of a specific substring.
+//
+// By default, the search is case-insensitive and matches exact words only (not partial matches
+// within larger words). These behaviors can be changed with the --case-sensitive and
+// --partial-match flags. Each file is counted only once, even if the substring appears
+// multiple times in the same file.
+//
+// Supports:
+// - Recursive directory scanning
+// - Following include directives in RST files
+// - Verbose output with file paths and language breakdown
+// - Language detection based on file extension
+// - Case-insensitive search (default) or case-sensitive search (--case-sensitive flag)
+// - Exact word matching (default) or partial matching (--partial-match flag)
+package find_string
+
+import (
+ "fmt"
+ "os"
+ "path/filepath"
+ "strings"
+
+ "github.com/mongodb/code-example-tooling/audit-cli/internal/rst"
+ "github.com/spf13/cobra"
+)
+
+// NewFindStringCommand creates the find-string subcommand.
+//
+// This command searches through extracted code example files for a specific substring.
+// Supports flags for recursive search, following includes, and verbose output.
+//
+// Flags:
+// - -r, --recursive: Recursively search all files in subdirectories
+// - -f, --follow-includes: Follow .. include:: directives in RST files
+// - -v, --verbose: Show file paths and language breakdown
+// - --case-sensitive: Make search case-sensitive (default: case-insensitive)
+// - --partial-match: Allow partial matches within words (default: exact word matching)
+func NewFindStringCommand() *cobra.Command {
+ var (
+ recursive bool
+ followIncludes bool
+ verbose bool
+ caseSensitive bool
+ partialMatch bool
+ )
+
+ cmd := &cobra.Command{
+ Use: "find-string [filepath] [substring]",
+ Short: "Search for a substring in extracted code example files",
+ Long: `Search through extracted code example files to find occurrences of a specific substring.
+Reports the number of code examples containing the substring.
+
+By default, the search is case-insensitive and matches exact words only. Use --case-sensitive
+to make the search case-sensitive, or --partial-match to allow matching the substring as part
+of larger words (e.g., "curl" matching "libcurl").`,
+ Args: cobra.ExactArgs(2),
+ RunE: func(cmd *cobra.Command, args []string) error {
+ filePath := args[0]
+ substring := args[1]
+ return runSearch(filePath, substring, recursive, followIncludes, verbose, caseSensitive, partialMatch)
+ },
+ }
+
+ cmd.Flags().BoolVarP(&recursive, "recursive", "r", false, "Recursively search all files in subdirectories")
+ cmd.Flags().BoolVarP(&followIncludes, "follow-includes", "f", false, "Follow .. include:: directives in RST files")
+ cmd.Flags().BoolVarP(&verbose, "verbose", "v", false, "Provide additional information during execution")
+ cmd.Flags().BoolVar(&caseSensitive, "case-sensitive", false, "Make search case-sensitive (default: case-insensitive)")
+ cmd.Flags().BoolVar(&partialMatch, "partial-match", false, "Allow partial matches within words (default: exact word matching)")
+
+ return cmd
+}
+
+// RunSearch executes the search operation and returns the report.
+//
+// This function is exported for use in tests. It searches for the substring in the
+// specified file or directory and returns statistics about the search.
+//
+// Parameters:
+// - filePath: Path to file or directory to search
+// - substring: The substring to search for
+// - recursive: If true, recursively search subdirectories
+// - followIncludes: If true, follow .. include:: directives
+// - verbose: If true, show detailed information during search
+// - caseSensitive: If true, search is case-sensitive; if false, case-insensitive
+// - partialMatch: If true, allow partial matches within words; if false, match exact words only
+//
+// Returns:
+// - *SearchReport: Statistics about the search operation
+// - error: Any error encountered during search
+func RunSearch(filePath string, substring string, recursive bool, followIncludes bool, verbose bool, caseSensitive bool, partialMatch bool) (*SearchReport, error) {
+ return runSearchInternal(filePath, substring, recursive, followIncludes, verbose, caseSensitive, partialMatch)
+}
+
+// runSearch executes the search operation (internal wrapper for CLI).
+//
+// This is a thin wrapper around runSearchInternal that discards the report
+// and only returns errors, suitable for use in the CLI command handler.
+func runSearch(filePath string, substring string, recursive bool, followIncludes bool, verbose bool, caseSensitive bool, partialMatch bool) error {
+ _, err := runSearchInternal(filePath, substring, recursive, followIncludes, verbose, caseSensitive, partialMatch)
+ return err
+}
+
+// runSearchInternal contains the core logic for the search-code-examples command
+func runSearchInternal(filePath string, substring string, recursive bool, followIncludes bool, verbose bool, caseSensitive bool, partialMatch bool) (*SearchReport, error) {
+ fileInfo, err := os.Stat(filePath)
+ if err != nil {
+ return nil, fmt.Errorf("failed to access path %s: %w", filePath, err)
+ }
+
+ report := NewSearchReport()
+
+ var filesToSearch []string
+
+ if fileInfo.IsDir() {
+ if verbose {
+ fmt.Printf("Scanning directory: %s (recursive: %v)\n", filePath, recursive)
+ }
+ filesToSearch, err = collectFiles(filePath, recursive)
+ if err != nil {
+ return nil, fmt.Errorf("failed to traverse directory: %w", err)
+ }
+ } else {
+ filesToSearch = []string{filePath}
+ }
+
+ if verbose {
+ fmt.Printf("Found %d files to search\n", len(filesToSearch))
+ fmt.Printf("Searching for substring: %q\n", substring)
+ fmt.Printf("Case sensitive: %v\n", caseSensitive)
+ fmt.Printf("Partial match: %v\n", partialMatch)
+ fmt.Printf("Follow includes: %v\n\n", followIncludes)
+ }
+
+ // Track visited files to prevent circular includes
+ visited := make(map[string]bool)
+
+ for _, file := range filesToSearch {
+ if verbose {
+ fmt.Printf("Searching: %s\n", file)
+ }
+
+ // If followIncludes is enabled, collect all files including those referenced by includes
+ var filesToSearchWithIncludes []string
+ if followIncludes {
+ // Use ParseFileWithIncludes to get all files (main + includes)
+ processedFiles, err := collectFilesWithIncludes(file, visited, verbose)
+ if err != nil {
+ fmt.Fprintf(os.Stderr, "Warning: failed to follow includes for %s: %v\n", file, err)
+ filesToSearchWithIncludes = []string{file}
+ } else {
+ filesToSearchWithIncludes = processedFiles
+ }
+ } else {
+ filesToSearchWithIncludes = []string{file}
+ }
+
+ // Search all collected files
+ for _, fileToSearch := range filesToSearchWithIncludes {
+ result, err := searchFile(fileToSearch, substring, caseSensitive, partialMatch)
+ if err != nil {
+ fmt.Fprintf(os.Stderr, "Warning: failed to search %s: %v\n", fileToSearch, err)
+ continue
+ }
+
+ report.AddResult(result)
+
+ if verbose && result.Contains {
+ fmt.Printf(" ✓ Found substring in %s\n", fileToSearch)
+ }
+ }
+ }
+
+ PrintReport(report, verbose)
+
+ return report, nil
+}
+
+// collectFiles gathers all files to search
+func collectFiles(dirPath string, recursive bool) ([]string, error) {
+ var files []string
+
+ if recursive {
+ err := filepath.Walk(dirPath, func(path string, info os.FileInfo, err error) error {
+ if err != nil {
+ return err
+ }
+ if !info.IsDir() {
+ files = append(files, path)
+ }
+ return nil
+ })
+ if err != nil {
+ return nil, err
+ }
+ } else {
+ entries, err := os.ReadDir(dirPath)
+ if err != nil {
+ return nil, err
+ }
+ for _, entry := range entries {
+ if !entry.IsDir() {
+ files = append(files, filepath.Join(dirPath, entry.Name()))
+ }
+ }
+ }
+
+ return files, nil
+}
+
+// collectFilesWithIncludes collects a file and all files it includes via .. include:: directives
+func collectFilesWithIncludes(filePath string, visited map[string]bool, verbose bool) ([]string, error) {
+ // Use the RST package's ParseFileWithIncludes to get all files
+ // We pass a no-op parseFunc since we just want the list of files
+ processedFiles, err := rst.ParseFileWithIncludes(
+ filePath,
+ true, // followIncludes = true
+ visited,
+ verbose,
+ nil, // no-op parseFunc
+ )
+ if err != nil {
+ return nil, err
+ }
+
+ return processedFiles, nil
+}
+
+// searchFile searches a single file for the substring
+func searchFile(filePath string, substring string, caseSensitive bool, partialMatch bool) (SearchResult, error) {
+ result := SearchResult{
+ FilePath: filePath,
+ Language: extractLanguageFromFilename(filePath),
+ Contains: false,
+ }
+
+ content, err := os.ReadFile(filePath)
+ if err != nil {
+ return result, err
+ }
+
+ contentStr := string(content)
+ searchStr := substring
+
+ // Handle case sensitivity
+ if !caseSensitive {
+ contentStr = strings.ToLower(contentStr)
+ searchStr = strings.ToLower(searchStr)
+ }
+
+ // Check if substring exists in content
+ if !strings.Contains(contentStr, searchStr) {
+ return result, nil
+ }
+
+ // If partial match is allowed, we're done
+ if partialMatch {
+ result.Contains = true
+ return result, nil
+ }
+
+ // For exact word matching, check if the match is a whole word
+ result.Contains = isExactWordMatch(contentStr, searchStr)
+
+ return result, nil
+}
+
+// isExactWordMatch checks if the substring appears as a complete word in the content.
+// A word boundary is defined as the start/end of the string or a non-alphanumeric character.
+func isExactWordMatch(content string, substring string) bool {
+ // Find all occurrences of the substring
+ index := 0
+ for {
+ pos := strings.Index(content[index:], substring)
+ if pos == -1 {
+ break
+ }
+
+ actualPos := index + pos
+
+ // Check if this is a whole word match
+ // Check character before (if not at start)
+ beforeOK := actualPos == 0 || !isWordChar(rune(content[actualPos-1]))
+
+ // Check character after (if not at end)
+ afterPos := actualPos + len(substring)
+ afterOK := afterPos >= len(content) || !isWordChar(rune(content[afterPos]))
+
+ if beforeOK && afterOK {
+ return true
+ }
+
+ // Move to next potential match
+ index = actualPos + 1
+ }
+
+ return false
+}
+
+// isWordChar returns true if the character is alphanumeric or underscore.
+// These characters are considered part of a word.
+func isWordChar(c rune) bool {
+ return (c >= 'a' && c <= 'z') || (c >= 'A' && c <= 'Z') || (c >= '0' && c <= '9') || c == '_'
+}
+
+// extractLanguageFromFilename extracts the language from the file extension
+func extractLanguageFromFilename(filePath string) string {
+ ext := filepath.Ext(filePath)
+ if ext == "" {
+ return "unknown"
+ }
+ // Remove the leading dot
+ return strings.TrimPrefix(ext, ".")
+}
diff --git a/audit-cli/commands/search/find-string/find_string_test.go b/audit-cli/commands/search/find-string/find_string_test.go
new file mode 100644
index 0000000..2bd6f7e
--- /dev/null
+++ b/audit-cli/commands/search/find-string/find_string_test.go
@@ -0,0 +1,250 @@
+package find_string
+
+import (
+ "path/filepath"
+ "testing"
+)
+
+// TestDefaultBehaviorCaseInsensitive tests that search is case-insensitive by default
+func TestDefaultBehaviorCaseInsensitive(t *testing.T) {
+ testDataDir := filepath.Join("..", "..", "..", "testdata", "search-test-files")
+ mixedCaseFile := filepath.Join(testDataDir, "mixed-case.txt")
+
+ // Search for lowercase "curl" with default settings (case-insensitive)
+ report, err := RunSearch(mixedCaseFile, "curl", false, false, false, false, false)
+ if err != nil {
+ t.Fatalf("RunSearch failed: %v", err)
+ }
+
+ // Should match because it's case-insensitive
+ if report.FilesContaining != 1 {
+ t.Errorf("Expected 1 file containing 'curl' (case-insensitive), got %d", report.FilesContaining)
+ }
+}
+
+// TestCaseSensitiveFlag tests that --case-sensitive flag works correctly
+func TestCaseSensitiveFlag(t *testing.T) {
+ testDataDir := filepath.Join("..", "..", "..", "testdata", "search-test-files")
+ mixedCaseFile := filepath.Join(testDataDir, "mixed-case.txt")
+
+ // Search for uppercase "CURL" with case-sensitive flag
+ report, err := RunSearch(mixedCaseFile, "CURL", false, false, false, true, false)
+ if err != nil {
+ t.Fatalf("RunSearch failed: %v", err)
+ }
+
+ // Should match only the uppercase version
+ if report.FilesContaining != 1 {
+ t.Errorf("Expected 1 file containing 'CURL' (case-sensitive), got %d", report.FilesContaining)
+ }
+
+ // Search for lowercase "curl" with case-sensitive flag
+ report2, err := RunSearch(mixedCaseFile, "curl", false, false, false, true, false)
+ if err != nil {
+ t.Fatalf("RunSearch failed: %v", err)
+ }
+
+ // Should match only the lowercase version
+ if report2.FilesContaining != 1 {
+ t.Errorf("Expected 1 file containing 'curl' (case-sensitive), got %d", report2.FilesContaining)
+ }
+}
+
+// TestDefaultBehaviorExactWordMatch tests that exact word matching is the default
+func TestDefaultBehaviorExactWordMatch(t *testing.T) {
+ testDataDir := filepath.Join("..", "..", "..", "testdata", "search-test-files")
+
+ // Search for "curl" in a file that only has "curl" as a standalone word
+ curlFile := filepath.Join(testDataDir, "curl-examples.txt")
+ report1, err := RunSearch(curlFile, "curl", false, false, false, false, false)
+ if err != nil {
+ t.Fatalf("RunSearch failed: %v", err)
+ }
+ if report1.FilesContaining != 1 {
+ t.Errorf("Expected 1 file containing 'curl' as exact word, got %d", report1.FilesContaining)
+ }
+
+ // Search for "curl" in a file that only has "libcurl" (should NOT match with exact word matching)
+ libcurlFile := filepath.Join(testDataDir, "libcurl-examples.txt")
+ report2, err := RunSearch(libcurlFile, "curl", false, false, false, false, false)
+ if err != nil {
+ t.Fatalf("RunSearch failed: %v", err)
+ }
+ if report2.FilesContaining != 0 {
+ t.Errorf("Expected 0 files containing 'curl' as exact word in libcurl file, got %d", report2.FilesContaining)
+ }
+}
+
+// TestPartialMatchFlag tests that --partial-match flag allows substring matching
+func TestPartialMatchFlag(t *testing.T) {
+ testDataDir := filepath.Join("..", "..", "..", "testdata", "search-test-files")
+ libcurlFile := filepath.Join(testDataDir, "libcurl-examples.txt")
+
+ // Search for "curl" with partial match enabled (should match "libcurl")
+ report, err := RunSearch(libcurlFile, "curl", false, false, false, false, true)
+ if err != nil {
+ t.Fatalf("RunSearch failed: %v", err)
+ }
+
+ if report.FilesContaining != 1 {
+ t.Errorf("Expected 1 file containing 'curl' with partial match, got %d", report.FilesContaining)
+ }
+}
+
+// TestWordBoundaries tests various word boundary scenarios
+func TestWordBoundaries(t *testing.T) {
+ testDataDir := filepath.Join("..", "..", "..", "testdata", "search-test-files")
+ boundariesFile := filepath.Join(testDataDir, "word-boundaries.txt")
+
+ // Test exact word match (should match "curl" but not "libcurl", "curlopt", etc.)
+ report, err := RunSearch(boundariesFile, "curl", false, false, false, false, false)
+ if err != nil {
+ t.Fatalf("RunSearch failed: %v", err)
+ }
+
+ // The file contains "curl" as a standalone word, so should match
+ if report.FilesContaining != 1 {
+ t.Errorf("Expected 1 file containing 'curl' as exact word, got %d", report.FilesContaining)
+ }
+
+ // Test partial match (should match all occurrences)
+ report2, err := RunSearch(boundariesFile, "curl", false, false, false, false, true)
+ if err != nil {
+ t.Fatalf("RunSearch failed: %v", err)
+ }
+
+ // Should match because partial matching is enabled
+ if report2.FilesContaining != 1 {
+ t.Errorf("Expected 1 file containing 'curl' with partial match, got %d", report2.FilesContaining)
+ }
+}
+
+// TestDirectorySearch tests searching across multiple files in a directory
+func TestDirectorySearch(t *testing.T) {
+ testDataDir := filepath.Join("..", "..", "..", "testdata", "search-test-files")
+
+ // Search for "curl" in the directory (exact word match, case-insensitive)
+ report, err := RunSearch(testDataDir, "curl", false, false, false, false, false)
+ if err != nil {
+ t.Fatalf("RunSearch failed: %v", err)
+ }
+
+ // Should find "curl" in:
+ // - curl-examples.txt (has "curl" as standalone word)
+ // - mixed-case.txt (has "curl", "CURL", "Curl" - case insensitive)
+ // - word-boundaries.txt (has "curl" as standalone word)
+ // - python-code.py (has "curl" as standalone word)
+ // Should NOT find in:
+ // - libcurl-examples.txt (only has "libcurl", not standalone "curl")
+ // - no-match.txt (doesn't contain "curl" at all)
+ expectedMatches := 4
+ if report.FilesContaining != expectedMatches {
+ t.Errorf("Expected %d files containing 'curl', got %d", expectedMatches, report.FilesContaining)
+ }
+
+ // Verify total files scanned
+ if report.FilesScanned != 6 {
+ t.Errorf("Expected 6 files scanned, got %d", report.FilesScanned)
+ }
+}
+
+// TestDirectorySearchWithPartialMatch tests directory search with partial matching
+func TestDirectorySearchWithPartialMatch(t *testing.T) {
+ testDataDir := filepath.Join("..", "..", "..", "testdata", "search-test-files")
+
+ // Search for "curl" with partial match enabled
+ report, err := RunSearch(testDataDir, "curl", false, false, false, false, true)
+ if err != nil {
+ t.Fatalf("RunSearch failed: %v", err)
+ }
+
+ // Should find "curl" in all files except no-match.txt:
+ // - curl-examples.txt
+ // - libcurl-examples.txt (now matches because of partial match)
+ // - mixed-case.txt
+ // - word-boundaries.txt
+ // - python-code.py
+ expectedMatches := 5
+ if report.FilesContaining != expectedMatches {
+ t.Errorf("Expected %d files containing 'curl' with partial match, got %d", expectedMatches, report.FilesContaining)
+ }
+}
+
+// TestCombinedFlags tests using both case-sensitive and partial-match flags together
+func TestCombinedFlags(t *testing.T) {
+ testDataDir := filepath.Join("..", "..", "..", "testdata", "search-test-files")
+ mixedCaseFile := filepath.Join(testDataDir, "mixed-case.txt")
+
+ // Search for lowercase "curl" with both case-sensitive and partial match
+ report, err := RunSearch(mixedCaseFile, "curl", false, false, false, true, true)
+ if err != nil {
+ t.Fatalf("RunSearch failed: %v", err)
+ }
+
+ // Should match only lowercase "curl"
+ if report.FilesContaining != 1 {
+ t.Errorf("Expected 1 file containing 'curl' (case-sensitive + partial), got %d", report.FilesContaining)
+ }
+
+ // Search for uppercase "CURL" with both flags
+ report2, err := RunSearch(mixedCaseFile, "CURL", false, false, false, true, true)
+ if err != nil {
+ t.Fatalf("RunSearch failed: %v", err)
+ }
+
+ // Should match only uppercase "CURL"
+ if report2.FilesContaining != 1 {
+ t.Errorf("Expected 1 file containing 'CURL' (case-sensitive + partial), got %d", report2.FilesContaining)
+ }
+}
+
+// TestLanguageDetection tests that language is correctly detected from file extensions
+func TestLanguageDetection(t *testing.T) {
+ testDataDir := filepath.Join("..", "..", "..", "testdata", "search-test-files")
+
+ // Search in directory and check language counts
+ report, err := RunSearch(testDataDir, "curl", false, false, false, false, false)
+ if err != nil {
+ t.Fatalf("RunSearch failed: %v", err)
+ }
+
+ // Should have detected .txt and .py files
+ if _, hasTxt := report.LanguageCounts["txt"]; !hasTxt {
+ t.Error("Expected to find 'txt' in language counts")
+ }
+
+ if _, hasPy := report.LanguageCounts["py"]; !hasPy {
+ t.Error("Expected to find 'py' in language counts")
+ }
+
+ // Check that txt count is correct (3 txt files should match)
+ if report.LanguageCounts["txt"] != 3 {
+ t.Errorf("Expected 3 txt files, got %d", report.LanguageCounts["txt"])
+ }
+
+ // Check that py count is correct (1 py file should match)
+ if report.LanguageCounts["py"] != 1 {
+ t.Errorf("Expected 1 py file, got %d", report.LanguageCounts["py"])
+ }
+}
+
+// TestNoMatches tests searching for a string that doesn't exist
+func TestNoMatches(t *testing.T) {
+ testDataDir := filepath.Join("..", "..", "..", "testdata", "search-test-files")
+ noMatchFile := filepath.Join(testDataDir, "no-match.txt")
+
+ // Search for "curl" in a file that doesn't contain it
+ report, err := RunSearch(noMatchFile, "curl", false, false, false, false, false)
+ if err != nil {
+ t.Fatalf("RunSearch failed: %v", err)
+ }
+
+ if report.FilesContaining != 0 {
+ t.Errorf("Expected 0 files containing 'curl', got %d", report.FilesContaining)
+ }
+
+ if report.FilesScanned != 1 {
+ t.Errorf("Expected 1 file scanned, got %d", report.FilesScanned)
+ }
+}
+
diff --git a/audit-cli/commands/search/find-string/report.go b/audit-cli/commands/search/find-string/report.go
new file mode 100644
index 0000000..4a4a70c
--- /dev/null
+++ b/audit-cli/commands/search/find-string/report.go
@@ -0,0 +1,51 @@
+package find_string
+
+import (
+ "fmt"
+ "sort"
+ "strings"
+)
+
+// PrintReport prints the search report to stdout.
+//
+// Displays statistics about the search operation including:
+// - Number of files scanned
+// - Number of files containing the substring
+// - Files containing substring by language (if verbose is true)
+// - List of file paths containing the substring (if verbose is true)
+//
+// Parameters:
+// - report: The report to print
+// - verbose: If true, show detailed breakdown including file paths and language counts
+func PrintReport(report *SearchReport, verbose bool) {
+ fmt.Println("\n" + strings.Repeat("=", 60))
+ fmt.Println("SEARCH REPORT")
+ fmt.Println(strings.Repeat("=", 60))
+
+ fmt.Printf("\nFiles Scanned: %d\n", report.FilesScanned)
+ fmt.Printf("Files Containing Substring: %d\n", report.FilesContaining)
+
+ if verbose && len(report.LanguageCounts) > 0 {
+ fmt.Println("\nFiles Containing Substring by Language:")
+
+ languages := make([]string, 0, len(report.LanguageCounts))
+ for lang := range report.LanguageCounts {
+ languages = append(languages, lang)
+ }
+ sort.Strings(languages)
+
+ for _, lang := range languages {
+ count := report.LanguageCounts[lang]
+ fmt.Printf(" %-15s: %d\n", lang, count)
+ }
+ }
+
+ if verbose && len(report.FilesWithSubstring) > 0 {
+ fmt.Println("\nFiles Containing Substring:")
+ for _, path := range report.FilesWithSubstring {
+ fmt.Printf(" - %s\n", path)
+ }
+ }
+
+ fmt.Println(strings.Repeat("=", 60))
+}
diff --git a/audit-cli/commands/search/find-string/types.go b/audit-cli/commands/search/find-string/types.go
new file mode 100644
index 0000000..dde85e8
--- /dev/null
+++ b/audit-cli/commands/search/find-string/types.go
@@ -0,0 +1,45 @@
+package find_string
+
+// SearchResult contains the results of searching a single file.
+//
+// Used internally during the search operation to track results for each file.
+type SearchResult struct {
+ FilePath string // Path to the file that was searched
+ Language string // Programming language (detected from file extension)
+ Contains bool // Whether the file contains the substring
+}
+
+// SearchReport contains statistics about the search operation.
+//
+// Tracks overall statistics for reporting to the user.
+type SearchReport struct {
+ FilesScanned int // Total number of files scanned
+ FilesContaining int // Number of files containing the substring
+ LanguageCounts map[string]int // Count of files containing substring by language
+ FilesWithSubstring []string // List of file paths containing the substring
+}
+
+// NewSearchReport creates a new initialized SearchReport with empty maps and slices.
+func NewSearchReport() *SearchReport {
+ return &SearchReport{
+ LanguageCounts: make(map[string]int),
+ FilesWithSubstring: make([]string, 0),
+ }
+}
+
+// AddResult updates the report with a search result.
+//
+// This method should be called once for each file that is searched.
+// It updates the statistics based on whether the file contains the substring.
+func (r *SearchReport) AddResult(result SearchResult) {
+ r.FilesScanned++
+
+ if result.Contains {
+ r.FilesContaining++
+ r.FilesWithSubstring = append(r.FilesWithSubstring, result.FilePath)
+
+ if result.Language != "" {
+ r.LanguageCounts[result.Language]++
+ }
+ }
+}
diff --git a/audit-cli/commands/search/search.go b/audit-cli/commands/search/search.go
new file mode 100644
index 0000000..ed9bdee
--- /dev/null
+++ b/audit-cli/commands/search/search.go
@@ -0,0 +1,33 @@
+// Package search provides the parent command for searching through extracted content.
+//
+// This package serves as the parent command for various search operations.
+// Currently supports:
+// - find-string: Search for substrings in extracted code example files
+//
+// Future subcommands could include pattern matching, regex search, or semantic search.
+package search
+
+import (
+ "github.com/mongodb/code-example-tooling/audit-cli/commands/search/find-string"
+ "github.com/spf13/cobra"
+)
+
+// NewSearchCommand creates the search parent command.
+//
+// This command serves as a parent for various search operations on extracted content.
+// It doesn't perform any operations itself but provides a namespace for subcommands.
+func NewSearchCommand() *cobra.Command {
+ cmd := &cobra.Command{
+ Use: "search",
+ Short: "Search through extracted content",
+ Long: `Search through extracted content such as code examples.
+
+Currently supports searching for substrings in extracted code example files.
+Future subcommands may support pattern matching, regex search, or semantic search.`,
+ }
+
+ // Add subcommands
+ cmd.AddCommand(find_string.NewFindStringCommand())
+
+ return cmd
+}
diff --git a/audit-cli/go.mod b/audit-cli/go.mod
new file mode 100644
index 0000000..788992d
--- /dev/null
+++ b/audit-cli/go.mod
@@ -0,0 +1,13 @@
+module github.com/mongodb/code-example-tooling/audit-cli
+
+go 1.24
+
+require (
+ github.com/aymanbagabas/go-udiff v0.3.1
+ github.com/spf13/cobra v1.10.1
+)
+
+require (
+ github.com/inconshreveable/mousetrap v1.1.0 // indirect
+ github.com/spf13/pflag v1.0.10 // indirect
+)
diff --git a/audit-cli/go.sum b/audit-cli/go.sum
new file mode 100644
index 0000000..ce2736b
--- /dev/null
+++ b/audit-cli/go.sum
@@ -0,0 +1,13 @@
+github.com/aymanbagabas/go-udiff v0.3.1 h1:LV+qyBQ2pqe0u42ZsUEtPiCaUoqgA9gYRDs3vj1nolY=
+github.com/aymanbagabas/go-udiff v0.3.1/go.mod h1:G0fsKmG+P6ylD0r6N/KgQD/nWzgfnl8ZBcNLgcbrw8E=
+github.com/cpuguy83/go-md2man/v2 v2.0.6/go.mod h1:oOW0eioCTA6cOiMLiUPZOpcVxMig6NIQQ7OS05n1F4g=
+github.com/inconshreveable/mousetrap v1.1.0 h1:wN+x4NVGpMsO7ErUn/mUI3vEoE6Jt13X2s0bqwp9tc8=
+github.com/inconshreveable/mousetrap v1.1.0/go.mod h1:vpF70FUmC8bwa3OWnCshd2FqLfsEA9PFc4w1p2J65bw=
+github.com/russross/blackfriday/v2 v2.1.0/go.mod h1:+Rmxgy9KzJVeS9/2gXHxylqXiyQDYRxCVz55jmeOWTM=
+github.com/spf13/cobra v1.10.1 h1:lJeBwCfmrnXthfAupyUTzJ/J4Nc1RsHC/mSRU2dll/s=
+github.com/spf13/cobra v1.10.1/go.mod h1:7SmJGaTHFVBY0jW4NXGluQoLvhqFQM+6XSKD+P4XaB0=
+github.com/spf13/pflag v1.0.9/go.mod h1:McXfInJRrz4CZXVZOBLb0bTZqETkiAhM9Iw0y3An2Bg=
+github.com/spf13/pflag v1.0.10 h1:4EBh2KAYBwaONj6b2Ye1GiHfwjqyROoF4RwYO+vPwFk=
+github.com/spf13/pflag v1.0.10/go.mod h1:McXfInJRrz4CZXVZOBLb0bTZqETkiAhM9Iw0y3An2Bg=
+gopkg.in/check.v1 v0.0.0-20161208181325-20d25e280405/go.mod h1:Co6ibVJAznAaIkqp8huTwlJQCZ016jof/cbN4VW5Yz0=
+gopkg.in/yaml.v3 v3.0.1/go.mod h1:K4uyk7z7BCEPqu6E+C64Yfv1cQ7kz7rIZviUmN+EgEM=
diff --git a/audit-cli/internal/rst/directive_parser.go b/audit-cli/internal/rst/directive_parser.go
new file mode 100644
index 0000000..6539c2c
--- /dev/null
+++ b/audit-cli/internal/rst/directive_parser.go
@@ -0,0 +1,490 @@
+// Package rst provides utilities for parsing reStructuredText (RST) files.
+//
+// This package contains the core RST parsing logic used by the extract commands.
+// It handles:
+// - Parsing RST directives (literalinclude, code-block, io-code-block)
+// - Following include directives recursively
+// - Resolving include paths with MongoDB-specific conventions
+// - Traversing directories for RST files
+//
+// The package is designed to be reusable across different extraction operations.
+package rst
+
+import (
+ "bufio"
+ "fmt"
+ "os"
+ "regexp"
+ "strings"
+)
+
+// DirectiveType represents the type of reStructuredText directive.
+type DirectiveType string
+
+const (
+ // CodeBlock represents inline code blocks (.. code-block::)
+ CodeBlock DirectiveType = "code-block"
+ // LiteralInclude represents external file references (.. literalinclude::)
+ LiteralInclude DirectiveType = "literalinclude"
+ // IoCodeBlock represents input/output examples (.. io-code-block::)
+ IoCodeBlock DirectiveType = "io-code-block"
+)
+
+// Directive represents a parsed reStructuredText directive.
+//
+// Contains all information needed to extract content from the directive,
+// including the directive type, arguments, options, and content.
+type Directive struct {
+ Type DirectiveType // Type of directive (code-block, literalinclude, io-code-block)
+ Argument string // Main argument (e.g., language for code-block, filepath for literalinclude)
+ Options map[string]string // Directive options (e.g., :language:, :start-after:, etc.)
+ Content string // Content of the directive (for code-block and inline io-code-block)
+ LineNum int // Line number where directive starts (1-based)
+
+ // For io-code-block directives
+ InputDirective *SubDirective // The .. input:: nested directive
+ OutputDirective *SubDirective // The .. output:: nested directive
+}
+
+// SubDirective represents a nested directive within io-code-block.
+//
+// Can contain either a filepath argument (for external file reference)
+// or inline content (for embedded code).
+type SubDirective struct {
+ Argument string // Filepath argument (if provided)
+ Options map[string]string // Directive options (e.g., :language:)
+ Content string // Inline content (if no filepath)
+}
+
+// Regular expressions for directive parsing
+var (
+ // Matches: .. literalinclude:: /path/to/file.php
+ literalIncludeRegex = regexp.MustCompile(`^\.\.\s+literalinclude::\s+(.+)$`)
+
+ // Matches: .. code-block:: python (language is optional)
+ codeBlockRegex = regexp.MustCompile(`^\.\.\s+code-block::\s*(.*)$`)
+
+ // Matches: .. io-code-block::
+ ioCodeBlockRegex = regexp.MustCompile(`^\.\.\s+io-code-block::\s*$`)
+
+ // Matches: .. input:: /path/to/file.cs (filepath is optional)
+ inputDirectiveRegex = regexp.MustCompile(`^\.\.\s+input::\s*(.*)$`)
+
+ // Matches: .. output:: /path/to/file.txt (filepath is optional)
+ outputDirectiveRegex = regexp.MustCompile(`^\.\.\s+output::\s*(.*)$`)
+
+ // Matches directive options like: :language: python
+ optionRegex = regexp.MustCompile(`^\s+:([^:]+):\s*(.*)$`)
+)
+
+// ParseDirectives parses all directives from an RST file.
+//
+// This function scans the file line-by-line and extracts all supported directives
+// (literalinclude, code-block, io-code-block). For each directive, it parses:
+// - The directive type and argument
+// - All directive options (e.g., :language:, :start-after:)
+// - The directive content (for code-block and io-code-block)
+// - Nested directives (for io-code-block)
+//
+// Parameters:
+// - filePath: Path to the RST file to parse
+//
+// Returns:
+// - []Directive: Slice of all parsed directives in order of appearance
+// - error: Any error encountered during parsing
+func ParseDirectives(filePath string) ([]Directive, error) {
+ file, err := os.Open(filePath)
+ if err != nil {
+ return nil, err
+ }
+ defer file.Close()
+
+ var directives []Directive
+ scanner := bufio.NewScanner(file)
+ lineNum := 0
+
+ for scanner.Scan() {
+ lineNum++
+ line := scanner.Text()
+ trimmedLine := strings.TrimSpace(line)
+
+ // Check for literalinclude directive
+ if matches := literalIncludeRegex.FindStringSubmatch(trimmedLine); len(matches) > 1 {
+ directive := Directive{
+ Type: LiteralInclude,
+ Argument: strings.TrimSpace(matches[1]),
+ Options: make(map[string]string),
+ LineNum: lineNum,
+ }
+
+ // Parse options on following lines
+ parseDirectiveOptions(scanner, &directive, &lineNum)
+ directives = append(directives, directive)
+ continue
+ }
+
+ // Check for code-block directive
+ if matches := codeBlockRegex.FindStringSubmatch(trimmedLine); len(matches) > 1 {
+ directive := Directive{
+ Type: CodeBlock,
+ Argument: strings.TrimSpace(matches[1]),
+ Options: make(map[string]string),
+ LineNum: lineNum,
+ }
+
+ // Parse options and content on following lines
+ firstContentLine := parseDirectiveOptions(scanner, &directive, &lineNum)
+ parseDirectiveContent(scanner, &directive, &lineNum, firstContentLine)
+ directives = append(directives, directive)
+ continue
+ }
+
+ // Check for io-code-block directive
+ if ioCodeBlockRegex.MatchString(trimmedLine) {
+ directive := Directive{
+ Type: IoCodeBlock,
+ Options: make(map[string]string),
+ LineNum: lineNum,
+ }
+
+ // Parse io-code-block with its nested input/output directives
+ parseIoCodeBlock(scanner, &directive, &lineNum)
+ directives = append(directives, directive)
+ continue
+ }
+ }
+
+ if err := scanner.Err(); err != nil {
+ return nil, err
+ }
+
+ return directives, nil
+}
+
+// parseDirectiveOptions parses the options following a directive
+// Returns the first content line if encountered, or empty string if not
+func parseDirectiveOptions(scanner *bufio.Scanner, directive *Directive, lineNum *int) string {
+ for scanner.Scan() {
+ *lineNum++
+ line := scanner.Text()
+
+ // Check if this is an option line
+ if matches := optionRegex.FindStringSubmatch(line); len(matches) > 1 {
+ optionName := strings.TrimSpace(matches[1])
+ optionValue := strings.TrimSpace(matches[2])
+ directive.Options[optionName] = optionValue
+ continue
+ }
+
+ // If we hit a blank line or non-indented line, we're done with options
+ trimmedLine := strings.TrimSpace(line)
+ if trimmedLine == "" {
+ continue // Skip blank lines between options and content
+ }
+
+ // If the line is not indented and not an option, we're done
+ if len(line) > 0 && line[0] != ' ' && line[0] != '\t' {
+ // Non-indented line means end of directive
+ return ""
+ }
+
+ // If we have indented content (not an option), this is the start of content
+ if len(line) > 0 && (line[0] == ' ' || line[0] == '\t') && !optionRegex.MatchString(line) {
+ return line
+ }
+ }
+ return ""
+}
+
+// parseDirectiveContent parses the content block of a directive (for code-block, io-code-block)
+// firstContentLine is the first line of content (if already consumed by parseDirectiveOptions)
+func parseDirectiveContent(scanner *bufio.Scanner, directive *Directive, lineNum *int, firstContentLine string) {
+ var contentLines []string
+ var baseIndent int = -1
+
+ // Process the first content line if provided
+ if firstContentLine != "" {
+ // Calculate indentation
+ indent := len(firstContentLine) - len(strings.TrimLeft(firstContentLine, " \t"))
+ baseIndent = indent
+
+ // Add the first line, removing the base indentation
+ contentLines = append(contentLines, firstContentLine[baseIndent:])
+ }
+
+ for scanner.Scan() {
+ *lineNum++
+ line := scanner.Text()
+
+ // Empty lines are part of the content
+ if strings.TrimSpace(line) == "" {
+ contentLines = append(contentLines, "")
+ continue
+ }
+
+ // Calculate indentation
+ indent := len(line) - len(strings.TrimLeft(line, " \t"))
+
+ // If this is the first content line, establish the base indentation
+ if baseIndent == -1 {
+ baseIndent = indent
+ }
+
+ // If the line is less indented than the base, we're done with content
+ if indent < baseIndent {
+ break
+ }
+
+ // Add the line to content, removing the base indentation
+ if indent >= baseIndent {
+ contentLines = append(contentLines, line[baseIndent:])
+ }
+ }
+
+ directive.Content = strings.TrimSpace(strings.Join(contentLines, "\n"))
+}
+
+// ExtractLiteralIncludeContent extracts content from a literalinclude directive
+// Handles start-after and end-before options
+func ExtractLiteralIncludeContent(currentFilePath string, directive Directive) (string, error) {
+ if directive.Type != LiteralInclude {
+ return "", fmt.Errorf("directive is not a literalinclude")
+ }
+
+ // Resolve the file path
+ resolvedPath, err := ResolveIncludePath(currentFilePath, directive.Argument)
+ if err != nil {
+ return "", fmt.Errorf("failed to resolve literalinclude path %s: %w", directive.Argument, err)
+ }
+
+ // Read the file content
+ content, err := os.ReadFile(resolvedPath)
+ if err != nil {
+ return "", fmt.Errorf("failed to read literalinclude file %s: %w", resolvedPath, err)
+ }
+
+ contentStr := string(content)
+
+ // Handle start-after option
+ if startAfter, hasStartAfter := directive.Options["start-after"]; hasStartAfter {
+ startIdx := strings.Index(contentStr, startAfter)
+ if startIdx == -1 {
+ return "", fmt.Errorf("start-after tag '%s' not found in %s", startAfter, resolvedPath)
+ }
+ // Find the end of the line containing the start-after tag
+ lineEnd := strings.Index(contentStr[startIdx:], "\n")
+ if lineEnd == -1 {
+ // Tag is on the last line, take everything after it
+ contentStr = ""
+ } else {
+ // Skip past the newline to start at the next line
+ contentStr = contentStr[startIdx+lineEnd+1:]
+ }
+ }
+
+ // Handle end-before option
+ if endBefore, hasEndBefore := directive.Options["end-before"]; hasEndBefore {
+ endIdx := strings.Index(contentStr, endBefore)
+ if endIdx == -1 {
+ return "", fmt.Errorf("end-before tag '%s' not found in %s", endBefore, resolvedPath)
+ }
+ // Find the start of the line containing the end-before tag
+ lineStart := strings.LastIndex(contentStr[:endIdx], "\n")
+ if lineStart == -1 {
+ lineStart = 0
+ } else {
+ lineStart++ // Move past the newline
+ }
+ // Cut before the line containing the tag, but keep the newline before it
+ if lineStart > 0 {
+ contentStr = contentStr[:lineStart-1]
+ } else {
+ contentStr = ""
+ }
+ }
+
+ // Handle dedent option
+ if _, hasDedent := directive.Options["dedent"]; hasDedent {
+ contentStr = dedentContent(contentStr)
+ }
+
+ return strings.TrimSpace(contentStr), nil
+}
+
+// dedentContent removes common leading whitespace from all lines
+func dedentContent(content string) string {
+ lines := strings.Split(content, "\n")
+ if len(lines) == 0 {
+ return content
+ }
+
+ // Find the minimum indentation (ignoring empty lines)
+ minIndent := -1
+ for _, line := range lines {
+ if strings.TrimSpace(line) == "" {
+ continue
+ }
+ indent := len(line) - len(strings.TrimLeft(line, " \t"))
+ if minIndent == -1 || indent < minIndent {
+ minIndent = indent
+ }
+ }
+
+ if minIndent <= 0 {
+ return content
+ }
+
+ // Remove the common indentation from all lines
+ var dedentedLines []string
+ for _, line := range lines {
+ if strings.TrimSpace(line) == "" {
+ dedentedLines = append(dedentedLines, "")
+ } else if len(line) >= minIndent {
+ dedentedLines = append(dedentedLines, line[minIndent:])
+ } else {
+ dedentedLines = append(dedentedLines, line)
+ }
+ }
+
+ return strings.Join(dedentedLines, "\n")
+}
+
+// parseIoCodeBlock parses an io-code-block directive with its nested input/output directives
+func parseIoCodeBlock(scanner *bufio.Scanner, directive *Directive, lineNum *int) {
+ // First, parse any options for the io-code-block itself
+ // This might return the first input/output directive line
+ firstLine := parseDirectiveOptions(scanner, directive, lineNum)
+
+ // Now parse the nested input and output directives
+ var pendingLine string = firstLine
+ for {
+ var line string
+ var trimmedLine string
+
+ // Use pending line if we have one, otherwise scan for next line
+ if pendingLine != "" {
+ line = pendingLine
+ trimmedLine = strings.TrimSpace(line)
+ pendingLine = ""
+ } else {
+ if !scanner.Scan() {
+ break
+ }
+ *lineNum++
+ line = scanner.Text()
+ trimmedLine = strings.TrimSpace(line)
+ }
+
+ // Stop if we hit a blank line followed by dedent to base level
+ if trimmedLine == "" {
+ // Peek ahead to see if next line is dedented
+ if !scanner.Scan() {
+ break
+ }
+ *lineNum++
+ nextLine := scanner.Text()
+ if len(nextLine) > 0 && nextLine[0] != ' ' && nextLine[0] != '\t' {
+ // We've reached the end of the io-code-block
+ break
+ }
+ // Not dedented, continue parsing
+ line = nextLine
+ trimmedLine = strings.TrimSpace(line)
+ }
+
+ // Check for input directive
+ if matches := inputDirectiveRegex.FindStringSubmatch(trimmedLine); len(matches) > 0 {
+ subDir := &SubDirective{
+ Argument: strings.TrimSpace(matches[1]),
+ Options: make(map[string]string),
+ }
+ pendingLine = parseSubDirective(scanner, subDir, lineNum)
+ directive.InputDirective = subDir
+ continue
+ }
+
+ // Check for output directive
+ if matches := outputDirectiveRegex.FindStringSubmatch(trimmedLine); len(matches) > 0 {
+ subDir := &SubDirective{
+ Argument: strings.TrimSpace(matches[1]),
+ Options: make(map[string]string),
+ }
+ pendingLine = parseSubDirective(scanner, subDir, lineNum)
+ directive.OutputDirective = subDir
+ continue
+ }
+
+ // If we get here, the line is neither input nor output directive
+ // This means we've reached the end of the io-code-block
+ break
+ }
+}
+
+// parseSubDirective parses a nested directive (input or output) within io-code-block
+// Returns the last line read (which might be the start of the next directive)
+func parseSubDirective(scanner *bufio.Scanner, subDir *SubDirective, lineNum *int) string {
+ var contentLines []string
+ var baseIndent int = -1
+ var lastLine string
+
+ // Parse options and content
+ for scanner.Scan() {
+ *lineNum++
+ line := scanner.Text()
+ lastLine = line
+ trimmedLine := strings.TrimSpace(line)
+
+ // Empty line - might be part of content or end of directive
+ if trimmedLine == "" {
+ if len(contentLines) > 0 {
+ contentLines = append(contentLines, "")
+ }
+ continue
+ }
+
+ // Check if this is an option line
+ if matches := optionRegex.FindStringSubmatch(line); len(matches) > 2 {
+ subDir.Options[matches[1]] = strings.TrimSpace(matches[2])
+ continue
+ }
+
+ // Check if this is the start of another directive (input/output)
+ if inputDirectiveRegex.MatchString(trimmedLine) || outputDirectiveRegex.MatchString(trimmedLine) {
+ // Return this line so the caller can process it
+ break
+ }
+
+ // Check if line is indented (content)
+ if len(line) > 0 && (line[0] == ' ' || line[0] == '\t') {
+ indent := len(line) - len(strings.TrimLeft(line, " \t"))
+
+ // Set base indent from first content line
+ if baseIndent == -1 {
+ baseIndent = indent
+ }
+
+ // If we've dedented back to or past the base level, we're done
+ if len(contentLines) > 0 && indent < baseIndent {
+ break
+ }
+
+ // Add content line (remove base indentation)
+ if baseIndent >= 0 && len(line) >= baseIndent {
+ contentLines = append(contentLines, line[baseIndent:])
+ } else {
+ contentLines = append(contentLines, strings.TrimLeft(line, " \t"))
+ }
+ } else {
+ // Non-indented, non-empty line means we're done with this directive
+ break
+ }
+ }
+
+ // Set the content
+ if len(contentLines) > 0 {
+ subDir.Content = strings.TrimSpace(strings.Join(contentLines, "\n"))
+ }
+
+ return lastLine
+}
+
diff --git a/audit-cli/internal/rst/file_utils.go b/audit-cli/internal/rst/file_utils.go
new file mode 100644
index 0000000..f867f7b
--- /dev/null
+++ b/audit-cli/internal/rst/file_utils.go
@@ -0,0 +1,72 @@
+package rst
+
+import (
+ "os"
+ "path/filepath"
+ "strings"
+)
+
+// TraverseDirectory traverses a directory and returns all file paths.
+//
+// If recursive is true, walks the entire directory tree. If false, only
+// returns files in the immediate directory (no subdirectories).
+//
+// Parameters:
+// - rootPath: Root directory to traverse
+// - recursive: If true, recursively scan subdirectories
+//
+// Returns:
+// - []string: List of all file paths found
+// - error: Any error encountered during traversal
+func TraverseDirectory(rootPath string, recursive bool) ([]string, error) {
+ var files []string
+
+ if recursive {
+ err := filepath.Walk(rootPath, func(path string, info os.FileInfo, err error) error {
+ if err != nil {
+ return err
+ }
+ if !info.IsDir() {
+ files = append(files, path)
+ }
+ return nil
+ })
+ if err != nil {
+ return nil, err
+ }
+ } else {
+ entries, err := os.ReadDir(rootPath)
+ if err != nil {
+ return nil, err
+ }
+ for _, entry := range entries {
+ if !entry.IsDir() {
+ files = append(files, filepath.Join(rootPath, entry.Name()))
+ }
+ }
+ }
+
+ return files, nil
+}
+
+// ShouldProcessFile determines if a file should be processed based on its extension.
+//
+// Returns true for files with .rst, .txt, or .md extensions (case-insensitive).
+// This is used to filter files during directory traversal.
+//
+// Parameters:
+// - filePath: Path to the file to check
+//
+// Returns:
+// - bool: True if the file should be processed, false otherwise
+func ShouldProcessFile(filePath string) bool {
+ ext := strings.ToLower(filepath.Ext(filePath))
+ validExtensions := []string{".rst", ".txt", ".md"}
+ for _, validExt := range validExtensions {
+ if ext == validExt {
+ return true
+ }
+ }
+ return false
+}
+
diff --git a/audit-cli/internal/rst/include_resolver.go b/audit-cli/internal/rst/include_resolver.go
new file mode 100644
index 0000000..af437ec
--- /dev/null
+++ b/audit-cli/internal/rst/include_resolver.go
@@ -0,0 +1,360 @@
+package rst
+
+import (
+ "bufio"
+ "fmt"
+ "os"
+ "path/filepath"
+ "regexp"
+ "strings"
+)
+
+// IncludeDirectiveRegex matches .. include:: directives in RST files.
+var IncludeDirectiveRegex = regexp.MustCompile(`^\.\.\s+include::\s+(.+)$`)
+
+// FindIncludeDirectives finds all include directives in a file and resolves their paths.
+//
+// This function scans the file for .. include:: directives and resolves each path
+// using MongoDB-specific conventions (steps files, extracts, template variables, etc.).
+//
+// Parameters:
+// - filePath: Path to the RST file to scan
+//
+// Returns:
+// - []string: List of resolved absolute paths to included files
+// - error: Any error encountered during scanning
+func FindIncludeDirectives(filePath string) ([]string, error) {
+ file, err := os.Open(filePath)
+ if err != nil {
+ return nil, err
+ }
+ defer file.Close()
+
+ var includePaths []string
+ scanner := bufio.NewScanner(file)
+
+ for scanner.Scan() {
+ line := strings.TrimSpace(scanner.Text())
+
+ // Check if this line is an include directive
+ matches := IncludeDirectiveRegex.FindStringSubmatch(line)
+ if len(matches) > 1 {
+ includePath := strings.TrimSpace(matches[1])
+
+ // Resolve the include path relative to the source directory
+ resolvedPath, err := ResolveIncludePath(filePath, includePath)
+ if err != nil {
+ fmt.Fprintf(os.Stderr, "Warning: failed to resolve include path %s: %v\n", includePath, err)
+ continue
+ }
+
+ includePaths = append(includePaths, resolvedPath)
+ }
+ }
+
+ if err := scanner.Err(); err != nil {
+ return nil, err
+ }
+
+ return includePaths, nil
+}
+
+// ResolveIncludePath resolves an include path relative to the source directory
+// Handles multiple special cases:
+// - Template variables ({{var_name}})
+// - Steps files (/includes/steps/name.rst -> /includes/steps-name.yaml)
+// - Extracts files (ref-based YAML content blocks)
+// - Release files (ref-based YAML content blocks)
+// - Files without extensions (auto-append .rst)
+func ResolveIncludePath(currentFilePath, includePath string) (string, error) {
+ // Handle template variables by looking up replacements in the current file
+ if strings.HasPrefix(includePath, "{{") && strings.HasSuffix(includePath, "}}") {
+ // Extract the variable name
+ varName := strings.TrimSuffix(strings.TrimPrefix(includePath, "{{"), "}}")
+ varName = strings.TrimSpace(varName)
+
+ // Try to resolve the variable from the current file's replacement section
+ resolvedPath, err := ResolveTemplateVariable(currentFilePath, varName)
+ if err != nil {
+ return "", fmt.Errorf("failed to resolve template variable %s: %w", includePath, err)
+ }
+
+ // Now resolve the replacement path as a normal include
+ includePath = resolvedPath
+ }
+
+ // Find the source directory by walking up from the current file
+ sourceDir, err := FindSourceDirectory(currentFilePath)
+ if err != nil {
+ return "", err
+ }
+
+ // Clean the include path (remove leading slash if present)
+ cleanIncludePath := strings.TrimPrefix(includePath, "/")
+
+ // Special handling for steps/ includes
+ // Convert /includes/steps/filename.rst to /includes/steps-filename.yaml
+ if strings.Contains(cleanIncludePath, "steps/") {
+ fullPath, err := resolveSpecialIncludePath(sourceDir, cleanIncludePath, "steps")
+ if err == nil {
+ return fullPath, nil
+ }
+ // If steps resolution fails, continue with normal resolution
+ }
+
+ // Special handling for extracts/ includes
+ // These reference content blocks in YAML files by ref ID
+ // Convert /includes/extracts/ref-name.rst to the YAML file containing that ref
+ if strings.Contains(cleanIncludePath, "extracts/") {
+ fullPath, err := resolveRefBasedIncludePath(sourceDir, cleanIncludePath, "extracts")
+ if err == nil {
+ return fullPath, nil
+ }
+ // If extracts resolution fails, continue with normal resolution
+ }
+
+ // Special handling for release/ includes
+ // These also reference content blocks in YAML files by ref ID
+ if strings.Contains(cleanIncludePath, "release/") {
+ fullPath, err := resolveRefBasedIncludePath(sourceDir, cleanIncludePath, "release")
+ if err == nil {
+ return fullPath, nil
+ }
+ // If release resolution fails, continue with normal resolution
+ }
+
+ // Construct the full path
+ fullPath := filepath.Join(sourceDir, cleanIncludePath)
+
+ // If the file exists as-is, return it
+ if _, err := os.Stat(fullPath); err == nil {
+ return fullPath, nil
+ }
+
+ // If the path doesn't have an extension, try adding .rst
+ if filepath.Ext(cleanIncludePath) == "" {
+ fullPathWithRst := fullPath + ".rst"
+ if _, err := os.Stat(fullPathWithRst); err == nil {
+ return fullPathWithRst, nil
+ }
+ }
+
+ return "", fmt.Errorf("include file not found: %s", fullPath)
+}
+
+// resolveSpecialIncludePath handles special include paths (steps/)
+// Converts: /includes/steps/run-mongodb-on-a-linux-distribution-systemd.rst
+// To: /includes/steps-run-mongodb-on-a-linux-distribution-systemd.yaml
+func resolveSpecialIncludePath(sourceDir, includePath, dirType string) (string, error) {
+ // Find the "dirType/" part in the path (e.g., "steps/")
+ searchPattern := dirType + "/"
+ dirIndex := strings.Index(includePath, searchPattern)
+ if dirIndex == -1 {
+ return "", fmt.Errorf("no %s/ found in path", dirType)
+ }
+
+ // Split the path at "dirType/"
+ beforeDir := includePath[:dirIndex]
+ afterDir := includePath[dirIndex+len(searchPattern):]
+
+ // Remove the file extension from afterDir
+ afterDir = strings.TrimSuffix(afterDir, filepath.Ext(afterDir))
+
+ // Construct the new path: before + "dirType-" + after + ".yaml"
+ newPath := beforeDir + dirType + "-" + afterDir + ".yaml"
+
+ // Construct the full path
+ fullPath := filepath.Join(sourceDir, newPath)
+
+ // Verify the file exists
+ if _, err := os.Stat(fullPath); err != nil {
+ return "", fmt.Errorf("%s file not found: %s", dirType, fullPath)
+ }
+
+ return fullPath, nil
+}
+
+// resolveRefBasedIncludePath handles ref-based include paths (extracts/, release/)
+// These reference content blocks in YAML files by ref ID
+// Example: /includes/extracts/install-mongodb-community-manually-redhat.rst
+// References a ref in a YAML file like /includes/extracts-install-mongodb-manually.yaml
+// Example: /includes/release/pin-repo-to-version-yum.rst
+// References a ref in a YAML file like /includes/release-pinning.yaml
+func resolveRefBasedIncludePath(sourceDir, includePath, dirType string) (string, error) {
+ // Extract the ref name from the path
+ // /includes/dirType/ref-name.rst -> ref-name
+ searchPattern := dirType + "/"
+ dirIndex := strings.Index(includePath, searchPattern)
+ if dirIndex == -1 {
+ return "", fmt.Errorf("no %s/ found in path", dirType)
+ }
+
+ refName := includePath[dirIndex+len(searchPattern):]
+ refName = strings.TrimSuffix(refName, filepath.Ext(refName))
+
+ // Get the directory part before "dirType/"
+ beforeDir := includePath[:dirIndex]
+ searchDir := filepath.Join(sourceDir, beforeDir)
+
+ // Find all dirType-*.yaml files in the includes directory
+ pattern := filepath.Join(searchDir, dirType+"-*.yaml")
+ matches, err := filepath.Glob(pattern)
+ if err != nil {
+ return "", fmt.Errorf("failed to search for %s files: %w", dirType, err)
+ }
+
+ // Search each YAML file for the ref
+ for _, yamlFile := range matches {
+ hasRef, err := YAMLFileContainsRef(yamlFile, refName)
+ if err != nil {
+ continue // Skip files we can't read
+ }
+ if hasRef {
+ return yamlFile, nil
+ }
+ }
+
+ return "", fmt.Errorf("no %s file found containing ref: %s", dirType, refName)
+}
+
+// YAMLFileContainsRef checks if a YAML file contains a specific ref.
+//
+// This function scans a YAML file for a line matching "ref: ".
+// Used to find the correct YAML file for ref-based includes (extracts, release).
+//
+// Parameters:
+// - filePath: Path to the YAML file to check
+// - refName: The ref name to search for
+//
+// Returns:
+// - bool: True if the file contains the ref, false otherwise
+// - error: Any error encountered during scanning
+func YAMLFileContainsRef(filePath, refName string) (bool, error) {
+ file, err := os.Open(filePath)
+ if err != nil {
+ return false, err
+ }
+ defer file.Close()
+
+ scanner := bufio.NewScanner(file)
+ searchPattern := "ref: " + refName
+
+ for scanner.Scan() {
+ line := strings.TrimSpace(scanner.Text())
+ if line == searchPattern {
+ return true, nil
+ }
+ }
+
+ return false, scanner.Err()
+}
+
+// ResolveTemplateVariable resolves a template variable from a YAML file's replacement section.
+//
+// MongoDB documentation uses template variables in include directives like:
+// .. include:: {{release_specification_default}}
+//
+// These are resolved by looking up the variable in the YAML file's replacement section:
+// replacement:
+// release_specification_default: "/includes/release/install-windows-default.rst"
+//
+// Parameters:
+// - yamlFilePath: Path to the YAML file containing the replacement section
+// - varName: The variable name to resolve (without {{ }})
+//
+// Returns:
+// - string: The resolved path from the replacement section
+// - error: Any error encountered during resolution
+func ResolveTemplateVariable(yamlFilePath, varName string) (string, error) {
+ file, err := os.Open(yamlFilePath)
+ if err != nil {
+ return "", err
+ }
+ defer file.Close()
+
+ scanner := bufio.NewScanner(file)
+ inReplacementSection := false
+ searchPattern := varName + ":"
+
+ for scanner.Scan() {
+ line := scanner.Text()
+ trimmedLine := strings.TrimSpace(line)
+
+ // Check if we're entering the replacement section
+ if trimmedLine == "replacement:" {
+ inReplacementSection = true
+ continue
+ }
+
+ // If we're in the replacement section
+ if inReplacementSection {
+ // Check if we've left the replacement section (new top-level key or document separator)
+ if len(line) > 0 && line[0] != ' ' && line[0] != '\t' {
+ // We've left the replacement section
+ break
+ }
+ if trimmedLine == "..." || trimmedLine == "---" {
+ // Document separator - we've left the replacement section
+ break
+ }
+
+ // Look for our variable
+ if strings.HasPrefix(trimmedLine, searchPattern) {
+ // Extract the value (everything after "varName: ")
+ value := strings.TrimPrefix(trimmedLine, searchPattern)
+ value = strings.TrimSpace(value)
+ // Remove quotes if present
+ value = strings.Trim(value, "\"'")
+ return value, nil
+ }
+ }
+ }
+
+ if err := scanner.Err(); err != nil {
+ return "", err
+ }
+
+ return "", fmt.Errorf("template variable %s not found in replacement section of %s", varName, yamlFilePath)
+}
+
+// FindSourceDirectory walks up the directory tree to find the "source" directory.
+//
+// MongoDB documentation is typically organized with a "source" directory at the root.
+// This function walks up from the current file to find that directory, which is used
+// as the base for resolving include paths.
+//
+// Parameters:
+// - filePath: Path to a file within the documentation tree
+//
+// Returns:
+// - string: Absolute path to the source directory
+// - error: Error if source directory cannot be found
+func FindSourceDirectory(filePath string) (string, error) {
+ // Get the directory containing the file
+ dir := filepath.Dir(filePath)
+
+ // Walk up the directory tree
+ for {
+ // Check if the current directory is named "source"
+ if filepath.Base(dir) == "source" {
+ return dir, nil
+ }
+
+ // Check if there's a "source" subdirectory
+ sourceSubdir := filepath.Join(dir, "source")
+ if info, err := os.Stat(sourceSubdir); err == nil && info.IsDir() {
+ return sourceSubdir, nil
+ }
+
+ // Move up one directory
+ parent := filepath.Dir(dir)
+
+ // If we've reached the root, stop
+ if parent == dir {
+ return "", fmt.Errorf("could not find source directory for %s", filePath)
+ }
+
+ dir = parent
+ }
+}
+
diff --git a/audit-cli/internal/rst/parser.go b/audit-cli/internal/rst/parser.go
new file mode 100644
index 0000000..a7a7bb0
--- /dev/null
+++ b/audit-cli/internal/rst/parser.go
@@ -0,0 +1,91 @@
+package rst
+
+import (
+ "fmt"
+ "os"
+ "path/filepath"
+)
+
+// ParseFileWithIncludes parses a file and recursively follows include directives.
+//
+// This function provides a generic mechanism for processing RST files and their includes.
+// It handles:
+// - Tracking visited files to prevent circular includes
+// - Calling a custom parse function for each file
+// - Recursively following .. include:: directives
+// - Resolving include paths with MongoDB-specific conventions
+//
+// The parseFunc is called for each file to extract content (e.g., code examples).
+// It should return an error if parsing fails.
+//
+// Parameters:
+// - filePath: Path to the RST file to parse
+// - followIncludes: If true, recursively follow .. include:: directives
+// - visited: Map tracking already-processed files (prevents circular includes)
+// - verbose: If true, print detailed processing information
+// - parseFunc: Function to call for each file to extract content
+//
+// Returns:
+// - []string: List of all processed file paths (absolute paths)
+// - error: Any error encountered during parsing
+func ParseFileWithIncludes(
+ filePath string,
+ followIncludes bool,
+ visited map[string]bool,
+ verbose bool,
+ parseFunc func(string) error,
+) ([]string, error) {
+ // Prevent infinite loops from circular includes
+ absPath, err := filepath.Abs(filePath)
+ if err != nil {
+ return nil, err
+ }
+
+ if visited[absPath] {
+ return nil, nil // Already processed this file
+ }
+ visited[absPath] = true
+
+ var processedFiles []string
+ processedFiles = append(processedFiles, absPath)
+
+ // Parse the current file using the provided parse function
+ if parseFunc != nil {
+ if err := parseFunc(filePath); err != nil {
+ return processedFiles, err
+ }
+ }
+
+ // If not following includes, return just this file
+ if !followIncludes {
+ return processedFiles, nil
+ }
+
+ // Find and process include directives
+ includeFiles, err := FindIncludeDirectives(filePath)
+ if err != nil {
+ return processedFiles, nil // Continue even if we can't find includes
+ }
+
+ if verbose && len(includeFiles) > 0 {
+ fmt.Printf(" Found %d include(s) in %s\n", len(includeFiles), filepath.Base(filePath))
+ }
+
+ // Recursively parse included files
+ for _, includeFile := range includeFiles {
+ if verbose {
+ fmt.Printf(" Following include: %s\n", includeFile)
+ }
+
+ includedFiles, err := ParseFileWithIncludes(includeFile, followIncludes, visited, verbose, parseFunc)
+ if err != nil {
+ // Log warning but continue processing other files
+ fmt.Fprintf(os.Stderr, "Warning: failed to parse included file %s: %v\n", includeFile, err)
+ continue
+ }
+ processedFiles = append(processedFiles, includedFiles...)
+ }
+
+ return processedFiles, nil
+}
+
diff --git a/audit-cli/main.go b/audit-cli/main.go
new file mode 100644
index 0000000..a6ce75d
--- /dev/null
+++ b/audit-cli/main.go
@@ -0,0 +1,46 @@
+// Package main provides the entry point for the audit-cli tool.
+//
+// audit-cli is a command-line tool for extracting and analyzing code examples
+// from MongoDB documentation written in reStructuredText (RST).
+//
+// The CLI is organized into parent commands with subcommands:
+// - extract: Extract content from RST files
+// - code-examples: Extract code examples from RST directives
+// - search: Search through extracted content
+// - find-string: Search for substrings in extracted files
+// - analyze: Analyze RST file structures
+// - includes: Analyze include directive relationships
+// - compare: Compare files across different versions
+// - file-contents: Compare file contents across versions
+package main
+
+import (
+ "github.com/mongodb/code-example-tooling/audit-cli/commands/analyze"
+ "github.com/mongodb/code-example-tooling/audit-cli/commands/compare"
+ "github.com/mongodb/code-example-tooling/audit-cli/commands/extract"
+ "github.com/mongodb/code-example-tooling/audit-cli/commands/search"
+ "github.com/spf13/cobra"
+)
+
+func main() {
+ var rootCmd = &cobra.Command{
+ Use: "audit-cli",
+ Short: "A CLI tool for extracting and analyzing code examples from MongoDB documentation",
+ Long: `audit-cli extracts code examples from reStructuredText files and provides
+tools for searching and analyzing the extracted content.
+
+Supports extraction from literalinclude, code-block, and io-code-block directives,
+with special handling for MongoDB documentation conventions.`,
+ }
+
+ // Add parent commands
+ rootCmd.AddCommand(extract.NewExtractCommand())
+ rootCmd.AddCommand(search.NewSearchCommand())
+ rootCmd.AddCommand(analyze.NewAnalyzeCommand())
+ rootCmd.AddCommand(compare.NewCompareCommand())
+
+ err := rootCmd.Execute()
+ if err != nil {
+ return
+ }
+}
diff --git a/audit-cli/testdata/compare/file1.txt b/audit-cli/testdata/compare/file1.txt
new file mode 100644
index 0000000..c4d290d
--- /dev/null
+++ b/audit-cli/testdata/compare/file1.txt
@@ -0,0 +1,4 @@
+Line 1
+Line 2
+Line 3
+
diff --git a/audit-cli/testdata/compare/file2.txt b/audit-cli/testdata/compare/file2.txt
new file mode 100644
index 0000000..90f6207
--- /dev/null
+++ b/audit-cli/testdata/compare/file2.txt
@@ -0,0 +1,5 @@
+Line 1
+Line 2 modified
+Line 3
+Line 4
+
diff --git a/audit-cli/testdata/compare/identical1.txt b/audit-cli/testdata/compare/identical1.txt
new file mode 100644
index 0000000..2e25982
--- /dev/null
+++ b/audit-cli/testdata/compare/identical1.txt
@@ -0,0 +1,4 @@
+Identical content
+Line 2
+Line 3
+
diff --git a/audit-cli/testdata/compare/identical2.txt b/audit-cli/testdata/compare/identical2.txt
new file mode 100644
index 0000000..2e25982
--- /dev/null
+++ b/audit-cli/testdata/compare/identical2.txt
@@ -0,0 +1,4 @@
+Identical content
+Line 2
+Line 3
+
diff --git a/audit-cli/testdata/compare/product/manual/source/includes/example.rst b/audit-cli/testdata/compare/product/manual/source/includes/example.rst
new file mode 100644
index 0000000..313e078
--- /dev/null
+++ b/audit-cli/testdata/compare/product/manual/source/includes/example.rst
@@ -0,0 +1,40 @@
+.. _example-reference:
+
+=================
+Example Document
+=================
+
+This is an example RST file for testing the compare command.
+
+Introduction
+------------
+
+MongoDB is a document database designed for ease of application development and scaling.
+
+Features
+--------
+
+- Document-oriented storage
+- Full index support
+- Replication and high availability
+- Auto-sharding
+- Rich queries
+- Fast in-place updates
+- Professional support by MongoDB
+
+Code Example
+------------
+
+.. code-block:: javascript
+
+ db.collection.insertOne({
+ name: "John Doe",
+ age: 30,
+ status: "active"
+ })
+
+Conclusion
+----------
+
+This concludes the example document.
+
diff --git a/audit-cli/testdata/compare/product/manual/source/includes/new-feature.rst b/audit-cli/testdata/compare/product/manual/source/includes/new-feature.rst
new file mode 100644
index 0000000..00d5e49
--- /dev/null
+++ b/audit-cli/testdata/compare/product/manual/source/includes/new-feature.rst
@@ -0,0 +1,13 @@
+.. _new-feature:
+
+===========
+New Feature
+===========
+
+This feature only exists in manual and upcoming versions.
+
+Description
+-----------
+
+This is a new feature that was added in recent versions.
+
diff --git a/audit-cli/testdata/compare/product/upcoming/source/includes/example.rst b/audit-cli/testdata/compare/product/upcoming/source/includes/example.rst
new file mode 100644
index 0000000..de6b7cb
--- /dev/null
+++ b/audit-cli/testdata/compare/product/upcoming/source/includes/example.rst
@@ -0,0 +1,42 @@
+.. _example-reference:
+
+=================
+Example Document
+=================
+
+This is an example RST file for testing the compare command.
+
+Introduction
+------------
+
+MongoDB is a document database designed for ease of application development and scaling.
+
+Features
+--------
+
+- Document-oriented storage
+- Full index support
+- Replication and high availability
+- Auto-sharding
+- Rich queries
+- Fast in-place updates
+- Professional support by MongoDB
+- New feature in upcoming version
+
+Code Example
+------------
+
+.. code-block:: javascript
+
+ db.collection.insertOne({
+ name: "John Doe",
+ age: 30,
+ status: "active",
+ version: "upcoming"
+ })
+
+Conclusion
+----------
+
+This concludes the example document with updates for the upcoming version.
+
diff --git a/audit-cli/testdata/compare/product/upcoming/source/includes/new-feature.rst b/audit-cli/testdata/compare/product/upcoming/source/includes/new-feature.rst
new file mode 100644
index 0000000..00d5e49
--- /dev/null
+++ b/audit-cli/testdata/compare/product/upcoming/source/includes/new-feature.rst
@@ -0,0 +1,13 @@
+.. _new-feature:
+
+===========
+New Feature
+===========
+
+This feature only exists in manual and upcoming versions.
+
+Description
+-----------
+
+This is a new feature that was added in recent versions.
+
diff --git a/audit-cli/testdata/compare/product/v8.0/source/includes/example.rst b/audit-cli/testdata/compare/product/v8.0/source/includes/example.rst
new file mode 100644
index 0000000..ce8892d
--- /dev/null
+++ b/audit-cli/testdata/compare/product/v8.0/source/includes/example.rst
@@ -0,0 +1,38 @@
+.. _example-reference:
+
+=================
+Example Document
+=================
+
+This is an example RST file for testing the compare command.
+
+Introduction
+------------
+
+MongoDB is a document database designed for ease of application development and scaling.
+
+Features
+--------
+
+- Document-oriented storage
+- Full index support
+- Replication and high availability
+- Auto-sharding
+- Rich queries
+- Fast in-place updates
+
+Code Example
+------------
+
+.. code-block:: javascript
+
+ db.collection.insertOne({
+ name: "John Doe",
+ age: 30
+ })
+
+Conclusion
+----------
+
+This concludes the example document.
+
diff --git a/audit-cli/testdata/expected-output/code-block-test.code-block.1.js b/audit-cli/testdata/expected-output/code-block-test.code-block.1.js
new file mode 100644
index 0000000..cfdd3ad
--- /dev/null
+++ b/audit-cli/testdata/expected-output/code-block-test.code-block.1.js
@@ -0,0 +1,2 @@
+const greeting = "Hello, World!";
+console.log(greeting);
\ No newline at end of file
diff --git a/audit-cli/testdata/expected-output/code-block-test.code-block.2.py b/audit-cli/testdata/expected-output/code-block-test.code-block.2.py
new file mode 100644
index 0000000..e2cd691
--- /dev/null
+++ b/audit-cli/testdata/expected-output/code-block-test.code-block.2.py
@@ -0,0 +1,3 @@
+def calculate_sum(a, b):
+ result = a + b
+ return result
\ No newline at end of file
diff --git a/audit-cli/testdata/expected-output/code-block-test.code-block.3.js b/audit-cli/testdata/expected-output/code-block-test.code-block.3.js
new file mode 100644
index 0000000..d77da9c
--- /dev/null
+++ b/audit-cli/testdata/expected-output/code-block-test.code-block.3.js
@@ -0,0 +1,41 @@
+[
+ {
+ _id: ObjectId("620ad555394d47411658b5ef"),
+ time: ISODate("2021-03-08T09:00:00.000Z"),
+ price: 500,
+ linearFillPrice: 500,
+ locfPrice: 500
+ },
+ {
+ _id: ObjectId("620ad555394d47411658b5f0"),
+ time: ISODate("2021-03-08T10:00:00.000Z"),
+ linearFillPrice: 507.5,
+ locfPrice: 500
+ },
+ {
+ _id: ObjectId("620ad555394d47411658b5f1"),
+ time: ISODate("2021-03-08T11:00:00.000Z"),
+ price: 515,
+ linearFillPrice: 515,
+ locfPrice: 515
+ },
+ {
+ _id: ObjectId("620ad555394d47411658b5f2"),
+ time: ISODate("2021-03-08T12:00:00.000Z"),
+ linearFillPrice: 505,
+ locfPrice: 515
+ },
+ {
+ _id: ObjectId("620ad555394d47411658b5f3"),
+ time: ISODate("2021-03-08T13:00:00.000Z"),
+ linearFillPrice: 495,
+ locfPrice: 515
+ },
+ {
+ _id: ObjectId("620ad555394d47411658b5f4"),
+ time: ISODate("2021-03-08T14:00:00.000Z"),
+ price: 485,
+ linearFillPrice: 485,
+ locfPrice: 485
+ }
+]
\ No newline at end of file
diff --git a/audit-cli/testdata/expected-output/code-block-test.code-block.4.txt b/audit-cli/testdata/expected-output/code-block-test.code-block.4.txt
new file mode 100644
index 0000000..27d8210
--- /dev/null
+++ b/audit-cli/testdata/expected-output/code-block-test.code-block.4.txt
@@ -0,0 +1,2 @@
+This is a code block with no language specified.
+It should still be extracted.
\ No newline at end of file
diff --git a/audit-cli/testdata/expected-output/code-block-test.code-block.5.sh b/audit-cli/testdata/expected-output/code-block-test.code-block.5.sh
new file mode 100644
index 0000000..0cb7dcb
--- /dev/null
+++ b/audit-cli/testdata/expected-output/code-block-test.code-block.5.sh
@@ -0,0 +1,3 @@
+#!/bin/bash
+echo "Hello from shell"
+exit 0
\ No newline at end of file
diff --git a/audit-cli/testdata/expected-output/code-block-test.code-block.6.ts b/audit-cli/testdata/expected-output/code-block-test.code-block.6.ts
new file mode 100644
index 0000000..34e7b57
--- /dev/null
+++ b/audit-cli/testdata/expected-output/code-block-test.code-block.6.ts
@@ -0,0 +1,4 @@
+interface User {
+ name: string;
+ age: number;
+}
\ No newline at end of file
diff --git a/audit-cli/testdata/expected-output/code-block-test.code-block.7.cpp b/audit-cli/testdata/expected-output/code-block-test.code-block.7.cpp
new file mode 100644
index 0000000..422591f
--- /dev/null
+++ b/audit-cli/testdata/expected-output/code-block-test.code-block.7.cpp
@@ -0,0 +1,6 @@
+#include
+
+int main() {
+ std::cout << "Hello" << std::endl;
+ return 0;
+}
\ No newline at end of file
diff --git a/audit-cli/testdata/expected-output/examples.literalinclude.1.go b/audit-cli/testdata/expected-output/examples.literalinclude.1.go
new file mode 100644
index 0000000..bc4d6fa
--- /dev/null
+++ b/audit-cli/testdata/expected-output/examples.literalinclude.1.go
@@ -0,0 +1,7 @@
+package main
+
+import "fmt"
+
+func main() {
+ fmt.Println("Hello from Go!")
+}
\ No newline at end of file
diff --git a/audit-cli/testdata/expected-output/io-code-block-test.io-code-block.1.input.js b/audit-cli/testdata/expected-output/io-code-block-test.io-code-block.1.input.js
new file mode 100644
index 0000000..e980d86
--- /dev/null
+++ b/audit-cli/testdata/expected-output/io-code-block-test.io-code-block.1.input.js
@@ -0,0 +1 @@
+db.restaurants.aggregate( [ { $match: { category: "cafe" } } ] )
\ No newline at end of file
diff --git a/audit-cli/testdata/expected-output/io-code-block-test.io-code-block.1.output.js b/audit-cli/testdata/expected-output/io-code-block-test.io-code-block.1.output.js
new file mode 100644
index 0000000..6449d1b
--- /dev/null
+++ b/audit-cli/testdata/expected-output/io-code-block-test.io-code-block.1.output.js
@@ -0,0 +1,5 @@
+[
+ { _id: 1, category: 'café', status: 'Open' },
+ { _id: 2, category: 'cafe', status: 'open' },
+ { _id: 3, category: 'cafE', status: 'open' }
+]
\ No newline at end of file
diff --git a/audit-cli/testdata/expected-output/io-code-block-test.io-code-block.3.input.py b/audit-cli/testdata/expected-output/io-code-block-test.io-code-block.3.input.py
new file mode 100644
index 0000000..a2967ac
--- /dev/null
+++ b/audit-cli/testdata/expected-output/io-code-block-test.io-code-block.3.input.py
@@ -0,0 +1,6 @@
+from pymongo import MongoClient
+client = MongoClient('mongodb://localhost:27017')
+db = client.test_database
+collection = db.test_collection
+result = collection.insert_one({'name': 'Alice', 'age': 30})
+print(result.inserted_id)
\ No newline at end of file
diff --git a/audit-cli/testdata/expected-output/io-code-block-test.io-code-block.3.output.py b/audit-cli/testdata/expected-output/io-code-block-test.io-code-block.3.output.py
new file mode 100644
index 0000000..2b106e5
--- /dev/null
+++ b/audit-cli/testdata/expected-output/io-code-block-test.io-code-block.3.output.py
@@ -0,0 +1 @@
+ObjectId('507f1f77bcf86cd799439011')
\ No newline at end of file
diff --git a/audit-cli/testdata/expected-output/io-code-block-test.io-code-block.4.input.sh b/audit-cli/testdata/expected-output/io-code-block-test.io-code-block.4.input.sh
new file mode 100644
index 0000000..f815eb7
--- /dev/null
+++ b/audit-cli/testdata/expected-output/io-code-block-test.io-code-block.4.input.sh
@@ -0,0 +1 @@
+mongosh --eval "db.users.find({age: {$gt: 25}})"
\ No newline at end of file
diff --git a/audit-cli/testdata/expected-output/io-code-block-test.io-code-block.4.output.txt b/audit-cli/testdata/expected-output/io-code-block-test.io-code-block.4.output.txt
new file mode 100644
index 0000000..12943f7
--- /dev/null
+++ b/audit-cli/testdata/expected-output/io-code-block-test.io-code-block.4.output.txt
@@ -0,0 +1,4 @@
+[
+ { "_id": 1, "name": "Alice", "age": 30 },
+ { "_id": 2, "name": "Bob", "age": 35 }
+]
\ No newline at end of file
diff --git a/audit-cli/testdata/expected-output/io-code-block-test.io-code-block.5.input.ts b/audit-cli/testdata/expected-output/io-code-block-test.io-code-block.5.input.ts
new file mode 100644
index 0000000..d5aaa42
--- /dev/null
+++ b/audit-cli/testdata/expected-output/io-code-block-test.io-code-block.5.input.ts
@@ -0,0 +1,7 @@
+import { MongoClient } from 'mongodb';
+
+const client = new MongoClient('mongodb://localhost:27017');
+await client.connect();
+const db = client.db('mydb');
+const result = await db.collection('users').findOne({ name: 'Alice' });
+console.log(result);
\ No newline at end of file
diff --git a/audit-cli/testdata/expected-output/io-code-block-test.io-code-block.5.output.txt b/audit-cli/testdata/expected-output/io-code-block-test.io-code-block.5.output.txt
new file mode 100644
index 0000000..4f0d8a4
--- /dev/null
+++ b/audit-cli/testdata/expected-output/io-code-block-test.io-code-block.5.output.txt
@@ -0,0 +1 @@
+{ "_id": 1, "name": "Alice", "age": 30, "email": "alice@example.com" }
\ No newline at end of file
diff --git a/audit-cli/testdata/expected-output/io-code-block-test.io-code-block.6.input.js b/audit-cli/testdata/expected-output/io-code-block-test.io-code-block.6.input.js
new file mode 100644
index 0000000..ea52620
--- /dev/null
+++ b/audit-cli/testdata/expected-output/io-code-block-test.io-code-block.6.input.js
@@ -0,0 +1 @@
+db.inventory.find({ status: "A" })
\ No newline at end of file
diff --git a/audit-cli/testdata/expected-output/io-code-block-test.io-code-block.6.output.js b/audit-cli/testdata/expected-output/io-code-block-test.io-code-block.6.output.js
new file mode 100644
index 0000000..fc4e206
--- /dev/null
+++ b/audit-cli/testdata/expected-output/io-code-block-test.io-code-block.6.output.js
@@ -0,0 +1,4 @@
+[
+ { _id: 1, item: "journal", status: "A" },
+ { _id: 2, item: "notebook", status: "A" }
+]
\ No newline at end of file
diff --git a/audit-cli/testdata/expected-output/io-code-block-test.io-code-block.7.input.go b/audit-cli/testdata/expected-output/io-code-block-test.io-code-block.7.input.go
new file mode 100644
index 0000000..b2ff291
--- /dev/null
+++ b/audit-cli/testdata/expected-output/io-code-block-test.io-code-block.7.input.go
@@ -0,0 +1,15 @@
+package main
+
+import (
+ "context"
+ "go.mongodb.org/mongo-driver/mongo"
+ "go.mongodb.org/mongo-driver/mongo/options"
+)
+
+func main() {
+ client, err := mongo.Connect(context.TODO(), options.Client().ApplyURI("mongodb://localhost:27017"))
+ if err != nil {
+ panic(err)
+ }
+ defer client.Disconnect(context.TODO())
+}
\ No newline at end of file
diff --git a/audit-cli/testdata/expected-output/literalinclude-test.literalinclude.1.py b/audit-cli/testdata/expected-output/literalinclude-test.literalinclude.1.py
new file mode 100644
index 0000000..ac2eb81
--- /dev/null
+++ b/audit-cli/testdata/expected-output/literalinclude-test.literalinclude.1.py
@@ -0,0 +1,4 @@
+def hello_world():
+ """Print hello world message."""
+ print("Hello, World!")
+ return True
\ No newline at end of file
diff --git a/audit-cli/testdata/expected-output/literalinclude-test.literalinclude.2.go b/audit-cli/testdata/expected-output/literalinclude-test.literalinclude.2.go
new file mode 100644
index 0000000..bc4d6fa
--- /dev/null
+++ b/audit-cli/testdata/expected-output/literalinclude-test.literalinclude.2.go
@@ -0,0 +1,7 @@
+package main
+
+import "fmt"
+
+func main() {
+ fmt.Println("Hello from Go!")
+}
\ No newline at end of file
diff --git a/audit-cli/testdata/expected-output/literalinclude-test.literalinclude.3.js b/audit-cli/testdata/expected-output/literalinclude-test.literalinclude.3.js
new file mode 100644
index 0000000..7c75f16
--- /dev/null
+++ b/audit-cli/testdata/expected-output/literalinclude-test.literalinclude.3.js
@@ -0,0 +1,5 @@
+function greet(name) {
+ return `Hello, ${name}!`;
+}
+
+console.log(greet("World"));
\ No newline at end of file
diff --git a/audit-cli/testdata/expected-output/literalinclude-test.literalinclude.4.php b/audit-cli/testdata/expected-output/literalinclude-test.literalinclude.4.php
new file mode 100644
index 0000000..5e921e5
--- /dev/null
+++ b/audit-cli/testdata/expected-output/literalinclude-test.literalinclude.4.php
@@ -0,0 +1,6 @@
+ 'localhost',
+ 'port' => 27017
+];
\ No newline at end of file
diff --git a/audit-cli/testdata/expected-output/literalinclude-test.literalinclude.5.rb b/audit-cli/testdata/expected-output/literalinclude-test.literalinclude.5.rb
new file mode 100644
index 0000000..6201a21
--- /dev/null
+++ b/audit-cli/testdata/expected-output/literalinclude-test.literalinclude.5.rb
@@ -0,0 +1,10 @@
+# Ruby example
+class Greeter
+ def initialize(name)
+ @name = name
+ end
+
+ def greet
+ puts "Hello, #{@name}!"
+ end
+end
\ No newline at end of file
diff --git a/audit-cli/testdata/expected-output/literalinclude-test.literalinclude.6.ts b/audit-cli/testdata/expected-output/literalinclude-test.literalinclude.6.ts
new file mode 100644
index 0000000..721cc1e
--- /dev/null
+++ b/audit-cli/testdata/expected-output/literalinclude-test.literalinclude.6.ts
@@ -0,0 +1,9 @@
+// TypeScript example
+interface User {
+ name: string;
+ age: number;
+}
+
+function greetUser(user: User): string {
+ return `Hello, ${user.name}!`;
+}
\ No newline at end of file
diff --git a/audit-cli/testdata/expected-output/literalinclude-test.literalinclude.7.cpp b/audit-cli/testdata/expected-output/literalinclude-test.literalinclude.7.cpp
new file mode 100644
index 0000000..28276a3
--- /dev/null
+++ b/audit-cli/testdata/expected-output/literalinclude-test.literalinclude.7.cpp
@@ -0,0 +1,8 @@
+#include
+#include
+
+int main() {
+ std::string message = "Hello from C++!";
+ std::cout << message << std::endl;
+ return 0;
+}
\ No newline at end of file
diff --git a/audit-cli/testdata/expected-output/nested-code-block-test.code-block.1.js b/audit-cli/testdata/expected-output/nested-code-block-test.code-block.1.js
new file mode 100644
index 0000000..05e0fa8
--- /dev/null
+++ b/audit-cli/testdata/expected-output/nested-code-block-test.code-block.1.js
@@ -0,0 +1,3 @@
+const { MongoClient } = require('mongodb');
+const client = new MongoClient('mongodb://localhost:27017');
+await client.connect();
\ No newline at end of file
diff --git a/audit-cli/testdata/expected-output/nested-code-block-test.code-block.10.rb b/audit-cli/testdata/expected-output/nested-code-block-test.code-block.10.rb
new file mode 100644
index 0000000..8aeee6e
--- /dev/null
+++ b/audit-cli/testdata/expected-output/nested-code-block-test.code-block.10.rb
@@ -0,0 +1,2 @@
+require 'mongo'
+client = Mongo::Client.new(['localhost:27017'], database: 'mydb')
\ No newline at end of file
diff --git a/audit-cli/testdata/expected-output/nested-code-block-test.code-block.11.txt b/audit-cli/testdata/expected-output/nested-code-block-test.code-block.11.txt
new file mode 100644
index 0000000..509a91b
--- /dev/null
+++ b/audit-cli/testdata/expected-output/nested-code-block-test.code-block.11.txt
@@ -0,0 +1,9 @@
+{
+ "database": {
+ "host": "localhost",
+ "port": 27017
+ },
+ "logging": {
+ "level": "info"
+ }
+}
\ No newline at end of file
diff --git a/audit-cli/testdata/expected-output/nested-code-block-test.code-block.2.js b/audit-cli/testdata/expected-output/nested-code-block-test.code-block.2.js
new file mode 100644
index 0000000..54fd99e
--- /dev/null
+++ b/audit-cli/testdata/expected-output/nested-code-block-test.code-block.2.js
@@ -0,0 +1,4 @@
+const db = client.db('myDatabase');
+const collection = db.collection('myCollection');
+const result = await collection.insertOne({ name: 'Alice', age: 30 });
+console.log('Inserted document:', result.insertedId);
\ No newline at end of file
diff --git a/audit-cli/testdata/expected-output/nested-code-block-test.code-block.3.js b/audit-cli/testdata/expected-output/nested-code-block-test.code-block.3.js
new file mode 100644
index 0000000..a2b3626
--- /dev/null
+++ b/audit-cli/testdata/expected-output/nested-code-block-test.code-block.3.js
@@ -0,0 +1,2 @@
+const doc = await collection.findOne({ name: 'Alice' });
+console.log('Found document:', doc);
\ No newline at end of file
diff --git a/audit-cli/testdata/expected-output/nested-code-block-test.code-block.4.py b/audit-cli/testdata/expected-output/nested-code-block-test.code-block.4.py
new file mode 100644
index 0000000..0bfccdb
--- /dev/null
+++ b/audit-cli/testdata/expected-output/nested-code-block-test.code-block.4.py
@@ -0,0 +1,5 @@
+client = MongoClient('mongodb://localhost:27017')
+session = client.start_session()
+with session.start_transaction():
+ collection.insert_one({'x': 1}, session=session)
+ collection.update_one({'x': 1}, {'$set': {'y': 2}}, session=session)
\ No newline at end of file
diff --git a/audit-cli/testdata/expected-output/nested-code-block-test.code-block.5.go b/audit-cli/testdata/expected-output/nested-code-block-test.code-block.5.go
new file mode 100644
index 0000000..8776f29
--- /dev/null
+++ b/audit-cli/testdata/expected-output/nested-code-block-test.code-block.5.go
@@ -0,0 +1,9 @@
+func validateInput(input string) error {
+ if len(input) == 0 {
+ return errors.New("input cannot be empty")
+ }
+ if len(input) > 100 {
+ return errors.New("input too long")
+ }
+ return nil
+}
\ No newline at end of file
diff --git a/audit-cli/testdata/expected-output/nested-code-block-test.code-block.6.ts b/audit-cli/testdata/expected-output/nested-code-block-test.code-block.6.ts
new file mode 100644
index 0000000..2867d5e
--- /dev/null
+++ b/audit-cli/testdata/expected-output/nested-code-block-test.code-block.6.ts
@@ -0,0 +1,9 @@
+interface Config {
+ host: string;
+ port: number;
+}
+
+const config: Config = {
+ host: 'localhost',
+ port: 27017
+};
\ No newline at end of file
diff --git a/audit-cli/testdata/expected-output/nested-code-block-test.code-block.7.ts b/audit-cli/testdata/expected-output/nested-code-block-test.code-block.7.ts
new file mode 100644
index 0000000..2bcabc6
--- /dev/null
+++ b/audit-cli/testdata/expected-output/nested-code-block-test.code-block.7.ts
@@ -0,0 +1,5 @@
+import { MongoClient } from 'mongodb';
+
+const client = new MongoClient(`mongodb://${config.host}:${config.port}`);
+await client.connect();
+console.log('Connected successfully');
\ No newline at end of file
diff --git a/audit-cli/testdata/expected-output/nested-code-block-test.code-block.8.sh b/audit-cli/testdata/expected-output/nested-code-block-test.code-block.8.sh
new file mode 100644
index 0000000..4a7f92e
--- /dev/null
+++ b/audit-cli/testdata/expected-output/nested-code-block-test.code-block.8.sh
@@ -0,0 +1,3 @@
+# This is insecure!
+chmod 777 /var/lib/mongodb
+chown nobody:nobody /var/lib/mongodb
\ No newline at end of file
diff --git a/audit-cli/testdata/expected-output/nested-code-block-test.code-block.9.rb b/audit-cli/testdata/expected-output/nested-code-block-test.code-block.9.rb
new file mode 100644
index 0000000..71132c3
--- /dev/null
+++ b/audit-cli/testdata/expected-output/nested-code-block-test.code-block.9.rb
@@ -0,0 +1,2 @@
+require 'mongo'
+client = Mongo::Client.new('mongodb://localhost:27017/mydb')
\ No newline at end of file
diff --git a/audit-cli/testdata/input-files/source/code-block-test.rst b/audit-cli/testdata/input-files/source/code-block-test.rst
new file mode 100644
index 0000000..d655fed
--- /dev/null
+++ b/audit-cli/testdata/input-files/source/code-block-test.rst
@@ -0,0 +1,112 @@
+Code Block Test
+===============
+
+This file tests various code-block directive scenarios.
+
+JavaScript with Language
+------------------------
+
+.. code-block:: javascript
+
+ const greeting = "Hello, World!";
+ console.log(greeting);
+
+Python with Options
+-------------------
+
+.. code-block:: python
+ :copyable: false
+ :emphasize-lines: 2,3
+
+ def calculate_sum(a, b):
+ result = a + b
+ return result
+
+JSON Array Example
+------------------
+
+.. code-block:: javascript
+ :copyable: false
+ :emphasize-lines: 12,13,25,26,31,32
+
+ [
+ {
+ _id: ObjectId("620ad555394d47411658b5ef"),
+ time: ISODate("2021-03-08T09:00:00.000Z"),
+ price: 500,
+ linearFillPrice: 500,
+ locfPrice: 500
+ },
+ {
+ _id: ObjectId("620ad555394d47411658b5f0"),
+ time: ISODate("2021-03-08T10:00:00.000Z"),
+ linearFillPrice: 507.5,
+ locfPrice: 500
+ },
+ {
+ _id: ObjectId("620ad555394d47411658b5f1"),
+ time: ISODate("2021-03-08T11:00:00.000Z"),
+ price: 515,
+ linearFillPrice: 515,
+ locfPrice: 515
+ },
+ {
+ _id: ObjectId("620ad555394d47411658b5f2"),
+ time: ISODate("2021-03-08T12:00:00.000Z"),
+ linearFillPrice: 505,
+ locfPrice: 515
+ },
+ {
+ _id: ObjectId("620ad555394d47411658b5f3"),
+ time: ISODate("2021-03-08T13:00:00.000Z"),
+ linearFillPrice: 495,
+ locfPrice: 515
+ },
+ {
+ _id: ObjectId("620ad555394d47411658b5f4"),
+ time: ISODate("2021-03-08T14:00:00.000Z"),
+ price: 485,
+ linearFillPrice: 485,
+ locfPrice: 485
+ }
+ ]
+
+Code Block with No Language
+----------------------------
+
+.. code-block::
+
+ This is a code block with no language specified.
+ It should still be extracted.
+
+Shell Script
+------------
+
+.. code-block:: sh
+
+ #!/bin/bash
+ echo "Hello from shell"
+ exit 0
+
+TypeScript Normalization
+------------------------
+
+.. code-block:: ts
+
+ interface User {
+ name: string;
+ age: number;
+ }
+
+C++ Normalization
+-----------------
+
+.. code-block:: c++
+
+ #include
+
+ int main() {
+ std::cout << "Hello" << std::endl;
+ return 0;
+ }
+
diff --git a/audit-cli/testdata/input-files/source/code-examples/example.cpp b/audit-cli/testdata/input-files/source/code-examples/example.cpp
new file mode 100644
index 0000000..15f17b0
--- /dev/null
+++ b/audit-cli/testdata/input-files/source/code-examples/example.cpp
@@ -0,0 +1,9 @@
+#include
+#include
+
+int main() {
+ std::string message = "Hello from C++!";
+ std::cout << message << std::endl;
+ return 0;
+}
+
diff --git a/audit-cli/testdata/input-files/source/code-examples/example.go b/audit-cli/testdata/input-files/source/code-examples/example.go
new file mode 100644
index 0000000..6c4129b
--- /dev/null
+++ b/audit-cli/testdata/input-files/source/code-examples/example.go
@@ -0,0 +1,8 @@
+package main
+
+import "fmt"
+
+func main() {
+ fmt.Println("Hello from Go!")
+}
+
diff --git a/audit-cli/testdata/input-files/source/code-examples/example.js b/audit-cli/testdata/input-files/source/code-examples/example.js
new file mode 100644
index 0000000..eb4f156
--- /dev/null
+++ b/audit-cli/testdata/input-files/source/code-examples/example.js
@@ -0,0 +1,10 @@
+// JavaScript example
+console.log("Before function");
+
+// start-greet
+function greet(name) {
+ return `Hello, ${name}!`;
+}
+
+console.log(greet("World"));
+
diff --git a/audit-cli/testdata/input-files/source/code-examples/example.php b/audit-cli/testdata/input-files/source/code-examples/example.php
new file mode 100644
index 0000000..1763a58
--- /dev/null
+++ b/audit-cli/testdata/input-files/source/code-examples/example.php
@@ -0,0 +1,12 @@
+ 'localhost',
+ 'port' => 27017
+];
+// end-init
+
+function connect($config) {
+ return new MongoDB\Client("mongodb://{$config['host']}:{$config['port']}");
+}
+
diff --git a/audit-cli/testdata/input-files/source/code-examples/example.py b/audit-cli/testdata/input-files/source/code-examples/example.py
new file mode 100644
index 0000000..f932099
--- /dev/null
+++ b/audit-cli/testdata/input-files/source/code-examples/example.py
@@ -0,0 +1,16 @@
+# Python example file
+import sys
+
+# start-hello
+ def hello_world():
+ """Print hello world message."""
+ print("Hello, World!")
+ return True
+# end-hello
+
+def main():
+ hello_world()
+
+if __name__ == "__main__":
+ main()
+
diff --git a/audit-cli/testdata/input-files/source/code-examples/example.rb b/audit-cli/testdata/input-files/source/code-examples/example.rb
new file mode 100644
index 0000000..8374655
--- /dev/null
+++ b/audit-cli/testdata/input-files/source/code-examples/example.rb
@@ -0,0 +1,11 @@
+ # Ruby example
+ class Greeter
+ def initialize(name)
+ @name = name
+ end
+
+ def greet
+ puts "Hello, #{@name}!"
+ end
+ end
+
diff --git a/audit-cli/testdata/input-files/source/code-examples/example.ts b/audit-cli/testdata/input-files/source/code-examples/example.ts
new file mode 100644
index 0000000..047b1ee
--- /dev/null
+++ b/audit-cli/testdata/input-files/source/code-examples/example.ts
@@ -0,0 +1,10 @@
+// TypeScript example
+interface User {
+ name: string;
+ age: number;
+}
+
+function greetUser(user: User): string {
+ return `Hello, ${user.name}!`;
+}
+
diff --git a/audit-cli/testdata/input-files/source/duplicate-include-test.rst b/audit-cli/testdata/input-files/source/duplicate-include-test.rst
new file mode 100644
index 0000000..5dd89a0
--- /dev/null
+++ b/audit-cli/testdata/input-files/source/duplicate-include-test.rst
@@ -0,0 +1,19 @@
+Duplicate Include Test
+======================
+
+This file includes the same file twice to test deduplication.
+
+.. include:: /includes/intro.rst
+
+Middle Content
+--------------
+
+Some content in the middle.
+
+.. include:: /includes/intro.rst
+
+End Content
+-----------
+
+More content at the end.
+
diff --git a/audit-cli/testdata/input-files/source/include-test.rst b/audit-cli/testdata/input-files/source/include-test.rst
new file mode 100644
index 0000000..1d352cf
--- /dev/null
+++ b/audit-cli/testdata/input-files/source/include-test.rst
@@ -0,0 +1,14 @@
+Include Directive Test
+======================
+
+This file tests include directive following.
+
+.. include:: /includes/intro.rst
+
+Main Content
+------------
+
+Some main content here.
+
+.. include:: /includes/examples.rst
+
diff --git a/audit-cli/testdata/input-files/source/includes/examples.rst b/audit-cli/testdata/input-files/source/includes/examples.rst
new file mode 100644
index 0000000..96b554e
--- /dev/null
+++ b/audit-cli/testdata/input-files/source/includes/examples.rst
@@ -0,0 +1,8 @@
+Examples
+--------
+
+Here's a simple example:
+
+.. literalinclude:: /code-examples/example.go
+ :language: golang
+
diff --git a/audit-cli/testdata/input-files/source/includes/intro.rst b/audit-cli/testdata/input-files/source/includes/intro.rst
new file mode 100644
index 0000000..4bed946
--- /dev/null
+++ b/audit-cli/testdata/input-files/source/includes/intro.rst
@@ -0,0 +1,5 @@
+Introduction
+------------
+
+This is an included introduction section.
+
diff --git a/audit-cli/testdata/input-files/source/includes/nested-include.rst b/audit-cli/testdata/input-files/source/includes/nested-include.rst
new file mode 100644
index 0000000..b461662
--- /dev/null
+++ b/audit-cli/testdata/input-files/source/includes/nested-include.rst
@@ -0,0 +1,7 @@
+Nested Include Example
+======================
+
+This file includes another file.
+
+.. include:: /includes/intro.rst
+
diff --git a/audit-cli/testdata/input-files/source/io-code-block-test.rst b/audit-cli/testdata/input-files/source/io-code-block-test.rst
new file mode 100644
index 0000000..52402f7
--- /dev/null
+++ b/audit-cli/testdata/input-files/source/io-code-block-test.rst
@@ -0,0 +1,146 @@
+==========================
+IO Code Block Test
+==========================
+
+This file tests io-code-block directives with input and output sub-directives.
+
+Test 1: Inline Input and Output
+=================================
+
+.. io-code-block::
+ :copyable: true
+
+ .. input::
+ :language: javascript
+
+ db.restaurants.aggregate( [ { $match: { category: "cafe" } } ] )
+
+ .. output::
+ :language: javascript
+
+ [
+ { _id: 1, category: 'café', status: 'Open' },
+ { _id: 2, category: 'cafe', status: 'open' },
+ { _id: 3, category: 'cafE', status: 'open' }
+ ]
+
+Test 2: File-based Input and Output
+=====================================
+
+.. io-code-block::
+
+ .. input:: /code-examples/example.js
+ :language: javascript
+
+ .. output:: /code-examples/example-output.txt
+ :language: text
+
+Test 3: Python Example with Inline Code
+=========================================
+
+.. io-code-block::
+
+ .. input::
+ :language: python
+
+ from pymongo import MongoClient
+ client = MongoClient('mongodb://localhost:27017')
+ db = client.test_database
+ collection = db.test_collection
+ result = collection.insert_one({'name': 'Alice', 'age': 30})
+ print(result.inserted_id)
+
+ .. output::
+ :language: python
+
+ ObjectId('507f1f77bcf86cd799439011')
+
+Test 4: Shell Command Example
+===============================
+
+.. io-code-block::
+ :copyable: true
+
+ .. input::
+ :language: sh
+
+ mongosh --eval "db.users.find({age: {$gt: 25}})"
+
+ .. output::
+ :language: json
+
+ [
+ { "_id": 1, "name": "Alice", "age": 30 },
+ { "_id": 2, "name": "Bob", "age": 35 }
+ ]
+
+Test 5: TypeScript Example
+============================
+
+.. io-code-block::
+
+ .. input::
+ :language: ts
+
+ import { MongoClient } from 'mongodb';
+
+ const client = new MongoClient('mongodb://localhost:27017');
+ await client.connect();
+ const db = client.db('mydb');
+ const result = await db.collection('users').findOne({ name: 'Alice' });
+ console.log(result);
+
+ .. output::
+ :language: json
+
+ { "_id": 1, "name": "Alice", "age": 30, "email": "alice@example.com" }
+
+Test 6: Nested Inside Procedure Step
+======================================
+
+.. procedure::
+
+ .. step:: Query the database
+
+ Run the following query:
+
+ .. io-code-block::
+ :copyable: true
+
+ .. input::
+ :language: javascript
+
+ db.inventory.find({ status: "A" })
+
+ .. output::
+ :language: javascript
+
+ [
+ { _id: 1, item: "journal", status: "A" },
+ { _id: 2, item: "notebook", status: "A" }
+ ]
+
+Test 7: Input Only (No Output)
+================================
+
+.. io-code-block::
+
+ .. input::
+ :language: go
+
+ package main
+
+ import (
+ "context"
+ "go.mongodb.org/mongo-driver/mongo"
+ "go.mongodb.org/mongo-driver/mongo/options"
+ )
+
+ func main() {
+ client, err := mongo.Connect(context.TODO(), options.Client().ApplyURI("mongodb://localhost:27017"))
+ if err != nil {
+ panic(err)
+ }
+ defer client.Disconnect(context.TODO())
+ }
+
diff --git a/audit-cli/testdata/input-files/source/literalinclude-test.rst b/audit-cli/testdata/input-files/source/literalinclude-test.rst
new file mode 100644
index 0000000..d6c4e44
--- /dev/null
+++ b/audit-cli/testdata/input-files/source/literalinclude-test.rst
@@ -0,0 +1,53 @@
+Literalinclude Test
+===================
+
+This file tests various literalinclude directive scenarios.
+
+Python with start-after and end-before
+---------------------------------------
+
+.. literalinclude:: /code-examples/example.py
+ :language: python
+ :start-after: start-hello
+ :end-before: end-hello
+ :dedent:
+
+Go full file
+------------
+
+.. literalinclude:: /code-examples/example.go
+ :language: go
+
+JavaScript with start-after only
+---------------------------------
+
+.. literalinclude:: /code-examples/example.js
+ :language: javascript
+ :start-after: start-greet
+
+PHP with end-before only
+-------------------------
+
+.. literalinclude:: /code-examples/example.php
+ :language: php
+ :end-before: end-init
+
+Ruby with dedent
+----------------
+
+.. literalinclude:: /code-examples/example.rb
+ :language: ruby
+ :dedent:
+
+TypeScript language normalization
+----------------------------------
+
+.. literalinclude:: /code-examples/example.ts
+ :language: ts
+
+C++ language normalization
+---------------------------
+
+.. literalinclude:: /code-examples/example.cpp
+ :language: c++
+
diff --git a/audit-cli/testdata/input-files/source/nested-code-block-test.rst b/audit-cli/testdata/input-files/source/nested-code-block-test.rst
new file mode 100644
index 0000000..32f8ba2
--- /dev/null
+++ b/audit-cli/testdata/input-files/source/nested-code-block-test.rst
@@ -0,0 +1,167 @@
+==========================
+Nested Code Block Test
+==========================
+
+This file tests code-block directives that are nested inside other directives.
+
+Test 1: Code Block Inside Procedure Step
+==========================================
+
+.. procedure::
+ :style: normal
+
+ .. step:: Create a database connection
+
+ First, establish a connection to the database:
+
+ .. code-block:: javascript
+ :copyable: true
+
+ const { MongoClient } = require('mongodb');
+ const client = new MongoClient('mongodb://localhost:27017');
+ await client.connect();
+
+ .. step:: Insert a document
+
+ Next, insert a document into the collection:
+
+ .. code-block:: javascript
+ :copyable: true
+
+ const db = client.db('myDatabase');
+ const collection = db.collection('myCollection');
+ const result = await collection.insertOne({ name: 'Alice', age: 30 });
+ console.log('Inserted document:', result.insertedId);
+
+ .. step:: Query the document
+
+ Finally, query the document you just inserted:
+
+ .. code-block:: javascript
+
+ const doc = await collection.findOne({ name: 'Alice' });
+ console.log('Found document:', doc);
+
+Test 2: Code Block Inside Note Directive
+==========================================
+
+.. note::
+
+ When using transactions, you must use a session:
+
+ .. code-block:: python
+ :emphasize-lines: 2,3
+
+ client = MongoClient('mongodb://localhost:27017')
+ session = client.start_session()
+ with session.start_transaction():
+ collection.insert_one({'x': 1}, session=session)
+ collection.update_one({'x': 1}, {'$set': {'y': 2}}, session=session)
+
+Test 3: Code Block Inside Important Directive
+===============================================
+
+.. important::
+
+ Always validate user input before processing:
+
+ .. code-block:: go
+
+ func validateInput(input string) error {
+ if len(input) == 0 {
+ return errors.New("input cannot be empty")
+ }
+ if len(input) > 100 {
+ return errors.New("input too long")
+ }
+ return nil
+ }
+
+Test 4: Deeply Nested Code Block
+==================================
+
+.. container:: example
+
+ .. admonition:: Example: Multi-step Process
+
+ This example shows a multi-step process:
+
+ .. procedure::
+
+ .. step:: Initialize the system
+
+ .. code-block:: typescript
+
+ interface Config {
+ host: string;
+ port: number;
+ }
+
+ const config: Config = {
+ host: 'localhost',
+ port: 27017
+ };
+
+ .. step:: Connect to the database
+
+ .. code-block:: typescript
+
+ import { MongoClient } from 'mongodb';
+
+ const client = new MongoClient(`mongodb://${config.host}:${config.port}`);
+ await client.connect();
+ console.log('Connected successfully');
+
+Test 5: Code Block Inside Warning
+===================================
+
+.. warning::
+
+ Do not use this pattern in production:
+
+ .. code-block:: sh
+
+ # This is insecure!
+ chmod 777 /var/lib/mongodb
+ chown nobody:nobody /var/lib/mongodb
+
+Test 6: Multiple Code Blocks in Same Parent
+=============================================
+
+.. tip::
+
+ You can use either syntax for connecting:
+
+ **Option 1: Connection String**
+
+ .. code-block:: ruby
+
+ require 'mongo'
+ client = Mongo::Client.new('mongodb://localhost:27017/mydb')
+
+ **Option 2: Hash Options**
+
+ .. code-block:: ruby
+
+ require 'mongo'
+ client = Mongo::Client.new(['localhost:27017'], database: 'mydb')
+
+Test 7: Code Block with No Language Inside Directive
+======================================================
+
+.. note::
+
+ Here's a sample configuration file:
+
+ .. code-block::
+
+ {
+ "database": {
+ "host": "localhost",
+ "port": 27017
+ },
+ "logging": {
+ "level": "info"
+ }
+ }
+
diff --git a/audit-cli/testdata/input-files/source/nested-include-test.rst b/audit-cli/testdata/input-files/source/nested-include-test.rst
new file mode 100644
index 0000000..4d8c475
--- /dev/null
+++ b/audit-cli/testdata/input-files/source/nested-include-test.rst
@@ -0,0 +1,14 @@
+Nested Include Test
+===================
+
+This file tests nested include directives.
+
+.. include:: /includes/nested-include.rst
+
+Main Content
+------------
+
+Some main content here.
+
+.. include:: /includes/examples.rst
+
diff --git a/audit-cli/testdata/search-test-files/curl-examples.txt b/audit-cli/testdata/search-test-files/curl-examples.txt
new file mode 100644
index 0000000..53e8960
--- /dev/null
+++ b/audit-cli/testdata/search-test-files/curl-examples.txt
@@ -0,0 +1,4 @@
+This file contains curl command examples.
+Use curl to make HTTP requests.
+The curl tool is very useful.
+
diff --git a/audit-cli/testdata/search-test-files/libcurl-examples.txt b/audit-cli/testdata/search-test-files/libcurl-examples.txt
new file mode 100644
index 0000000..cc5a822
--- /dev/null
+++ b/audit-cli/testdata/search-test-files/libcurl-examples.txt
@@ -0,0 +1,4 @@
+This file uses libcurl library.
+The libcurl API is powerful.
+You can use libcurl in C programs.
+
diff --git a/audit-cli/testdata/search-test-files/mixed-case.txt b/audit-cli/testdata/search-test-files/mixed-case.txt
new file mode 100644
index 0000000..a41a8e2
--- /dev/null
+++ b/audit-cli/testdata/search-test-files/mixed-case.txt
@@ -0,0 +1,4 @@
+This file has CURL in uppercase.
+Also has Curl in mixed case.
+And curl in lowercase.
+
diff --git a/audit-cli/testdata/search-test-files/no-match.txt b/audit-cli/testdata/search-test-files/no-match.txt
new file mode 100644
index 0000000..3c642e7
--- /dev/null
+++ b/audit-cli/testdata/search-test-files/no-match.txt
@@ -0,0 +1,3 @@
+This file does not contain the search term.
+It has other content but not what we're looking for.
+
diff --git a/audit-cli/testdata/search-test-files/python-code.py b/audit-cli/testdata/search-test-files/python-code.py
new file mode 100644
index 0000000..50c3648
--- /dev/null
+++ b/audit-cli/testdata/search-test-files/python-code.py
@@ -0,0 +1,8 @@
+import requests
+
+# Use curl or requests library
+def fetch_data():
+ # curl alternative in Python
+ response = requests.get('https://api.example.com')
+ return response.json()
+
diff --git a/audit-cli/testdata/search-test-files/word-boundaries.txt b/audit-cli/testdata/search-test-files/word-boundaries.txt
new file mode 100644
index 0000000..3dc9ad5
--- /dev/null
+++ b/audit-cli/testdata/search-test-files/word-boundaries.txt
@@ -0,0 +1,8 @@
+Testing word boundaries:
+curl is a tool
+libcurl is a library
+curlopt is an option
+_curl_ with underscores
+curl-config is a script
+precurl and postcurl
+