Skip to content

oldnordic/magellan

Repository files navigation

Magellan

A deterministic codebase mapping tool. Watches source files, extracts AST-level facts, and builds a searchable graph database of symbols and references.

What Magellan Does

  • Watches directories for file changes (Create/Modify/Delete)
  • Extracts AST-level facts: functions, classes, methods, enums, modules
  • Tracks symbol references: function calls and type references (7 languages)
  • Builds call graphs: caller → callee relationships across indexed files (7 languages)
  • Persists everything to a sqlitegraph database
  • Handles errors gracefully - keeps running even when files are unreadable
  • Shuts down cleanly on SIGINT/SIGTERM

What Magellan Does NOT Do

  • No semantic analysis or type checking
  • No LSP server or language features
  • No async runtimes or background thread pools
  • No config files
  • No web APIs or network services
  • No automatic database cleanup

Installation

cargo install magellan

Or build from source:

git clone https://github.com/oldnordic/magellan
cd magellan
cargo build --release

# Binary will be at target/release/magellan

Requirements

  • Rust 1.70+
  • Linux/macOS (signal handling uses Unix signals)
  • SQLite 3 (via sqlitegraph dependency)

Features

  • Help: Use --help or -h with any command to see usage information
  • Native-v2 Backend: Build with --features native-v2 for improved performance

Quick Start

# Start watching a project with initial scan
magellan watch --root /path/to/project --db ~/.cache/magellan/project.db --scan-initial

# Check status
magellan status --db /path/to/magellan.db

# List all indexed files
magellan files --db /path/to/magellan.db

# Query symbols in a file (or run --explain for selector help)
magellan query --db /path/to/magellan.db --file /path/to/file.rs
# Print the selector cheat sheet
magellan query --db /path/to/magellan.db --explain
# Show the byte/line span for a specific symbol
magellan query --db /path/to/magellan.db --file src/lib.rs --symbol main --show-extent

# Find a symbol by name (v1.5: use --symbol-id for precise lookup)
magellan find --db /path/to/magellan.db --name main
magellan find --db /path/to/magellan.db --symbol-id <SYMBOL_ID>
# Show all candidates for an ambiguous name
magellan find --db /path/to/magellan.db --ambiguous main
# List all symbols that match a glob pattern
magellan find --db /path/to/magellan.db --list-glob "handler_*"

# Show call references
magellan refs --db /path/to/magellan.db --name main --path /path/to/file.rs --direction out

# Query by labels
magellan label --db /path/to/magellan.db --list
magellan label --db /path/to/magellan.db --label rust --label fn
magellan label --db /path/to/magellan.db --label struct --show-code

# Get code chunks without re-reading files
magellan get --db /path/to/magellan.db --file /path/to/file.rs --symbol main
magellan get-file --db /path/to/magellan.db --file /path/to/file.rs

# List ambiguous symbols (v1.5)
magellan collisions --db /path/to/magellan.db

# Export to various formats (v1.5: jsonl, csv, scip, dot)
magellan export --db /path/to/magellan.db > codegraph.json
magellan export --db /path/to/magellan.db --format jsonl > codegraph.jsonl
magellan export --db /path/to/magellan.db --format scip --output codegraph.scip
magellan export --db /path/to/magellan.db --format dot | dot -Tpng -o graph.png

# Migrate database to latest schema (v1.5)
magellan migrate --db /path/to/magellan.db

Commands

watch

magellan watch --root <DIR> --db <FILE> [--debounce-ms <N>] [--scan-initial]

Watch a directory for source file changes and index them into the database.

Argument Description
--root <DIR> Directory to watch recursively (required)
--db <FILE> Path to sqlitegraph database (required)
--debounce-ms <N> Debounce delay in milliseconds (default: 500)
--scan-initial Scan directory for source files on startup

status

magellan status --db <FILE>

Show database statistics.

$ magellan status --db ./magellan.db
files: 30
symbols: 349
references: 262

files

magellan files --db <FILE>

List all indexed files.

$ magellan files --db ./magellan.db
30 indexed files:
  /path/to/src/main.rs
  /path/to/src/lib.rs
  ...

query

magellan query --db <FILE> --file <PATH> [--kind <KIND>] [--symbol <NAME>] [--show-extent]
magellan query --db <FILE> --explain

List symbols in a file, optionally filtered by kind or symbol name. --symbol <NAME> narrows output to a specific identifier, and --show-extent prints byte/line spans plus node IDs when used with --symbol. --explain prints a selector cheat sheet covering available filters and their syntax. Each result line shows both the human-friendly kind and a normalized tag in square brackets (e.g., [fn], [struct]) so automation can ingest the output deterministically.

Argument Description
--db <FILE> Path to database (required)
--file <PATH> File path to query (required)
--kind <KIND> Filter by symbol kind (optional)
--symbol <NAME> Limit results to a specific symbol (optional)
--show-extent Print byte + line ranges for the selected symbol (requires --symbol)
--explain Show selector documentation instead of querying

Valid kinds: Function, Method, Class, Interface, Enum, Module, Union, Namespace, TypeAlias

$ magellan query --db ./magellan.db --file src/main.rs --kind Function
/path/to/src/main.rs:
  Line   13: Function     print_usage
  Line   64: Function     parse_args

find

magellan find --db <FILE> --name <NAME> [--path <PATH>] [--symbol-id <ID>] [--ambiguous <NAME>] [--first]
magellan find --db <FILE> --list-glob "<PATTERN>"

Find a symbol by name or preview all symbols that match a glob expression. Glob listings include node IDs for deterministic scripting (e.g., feeding results to refactoring tooling).

Argument Description
--db <FILE> Path to database (required)
--name <NAME> Symbol name to find
--symbol-id <ID> Stable SymbolId for precise lookup (v1.5)
--ambiguous <NAME> Show all candidates for an ambiguous name (v1.5)
--path <PATH> Limit search to specific file (optional)
--list-glob <PATTERN> List all symbol names that match the glob (mutually exclusive with --name)
--first Use first match when ambiguous (deprecated; use --symbol-id)
$ magellan find --db ./magellan.db --name main
Found "main":
  File:     /path/to/src/main.rs
  Kind:     Function
  Location: Line 229, Column 0

$ magellan find --db ./magellan.db --ambiguous main
Ambiguous name "main" has 3 candidates:
  [1] a1b2c3d4e5f67890123456789012ab - src/bin/main.rs::Function main
  [2] b2c3d4e5f678901234567890123cd - src/lib.rs::Function main
  [3] c3d4e5f6789012345678901234de - tests/integration_test.rs::Function main

refs

magellan refs --db <FILE> --name <NAME> --path <PATH> [--direction <in|out>]

Show incoming or outgoing calls for a symbol. Incoming calls include callers from other indexed files when the target symbol name is unique in the database.

Argument Description
--db <FILE> Path to database (required)
--name <NAME> Symbol name (required)
--path <PATH> File path containing the symbol (required)
`--direction <in out>`
$ magellan refs --db ./magellan.db --name parse_args --path src/main.rs --direction in
Calls TO "parse_args":
  From: main (Function) at /path/to/src/main.rs:237

verify

magellan verify --root <DIR> --db <FILE>

Compare database state vs filesystem and report differences.

Exit codes: 0 = up to date, 1 = issues found

export

magellan export --db <FILE> [--format json|jsonl|csv|scip|dot] [--output <PATH>] [--minify] [--include-collisions]

Export all graph data to various formats.

Argument Description
--db <FILE> Path to database (required)
--format <FORMAT> Export format: json (default), jsonl, csv, scip, dot
--output <PATH> Write to file instead of stdout
--minify Use compact JSON (no pretty-printing)
--no-symbols Exclude symbols from export
--no-references Exclude references from export
--no-calls Exclude calls from export
--include-collisions Include collision groups (JSON only)

Export Versions:

Version Changes
2.0.0 Added symbol_id, canonical_fqn, display_fqn fields to SymbolExport

Format-Specific Version Encoding:

  • JSON: Top-level version field
  • JSONL: First line is {"type":"Version","version":"2.0.0"}
  • CSV: Header comment # Magellan Export Version: 2.0.0
  • SCIP: Metadata includes version information
  • DOT: No version field (graphviz format)

Examples:

# JSON export (default)
magellan export --db ./magellan.db > codegraph.json

# JSON Lines (one JSON object per line)
magellan export --db ./magellan.db --format jsonl > codegraph.jsonl

# CSV export
magellan export --db ./magellan.db --format csv > codegraph.csv

# SCIP export (binary, requires --output)
magellan export --db ./magellan.db --format scip --output codegraph.scip

# DOT graph format (pipe to graphviz)
magellan export --db ./magellan.db --format dot | dot -Tpng -o graph.png

# Include collision information in JSON
magellan export --db ./magellan.db --include-collisions > codegraph.json

collisions

magellan collisions --db <FILE> [--field <FIELD>] [--limit <N>]

List ambiguous symbols that share the same FQN or display FQN (v1.5).

Argument Description
--db <FILE> Path to database (required)
--field <FIELD> Field to check: fqn, display_fqn, canonical_fqn (default: display_fqn)
--limit <N> Maximum groups to show (default: 50)
$ magellan collisions --db ./magellan.db
Collisions by display_fqn:

main (3)
  [1] a1b2c3d4e5f67890123456789012ab src/bin/main.rs
       my_crate::src/bin/main.rs::Function main
  [2] b2c3d4e5f678901234567890123cd src/lib.rs
       my_crate::src/lib.rs::Function main
  [3] c3d4e5f6789012345678901234de tests/integration_test.rs
       my_crate::tests/integration_test.rs::Function main

migrate

magellan migrate --db <FILE> [--dry-run] [--no-backup]

Upgrade a Magellan database to the current schema version (v1.5).

Argument Description
--db <FILE> Path to database (required)
--dry-run Check version without migrating
--no-backup Skip backup creation (not recommended)

Migration Behavior:

  • Creates timestamped backup before migration (<db>.v<timestamp>.bak)
  • Uses SQLite transaction for atomicity (rollback on error)
  • Shows old version and new version before running
  • No-op if database already at current version

Schema Version 4 (v1.5 BLAKE3 SymbolId):

Version 4 introduces BLAKE3-based SymbolId and canonical_fqn/display_fqn fields:

  • New symbols get 32-character BLAKE3 hash IDs (128 bits)
  • Existing symbols have symbol_id: null in exports
  • To get BLAKE3 IDs for all symbols, re-index after migration

Security

Database File Placement

Magellan's database (--db <FILE>) stores all indexed code information.

Recommended: Place .db files outside watched directories.

Placing the database inside a watched directory can cause:

  • The watcher to process the database as if it's a source file
  • Export operations to include binary database content
  • Circular file system events

Examples:

# Recommended: database outside watched directory
magellan watch --root /path/to/project --db ~/.cache/magellan/project.db --scan-initial

# Discouraged: database inside watched directory
magellan watch --root . --db ./magellan.db --scan-initial

Recommended Database Locations

  • Linux/macOS: ~/.cache/magellan/ or ~/.local/share/magellan/
  • Windows: %LOCALAPPDATA%\magellan\
  • CI/CD: Use a cache directory outside the workspace

Path Traversal Protection

Magellan validates all file paths to prevent directory traversal attacks:

  • Paths with ../ patterns are validated before access
  • Symlinks pointing outside the project root are rejected
  • Absolute paths outside the watched directory are blocked

These protections are implemented in src/validation.rs and applied during:

  • Watcher event processing
  • Directory scanning
  • File indexing operations

label

magellan label --db <FILE> [--label <LABEL>]... [--list] [--count] [--show-code]

Query symbols by labels. Labels are automatically assigned during indexing:

  • Language labels: rust, python, javascript, typescript, c, cpp, java
  • Symbol kind labels: fn, method, struct, class, enum, interface, module, union, namespace, typealias
Argument Description
--db <FILE> Path to database (required)
--label <LABEL> Label to query (can be specified multiple times for AND semantics)
--list List all available labels with counts
--count Count entities with specified label(s)
--show-code Show actual source code for each symbol
$ magellan label --db ./magellan.db --list
12 labels in use:
  rust (349)
  fn (120)
  struct (45)
  method (89)
  ...

$ magellan label --db ./magellan.db --label rust --label fn
120 symbols with labels [rust, fn]:
  main (fn) in src/main.rs [0-36]
  new (fn) in src/user.rs [91-138]
  ...

$ magellan label --db ./magellan.db --label rust --label fn --show-code
120 symbols with labels [rust, fn]:
  main (fn) in src/main.rs [0-36]
    fn main() {
        println!("Hello");
    }

get

magellan get --db <FILE> --file <PATH> --symbol <NAME>

Get code chunks for a specific symbol. Uses stored code chunks so you don't need to re-read source files.

Argument Description
--db <FILE> Path to database (required)
--file <PATH> File path (required)
--symbol <NAME> Symbol name (required)

get-file

magellan get-file --db <FILE> --file <PATH>

Get all code chunks from a file. Useful for getting complete file contents without re-reading the source.

Argument Description
--db <FILE> Path to database (required)
--file <PATH> File path (required)

Supported Languages

Language Extensions Parser
Rust .rs tree-sitter-rust
C .c, .h tree-sitter-c
C++ .cpp, .cc, .cxx, .hpp, .h tree-sitter-cpp
Java .java tree-sitter-java
JavaScript .js, .mjs tree-sitter-javascript
TypeScript .ts, .tsx tree-sitter-typescript
Python .py tree-sitter-python

Database Schema

Nodes:

  • File - path, hash, timestamps
  • Symbol - name, kind, byte spans, line/column
  • Reference - file, referenced symbol, location
  • Call - file, caller, callee, location

Edges:

  • DEFINES - File -> Symbol
  • REFERENCES - Reference -> Symbol
  • CALLER - Symbol -> Call
  • CALLS - Call -> Symbol

Symbol Kinds: Function, Method, Class, Interface, Enum, Module, Union, Namespace, TypeAlias, Unknown

Error Handling

Magellan continues processing even when individual files fail:

  • Permission errors are logged and skipped
  • Files with invalid syntax are skipped
  • Database write errors cause exit (requires manual intervention)

Architecture

src/
├── main.rs              # CLI entry point
├── lib.rs               # Public API
├── watcher.rs           # Filesystem watcher
├── indexer.rs           # Event coordination
├── references.rs        # Reference/Call fact types
├── verify.rs            # Database verification logic
├── ingest/
│   ├── mod.rs           # Parser dispatcher & Rust parser
│   ├── detect.rs        # Language detection
│   ├── pool.rs          # Thread-local parser pool
│   ├── c.rs             # C parser
│   ├── cpp.rs           # C++ parser
│   ├── java.rs          # Java parser
│   ├── javascript.rs    # JavaScript parser
│   ├── typescript.rs    # TypeScript parser
│   └── python.rs        # Python parser
├── query_cmd.rs         # Query command
├── find_cmd.rs          # Find command
├── refs_cmd.rs          # Refs command
├── verify_cmd.rs        # Verify CLI handler
├── watch_cmd.rs         # Watch CLI handler
├── output/              # Output formatting
├── common.rs            # Shared utilities
├── validation.rs        # Path validation
└── graph/
    ├── mod.rs           # CodeGraph API
    ├── schema.rs        # Node/edge types
    ├── files.rs         # File operations
    ├── symbols.rs       # Symbol operations
    ├── references.rs    # Reference node operations
    ├── calls.rs         # Call edge operations
    ├── call_ops.rs      # Call node operations
    ├── ops.rs           # Graph indexing operations
    ├── query.rs         # Query operations
    ├── count.rs         # Count operations
    ├── export.rs        # JSON export
    ├── scan.rs          # Scanning operations
    ├── freshness.rs     # Freshness checking
    ├── cache.rs         # LRU cache
    └── tests.rs         # Graph tests

Testing

cargo test

Test coverage:

  • Path validation tests: 24 tests for traversal protection, symlink handling, cross-platform paths
  • Orphan detection tests: 12 tests verifying clean state after delete operations
  • SCIP export tests: 7 round-trip tests verifying parseable protobuf output
  • Call graph tests: 5 tests for cross-file method call resolution
  • Symbol extraction tests: per-language tests for Rust, Python, Java, JavaScript, TypeScript, C, C++
  • Graph operations tests: insert, delete, query operations across all node types

Tests pass on Linux (primary development platform). Other platforms not regularly tested.

Thread Safety Testing (v1.7)

Magellan uses thread-safe synchronization for concurrent access to shared state. The v1.7 migration from RefCell<T> to Arc<Mutex<T>> ensures data races are eliminated.

TSAN Test Suite:

# Run TSAN thread safety tests
cargo test --test tsan_thread_safety_tests

What TSAN Detects:

  • Data races from unsynchronized concurrent access
  • Missing mutexes around shared mutable state
  • Lock ordering violations that can cause deadlocks

Modules Tested:

  • FileSystemWatcher - Concurrent batch access, legacy pending state
  • PipelineSharedState - Dirty path insertion, lock ordering

Current Status:

The TSAN test suite is created and all tests pass. However, running with actual ThreadSanitizer instrumentation (-Zsanitizer=thread) is currently blocked by Rust toolchain limitations (ABI mismatch errors in dependencies). See TEST-01-TSAN-RESULTS.md for details.

Manual Verification:

All concurrent state uses Arc<Mutex<T>>:

  • FileSystemWatcher::legacy_pending_batch: Arc<Mutex<Option<WatcherBatch>>>
  • FileSystemWatcher::legacy_pending_index: Arc<Mutex<usize>>
  • PipelineSharedState::dirty_paths: Arc<Mutex<BTreeSet<PathBuf>>

Lock ordering is enforced to prevent deadlocks:

  1. Acquire dirty_paths lock first
  2. Send wakeup signal while holding lock
  3. Release lock

Known Limitations

  • FQN collisions addressed in v1.5: Symbols with identical names in different files or modules may share the same display FQN. The collisions command (v1.5) identifies these cases, and --symbol-id provides stable BLAKE3-based identifiers for unambiguous symbol reference. Common in: main functions across binaries, test functions, and methods with generic names (new, default, etc.) in impl blocks.
  • No semantic analysis: AST-level only; no type checking or cross-module resolution
  • No incremental parsing: File changes trigger full re-parse of that file
  • Cross-crate resolution: Rust symbols across crates are resolved by name only
  • Testing: Primary development and testing on Linux; Windows and macOS not regularly tested in CI

License

GPL-3.0-or-later

Dependencies

  • notify - Filesystem watching
  • tree-sitter - AST parsing
  • sqlitegraph - Graph persistence (repository)
  • signal-hook - Signal handling
  • walkdir - Directory scanning
  • rayon - Parallel processing

Project Links

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Contributors 2

  •  
  •  

Languages