A deterministic codebase mapping tool. Watches source files, extracts AST-level facts, and builds a searchable graph database of symbols and references.
- Watches directories for file changes (Create/Modify/Delete)
- Extracts AST-level facts: functions, classes, methods, enums, modules
- Tracks symbol references: function calls and type references (7 languages)
- Builds call graphs: caller → callee relationships across indexed files (7 languages)
- Persists everything to a sqlitegraph database
- Handles errors gracefully - keeps running even when files are unreadable
- Shuts down cleanly on SIGINT/SIGTERM
- No semantic analysis or type checking
- No LSP server or language features
- No async runtimes or background thread pools
- No config files
- No web APIs or network services
- No automatic database cleanup
cargo install magellanOr build from source:
git clone https://github.com/oldnordic/magellan
cd magellan
cargo build --release
# Binary will be at target/release/magellan- Rust 1.70+
- Linux/macOS (signal handling uses Unix signals)
- SQLite 3 (via sqlitegraph dependency)
- Help: Use
--helpor-hwith any command to see usage information - Native-v2 Backend: Build with
--features native-v2for improved performance
# Start watching a project with initial scan
magellan watch --root /path/to/project --db ~/.cache/magellan/project.db --scan-initial
# Check status
magellan status --db /path/to/magellan.db
# List all indexed files
magellan files --db /path/to/magellan.db
# Query symbols in a file (or run --explain for selector help)
magellan query --db /path/to/magellan.db --file /path/to/file.rs
# Print the selector cheat sheet
magellan query --db /path/to/magellan.db --explain
# Show the byte/line span for a specific symbol
magellan query --db /path/to/magellan.db --file src/lib.rs --symbol main --show-extent
# Find a symbol by name (v1.5: use --symbol-id for precise lookup)
magellan find --db /path/to/magellan.db --name main
magellan find --db /path/to/magellan.db --symbol-id <SYMBOL_ID>
# Show all candidates for an ambiguous name
magellan find --db /path/to/magellan.db --ambiguous main
# List all symbols that match a glob pattern
magellan find --db /path/to/magellan.db --list-glob "handler_*"
# Show call references
magellan refs --db /path/to/magellan.db --name main --path /path/to/file.rs --direction out
# Query by labels
magellan label --db /path/to/magellan.db --list
magellan label --db /path/to/magellan.db --label rust --label fn
magellan label --db /path/to/magellan.db --label struct --show-code
# Get code chunks without re-reading files
magellan get --db /path/to/magellan.db --file /path/to/file.rs --symbol main
magellan get-file --db /path/to/magellan.db --file /path/to/file.rs
# List ambiguous symbols (v1.5)
magellan collisions --db /path/to/magellan.db
# Export to various formats (v1.5: jsonl, csv, scip, dot)
magellan export --db /path/to/magellan.db > codegraph.json
magellan export --db /path/to/magellan.db --format jsonl > codegraph.jsonl
magellan export --db /path/to/magellan.db --format scip --output codegraph.scip
magellan export --db /path/to/magellan.db --format dot | dot -Tpng -o graph.png
# Migrate database to latest schema (v1.5)
magellan migrate --db /path/to/magellan.dbmagellan watch --root <DIR> --db <FILE> [--debounce-ms <N>] [--scan-initial]Watch a directory for source file changes and index them into the database.
| Argument | Description |
|---|---|
--root <DIR> |
Directory to watch recursively (required) |
--db <FILE> |
Path to sqlitegraph database (required) |
--debounce-ms <N> |
Debounce delay in milliseconds (default: 500) |
--scan-initial |
Scan directory for source files on startup |
magellan status --db <FILE>Show database statistics.
$ magellan status --db ./magellan.db
files: 30
symbols: 349
references: 262
magellan files --db <FILE>List all indexed files.
$ magellan files --db ./magellan.db
30 indexed files:
/path/to/src/main.rs
/path/to/src/lib.rs
...
magellan query --db <FILE> --file <PATH> [--kind <KIND>] [--symbol <NAME>] [--show-extent]
magellan query --db <FILE> --explainList symbols in a file, optionally filtered by kind or symbol name. --symbol <NAME> narrows
output to a specific identifier, and --show-extent prints byte/line spans plus node IDs when
used with --symbol. --explain prints a selector cheat sheet covering available filters and
their syntax. Each result line shows both the human-friendly kind and a normalized tag in square
brackets (e.g., [fn], [struct]) so automation can ingest the output deterministically.
| Argument | Description |
|---|---|
--db <FILE> |
Path to database (required) |
--file <PATH> |
File path to query (required) |
--kind <KIND> |
Filter by symbol kind (optional) |
--symbol <NAME> |
Limit results to a specific symbol (optional) |
--show-extent |
Print byte + line ranges for the selected symbol (requires --symbol) |
--explain |
Show selector documentation instead of querying |
Valid kinds: Function, Method, Class, Interface, Enum, Module, Union, Namespace, TypeAlias
$ magellan query --db ./magellan.db --file src/main.rs --kind Function
/path/to/src/main.rs:
Line 13: Function print_usage
Line 64: Function parse_args
magellan find --db <FILE> --name <NAME> [--path <PATH>] [--symbol-id <ID>] [--ambiguous <NAME>] [--first]
magellan find --db <FILE> --list-glob "<PATTERN>"Find a symbol by name or preview all symbols that match a glob expression. Glob listings include node IDs for deterministic scripting (e.g., feeding results to refactoring tooling).
| Argument | Description |
|---|---|
--db <FILE> |
Path to database (required) |
--name <NAME> |
Symbol name to find |
--symbol-id <ID> |
Stable SymbolId for precise lookup (v1.5) |
--ambiguous <NAME> |
Show all candidates for an ambiguous name (v1.5) |
--path <PATH> |
Limit search to specific file (optional) |
--list-glob <PATTERN> |
List all symbol names that match the glob (mutually exclusive with --name) |
--first |
Use first match when ambiguous (deprecated; use --symbol-id) |
$ magellan find --db ./magellan.db --name main
Found "main":
File: /path/to/src/main.rs
Kind: Function
Location: Line 229, Column 0
$ magellan find --db ./magellan.db --ambiguous main
Ambiguous name "main" has 3 candidates:
[1] a1b2c3d4e5f67890123456789012ab - src/bin/main.rs::Function main
[2] b2c3d4e5f678901234567890123cd - src/lib.rs::Function main
[3] c3d4e5f6789012345678901234de - tests/integration_test.rs::Function main
magellan refs --db <FILE> --name <NAME> --path <PATH> [--direction <in|out>]Show incoming or outgoing calls for a symbol. Incoming calls include callers from other indexed files when the target symbol name is unique in the database.
| Argument | Description |
|---|---|
--db <FILE> |
Path to database (required) |
--name <NAME> |
Symbol name (required) |
--path <PATH> |
File path containing the symbol (required) |
| `--direction <in | out>` |
$ magellan refs --db ./magellan.db --name parse_args --path src/main.rs --direction in
Calls TO "parse_args":
From: main (Function) at /path/to/src/main.rs:237
magellan verify --root <DIR> --db <FILE>Compare database state vs filesystem and report differences.
Exit codes: 0 = up to date, 1 = issues found
magellan export --db <FILE> [--format json|jsonl|csv|scip|dot] [--output <PATH>] [--minify] [--include-collisions]Export all graph data to various formats.
| Argument | Description |
|---|---|
--db <FILE> |
Path to database (required) |
--format <FORMAT> |
Export format: json (default), jsonl, csv, scip, dot |
--output <PATH> |
Write to file instead of stdout |
--minify |
Use compact JSON (no pretty-printing) |
--no-symbols |
Exclude symbols from export |
--no-references |
Exclude references from export |
--no-calls |
Exclude calls from export |
--include-collisions |
Include collision groups (JSON only) |
Export Versions:
| Version | Changes |
|---|---|
| 2.0.0 | Added symbol_id, canonical_fqn, display_fqn fields to SymbolExport |
Format-Specific Version Encoding:
- JSON: Top-level
versionfield - JSONL: First line is
{"type":"Version","version":"2.0.0"} - CSV: Header comment
# Magellan Export Version: 2.0.0 - SCIP: Metadata includes version information
- DOT: No version field (graphviz format)
Examples:
# JSON export (default)
magellan export --db ./magellan.db > codegraph.json
# JSON Lines (one JSON object per line)
magellan export --db ./magellan.db --format jsonl > codegraph.jsonl
# CSV export
magellan export --db ./magellan.db --format csv > codegraph.csv
# SCIP export (binary, requires --output)
magellan export --db ./magellan.db --format scip --output codegraph.scip
# DOT graph format (pipe to graphviz)
magellan export --db ./magellan.db --format dot | dot -Tpng -o graph.png
# Include collision information in JSON
magellan export --db ./magellan.db --include-collisions > codegraph.jsonmagellan collisions --db <FILE> [--field <FIELD>] [--limit <N>]List ambiguous symbols that share the same FQN or display FQN (v1.5).
| Argument | Description |
|---|---|
--db <FILE> |
Path to database (required) |
--field <FIELD> |
Field to check: fqn, display_fqn, canonical_fqn (default: display_fqn) |
--limit <N> |
Maximum groups to show (default: 50) |
$ magellan collisions --db ./magellan.db
Collisions by display_fqn:
main (3)
[1] a1b2c3d4e5f67890123456789012ab src/bin/main.rs
my_crate::src/bin/main.rs::Function main
[2] b2c3d4e5f678901234567890123cd src/lib.rs
my_crate::src/lib.rs::Function main
[3] c3d4e5f6789012345678901234de tests/integration_test.rs
my_crate::tests/integration_test.rs::Function main
magellan migrate --db <FILE> [--dry-run] [--no-backup]Upgrade a Magellan database to the current schema version (v1.5).
| Argument | Description |
|---|---|
--db <FILE> |
Path to database (required) |
--dry-run |
Check version without migrating |
--no-backup |
Skip backup creation (not recommended) |
Migration Behavior:
- Creates timestamped backup before migration (
<db>.v<timestamp>.bak) - Uses SQLite transaction for atomicity (rollback on error)
- Shows old version and new version before running
- No-op if database already at current version
Schema Version 4 (v1.5 BLAKE3 SymbolId):
Version 4 introduces BLAKE3-based SymbolId and canonical_fqn/display_fqn fields:
- New symbols get 32-character BLAKE3 hash IDs (128 bits)
- Existing symbols have
symbol_id: nullin exports - To get BLAKE3 IDs for all symbols, re-index after migration
Magellan's database (--db <FILE>) stores all indexed code information.
Recommended: Place .db files outside watched directories.
Placing the database inside a watched directory can cause:
- The watcher to process the database as if it's a source file
- Export operations to include binary database content
- Circular file system events
Examples:
# Recommended: database outside watched directory
magellan watch --root /path/to/project --db ~/.cache/magellan/project.db --scan-initial
# Discouraged: database inside watched directory
magellan watch --root . --db ./magellan.db --scan-initial- Linux/macOS:
~/.cache/magellan/or~/.local/share/magellan/ - Windows:
%LOCALAPPDATA%\magellan\ - CI/CD: Use a cache directory outside the workspace
Magellan validates all file paths to prevent directory traversal attacks:
- Paths with
../patterns are validated before access - Symlinks pointing outside the project root are rejected
- Absolute paths outside the watched directory are blocked
These protections are implemented in src/validation.rs and applied during:
- Watcher event processing
- Directory scanning
- File indexing operations
magellan label --db <FILE> [--label <LABEL>]... [--list] [--count] [--show-code]Query symbols by labels. Labels are automatically assigned during indexing:
- Language labels:
rust,python,javascript,typescript,c,cpp,java - Symbol kind labels:
fn,method,struct,class,enum,interface,module,union,namespace,typealias
| Argument | Description |
|---|---|
--db <FILE> |
Path to database (required) |
--label <LABEL> |
Label to query (can be specified multiple times for AND semantics) |
--list |
List all available labels with counts |
--count |
Count entities with specified label(s) |
--show-code |
Show actual source code for each symbol |
$ magellan label --db ./magellan.db --list
12 labels in use:
rust (349)
fn (120)
struct (45)
method (89)
...
$ magellan label --db ./magellan.db --label rust --label fn
120 symbols with labels [rust, fn]:
main (fn) in src/main.rs [0-36]
new (fn) in src/user.rs [91-138]
...
$ magellan label --db ./magellan.db --label rust --label fn --show-code
120 symbols with labels [rust, fn]:
main (fn) in src/main.rs [0-36]
fn main() {
println!("Hello");
}
magellan get --db <FILE> --file <PATH> --symbol <NAME>Get code chunks for a specific symbol. Uses stored code chunks so you don't need to re-read source files.
| Argument | Description |
|---|---|
--db <FILE> |
Path to database (required) |
--file <PATH> |
File path (required) |
--symbol <NAME> |
Symbol name (required) |
magellan get-file --db <FILE> --file <PATH>Get all code chunks from a file. Useful for getting complete file contents without re-reading the source.
| Argument | Description |
|---|---|
--db <FILE> |
Path to database (required) |
--file <PATH> |
File path (required) |
| Language | Extensions | Parser |
|---|---|---|
| Rust | .rs | tree-sitter-rust |
| C | .c, .h | tree-sitter-c |
| C++ | .cpp, .cc, .cxx, .hpp, .h | tree-sitter-cpp |
| Java | .java | tree-sitter-java |
| JavaScript | .js, .mjs | tree-sitter-javascript |
| TypeScript | .ts, .tsx | tree-sitter-typescript |
| Python | .py | tree-sitter-python |
Nodes:
File- path, hash, timestampsSymbol- name, kind, byte spans, line/columnReference- file, referenced symbol, locationCall- file, caller, callee, location
Edges:
DEFINES- File -> SymbolREFERENCES- Reference -> SymbolCALLER- Symbol -> CallCALLS- Call -> Symbol
Symbol Kinds: Function, Method, Class, Interface, Enum, Module, Union, Namespace, TypeAlias, Unknown
Magellan continues processing even when individual files fail:
- Permission errors are logged and skipped
- Files with invalid syntax are skipped
- Database write errors cause exit (requires manual intervention)
src/
├── main.rs # CLI entry point
├── lib.rs # Public API
├── watcher.rs # Filesystem watcher
├── indexer.rs # Event coordination
├── references.rs # Reference/Call fact types
├── verify.rs # Database verification logic
├── ingest/
│ ├── mod.rs # Parser dispatcher & Rust parser
│ ├── detect.rs # Language detection
│ ├── pool.rs # Thread-local parser pool
│ ├── c.rs # C parser
│ ├── cpp.rs # C++ parser
│ ├── java.rs # Java parser
│ ├── javascript.rs # JavaScript parser
│ ├── typescript.rs # TypeScript parser
│ └── python.rs # Python parser
├── query_cmd.rs # Query command
├── find_cmd.rs # Find command
├── refs_cmd.rs # Refs command
├── verify_cmd.rs # Verify CLI handler
├── watch_cmd.rs # Watch CLI handler
├── output/ # Output formatting
├── common.rs # Shared utilities
├── validation.rs # Path validation
└── graph/
├── mod.rs # CodeGraph API
├── schema.rs # Node/edge types
├── files.rs # File operations
├── symbols.rs # Symbol operations
├── references.rs # Reference node operations
├── calls.rs # Call edge operations
├── call_ops.rs # Call node operations
├── ops.rs # Graph indexing operations
├── query.rs # Query operations
├── count.rs # Count operations
├── export.rs # JSON export
├── scan.rs # Scanning operations
├── freshness.rs # Freshness checking
├── cache.rs # LRU cache
└── tests.rs # Graph tests
cargo testTest coverage:
- Path validation tests: 24 tests for traversal protection, symlink handling, cross-platform paths
- Orphan detection tests: 12 tests verifying clean state after delete operations
- SCIP export tests: 7 round-trip tests verifying parseable protobuf output
- Call graph tests: 5 tests for cross-file method call resolution
- Symbol extraction tests: per-language tests for Rust, Python, Java, JavaScript, TypeScript, C, C++
- Graph operations tests: insert, delete, query operations across all node types
Tests pass on Linux (primary development platform). Other platforms not regularly tested.
Magellan uses thread-safe synchronization for concurrent access to shared state. The v1.7 migration from RefCell<T> to Arc<Mutex<T>> ensures data races are eliminated.
TSAN Test Suite:
# Run TSAN thread safety tests
cargo test --test tsan_thread_safety_testsWhat TSAN Detects:
- Data races from unsynchronized concurrent access
- Missing mutexes around shared mutable state
- Lock ordering violations that can cause deadlocks
Modules Tested:
FileSystemWatcher- Concurrent batch access, legacy pending statePipelineSharedState- Dirty path insertion, lock ordering
Current Status:
The TSAN test suite is created and all tests pass. However, running with actual ThreadSanitizer instrumentation (-Zsanitizer=thread) is currently blocked by Rust toolchain limitations (ABI mismatch errors in dependencies). See TEST-01-TSAN-RESULTS.md for details.
Manual Verification:
All concurrent state uses Arc<Mutex<T>>:
FileSystemWatcher::legacy_pending_batch: Arc<Mutex<Option<WatcherBatch>>>FileSystemWatcher::legacy_pending_index: Arc<Mutex<usize>>PipelineSharedState::dirty_paths: Arc<Mutex<BTreeSet<PathBuf>>
Lock ordering is enforced to prevent deadlocks:
- Acquire
dirty_pathslock first - Send wakeup signal while holding lock
- Release lock
- FQN collisions addressed in v1.5: Symbols with identical names in different files or modules may share the same display FQN. The
collisionscommand (v1.5) identifies these cases, and--symbol-idprovides stable BLAKE3-based identifiers for unambiguous symbol reference. Common in:mainfunctions across binaries, test functions, and methods with generic names (new,default, etc.) in impl blocks. - No semantic analysis: AST-level only; no type checking or cross-module resolution
- No incremental parsing: File changes trigger full re-parse of that file
- Cross-crate resolution: Rust symbols across crates are resolved by name only
- Testing: Primary development and testing on Linux; Windows and macOS not regularly tested in CI
GPL-3.0-or-later
- notify - Filesystem watching
- tree-sitter - AST parsing
- sqlitegraph - Graph persistence (repository)
- signal-hook - Signal handling
- walkdir - Directory scanning
- rayon - Parallel processing
- Repository: https://github.com/oldnordic/magellan
- Documentation: MANUAL.md
- Crates.io: https://crates.io/crates/magellan