A minimal Git implementation in Rust designed for learning Git internals and understanding how version control systems work under the hood.
This project implements core Git functionality from scratch to understand:
- How Git stores objects (blobs, trees, commits) using content-addressed storage
- How the staging area (index) works as a three-way merge preparation
- How references and branches are managed through the filesystem
- Git's object model and SHA-1 hash-based storage system
- How Git tracks file changes across working directory, staging area, and commits
Git-rs supports two modes for different learning and testing needs:
When you run git-rs init
, it creates:
.git-rs/
directory (not.git/
).git-rs/git-rs-index
file (not.git/index
)
This allows you to:
- Run git-rs commands in existing Git repositories without conflicts
- Compare git-rs behavior with real Git side-by-side
- Learn safely without affecting your actual Git workflow
When you run git-rs --git-compat init
, it creates:
.git/
directory (standard Git structure).git/index
file (standard Git index)
This enables you to:
- Test git-rs output with real Git commands
- Verify compatibility with existing Git tools
- Switch between git-rs and git seamlessly
Examples:
# Safe learning mode (default)
git-rs init # Creates .git-rs/
git-rs add file.txt # Uses .git-rs/git-rs-index
# Git compatibility mode
git-rs --git-compat init # Creates .git/
git-rs --git-compat add file.txt # Uses .git/index
git status # Can use real Git to check!
This project follows DDD principles with clean separation of concerns:
src/
├── main.rs # CLI entry point with clap
├── lib.rs # Library exports and error handling
├── domain/ # 🧠 Core business logic
│ ├── repository.rs # Repository aggregate root
│ ├── objects.rs # Git objects (Blob, Tree, Commit)
│ ├── references.rs # HEAD, branches, tags
│ └── index.rs # Staging area model
├── infrastructure/ # 💾 Persistence layer
│ ├── object_store.rs # File-based object database
│ ├── ref_store.rs # Reference file management
│ └── index_store.rs # Index file serialization
├── application/ # 🎯 Use cases (commands)
│ ├── init.rs # ✅ Repository initialization
│ ├── add.rs # ✅ File staging
│ ├── status.rs # ✅ Working tree status
│ ├── commit.rs # ✅ Commit creation
│ ├── diff.rs # ✅ Content comparison
│ └── clone.rs # ✅ Repository cloning
└── cli/ # 🖥️ Command line interface
└── commands.rs # Command handlers and user interaction
Layer Responsibilities:
- Domain: Pure business logic, no I/O dependencies
- Infrastructure: File system operations, serialization
- Application: Orchestrates domain and infrastructure
- CLI: User interface and command parsing
.git-rs/
├── objects/ # Content-addressed object database
│ ├── 5a/
│ │ └── 1b2c3d... # Blob object (file content)
│ ├── ab/
│ │ └── cd1234... # Tree object (directory listing)
│ └── ef/
│ └── 567890... # Commit object (snapshot + metadata)
├── refs/ # Reference storage
│ ├── heads/ # Branch references
│ │ ├── main # Contains: "5abc123def..."
│ │ └── feature-x # Contains: "7def456ghi..."
│ └── tags/ # Tag references
├── HEAD # Current branch pointer
├── git-rs-index # Staging area (JSON format)
├── config # Repository configuration
└── description # Repository description
Working Directory → Staging Area → Repository
(files) (git-rs-index) (objects/)
│ │ │
│── git add ─────────▶ │
│ │── commit ───▶
│◀────────── checkout ──────────────│
Object Content → SHA-1 Hash → Storage Path
"Hello World" → a1b2c3... → .git-rs/objects/a1/b2c3...
Object Format:
"<type> <size>\0<content>"
"blob 11\0Hello World"
What it does:
- Creates
.git-rs/
directory structure - Initializes object database with proper subdirectories
- Sets up reference system (HEAD pointing to refs/heads/main)
- Creates configuration files
Educational Insights:
- How Git creates a repository from scratch
- Directory structure and file organization
- Reference initialization and HEAD management
Example:
git-rs init
# Creates: .git-rs/{objects,refs/{heads,tags},HEAD,config,description}
What it does:
- Reads file content and calculates SHA-1 hash
- Creates blob objects in object database with zlib compression
- Updates staging area (git-rs-index) with file paths and hashes
- Handles multiple files and directory recursion
Educational Insights:
- Content-addressed storage: identical content = same hash = same object
- How Git tracks file changes through content hashing
- The role of the staging area in preparing commits
- Object creation and compression techniques
Example:
git-rs add README.md src/
# Creates blob objects and updates git-rs-index
Internal Process:
- Read file content:
"Hello World"
- Create blob:
"blob 11\0Hello World"
- Calculate hash:
SHA-1("blob 11\0Hello World") = 5ab2c3d...
- Store compressed object:
.git-rs/objects/5a/b2c3d...
- Update index:
{"README.md": {"hash": "5ab2c3d...", ...}}
What it does:
- Compares working directory against staging area and last commit
- Categorizes file changes: staged, modified, deleted, untracked
- Shows current branch and commit information
- Respects
.gitignore
patterns
Educational Insights:
- Git's "three trees" concept: working directory, index, HEAD
- How Git determines file status through hash comparison
- The relationship between different file states
- Gitignore pattern matching
Status Categories:
Changes to be committed: # In index, different from HEAD
new file: README.md
modified: src/main.rs
Changes not staged: # In working dir, different from index
modified: README.md
deleted: old_file.txt
Untracked files: # In working dir, not in index
new_feature.rs
What it does:
- Creates tree objects from staged files in index
- Generates commit objects with metadata (author, timestamp, message)
- Updates branch references to point to new commit
- Handles both root commits and commits with parents
- Validates commit messages and detects empty commits
Educational Insights:
- How Git creates immutable snapshots from staged changes
- Tree object construction and hierarchical file organization
- Commit object format with parent relationships
- Reference management and branch pointer updates
- The difference between root commits and regular commits
Example:
git-rs commit -m "Initial implementation"
# Creates tree object, commit object, and updates branch ref
Internal Process:
- Load staged files from index:
git-rs-index
- Create tree entries:
{name: "README.md", mode: 100644, hash: "5ab2c3d..."}
- Store tree object:
tree 42\0<tree-content>
→7def456ghi...
- Create commit object with:
- Tree hash:
7def456ghi...
- Parent commits (if any)
- Author/committer signatures
- Commit message
- Tree hash:
- Store commit object:
commit 156\0<commit-content>
→9abc123def...
- Update branch reference:
.git-rs/refs/heads/main
→9abc123def...
Commit Object Format:
tree 7def456ghi789...
parent 1abc234def567... (if not root commit)
author John Doe <john@example.com> 1692000000 +0000
committer John Doe <john@example.com> 1692000000 +0000
Initial implementation
What it does:
- Generates unified diff format output showing line-by-line changes
- Compares working directory vs staging area (default)
- Compares staging area vs last commit (with
--cached
) - Detects binary files and handles them appropriately
- Shows file headers and line context for all changes
Educational Insights:
- How diff algorithms work to compare file contents
- Understanding unified diff format used by patch tools
- The difference between staged and unstaged changes
- Binary file detection and handling strategies
Example:
# Show unstaged changes
git-rs diff
# Show staged changes
git-rs diff --cached
Output Format:
diff --git a/README.md b/README.md
index 1234567..abcdefg 100644
--- a/README.md
+++ b/README.md
@@ -1,3 +1,4 @@
# Git-RS
-Old line
+New line
+Added line
Complete HTTP-based repository cloning with educational insights.
# Clone to directory with same name as repository
git-rs clone https://github.com/user/repo.git
# Clone to custom directory name
git-rs clone https://github.com/user/repo.git my-project
# Clone specific branch
git-rs clone --branch develop https://github.com/user/repo.git
Features:
- HTTP Git protocol implementation
- Pack file transfer and processing
- Remote reference discovery and mapping
- Working directory checkout
- Educational wire protocol documentation
Status: Not yet implemented (placeholder exists in CLI)
git-rs log # Show all commit history
git-rs log -n 5 # Show last 5 commits
Will implement:
- Commit graph traversal and display
- Parent relationship following
- Chronological sorting with metadata
- Formatted history output
git-rs status
# Shows comprehensive file state analysis
Git uses SHA-1 content addressing for all objects:
// Object format: "<type> <size>\0<content>"
let blob_content = b"Hello World";
let object_content = format!("blob {}\0", blob_content.len());
let full_content = [object_content.as_bytes(), blob_content].concat();
let hash = sha1::digest(&full_content); // "5ab2c3d4e5f6..."
Why this matters:
-
Identical content produces identical hashes
-
Deduplication: same file content stored only once
-
Integrity: any corruption changes the hash
-
Distributed: objects can be safely shared between repositories Our test suite covers:
-
Unit tests: Individual component behavior
-
Integration tests: Command workflows
-
Property tests: Hash consistency, object integrity
-
Cross-platform tests: macOS, Linux compatibility (Windows not supported)
cargo test # Run all tests
cargo test --test integration # Integration tests only
cargo test domain:: # Domain layer tests
# Initialize repository
git-rs init
# Add files to staging
git-rs add README.md src/
# Check status
git-rs status
# Create commit
git-rs commit -m "Initial implementation"
# View differences (when implemented)
git-rs diff
git-rs diff --staged
# Examine object database
find .git-rs/objects -type f
file .git-rs/objects/5a/b2c3d4...
# View staging area
cat .git-rs/git-rs-index | jq .
# Check references
cat .git-rs/HEAD
cat .git-rs/refs/heads/main
After exploring this implementation, you'll understand:
- Git's Object Model: How blobs, trees, and commits form a directed acyclic graph
- Content Addressing: Why identical content produces identical hashes
- Three Trees: Working directory, index, and HEAD relationships
- Reference System: How branches and tags are just pointers to commits
- Staging Process: Why the index exists and how it enables powerful workflows
- File System Integration: How Git maps its abstract model to disk storage
Use these commands to explore git-rs internals:
# Object inspection
hexdump -C .git-rs/objects/5a/b2c3d4...
# Decompression (requires zlib tools)
zpipe -d < .git-rs/objects/5a/b2c3d4...
# Index inspection
jq . .git-rs/git-rs-index
# Reference tracking
find .git-rs/refs -type f -exec echo {} \; -exec cat {} \;
Each command implementation includes:
- Comprehensive documentation: What, why, and how
- Visual diagrams: ASCII art showing data flow
- Code examples: Real-world usage patterns
- Test cases: Behavior verification
- Comparison with Git: How our implementation differs/matches
- 🏗️ Architecture Guide - System design and Git internals deep dive
- ⚡ Command Reference - Complete reference for all implemented commands
- 🔍 Git Internals Explained - Educational exploration of Git concepts
- 📊 Project Status - Development roadmap and contribution guidelines
- 🦀 API Documentation - Generated Rust API docs
For Git Beginners:
- Start with this README for project overview
- Read Git Internals Explained to understand core concepts
- Try hands-on examples in Command Reference
- Explore Architecture Guide for implementation details
For Developers:
- Review Project Status for current state and roadmap
- Study Architecture Guide for system design
- Check API Documentation for code reference
- Follow contribution guidelines below
For Rust Learners:
- Examine Architecture Guide for Domain-Driven Design patterns
- Browse API Documentation for Rust idioms
- Look at GitHub Actions workflows in
.github/workflows/
for CI/CD examples
- Git Object Model: How blobs, trees, and commits form a directed acyclic graph
- Content Addressing: SHA-1 hashing and object identification system
- Three Trees: Working directory, index, and HEAD relationships
- Reference System: How branches and tags are pointers to commits
- Domain-Driven Design: Clean architecture with separated concerns in Rust
- Testing Strategies: Unit tests, integration tests, and property-based testing
- Git concepts questions: Check Git Internals Explained
- Usage questions: See Command Reference
- Implementation questions: Review Architecture Guide
- Bug reports: Open an issue on GitHub
- Feature requests: Check Project Status first, then open an issue
This is primarily an educational project, but contributions are welcome:
- Bug fixes and improvements
- Additional test cases
- Documentation enhancements
- Performance optimizations
- New command implementations
Before committing changes, ensure code quality with our formatting script:
# Run all formatting and checks
./scripts/format.sh
# Or manually run individual tools:
cargo fmt # Rust code formatting
markdownlint-cli2 --fix "**/*.md" "!target/**" "!node_modules/**" # Markdown formatting
cargo clippy --all-targets --all-features -- -D warnings # Linting
cargo test # Test suite
Remember: This implementation uses .git-rs/
directories to avoid conflicts with real Git repositories, making it safe to experiment with in existing projects!