A comprehensive implementation of a Distributed Hash Table (DHT) using consistent hashing, featuring automatic node joining/leaving, failure tolerance, and data redistribution. This project demonstrates advanced distributed systems concepts including peer-to-peer networking, consistent hashing algorithms, and fault-tolerant distributed storage.
- 16-bit hash space (65,536 positions) using MD5 hashing
- Automatic load balancing as nodes join and leave
- Minimal data movement during topology changes
- Ring-based architecture for efficient key-to-node mapping
- Seamless node joining with automatic successor/predecessor updates
- Graceful node leaving with data migration to appropriate nodes
- Failure detection and recovery mechanisms
- Ring maintenance ensuring network connectivity
- Automatic file placement based on consistent hashing
- File replication for fault tolerance
- Dynamic file redistribution when nodes join/leave
- Efficient file retrieval with O(log N) lookup complexity
- Node failure detection and automatic recovery
- Data preservation during node failures
- Network partition handling
- Automatic ring repair mechanisms
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β DHT Ring Architecture β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β Node A (Key: 1000) ββ Node B (Key: 15000) ββ Node C β
β β β β
β βββββββββββ Node D (Key: 50000) βββββββββ β
β β
β β’ Each node maintains successor/predecessor pointers β
β β’ Files stored based on hash(filename) β responsible node β
β β’ Consistent hashing ensures minimal data movement β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
# Python 3.7 or higher
python3 --version
# Required packages (built-in)
# - socket, threading, hashlib, os, time
from src.DHT import Node
# Create first node (bootstrap node)
node1 = Node("localhost", 5000)
# Create additional nodes and join the network
node2 = Node("localhost", 5001)
node2.join(("localhost", 5000))
node3 = Node("localhost", 5002)
node3.join(("localhost", 5000))
# Store a file in the DHT
node1.put("example.txt") # File automatically placed on correct node
# Retrieve a file from the DHT
retrieved_file = node2.get("example.txt") # Returns filename or None
# Gracefully leave the network
node2.leave() # Transfers files to appropriate nodes
# Force kill a node (simulates failure)
node3.kill() # Network automatically recovers
cd distributed-hash-table
python3 src/check.py 5000
- β Initialization Test (1 point): Node creation and setup
- π Join Operations (9 points): Single node, two nodes, and multi-node scenarios
- π Put/Get Operations (10 points): File storage and retrieval
- π File Rehashing (5 points): Data redistribution when nodes join
- π Graceful Leave (5 points): Clean node departure with data transfer
- π₯ Failure Tolerance (10 points): Recovery from unexpected node failures
Total: 40 points maximum
def hasher(self, key):
return int(hashlib.md5(key.encode()).hexdigest(), 16) % (2**16)
- O(log N) average case complexity
- Ring traversal for key location
- Successor-based routing for efficiency
- Find successor using existing network
- Update predecessor of successor node
- Transfer relevant files from successor
- Update ring pointers for consistency
- Detect node failure through socket errors
- Update ring structure to bypass failed node
- Redistribute files to maintain availability
- Restore redundancy through replication
distributed-hash-table/
βββ src/
β βββ DHT.py # Main DHT implementation
β βββ check.py # Comprehensive test suite
βββ examples/
β βββ basic_usage.py # Simple DHT operations
β βββ multi_node_demo.py # Complex network scenarios
β βββ failure_simulation.py # Fault tolerance testing
βββ docs/
β βββ ALGORITHMS.md # Detailed algorithm explanations
β βββ API_REFERENCE.md # Complete API documentation
β βββ PERFORMANCE.md # Performance analysis and benchmarks
βββ tests/
β βββ unit_tests.py # Individual component tests
β βββ integration_tests.py # End-to-end system tests
βββ config/
β βββ network_configs.json # Sample network configurations
βββ README.md # This file
βββ LICENSE # MIT License
βββ requirements.txt # Python dependencies
- Content Distribution Networks (CDN)
- Peer-to-peer file sharing
- Distributed databases
- Cloud storage backends
- Web server load distribution
- Database sharding
- Microservice discovery
- Cache distribution
- Distributed systems coursework
- Algorithm implementation studies
- Network programming tutorials
- Fault tolerance research
- TCP socket communication for reliability
- Multi-threaded connection handling
- Asynchronous message processing
- Connection pooling and management
- File-based storage with directory organization
- Atomic file operations for consistency
- Metadata tracking for file locations
- Garbage collection for orphaned files
- Comprehensive logging of all operations
- Network topology visualization
- Performance metrics collection
- Debug utilities for troubleshooting
Operation | Time Complexity | Space Complexity | Network Hops |
---|---|---|---|
Lookup | O(log N) | O(1) | O(log N) |
Insert | O(log N) | O(1) | O(log N) |
Delete | O(log N) | O(1) | O(log N) |
Join | O(log N) | O(K) | O(log N) |
Leave | O(K) | O(K) | O(1) |
Where N = number of nodes, K = number of files per node
We welcome contributions! Please see our Contributing Guidelines for details.
git clone <repository-url>
cd distributed-hash-table
python3 -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
pip install -r requirements.txt
This project is licensed under the MIT License - see the LICENSE file for details.
- Consistent Hashing Algorithm - Karger et al.
- Chord Protocol - Stoica et al.
- Distributed Systems Principles - Various academic sources
Built with β€οΈ for distributed systems education and research