Skip to content

A Python-based implementation of a Distributed Hash Table (DHT) utilizing consistent hashing, supporting dynamic node joining and leaving, fault tolerance, and data redistribution.

License

Notifications You must be signed in to change notification settings

sdannyzaidi/distributed-hash-table

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

7 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸ”— Distributed Hash Table (DHT) Implementation

Python License Network Distributed

A comprehensive implementation of a Distributed Hash Table (DHT) using consistent hashing, featuring automatic node joining/leaving, failure tolerance, and data redistribution. This project demonstrates advanced distributed systems concepts including peer-to-peer networking, consistent hashing algorithms, and fault-tolerant distributed storage.

🌟 Key Features

πŸ”„ Consistent Hashing

  • 16-bit hash space (65,536 positions) using MD5 hashing
  • Automatic load balancing as nodes join and leave
  • Minimal data movement during topology changes
  • Ring-based architecture for efficient key-to-node mapping

🀝 Dynamic Node Management

  • Seamless node joining with automatic successor/predecessor updates
  • Graceful node leaving with data migration to appropriate nodes
  • Failure detection and recovery mechanisms
  • Ring maintenance ensuring network connectivity

πŸ“ Distributed File Storage

  • Automatic file placement based on consistent hashing
  • File replication for fault tolerance
  • Dynamic file redistribution when nodes join/leave
  • Efficient file retrieval with O(log N) lookup complexity

πŸ›‘οΈ Fault Tolerance

  • Node failure detection and automatic recovery
  • Data preservation during node failures
  • Network partition handling
  • Automatic ring repair mechanisms

πŸ—οΈ Architecture Overview

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                    DHT Ring Architecture                    β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚                                                             β”‚
β”‚    Node A (Key: 1000) ←→ Node B (Key: 15000) ←→ Node C     β”‚
β”‚           ↑                                        ↓        β”‚
β”‚           └────────── Node D (Key: 50000) β†β”€β”€β”€β”€β”€β”€β”€β”˜        β”‚
β”‚                                                             β”‚
β”‚  β€’ Each node maintains successor/predecessor pointers      β”‚
β”‚  β€’ Files stored based on hash(filename) β†’ responsible node β”‚
β”‚  β€’ Consistent hashing ensures minimal data movement        β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

πŸš€ Quick Start

Prerequisites

# Python 3.7 or higher
python3 --version

# Required packages (built-in)
# - socket, threading, hashlib, os, time

Basic Usage

1. Create and Start Nodes

from src.DHT import Node

# Create first node (bootstrap node)
node1 = Node("localhost", 5000)

# Create additional nodes and join the network
node2 = Node("localhost", 5001)
node2.join(("localhost", 5000))

node3 = Node("localhost", 5002)
node3.join(("localhost", 5000))

2. Store and Retrieve Files

# Store a file in the DHT
node1.put("example.txt")  # File automatically placed on correct node

# Retrieve a file from the DHT
retrieved_file = node2.get("example.txt")  # Returns filename or None

3. Node Management

# Gracefully leave the network
node2.leave()  # Transfers files to appropriate nodes

# Force kill a node (simulates failure)
node3.kill()  # Network automatically recovers

πŸ“Š Testing and Validation

Run Comprehensive Tests

cd distributed-hash-table
python3 src/check.py 5000

Test Categories

  • βœ… Initialization Test (1 point): Node creation and setup
  • πŸ”— Join Operations (9 points): Single node, two nodes, and multi-node scenarios
  • πŸ“ Put/Get Operations (10 points): File storage and retrieval
  • πŸ”„ File Rehashing (5 points): Data redistribution when nodes join
  • πŸ‘‹ Graceful Leave (5 points): Clean node departure with data transfer
  • πŸ’₯ Failure Tolerance (10 points): Recovery from unexpected node failures

Total: 40 points maximum

πŸ”§ Implementation Details

Hash Function

def hasher(self, key):
    return int(hashlib.md5(key.encode()).hexdigest(), 16) % (2**16)

Key Algorithms

Consistent Hashing Lookup

  • O(log N) average case complexity
  • Ring traversal for key location
  • Successor-based routing for efficiency

Node Join Protocol

  1. Find successor using existing network
  2. Update predecessor of successor node
  3. Transfer relevant files from successor
  4. Update ring pointers for consistency

Failure Recovery

  1. Detect node failure through socket errors
  2. Update ring structure to bypass failed node
  3. Redistribute files to maintain availability
  4. Restore redundancy through replication

πŸ“ Project Structure

distributed-hash-table/
β”œβ”€β”€ src/
β”‚   β”œβ”€β”€ DHT.py              # Main DHT implementation
β”‚   └── check.py            # Comprehensive test suite
β”œβ”€β”€ examples/
β”‚   β”œβ”€β”€ basic_usage.py      # Simple DHT operations
β”‚   β”œβ”€β”€ multi_node_demo.py  # Complex network scenarios
β”‚   └── failure_simulation.py # Fault tolerance testing
β”œβ”€β”€ docs/
β”‚   β”œβ”€β”€ ALGORITHMS.md       # Detailed algorithm explanations
β”‚   β”œβ”€β”€ API_REFERENCE.md    # Complete API documentation
β”‚   └── PERFORMANCE.md      # Performance analysis and benchmarks
β”œβ”€β”€ tests/
β”‚   β”œβ”€β”€ unit_tests.py       # Individual component tests
β”‚   └── integration_tests.py # End-to-end system tests
β”œβ”€β”€ config/
β”‚   └── network_configs.json # Sample network configurations
β”œβ”€β”€ README.md               # This file
β”œβ”€β”€ LICENSE                 # MIT License
└── requirements.txt        # Python dependencies

🎯 Use Cases

Distributed Storage Systems

  • Content Distribution Networks (CDN)
  • Peer-to-peer file sharing
  • Distributed databases
  • Cloud storage backends

Load Balancing

  • Web server load distribution
  • Database sharding
  • Microservice discovery
  • Cache distribution

Research and Education

  • Distributed systems coursework
  • Algorithm implementation studies
  • Network programming tutorials
  • Fault tolerance research

πŸ”¬ Advanced Features

Network Protocols

  • TCP socket communication for reliability
  • Multi-threaded connection handling
  • Asynchronous message processing
  • Connection pooling and management

Data Management

  • File-based storage with directory organization
  • Atomic file operations for consistency
  • Metadata tracking for file locations
  • Garbage collection for orphaned files

Monitoring and Debugging

  • Comprehensive logging of all operations
  • Network topology visualization
  • Performance metrics collection
  • Debug utilities for troubleshooting

πŸš€ Performance Characteristics

Operation Time Complexity Space Complexity Network Hops
Lookup O(log N) O(1) O(log N)
Insert O(log N) O(1) O(log N)
Delete O(log N) O(1) O(log N)
Join O(log N) O(K) O(log N)
Leave O(K) O(K) O(1)

Where N = number of nodes, K = number of files per node

🀝 Contributing

We welcome contributions! Please see our Contributing Guidelines for details.

Development Setup

git clone <repository-url>
cd distributed-hash-table
python3 -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate
pip install -r requirements.txt

πŸ“œ License

This project is licensed under the MIT License - see the LICENSE file for details.

πŸ™ Acknowledgments

  • Consistent Hashing Algorithm - Karger et al.
  • Chord Protocol - Stoica et al.
  • Distributed Systems Principles - Various academic sources

Built with ❀️ for distributed systems education and research

About

A Python-based implementation of a Distributed Hash Table (DHT) utilizing consistent hashing, supporting dynamic node joining and leaving, fault tolerance, and data redistribution.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages