Hashing Agenda

    What is Hashing? Why use Hashing? 
    Hash Functions 
    Hash Collision Resolution Techniques 
    Implement Hash Table - Python Dictionary
    Hashing Operations - Insert, Search, Delete 

Hashing:

    Hashing is a technique used to map data of arbitrary size 
    to fixed-size values. 
    It's commonly used in data structures 
    like hash tables to provide fast data retrieval.

What is Hashing? Why Use Hashing?

    Hashing is a process that converts input data into a fixed-size integer, 
    known as a hash code, using a hash function.

    Purpose: Hashing is used to efficiently store and retrieve data, typically in hash tables. It helps in achieving constant time complexity for operations like insertion, deletion, and search, assuming a good hash function and collision resolution strategy.

Key Components of Hashing:

    1.Hash Function:

    A function that takes a key and returns an integer, usually the index where the corresponding value is stored in the hash table.

    Example: hash(key) = index

    2.Hash Table:

    A data structure that stores the key-value pairs, where each key maps to a unique value.

    The size of the hash table is typically much smaller than the number of possible keys, so the hash function compresses the key space into a smaller space of possible indices.

    3.Collisions:

    A collision occurs when two different keys hash to the same index in the hash table. This is handled using techniques like chaining or open addressing.

Why Use Hashing?

    Hashing is used to provide fast access to data. It is widely used in scenarios where efficient data retrieval is required. The main advantages of using hashing are:

    1.Fast Lookups:

    Hashing allows for constant time complexity on average for common operations like search, insert, and delete operations.

    Operations like retrieving a value using its key can typically be done in O(1) time, which is much faster than searching through a list or tree, which can take O(n) or O(logn).

    2. Efficient Memory Usage:

    Hash tables allow efficient use of memory because they use an array to store data, and only a small portion of memory is used at a time, depending on the hash function.

    Hashing is especially useful when dealing with large amounts of data because it organizes the data efficiently for quick access.


    3. Associative Mapping:

    Hashing provides a way to map keys to values. This makes it easy to build associative data structures like hash maps or dictionaries where each key is unique, and it is used to quickly access the corresponding value.


    4. Collision Handling:

    Even when two keys map to the same index, hashing uses collision resolution techniques (like chaining, open addressing) to manage the conflicts, ensuring efficient access even in these cases.

Applications of Hashing

    Hashing is widely used across various domains due to its ability to provide fast data retrieval and management. 
    Below are some of the key applications of hashing:

Hash Tables/Hash Maps

    A hash table (or hash map) is a data structure that implements an associative array, which can store key-value pairs and retrieve values using keys. Hash tables are the most common application of hashing.

    Use Case: Python dictionaries (dict), Java HashMap, C++ unordered_map are all built using hashing.

Example:

    Store and retrieve student records using a student ID as the key.
    Contact list where names (or phone numbers) act as keys and details are the values.

Database Indexing

    Hashing is used in databases to index data efficiently. Indexing allows for fast retrieval of records using a key, such as a customer ID or a product ID.

Use Case: 

    When a database needs to quickly find a record based on a key, hashing ensures O(1) lookup time on average.

Example:

    In a customer database, customer IDs can be hashed to store and retrieve records quickly.

Password Storage and Verification

    Hashing is used for securely storing passwords. When a user sets a password, it is hashed and stored. During login, the entered password is hashed again and compared to the stored hash.

Use Case: 
    
    Almost all modern authentication systems use hashing to store passwords, ensuring that even if the database is compromised, attackers cannot easily retrieve the plain-text passwords.

Example:

    Passwords are hashed using functions like SHA-256, bcrypt, or PBKDF2 before being stored in a database.


Data Deduplication

    Hashing is used to identify duplicate data blocks or files by comparing their hash values. If two files or data blocks have the same hash value, they are likely identical, and one of them can be discarded.

Use Case:

    Used in file storage systems, backup systems, and cloud storage to reduce redundant data and save storage space.

Example:

    In a cloud storage system, hashing is used to avoid storing multiple copies of the same file. Files are compared based on their hash values to check for duplicates.


Digital Signatures and Cryptographic Hashing

    Hashing is a fundamental part of cryptographic algorithms, where a hash of the message is encrypted to provide a digital signature. The digital signature is then used to verify the integrity and authenticity of the message.

Use Case: 
    
    Widely used in securing communications, digital certificates, and software distribution.

Example:

    Digital signatures for secure transactions in blockchain technologies, such as Bitcoin.
    Verifying that software or data has not been tampered with during transmission by comparing hash values (checksum verification).


Load Balancing in Distributed Systems

    Hashing is used to distribute tasks or requests across multiple servers in a load-balanced distributed system. By using a consistent hash function, the system can evenly distribute the load without needing central coordination.

    Use Case: 
    
    Ensures efficient resource usage in distributed systems like web servers, databases, and cloud environments.

Example:

    Assigning client requests to a server in a web server farm by hashing the client’s IP address or request parameters.

URL Shortening Services

    Hashing is used to create short versions of long URLs. A hash function is applied to the original URL, and the resulting hash is used as the shortened URL identifier.

Use Case

    URL shortening services like bit.ly or TinyURL use hashing to generate unique, shortened URLs for long web addresses.

Example

    The long URL "https://www.example.com/some/long/path" can be hashed into a shortened version, such as "https://bit.ly/abc123".


File Integrity Verification (Checksums)

    Hashing is used to verify the integrity of files during transmission or storage. A file is hashed before transmission, and the hash value (checksum) is compared after transmission to ensure the file was not altered.

    Use Case: 
    
    Used in software downloads, data backups, and network transmission to ensure data integrity.

Example:

    When downloading software, a checksum (MD5, SHA-256) is often provided to ensure the file was not corrupted or tampered with.


Message Digest (Cryptographic Hash Functions)

    Hashing is used to generate message digests or hash codes for data. These digests are used to verify the integrity of the data and detect any tampering.

Use Case: 
    
    Used in digital signatures, file integrity checks, and secure communications.

Example:

    A digital signature is created by hashing a document and encrypting the hash with a private key. The receiver can verify the document's authenticity by decrypting the hash and comparing it with the document's hash.


Advantages of Hashing:

    1.Efficiency:
    
    Hashing allows for O(1) average time complexity for search, insert, and delete operations, making it ideal for scenarios where fast 
    lookups are essential.

    2.Scalability:

    Hashing works well even as the size of the dataset grows, as long as a good hash function is used and the hash table is resized appropriately to avoid excessive collisions.

    3.Flexibility:
    
    Hashing can be applied to any type of key (integers, strings, etc.) by defining an appropriate hash function.


Disadvantages of Hashing:

    1.Collisions:

    Hashing is susceptible to collisions, and poor collision resolution strategies can lead to performance degradation.

    2.Memory Overhead:

    Hash tables require additional memory for maintaining an array, especially if the table is sparsely populated (i.e., not many elements relative to the table size).
    
    3.Not Sorted:
    
    Hash tables do not maintain any order of elements. If ordering is required, another data structure (e.g., a balanced tree) might be more appropriate.

Hashing is a powerful technique that allows for fast data retrieval, making it ideal for tasks like creating hash maps, storing passwords securely, or building indices in databases. 

By using a hash function, we can quickly map keys to values, making hashing essential for building efficient data structures and algorithms.

In [4]:
# Example of a Hash Map in Python using a dictionary
# Creating a hash map
hash_map = {}

# Inserting key-value pairs into the hash map
hash_map['name'] = 'Harish'
hash_map['age'] = 25
hash_map['city'] = 'New York'

# Retrieving values using the key
print("Name:", hash_map['name'])   # Output: Harish
print("Age:", hash_map['age'])     # Output: 25
print("City:", hash_map['city'])   # Output: New York

# Updating a value
hash_map['age'] = 26

# Deleting a key-value pair
del hash_map['city']

print("Updated Hash Map:", hash_map)  # Output: {'name': 'Harish', 'age': 26}

Name: Harish
Age: 25
City: New York
Updated Hash Map: {'name': 'Harish', 'age': 26}


Hash Functions

    A hash function takes an input (or "key") and returns a fixed-size string or integer, which represents the input.

    Properties:

    Deterministic: The same input will always produce the same output.

    Fast: The function should compute the hash code quickly.

    Uniform Distribution: The hash codes should be distributed uniformly across the output range to minimize collisions.

In [17]:
# Using Python's built-in hash() function

# Hashing an integer
num_hash = hash(25)
print(f"Hash of 25: {num_hash}")

# Hashing a string
string_hash = hash("Hello")
print(f"Hash of 'Hello': {string_hash}")

# Hashing a tuple
tuple_hash = hash((10, 20, 30))
print(f"Hash of (10, 20, 30): {tuple_hash}")


# Hashing a list
tuple_string_hash = hash(('apple','banana'))

print(tuple_string_hash)


# Hashing a list
#list_hash = hash(['apple','banana'])

#print(list_hash)
# TypeError: unhashable type: 'list'

Hash of 25: 25
Hash of 'Hello': 3726879484534751429
Hash of (10, 20, 30): 3952409569436607343
8808222693234927697


Note:

    The hash() function in Python can return different values across different executions of the program (due to security reasons in Python versions 3.3+). 
    This is done to protect against certain types of attacks (hash collision attacks).

    If you want consistent hash values across different runs,
    you should use cryptographic hash functions like SHA256 or MD5.

Cryptographic Hash Functions with hashlib

    Python's hashlib library provides several cryptographic hash functions that can generate fixed-size hash values (also called digests). Common cryptographic hash functions include:

    • MD5 (128-bit hash)
    • SHA-1 (160-bit hash)
    • SHA-256 (256-bit hash)

    Cryptographic hash functions are designed to be secure, meaning that small changes in input data result in completely different hash values.

In [22]:
#Example Using SHA-256:
import hashlib

# Example of hashing a string using SHA-256
def hash_sha256(data):
    # Encode the string to bytes, as hashlib requires bytes input
    encoded_data = data.encode()
    # Create a SHA-256 hash object
    print("Encoded data",encoded_data)

    sha256_hash = hashlib.sha256()

    # Update the hash object with the encoded data

    sha256_hash.update(encoded_data)

    # Return the hexadecimal representation of the hash
    return sha256_hash.hexdigest()

# Input data
data = "Hello World"
hashed_value = hash_sha256(data)
print(f"SHA-256 Hash of 'Hello World': {hashed_value}")

hash_number = hash_sha256('9975072320')
print("Secure number '9975072320' ",hash_number)


Encoded data b'Hello World'
SHA-256 Hash of 'Hello World': a591a6d40bf420404a011733cfb7b190d62c65bf0bcda32b57b277d9ad9f146e
Encoded data b'9975072320'
Secure number '9975072320'  93f780392638afeeca8592e73b87b20836c2f31498b2ea26ae6b3f9634d7a674


MD5 Hash Function Example:

    MD5 is an older hash function that produces a 128-bit hash value.

    It is considered cryptographically broken and unsuitable for further use in secure applications but can still be used for non-secure purposes (like checksums).

In [3]:
import hashlib

# Example of hashing a string using MD5
def hash_md5(data):
    # Encode the string to bytes, as hashlib requires bytes input
    encoded_data = data.encode()
    # Create an MD5 hash object
    md5_hash = hashlib.md5()
    # Update the hash object with the encoded data
    md5_hash.update(encoded_data)
    # Return the hexadecimal representation of the hash
    return md5_hash.hexdigest()

# Input data
data = "Hello World"
hashed_value = hash_md5(data)
print(f"MD5 Hash of 'Hello World': {hashed_value}")

MD5 Hash of 'Hello World': b10a8db164e0754105b7a99be72e3fe5


SHA-1 Hash Function Example

    SHA-1 (Secure Hash Algorithm 1) produces a 160-bit hash value, represented as 40 hexadecimal digits. It is more secure than MD5, but it has also been considered insecure for cryptographic purposes due to vulnerabilities to collision attacks.

In [5]:
import hashlib

# Example of hashing a string using SHA-1
def hash_sha1(data):
    # Encode the string to bytes, as hashlib requires bytes input
    encoded_data = data.encode()
    # Create a SHA-1 hash object
    sha1_hash = hashlib.sha1()
    # Update the hash object with the encoded data
    sha1_hash.update(encoded_data)
    # Return the hexadecimal representation of the hash
    return sha1_hash.hexdigest()

# Input data
data = "Hello World"
hashed_value = hash_sha1(data)
print(f"SHA-1 Hash of 'Hello World': {hashed_value}")

SHA-1 Hash of 'Hello World': 0a4d55a8d778e5022fab701977c5d840bbc486d0


File Integrity Check Using Hash Functions:

    One common use of hashing is to verify the integrity of files. A hash of the file's contents is computed, and any changes in the file will result in a completely different hash value.

    Example: Generating an SHA-256 Hash for a File

In [25]:
with open('example1.txt','w') as file1:
    file1.write('This is sample data file.')

with open('example1.txt') as fileobject:
    print(fileobject.read())

This is sample data file.


In [14]:
import hashlib

# Function to calculate the hash of a file using SHA-256

def file_sha256(filepath):
    sha256_hash = hashlib.sha256()
    # Open the file in binary mode
    with open(filepath, "rb") as f:
        # Read the file in chunks
        for byte_block in iter(lambda: f.read(4096), b""):
            sha256_hash.update(byte_block)
    # Return the hexadecimal representation of the file hash
    return sha256_hash.hexdigest()

# Example file path (create 'example.txt')
file_path = 'example.txt'
file_hash = file_sha256(file_path)

print(f"SHA-256 Hash of the file '{file_path}': {file_hash}")

SHA-256 Hash of the file 'example.txt': 5cb05fd84119edbd36c69bee5e45a77abba471ecbf8263948ebe0cf11ceb6409


Consistent Hashing for Distributed Systems:

    Consistent Hashing is used in distributed systems to map data to servers/nodes. 
    It uses a hash function to distribute the workload efficiently across servers and allows easy addition or removal of servers.

    # Example using a simple hash to demonstrate consistent hashing

In [7]:
import hashlib

# Simulate a hash function to assign data to servers
def consistent_hashing(key):
    # Use SHA-256 to create a hash value of the key
    hash_value = hashlib.sha256(key.encode()).hexdigest()
    # Convert the hash value to an integer and return the server ID (mod 4 to simulate 4 servers)
    return int(hash_value, 16) % 4

# Keys representing data items
keys = ["data1", "data2", "data3", "data4", "data5"]

# Simulate assigning data to servers
for key in keys:
    server_id = consistent_hashing(key)
    print(f"Data item '{key}' is assigned to Server {server_id}")


Data item 'data1' is assigned to Server 1
Data item 'data2' is assigned to Server 0
Data item 'data3' is assigned to Server 0
Data item 'data4' is assigned to Server 2
Data item 'data5' is assigned to Server 2


Implement Hash Table - Python Dictionary
Python's built-in dict type implements a hash table and provides efficient operations for insertion, search, and deletion.

In [3]:
# Example of using Python Dictionary as a Hash Table
hash_table = {}

# Insert
hash_table['key1'] = 'value1'
hash_table['key2'] = 'value2'

# Search
value = hash_table.get('key1')  # 'value1'

# Delete
del hash_table['key2']

# Check existence
exists = 'key1' in hash_table  # True

Operations
Insert: Add a new key-value pair to the hash table.
Search: Retrieve the value associated with a given key.
Delete: Remove the key-value pair from the hash table.

In [5]:
#Insert Operation:
def insert(hash_table, key, value):
    hash_table[key] = value

#Search Operation:
def search(hash_table, key):
    return hash_table.get(key, None)  # Returns None if key is not found

#Delete Operation:
def delete(hash_table, key):
    if key in hash_table:
        del hash_table[key]

In [8]:
# Initialize hash table
hash_table = {}

# Insert elements
insert(hash_table, 'name', 'Surendra')

insert(hash_table, 'age', 47)

# Search for elements
print(search(hash_table, 'name'))  # Output: Surendra
print(search(hash_table, 'age'))   # Output: 47

# Delete an element
delete(hash_table, 'age')

# Verify deletion
print(search(hash_table, 'age'))   # Output: None

Surendra
47
None


Collision Resolution Techniques

Collision: Occurs when two or more keys hash to the same value.

Techniques:
Chaining: Uses a linked list to store multiple items at the same index in the hash table.

Open Addressing: Searches for the next available slot within the hash table. Common methods include linear probing, quadratic probing, and double hashing.

When two different keys in a hash table map to the same index, it causes a hash collision. 

There are several techniques to resolve these collisions, allowing the hash table to function efficiently. 

In this explanation, we will cover two primary collision resolution techniques and demonstrate them using Python:

1.	Chaining (Separate Chaining)
2.	Open Addressing

Chaining (Separate Chaining)

    In chaining, each index in the hash table points to a linked list (or another data structure like a list or set). 
    
    If multiple keys hash to the same index, their values are added to the list at that index.

    Python Example: Chaining

In [37]:
class HashTableChaining:
    def __init__(self, size):
        """
        Initializes a hash table with the given size. Each index contains an empty list to store key-value pairs.
        """
        self.size = size
        self.table = [[] for _ in range(size)]  # Initialize the table with empty lists

    def hash_function(self, key):
        """
        Simple hash function that uses the modulo operator to map keys to table indices.
        """
        return hash(key) % self.size

    def insert(self, key, value):
        """
        Inserts a key-value pair into the hash table using chaining for collision resolution.
        """
        index = self.hash_function(key)
        # Check if the key already exists and update it
        for pair in self.table[index]:
            if pair[0] == key:
                pair[1] = value
                return
        # If the key does not exist, add a new pair to the list at the hashed index
        self.table[index].append([key, value])

    def search(self, key):
        """
        Searches for the value associated with the key. Returns None if the key is not found.
        """
        index = self.hash_function(key)
        for pair in self.table[index]:
            if pair[0] == key:
                return pair[1]
        return None

    def delete(self, key):
        """
        Deletes the key-value pair associated with the key, if it exists.
        """
        index = self.hash_function(key)
        for i,pair in enumerate(self.table[index]):
            if pair[0] == key:
                del self.table[index][i]
                return

    def display(self):
        """
        Displays the hash table.
        """
        for i, chain in enumerate(self.table):
            print(f"Index {i}: {chain}")

# Example usage of HashTableChaining
hash_table = HashTableChaining(10)

# Insert key-value pairs
hash_table.insert("apple", 10)
hash_table.insert("banana", 20)
hash_table.insert("grapes", 30)
hash_table.insert("orange", 40)

# Cause collisions (these keys will hash to the same index as 'apple' and 'banana')
hash_table.insert("mango", 50)
hash_table.insert("watermelon", 60)
hash_table.insert("grapes", 70)
hash_table.insert("date",78)

# Display the hash table
hash_table.display()

# Search for a key
print("\nSearch for 'banana':", hash_table.search("banana"))

# Delete a key
hash_table.delete("grapes")
hash_table.display()

# Delete a key
hash_table.delete("date")
hash_table.display()


Index 0: [['mango', 50]]
Index 1: [['orange', 40]]
Index 2: []
Index 3: [['grapes', 70]]
Index 4: []
Index 5: [['watermelon', 60]]
Index 6: [['banana', 20]]
Index 7: [['date', 78]]
Index 8: [['apple', 10]]
Index 9: []

Search for 'banana': 20
Index 0: [['mango', 50]]
Index 1: [['orange', 40]]
Index 2: []
Index 3: []
Index 4: []
Index 5: [['watermelon', 60]]
Index 6: [['banana', 20]]
Index 7: [['date', 78]]
Index 8: [['apple', 10]]
Index 9: []
Index 0: [['mango', 50]]
Index 1: [['orange', 40]]
Index 2: []
Index 3: []
Index 4: []
Index 5: [['watermelon', 60]]
Index 6: [['banana', 20]]
Index 7: []
Index 8: [['apple', 10]]
Index 9: []


In [39]:
hashtable1 = [[['apple',89],['banana',80]]]

print(hashtable1[0][0])


['apple', 89]


Explanation

    Each index in the hash table is a list. When two keys hash to the same index (collision), they are appended to the list at that index.

    For example, "apple" and "mango" both hash to index 4, so they both reside in the list at index 4.


Open Addressing

In open addressing, all keys are stored in the hash table itself (no external structures like lists). When a collision occurs, the algorithm looks for another empty spot in the hash table using a probing method. 

Common probing techniques include:

    Linear Probing: Looks for the next available slot in a linear manner.

    Quadratic Probing: Uses a quadratic function to find the next available slot.

    Double Hashing: Uses a second hash function to compute a new index when a collision occurs.

Linear Probing Example:

In [40]:
class HashTableLinearProbing:
    def __init__(self, size):
        """
        Initializes a hash table with the given size. Each index contains None initially.
        """
        self.size = size
        self.table = [None] * size  # Initialize the table with None

    def hash_function(self, key):
        """
        Simple hash function that uses the modulo operator to map keys to table indices.
        """
        return hash(key) % self.size

    def linear_probe(self, key, index):
        """
        Linear probing method to find the next available slot.
        """
        original_index = index
        while self.table[index] is not None:
            index = (index + 1) % self.size
            # If we've cycled through the entire table and found no empty slot, return None (table is full)
            if index == original_index:
                return None
        return index

    def insert(self, key, value):
        """
        Inserts a key-value pair into the hash table using linear probing for collision resolution.
        """
        index = self.hash_function(key)
        if self.table[index] is not None:  # Collision detected
            index = self.linear_probe(key, index)  # Find the next available slot
            if index is None:
                print("Hash table is full!")
                return
        self.table[index] = (key, value)

    def search(self, key):
        """
        Searches for the value associated with the key. Returns None if the key is not found.
        """
        index = self.hash_function(key)
        original_index = index
        while self.table[index] is not None:
            if self.table[index][0] == key:
                return self.table[index][1]
            index = (index + 1) % self.size
            if index == original_index:  # Full circle, key not found
                break
        return None

    def delete(self, key):
        """
        Deletes the key-value pair associated with the key, if it exists.
        """
        index = self.hash_function(key)
        original_index = index
        while self.table[index] is not None:
            if self.table[index][0] == key:
                self.table[index] = None
                return
            index = (index + 1) % self.size
            if index == original_index:  # Full circle, key not found
                break

    def display(self):
        """
        Displays the hash table.
        """
        for i, pair in enumerate(self.table):
            print(f"Index {i}: {pair}")

# Example usage of HashTableLinearProbing
hash_table_lp = HashTableLinearProbing(10)

# Insert key-value pairs
hash_table_lp.insert("apple", 10)
hash_table_lp.insert("banana", 20)
hash_table_lp.insert("grapes", 30)
hash_table_lp.insert("orange", 40)

# Cause collisions (these keys will hash to the same index as 'apple' and 'banana')
hash_table_lp.insert("mango", 50)
hash_table_lp.insert("watermelon", 60)

# Display the hash table
hash_table_lp.display()

# Search for a key
print("\nSearch for 'banana':", hash_table_lp.search("banana"))

# Delete a key
hash_table_lp.delete("grapes")
hash_table_lp.display()

Index 0: ('mango', 50)
Index 1: ('orange', 40)
Index 2: None
Index 3: ('grapes', 30)
Index 4: None
Index 5: ('watermelon', 60)
Index 6: ('banana', 20)
Index 7: None
Index 8: ('apple', 10)
Index 9: None

Search for 'banana': 20
Index 0: ('mango', 50)
Index 1: ('orange', 40)
Index 2: None
Index 3: None
Index 4: None
Index 5: ('watermelon', 60)
Index 6: ('banana', 20)
Index 7: None
Index 8: ('apple', 10)
Index 9: None


Explanation

    When inserting "mango" and "watermelon", collisions occur because the hash function produces the same index as earlier values. 
    Linear probing is used to find the next available slot.
    "grapes" is deleted, and the corresponding slot is set to None.

Quadratic Probing for Hash Collision Resolution

    Quadratic Probing is a collision resolution technique used in hash tables to handle situations where two or more keys hash to the same index (collision). 

    Instead of searching for the next available slot linearly (as in linear probing), quadratic probing uses a quadratic function to determine the interval between probes.

    In quadratic probing, if a collision occurs at index i, the algorithm searches for the next available slot using the following quadratic formula:

Quadratic Probing Formula:

    New Index = (hash(key) + i²) % Table Size

    Where:

    •i is the number of probes (starts from 1 and increments by 1 on each collision).

    •hash(key) is the original hash value of the key.

    •Table Size is the size of the hash table.
    
    This means that instead of moving sequentially (like in linear probing), the index jumps based on a quadratic function (i.e., 1², 2², 3², etc.), thus avoiding the clustering issue that linear probing can cause.

Advantages of Quadratic Probing:

    Reduces Clustering: Quadratic probing helps reduce primary clustering, which is a problem in linear probing where contiguous blocks of filled slots lead to more collisions.

    Simple to Implement: It's easy to implement and avoids the need for chaining or more complex methods like double hashing.


    Disadvantages of Quadratic Probing:

    Secondary Clustering: Quadratic probing does not completely eliminate secondary clustering, where keys that hash to the same initial index follow the same sequence of probing positions.

    Requires a Prime-Sized Table: The size of the table must be a prime number or carefully chosen to ensure that quadratic probing checks every possible slot (i.e., full coverage of the table).


In [26]:
class HashTableQuadraticProbing:
    def __init__(self, size):
        """
        Initializes the hash table with a fixed size.
        """
        self.size = size
        self.table = [None] * size  # Initialize the table with None values
        self.item_count = 0  # To keep track of the number of inserted items

    def hash_function(self, key):
        """
        A simple hash function that returns an index based on the key.
        :param key: The key to hash.
        :return: The hash value (index in the table).
        """
        return hash(key) % self.size

    def insert(self, key, value):
        """
        Inserts a key-value pair into the hash table using quadratic probing for collision resolution.
        :param key: The key to insert.
        :param value: The value associated with the key.
        """
        index = self.hash_function(key)
        i = 1  # Quadratic probing starts with i = 1
        original_index = index

        # Quadratic probing to resolve collisions
        while self.table[index] is not None and self.table[index][0] != key:
            index = (original_index + i ** 2) % self.size  # Apply quadratic probing formula
            i += 1
            if i > self.size:  # If the probing sequence goes beyond the table size
                print("Hash table is full!")
                return

        # Insert the key-value pair in the table
        self.table[index] = (key, value)
        self.item_count += 1
        print(f"Inserted ({key}: {value}) at index {index}")

    def search(self, key):
        """
        Searches for a key in the hash table and returns its associated value.
        :param key: The key to search for.
        :return: The value associated with the key, or None if the key is not found.
        """
        index = self.hash_function(key)
        i = 1  # Quadratic probing starts with i = 1
        original_index = index

        # Quadratic probing to search for the key
        while self.table[index] is not None:
            if self.table[index][0] == key:
                return self.table[index][1]  # Return the value associated with the key
            index = (original_index + i ** 2) % self.size  # Apply quadratic probing formula
            i += 1
            if i > self.size:  # If we loop beyond the table size
                return None

        return None  # Key not found

    def delete(self, key):
        """
        Deletes a key-value pair from the hash table.
        :param key: The key to delete.
        """
        index = self.hash_function(key)
        i = 1  # Quadratic probing starts with i = 1
        original_index = index

        # Quadratic probing to find the key
        while self.table[index] is not None:
            if self.table[index][0] == key:
                self.table[index] = None  # Set the slot to None to delete the key
                self.item_count -= 1
                print(f"Deleted key '{key}' from index {index}")
                return
            index = (original_index + i ** 2) % self.size  # Apply quadratic probing formula
            i += 1
            if i > self.size:  # If we loop beyond the table size
                print(f"Key '{key}' not found")
                return

        print(f"Key '{key}' not found")

    def display(self):
        """
        Displays the contents of the hash table.
        """
        for i, item in enumerate(self.table):
            print(f"Index {i}: {item}")


# Example Usage of Hash Table with Quadratic Probing
hash_table = HashTableQuadraticProbing(7)  # Table size 7 (a prime number)

# Insert key-value pairs
hash_table.insert("apple", "fruit")
hash_table.insert("carrot", "vegetable")
hash_table.insert("banana", "fruit")
hash_table.insert("spinach", "vegetable")
hash_table.insert("mango", "fruit")

# Display the hash table
hash_table.display()

# Search for keys
print("\nSearch for 'carrot':", hash_table.search("carrot"))
print("Search for 'banana':", hash_table.search("banana"))

# Delete a key
hash_table.delete("banana")
hash_table.display()

Inserted (apple: fruit) at index 6
Inserted (carrot: vegetable) at index 0
Inserted (banana: fruit) at index 5
Inserted (spinach: vegetable) at index 1
Inserted (mango: fruit) at index 2
Index 0: ('carrot', 'vegetable')
Index 1: ('spinach', 'vegetable')
Index 2: ('mango', 'fruit')
Index 3: None
Index 4: None
Index 5: ('banana', 'fruit')
Index 6: ('apple', 'fruit')

Search for 'carrot': vegetable
Search for 'banana': fruit
Deleted key 'banana' from index 5
Index 0: ('carrot', 'vegetable')
Index 1: ('spinach', 'vegetable')
Index 2: ('mango', 'fruit')
Index 3: None
Index 4: None
Index 5: None
Index 6: ('apple', 'fruit')


Implement Hash Table - Python Dictionary 

    In Python, a hash table is implemented using a dictionary (or dict). 
    
    The Python dict provides an efficient way to store and retrieve key-value pairs, with an average time complexity 
    of O(1) for insertions, deletions, and lookups. 
    
    However, if you'd like to build a hash table from scratch using concepts like hash functions, collision resolution, and key-value storage, here's how to implement one in Python.

    We will implement a simple hash table with:

    Separate chaining for collision resolution using linked lists.

    Basic operations: Insert, Search, Delete.

    Implementation of Hash Table in Python

In [10]:
class HashNode:
    def __init__(self, key, value):
        """
        A node in the linked list used for separate chaining.
        Stores a key-value pair.
        """
        self.key = key
        self.value = value
        self.next = None  # Pointer to the next node (for collision resolution)


class HashTable:
    def __init__(self, size=10):
        """
        Initializes a hash table with the given size. Uses separate chaining for collision resolution.
        :param size: The number of buckets in the hash table.
        """
        self.size = size
        self.table = [None] * size  # Initialize the table with 'None' for each bucket

    def hash_function(self, key):
        """
        A simple hash function that computes the hash value of a key using the modulo operator.
        :param key: The key to be hashed.
        :return: The index in the hash table (bucket) where the key-value pair should be stored.
        """
        return hash(key) % self.size

    def insert(self, key, value):
        """
        Inserts a key-value pair into the hash table.
        If the key already exists, updates the value.
        :param key: The key to be inserted.
        :param value: The value associated with the key.
        """
        index = self.hash_function(key)
        head = self.table[index]

        # If the key already exists, update the value
        while head is not None:
            if head.key == key:
                head.value = value
                return
            head = head.next

        # If the key does not exist, insert the new node at the beginning of the list
        new_node = HashNode(key, value)
        new_node.next = self.table[index]
        self.table[index] = new_node

    def search(self, key):
        """
        Searches for a key in the hash table and returns its associated value.
        :param key: The key to search for.
        :return: The value associated with the key, or None if the key is not found.
        """
        index = self.hash_function(key)
        head = self.table[index]

        # Traverse the linked list at the index
        while head is not None:
            if head.key == key:
                return head.value
            head = head.next
        return None  # Key not found

    def delete(self, key):
        """
        Deletes a key-value pair from the hash table.
        :param key: The key to be deleted.
        """
        index = self.hash_function(key)
        head = self.table[index]
        prev = None

        # Traverse the linked list at the index
        while head is not None:
            if head.key == key:
                # If the key is found, remove the node
                if prev is None:
                    # If it's the first node in the list
                    self.table[index] = head.next
                else:
                    prev.next = head.next
                return
            prev = head
            head = head.next

    def display(self):
        """
        Displays the contents of the hash table.
        """
        for i in range(self.size):
            print(f"Bucket {i}: ", end="")
            head = self.table[i]
            while head:
                print(f"({head.key}, {head.value}) -> ", end="")
                head = head.next
            print("None")


# Example Usage
hash_table = HashTable(10)

# Insert key-value pairs
hash_table.insert("apple", 100)
hash_table.insert("banana", 200)
hash_table.insert("orange", 300)

# Cause collisions (keys that hash to the same index)
hash_table.insert("grapes", 400)
hash_table.insert("watermelon", 500)

# Display the hash table
hash_table.display()

# Search for keys
print("\nSearch for 'banana':", hash_table.search("banana"))
print("Search for 'grapes':", hash_table.search("grapes"))

# Delete a key
hash_table.delete("banana")
hash_table.display()

Bucket 0: None
Bucket 1: (orange, 300) -> None
Bucket 2: None
Bucket 3: (grapes, 400) -> None
Bucket 4: None
Bucket 5: (watermelon, 500) -> None
Bucket 6: (banana, 200) -> None
Bucket 7: None
Bucket 8: (apple, 100) -> None
Bucket 9: None

Search for 'banana': 200
Search for 'grapes': 400
Bucket 0: None
Bucket 1: (orange, 300) -> None
Bucket 2: None
Bucket 3: (grapes, 400) -> None
Bucket 4: None
Bucket 5: (watermelon, 500) -> None
Bucket 6: None
Bucket 7: None
Bucket 8: (apple, 100) -> None
Bucket 9: None


Explanation of Key Components

    1.Hash Function:

    The hash function computes the index by applying the hash() function to the key and taking the result modulo the size of the hash table. This ensures that the index falls within the bounds of the table.

    2. Chaining for Collision Resolution:

    Each bucket in the hash table points to the head of a linked list (represented by HashNode). When a collision occurs (i.e., two keys hash to the same index), the new key-value pair is added to the linked list at that index.


    3.Insertion:
     When inserting, if the key already exists in the hash table, its value is updated. Otherwise, a new node is added to the beginning of the linked list at the appropriate index.


    4.Search:
    The search() method traverses the linked list at the given index to find the key. If the key is found, its associated value is returned.


    5.Deletion:
    The delete() method removes the node associated with the given key from the hash table. It handles cases where the node is the first in the list or is in the middle/end.

    6.Display:

    The display() method prints the contents of the hash table, showing each bucket and its linked list of key-value pairs.

Key Operations:

    Operation	Time Complexity(Average) Time Complexity(Worst)

    Insert	        O(1)	                    O(n)

    Search	        O(1)	                    O(n)
    
    Delete	        O(1)	                    O(n)

Hashing Operations like Insert, Search, Delete 

    In a hash table, the key operations include Insert, Search, and Delete. 
    
    These operations are crucial for managing and retrieving key-value pairs efficiently. 

    Below is a breakdown of how these operations work in a hash table along with Python implementations for each operation.

1.Insert Operation

    The insert operation adds a key-value pair to the hash table. The key is hashed to determine the index in the hash table, and the value is stored at that index. If a collision occurs (i.e., two keys map to the same index), a collision resolution technique is used.
    
    Steps for Insert:
    1.	Compute the hash value of the key using a hash function.
    2.	If the slot is empty, insert the key-value pair at the computed index.
    3.	If the slot is already occupied (collision), resolve it using a technique like chaining or open addressing (in this example, we’ll use chaining).

In [11]:
class HashTable:
    def __init__(self, size=10):
        self.size = size
        self.table = [[] for _ in range(size)]  # Each slot is initialized to an empty list for chaining

    def hash_function(self, key):
        return hash(key) % self.size

    def insert(self, key, value):
        """
        Insert a key-value pair into the hash table.
        """
        index = self.hash_function(key)
        # Check if the key already exists in the list
        for pair in self.table[index]:
            if pair[0] == key:
                pair[1] = value  # If key exists, update the value
                return
        # If key does not exist, append a new pair
        self.table[index].append([key, value])
        print(f"Inserted ({key}: {value}) at index {index}")


2.Search Operation

    The search operation retrieves the value associated with a given key. The key is hashed to find the index in the table where it should be stored, and then we check if the key exists at that index.

    Steps for Search:
    1.	Compute the hash value of the key using the hash function.
    2.	Look for the key in the list stored at the computed index.
    3.	If found, return the corresponding value; otherwise, return None.


#Python Implementation:

    def search(self, key):
        """
        Search for a key in the hash table and return its value.
        """
        index = self.hash_function(key)
        # Look for the key in the list at the index
        for pair in self.table[index]:
            if pair[0] == key:
                return pair[1]  # Return the value associated with the key
        return None  # Key not found

3. Delete Operation
The delete operation removes the key-value pair from the hash table. The key is hashed to find the index in the table, and then we remove the key-value pair if it exists.
Steps for Delete:
1.	Compute the hash value of the key using the hash function.
2.	Look for the key in the list stored at the computed index.
3.	If found, remove the key-value pair.
Python Implementation:

    def delete(self, key):
        """
        Delete a key-value pair from the hash table.
        """
        index = self.hash_function(key)
        # Look for the key in the list at the index
        for i, pair in enumerate(self.table[index]):
            if pair[0] == key:
                del self.table[index][i]  # Remove the key-value pair
                print(f"Deleted ({key}) from index {index}")
                return
        print(f"Key ({key}) not found")

Full Hash Table Implementation with All Operations

    Below is the full implementation of a hash table with insert, search, and delete operations using separate chaining for collision resolution:

In [12]:
class HashTable:
    def __init__(self, size=10):
        """
        Initializes the hash table with the given size.
        Each bucket in the table contains an empty list for separate chaining.
        """
        self.size = size
        self.table = [[] for _ in range(size)]  # Initialize each bucket as an empty list

    def hash_function(self, key):
        """
        Computes the hash index for a given key.
        :param key: The key to be hashed.
        :return: The index in the hash table.
        """
        return hash(key) % self.size

    def insert(self, key, value):
        """
        Inserts a key-value pair into the hash table.
        If the key already exists, updates its value.
        :param key: The key to insert.
        :param value: The value associated with the key.
        """
        index = self.hash_function(key)
        # Check if the key already exists and update the value
        for pair in self.table[index]:
            if pair[0] == key:
                pair[1] = value
                print(f"Updated ({key}: {value}) at index {index}")
                return
        # If key does not exist, append it to the list at the computed index
        self.table[index].append([key, value])
        print(f"Inserted ({key}: {value}) at index {index}")

    def search(self, key):
        """
        Searches for a key in the hash table and returns its value.
        :param key: The key to search for.
        :return: The value associated with the key, or None if the key is not found.
        """
        index = self.hash_function(key)
        # Look for the key in the list at the computed index
        for pair in self.table[index]:
            if pair[0] == key:
                return pair[1]  # Return the associated value
        return None  # Key not found

    def delete(self, key):
        """
        Deletes a key-value pair from the hash table.
        :param key: The key to delete.
        """
        index = self.hash_function(key)
        # Look for the key in the list at the computed index
        for i, pair in enumerate(self.table[index]):
            if pair[0] == key:
                del self.table[index][i]  # Remove the key-value pair
                print(f"Deleted ({key}) from index {index}")
                return
        print(f"Key ({key}) not found")

    def display(self):
        """
        Displays the contents of the hash table.
        """
        for i, bucket in enumerate(self.table):
            print(f"Index {i}: {bucket}")


# Example Usage
hash_table = HashTable(10)

# Insert key-value pairs
hash_table.insert("apple", 100)
hash_table.insert("banana", 200)
hash_table.insert("orange", 300)

# Cause collisions (keys that hash to the same index)
hash_table.insert("grapes", 400)
hash_table.insert("watermelon", 500)

# Display the hash table
hash_table.display()

# Search for keys
print("\nSearch for 'banana':", hash_table.search("banana"))
print("Search for 'grapes':", hash_table.search("grapes"))

# Delete a key
hash_table.delete("banana")
hash_table.display()

Inserted (apple: 100) at index 8
Inserted (banana: 200) at index 6
Inserted (orange: 300) at index 1
Inserted (grapes: 400) at index 3
Inserted (watermelon: 500) at index 5
Index 0: []
Index 1: [['orange', 300]]
Index 2: []
Index 3: [['grapes', 400]]
Index 4: []
Index 5: [['watermelon', 500]]
Index 6: [['banana', 200]]
Index 7: []
Index 8: [['apple', 100]]
Index 9: []

Search for 'banana': 200
Search for 'grapes': 400
Deleted (banana) from index 6
Index 0: []
Index 1: [['orange', 300]]
Index 2: []
Index 3: [['grapes', 400]]
Index 4: []
Index 5: [['watermelon', 500]]
Index 6: []
Index 7: []
Index 8: [['apple', 100]]
Index 9: []
