# Hashing

## Properties
- hashing is a method of indexing data
- hashing allows large amounts of data to be indexed in constant time using keys generated by hash functions
- keys act as index for value in an array or table
- hash functions generate a unique key from an input
- hash tables let us search in best / avg case constant time
    - worse case linear time when there are many collisions
    
- terminology:
    - hash function: function used to map arbitrary size of data to fixed size
        - a good hash function distributes values uniformly across hash tables
    - key: input data by user
    - hash value: a value that is returned by a hash function
        - cannot get key from hash value, can only get hash value from key
    - hash table: data structure which maps keys to values
    - collision: when two different keys to a hash function produce the same output
        - impossible to avoid due to pigeonhole principal

- collision resolution:
    - direct chaining: use when we have frequent deletions
        - hash table is implemented as an array of linked lists
        - each slot in the array contains a linked list and matches with a hash value
        - if two keys produce a collision, the values are chained to the next reference of the linked list at that slot
        - at a certain point, if the avg length of each chain gets above an arbitrary point, the size of the array is extended, and the values are rehashed to the newly sized array
        - this helps prevent the avg case time complexity from approaching O(n)
    - open addressing: use when input size is known and fixed
        - linear probing
            - collision --> key placed in closest empty cell
        - quadratic probing
            - arbitrary quadratic polynomial added to index until empty cell is found
        - double hashing
            - pass key to a second hash function and add to original index to look for empty cell

- practical applications of hashing
    - security and encryption:
        - usernames and passwords are hashed when its stored on servers
    - file systems
        - location of files on hard disk or cloud drive is based on hash values
    - cryptocurrency and blockchain
        - sha-256 is a cryptographic hash function

### In python, hash tables are the underlying data structure in dictionaries and sets

![image.png](attachment:image.png)

In [3]:
import hashlib

In [4]:
hashlib.sha224(b"this text will be scrambled into a hash value").hexdigest()

'c49a3b6786db1e960aa0b9959fa7e6370265ebaec24d071a5953edb2'