# Hash Functions

## What is a Hash Function?

A function that takes a key (value) in the range of [0, N - 1] and then the function maps that value in a range of [0, m - 1]

* Typically m is less than N (m < N)

A key or value is a digital object. This could be a string, photo of your grandma, the spec of a car, a book, or your address

If two distict objects or keys hash to the same value, they collide. 

## Why do Hash Functions exist?

To simplify comparisons

Proving that a digitial object has not been damaged or tampered with

To locate objects in a table

## Basics of keys

The simplest way to think of a key is as a list of values (EX. [9♡, 8♡, 1♡, 2♡, A♡]).

Back in the day we would normally just add all of the values to get a key value to hash, like (EX. 9♡ + 8♡ + 1♡ + 2♡ + A♡). In this instance image in that 9♡ is 9 * 3 since the heart is the third highest suit in cards.

This old way **is a bad idea** because it often gives small key values that cluster together - this also loses ordering information.

Today we try to make key balues that are large specifically larger than N. Thus we might do something like this - (EX. 9♡ * 52 + 8♡ * 52 + 1♡ * 52 + 2♡ * 52 + A♡ * 52)

We chose 52 because thats the amout of cards in a deck. **This value 52 is called a radix** and it should be declared as a constant.

Treat keys as a squence of values where c<sub>i</sub> is a value in the range 0...r-1 and r is the radix and **compute the key value as the summation of r<sup>i</sup> multiplied by every value of c<sub>i</sub>**

## Hash Functions

Given a key **K** and the associated key value **KV**

Designing a hash function h() is tricky because it really depends on how we want to use it. The goals is to have no collisions when we distribute the key values over the range [0, m - 1].

* For comparisons, locating objects, and checking errors - has one type of hash
* For cyrptography, confirming data hasnt been tampered with - we use another type

### Default Hash Function (Comparisons, Locating, Errors)

The hash function h() is KV % m and thi spreads key values over 0,...,m - 1

m should be a prime number bu sometimes you want your m to be a non-prime - often when this happens you want m to be a power of 2.

So then the hash functions will be: 

$$ h(k) = |m\{k\theta\}| $$

* Theta is just some irrational number

## Universal Hash Function

**If you know the size of the set ofkeys you will be hashing you can use a unversal hash function**

A universal hash function's goal is to give hash values that minimize collisions specifically given two keys k1 and k2 in a set of keys K and a hash function h() that maps to the values 0,...,m - 1 the chance that h(k1) == h(k2) is less than or equal to 1/m

$$ h(x) = ((ax + b) \% N) \% m $$ 

* N in this equation is a prime number that is larger than the set of keys K
* x is a key, a is a value from {1,...,N - 1}
* b is a value from {1,...,N - 1}

If a = 1 and b = 0 you get (x % N) % m

**Benifits of this function:**

* You can scale to 0,...,m - 1 for various values of m
* If you need ot change the hash periodically this function allows for that since you can just modify a and b

## Cryptographic Hash Functions

A cryptographic hash is designed to protect data from tampering.

Just like a regular hash: the function takes a message in the range [0,N-1] and maps that value into a
hash value in the range [0,m-1]

**Warning:** terminology shift – because cryptographic hashes are typically used in secure communication, which uses the word “key” for inputs to an encrypWon algorithm, we will now refer to a “message” instead of a key.

Idea is that a cryptographic hash h() takes a message M and gives a hash value h<sub>m</sub> that can be used to confirm that M has not been changed (that is, if I send you M, and hM you can recompute h(M) and confirm it equals h<sub>m</sub>).

### Benifits of a Cryptographic Hash

Given a message m1 it should be difficult to find a message m2 such that h(m1) == h(m2). This is called second pre-image resistance and means I can’t cause a message you send to be replaced with another message that has the same hash.

It should be difficult to find two messages m1 and m2 that have the same hash value. This is called collision resistance. (The difference from second pre-image resistance is second pre-image resistance starts with a specific message, while collision resistance is more general).

Cryptographic hashes typically involve dividing the input into chunks of 32, 64, 256, or 512 bits and then punng the chunks through mulAple (oQen 64) rounds of xors and bit rotaAons while mixing with each other and/or specially chosen constants.

## Overview of Lecture

Hashing takes a digital object , a key or message, in range 0,...,N-1 and reduces it to a value in the range
0,...,m - 1

* Good hash functions distribute hash values evenly across 0,...,m - 1, even if the key values are not evenly
distributed in 0,...,N-1

Division by a prime usually works well

If you need m to be non-prime (and esp. a power of 2), then you can hash using multiplication, or use
universal hashing

Cryptographic hashes are a specialized type of hash with properties that protect against tampering
