# Integer Overflow and Hash Codes

## Major Problem: Integer Overflow

In Java, the largest possible integer is $2,147,483,647$
* If we go over this limit, an overflow occurs, starting back over at the smallest integer, $-2,147,483,648$
* In other words, the next number after `2,147,483,647` is `-2,147,483,648`

In [None]:
int x = 2147483647;
System.out.println(x);
System.out.println(x+1);

// Output will be as the following
>> 2147483647
>> -2147483647

## Consequence of Overflow: Collisions

With base 126, we will run into overflow even for short strings.

* For example, $omens_{126} = 28,196,917,171$
* `asciiToInt('omens')` = -1,867,853,901

Overflow can result in **collisions**,

In [None]:
public void moo(){
    DataIndexedStringSet disi = new DataIndexedStringSet();
    disi.add("melt banana");
    disi.contains("subterrestrial anticosmetic"); // Returns True!
    // asciiToInt for both the strings above is 839099397
}

## Hash Codes and the Pigeonhole Principle

The official term for the number we're computing is called `hash code`
* Wolfram Alpha definition: a hash code "projects a value 
    * From a set with many (or even an infinite number of) members 
    * To a value from a set with a fixed number of (fewer) members
* In our case, our target set is the set of Java integers, which is of the size 4294967296

`Pigeonhole principle` tells us that if there are more than 4294967296 possible items, multiple items will share the same hash code.
* There are more than 4294967296 planets
    * Each has `mass`, `xPos`, `yPos`, `xVel`, `yVel`, `imgName`
* There are more than 4294967296 strings
    * "one, "two", ..., "nineteen quadrillion"..
    
**Bottom line: Collisions are inevitable**

## 2 Fundamental Challenges

How do we resolve hashCode collisions ("melt banana" vs. "subterranean anticosmetic")?
* We call this **collision handling**.

How do we compute a hash code for arbitrary objects?
* We'll call this **computing a hashCode**
    * Example: our hashCode for "melt banana" was 839099497
* For Strings, this was relatively straightforward (treat as a base 27 or base 126 or base 40959 number).