# Lab 2: Entropy and Information Theory

Welcome to your second lab! In this lab we will be covering concepts related to Entropy and Information theory! This lab will help solidify your understanding of entropy, surprisal, and other information theory related concepts. It is highly recommended that you finish Lab 1, as information theory and probability theory are highly related. Be sure to have the `numpy` package installed. It will assist in some of the computations in this, and future labs. 

### Grading Breakdown:
- Problem 1 - 12 Points
- Problem 2 - 12 Points
- Problem 3 - 12 Points
- Problem 4 - 4 Points

Total: 40 Points

In [1]:
# Packages used in this lab
import numpy as np
import random

### Problem 1: Surpise!

Surprisal is a term that tells you how surprised you should be when observing a specific value of a variable, and as such is inversely related to probability.

In laymans terms, if a specific outcome in a sample space has a particularly low probability outcome, the surprise should be high! For example, the probability of winning the lottery is very low, so if a person randomly won the lottery, they would be very surprised.

Surprisal is a way to quantify how "surprised" a person should be when observing an event with a very low probability. We use $log_2$ to calculate suprisal because bits are a common way to measure information.

It is given by the following equation:

$S(X=x) = -log_2(p(x))$

Where p(x) is the probability that the random variable is equal to x.

Your task: Complete the function for calculating surprisal.

In [2]:
def calculate_surprisal(p_x):
    surprisal = -np.log2(p_x)
    return surprisal

In [6]:
# Grading Code: Do not edit this cell.
assert calculate_surprisal(.125) == 3.0
assert calculate_surprisal(.25) == 2.0
assert calculate_surprisal(.5) == 1.0
print("All Tests Passed!")

All Tests Passed!


Pedro's friend Frank the Lizard is trying to catch some crickets for dinner. The probability of catching a cricket during the day is 10%. How surprised would Frank  be that he caught one? How surprised would Frank be if he didn't catch a cricket?

In [7]:
cricket_surprise = calculate_surprisal(.10)
no_cricket_surprise = calculate_surprisal(.90)

In [10]:
assert round(cricket_surprise, 2) == 3.32
assert round(no_cricket_surprise, 2) == 0.15
print("All Tests Passed!")

### Problem 2: Entropy

Before we get into the concept of entropy, we need to cover what an expected value is. An expected value is the following formula:

$E[x] = \sum_{i=0}^{n} p(x_i)x_i$

This equation states that the expected value of a random variable is the sum of the outcomes multiplied by their probabilities. You'll see why this is important in the next part of the problem.

Your task: Fill out the equation for expected value. 

Note: You can get an extra credit point if you don't use any for loops, and only use numpy functions. This is called vectorization and can really speed up your python code.

In [12]:
def calculate_expected_value(x, p_x):
    expected_value = (x * p_x).sum()
    return expected_value

In [21]:
# Grading Code, do not edit this cell.
x = np.array([10,20,25,30])
p_x = np.array([0.1,0.2,0.3,0.4])

assert calculate_expected_value(x,p_x) == 24.5
print("All tests passed!")

All tests passed!


Now that you know what an expected value is, we can introduce entropy. It is the expected surprise of a given system. We can take the equation for surprise and substitute it in for $x_i$ in the expected value equation, giving us the following formula:

$H(X) = -\sum_{i=0}^n p(x_i) \cdot log_2p(x_i)$

Your task: Fill out the equation to calculate the entropy.
Note: If you can do this without for loops, you can get an extra credit point!

In [23]:
def calculate_entropy(p_x):
    entropy = -(p_x * np.log2(p_x)).sum()
    return entropy

In [28]:
# Grading Code, do not edit this cell.
probabilities = np.array([0.05 for i in range(20)])
assert round(calculate_entropy(probabilities),2) == 4.32

probabilities = [0.5, 0.25, 0.125, 0.0625,0.0625]
assert round(calculate_entropy(probabilities),2) == 1.88
print("All tests passed!")

All tests passed!


Frank large variety of insects that he likes to catch. But he wants to know how surprised he would be on average for catching a specific set of insects. The these are the following probabilities of catching each insect type: a cricket is 50%; a grasshopper is 20%; a rolypolly is 25%, and a spider is 5%.

Your task: calculate the expected surprise for catching the insects using your entropy equation 

In [31]:
insect_probs = [.5, .2, .25, .05]
insect_entropy = calculate_entropy(insect_probs)

1.6804820237218405


In [32]:
# Grading Code, do not edit this cell.
assert round(insect_entropy, 2) == 1.68
print("All tests passed!")

All tests passed!


Questions to Answer:
- What would happen to the entropy if the random variable is uniform?
- What would happen to the entropy if the random variable is skewed?

If the random variable is uniform, the entropy is higher. Because it has higher uncertainty.

If the random variable is skewed, the entropy is lower, because the uncertainty is lower.

### Problem 3: We all live in a Yellow Submarine

Pedro the Racoon is waging war against his friend Frank! Someone stole his bag of marbles and he thinks it's Frank. He's decided to do all of his planning and attack from a submarine. Please help Frank find Pedro in the Gulf of Mexico before Pedro does any more damage to the coastal cities!

Submarine is a classic game used to illustrate information theory. We recommend watching this [video](https://www.youtube.com/watch?v=bkLHszLlH34) before starting this problem.

Your task: Fillout the following class functions which will allow the user to play the submarine game.

- information_correct(remaining_guesses: int) -> float: Calculate the information gained from a correct guess.
- information_incorrect(remaining_guesses: int) -> float:
Calculate the information gained from an incorrect guess.
- play() -> None: Fill out the play loop with the following requirements
    - Prompt user to input an x and y location.
    - If they found the submarine, confirm they found the submarine
    - If they did not find the submarine, prompt them to guess again.
    - After every guess:
        - Log their guess.
        - Display the board every time they enter a new guess.
        - Display how much information they gained by the guess
        - Display the total information they've gathered about the submarine.

In [40]:
# This is just one way that the Submarine Game can be coded up.
# As long as they can play the game, and it gives a really small number
# For incorrect guesses, give them points.

class SubmarineGame():
    def __init__(self, grid_size):
        # Randomly place frank's location
        self.grid_size = grid_size
        self.sub_coords = (random.randint(0,grid_size-1), random.randint(0,grid_size-1))
        self.found = False
        self.display_guesses = np.zeros((grid_size, grid_size))
    
    def information_correct(self, remaining_guesses):
        return np.log2(remaining_guesses)

    def information_incorrect(self, remaining_guesses):
        return np.log2(remaining_guesses / (remaining_guesses - 1))
    
    def play(self):
        total_cells = self.grid_size ** 2
        guesses = set()
        information = 0
        while not self.found:
            print("Please enter an integer 1-6")
            x = int(input("Enter in a x location:")) - 1
            y = int(input("Enter in a y location:")) - 1

            if x > self.grid_size or y > self.grid_size:
                print("Please enter a valid digit (1 < x < 6)")
                continue

            if (x,y) in guesses:
                print("You've already guessed this, please choose a separate cell")
                continue

            remaining_guesses = total_cells - len(guesses)

            if(x,y) == self.sub_coords:
                self.found = True
                new_information = self.information_correct(remaining_guesses)
                self.display_guesses[x,y] = -1
                print("Congrats! You found the submarine!")
            else:
                self.display_guesses[x, y] = 1
                new_information = np.log2((remaining_guesses) / (remaining_guesses - 1))
                print("You have not found the submarine, please guess again")
            
            information += new_information
            print(f"Information Gained: {new_information}")
            print(f"Total Information Gathered: {information}")
            self.print_board()
            guesses.add((x,y))


    def print_board(self):
        m, n = self.display_guesses.shape
        columns = []
        columns = [str(i+1) for i in range(m)]

        print(f"  {' '.join(columns)}")
        
        for i in range(m):
            row = [str(int(self.display_guesses[i,j])) for j in range(n)]
            print(f"{i+1} {' '.join(row)}")

In [41]:
# Play the game
game = SubmarineGame(6)
game.play()

Please enter an integer 1-6
You have not found the submarine, please guess again
Information Gained: 0.04064198449734577
Total Information Gathered: 0.04064198449734577
  1 2 3 4 5 6
1 1 0 0 0 0 0
2 0 0 0 0 0 0
3 0 0 0 0 0 0
4 0 0 0 0 0 0
5 0 0 0 0 0 0
6 0 0 0 0 0 0
Please enter an integer 1-6
You have not found the submarine, please guess again
Information Gained: 0.0418201756946269
Total Information Gathered: 0.08246216019197267
  1 2 3 4 5 6
1 1 1 0 0 0 0
2 0 0 0 0 0 0
3 0 0 0 0 0 0
4 0 0 0 0 0 0
5 0 0 0 0 0 0
6 0 0 0 0 0 0
Please enter an integer 1-6
You have not found the submarine, please guess again
Information Gained: 0.04306872189188593
Total Information Gathered: 0.1255308820838586
  1 2 3 4 5 6
1 1 1 1 0 0 0
2 0 0 0 0 0 0
3 0 0 0 0 0 0
4 0 0 0 0 0 0
5 0 0 0 0 0 0
6 0 0 0 0 0 0
Please enter an integer 1-6
Congrats! You found the submarine!
Information Gained: 5.044394119358453
Total Information Gathered: 5.169925001442312
  1 2 3 4 5 6
1 1 1 1 -1 0 0
2 0 0 0 0 0 0
3 0 0 0 0 0

Questions to Answer:
- How much information gained by each incorrect guess?
- How is that different from the amount of information gained from the correct guess?

A very small amount of information is gained each incorrect guess.
A very large amount of information is gained each correct guess.

### Problem 4: Write Up

Your task: Answer the following questions
1. What were some things you learned from this lab?
2. What did you like about this lab?
3. What would you improve?

*Enter Your Answer Here*