# Information, Order, and Randomness Activity
In this notebook, we'll explore how entropy and randomness reveal patterns (or the lack of them) in text and data.
Specifically, we will:
- Calculate the entropy of text sequence
- Simulate and visualize a random walk

## Entropy of Text

As discussed, entropy can be used as a measure of information in our data. In this case, we're simplifying it by looking at repeated characters in the text.

In [1]:
from collections import Counter
import numpy as np

To compute for the entropy, we will use the equation:
$$
H(X) = -\sum_{i=1}^{n} P(x_i) \log_2 P(x_i)
$$

- Measured in **bits** (if log base 2)
- Higher entropy = more unpredictability
- Lower entropy = more structure/predictability

In [2]:
def shannon_entropy(text):
    freqs = Counter(text)
    total = len(text)
    probs = [f / total for f in freqs.values()]
    return -sum(p * np.log2(p) for p in probs)

Let's try it on different sequences:

In [3]:
text1 = "aaaaaa"
text2 = "abcdef"
text3 = "abracadabra"

print("Entropy of text1:", shannon_entropy(text1))
print("Entropy of text2:", shannon_entropy(text2))
print("Entropy of text3:", shannon_entropy(text3))

Entropy of text1: -0.0
Entropy of text2: 2.584962500721156
Entropy of text3: 2.0403733936884962


#### Questions:
1. How does the entropy change between the text samples?
2. How do you think can this be applicable?

#### Compression as a Proxy for Structure

Note that we mentioned that entropy is  measure of information. We can verify this by looking at the compression size of the text samples.

In [None]:
import zlib

def compressed_size(text):
    return len(zlib.compress(text.encode()))

print("Compressed size of text1:", compressed_size(text1))
print("Compressed size of text2:", compressed_size(text2))
print("Compressed size of text3:", compressed_size(text3))

In this case, the higher the entropy, the higher the information, and the higher the final compressed size will be.

How does this affect transmission of information?

Note that entropy can be "localized" depending on the measurement window. Can you think of a use case where a system would have varying entropy across varying windows?

## The Random Walk

In this section, we will simulate a random walk across one dimension. In this case, we either move up by one step or down by one step. We will then look at the long term behavior of our system.

In [None]:
import matplotlib.pyplot as plt
import random

Let's first simulate one random walker:

In [None]:
steps = 1000
x = [0]
for _ in range(steps):
    move = random.choice([-1, 1])
    x.append(x[-1] + move)

plt.figure(figsize=(10, 4))
plt.plot(x)
plt.title("1D Random Walk")
plt.xlabel("Time")
plt.ylabel("Position")
plt.grid(True)
plt.show()

Try running the code multiple times and see how the patterns changes across runs.
Specifcally, look at:
1) Value of maximum position
2) Final direction (positive or negative)

What do you think is the long term behavior of this system? Let's try to implement a Monte Carlo simulation of different runs.

In [None]:
num_walks = 500
steps = 1000
walks = np.zeros((num_walks, steps))

for i in range(num_walks):
    position = 0
    for t in range(1, steps):
        move = np.random.choice([-1, 1])
        position += move
        walks[i, t] = position

Now let's look at two graphs, the standard deviation and the mean position over time:

In [None]:
std_dev = np.std(walks, axis=0)

plt.figure(figsize=(10, 4))
plt.plot(std_dev, label='Standard Deviation')
plt.title("Standard Deviation of Position Over Time Across Random Walks")
plt.xlabel("Time Step")
plt.ylabel("Standard Deviation")
plt.grid(True)
plt.legend()
plt.tight_layout()
plt.show()

In [None]:
mean_displacement = np.mean(walks, axis=0)

plt.figure(figsize=(10, 4))
plt.plot(mean_displacement, label='Mean Displacement', color='orange')
plt.axhline(0, color='black', linestyle='--', linewidth=1)
plt.title("Mean Displacement Over Time Across Random Walks")
plt.xlabel("Time Step")
plt.ylabel("Mean Displacement")
plt.grid(True)
plt.legend()
plt.tight_layout()
plt.show()

How would you interpret the two plots?

How do you think would this translate to a two-dimensional random walker?

### On Information and Randomness
Note that in this notebook, we have demonstrated how we can quantify information and order via the entropy as well as look at how randomness can evolve over time. The idea is to integrate these topics into our subsequent models to capture possible emergent behavior. Order and randomness represent two ends of a spectrum that shape how we interpret, compress, and predict data.

- Order allows us to find patterns, build structured models, and make accurate forecasts.
- Randomness helps us capture uncertainty, variability, and noise — all of which are essential in real-world data.

Understanding this balance:

- Guides feature selection (which variables carry real signal?)
- Shapes modeling decisions (linear vs probabilistic vs chaotic systems)

Ultimately, learning to quantify and leverage both helps us build smarter, more robust, and more interpretable systems.