-
Notifications
You must be signed in to change notification settings - Fork 0
Entropy Theory
Entropy is a fundamental concept that appears across multiple scientific disciplines, from information theory to thermodynamics to statistical mechanics. At its core, entropy measures the amount of uncertainty, randomness, or disorder in a system. This page focuses primarily on Shannon entropy from information theory, while also exploring connections to other forms of entropy in science.
Shannon entropy, introduced by Claude Shannon in 1948, quantifies the average amount of information contained in a message or the uncertainty in a random variable. It answers the question: "How much information do we gain, on average, when we learn the outcome of a random event?"
Think of entropy as measuring surprise:
- If you flip a fair coin, you're equally likely to get heads or tails. The outcome is highly uncertain, so the entropy is high.
- If you have a biased coin that lands heads 99% of the time, the outcome is very predictable. The entropy is low because there's little surprise.
- If you roll a fair six-sided die, there's more uncertainty than a coin flip, so the entropy is higher.
For a discrete random variable X with possible outcomes x₁, x₂, ..., xₙ and corresponding probabilities p₁, p₂, ..., pₙ, the Shannon entropy H(X) is:
H(X) = -∑ᵢ pᵢ log₂(pᵢ)
Where:
- The sum is over all possible outcomes
- log₂ gives entropy in units of bits
- By convention, 0 log(0) = 0
- P(heads) = 0.5, P(tails) = 0.5
- H(X) = -(0.5 × log₂(0.5) + 0.5 × log₂(0.5))
- H(X) = -(0.5 × (-1) + 0.5 × (-1)) = 1 bit
- P(heads) = 0.9, P(tails) = 0.1
- H(X) = -(0.9 × log₂(0.9) + 0.1 × log₂(0.1))
- H(X) ≈ 0.47 bits
- Each outcome has probability 1/6
- H(X) = -6 × (1/6 × log₂(1/6)) = log₂(6) ≈ 2.58 bits
- Non-negative: H(X) ≥ 0 always
- Maximum entropy: Achieved when all outcomes are equally likely
- Minimum entropy: H(X) = 0 when one outcome has probability 1 (no uncertainty)
- Additive: For independent variables, H(X,Y) = H(X) + H(Y)
Claude Shannon introduced the concept in his groundbreaking paper "A Mathematical Theory of Communication." He was trying to quantify the fundamental limits of data compression and transmission. Shannon chose the term "entropy" because:
- The mathematical form was similar to thermodynamic entropy
- John von Neumann suggested it, noting "no one knows what entropy really is"
- Ludwig Boltzmann (1870s): Developed statistical interpretation of thermodynamic entropy
- Rudolf Clausius (1850s): Introduced thermodynamic entropy concept
- Andrey Kolmogorov (1930s): Laid probability theory foundations
In thermodynamics, entropy (S) measures the number of microscopic ways to arrange a system:
S = k ln(Ω)
Where k is Boltzmann's constant and Ω is the number of microstates.
Both Shannon and thermodynamic entropy measure "spreading out":
- Shannon: Information spread across possible messages
- Thermodynamic: Energy spread across possible microscopic states
- Data compression: Entropy sets theoretical limits (entropy coding)
- Cryptography: Measuring randomness in keys and passwords
- Channel capacity: Maximum information transmission rate
- Decision trees: Information gain for feature selection
- Cross-entropy loss: Common loss function in neural networks
- Feature selection: Identifying most informative variables
- DNA sequence analysis: Measuring genetic diversity
- Protein folding: Understanding structural complexity
- Evolutionary biology: Quantifying species diversity
- Statistical mechanics: Connecting microscopic and macroscopic properties
- Black hole physics: Bekenstein-Hawking entropy
- Quantum information: Von Neumann entropy
- Market efficiency: Measuring information content in prices
- Risk analysis: Quantifying uncertainty in portfolios
- Econometrics: Model selection and information criteria
Entropy of X given knowledge of Y:
H(X|Y) = -∑∑ p(x,y) log₂(p(x|y))
Amount of information shared between two variables:
I(X;Y) = H(X) - H(X|Y) = H(Y) - H(Y|X)
Measures difference between two probability distributions:
H(p,q) = -∑ p(x) log₂(q(x))
- Base 2: Entropy in bits (most common in computer science)
- Base e: Entropy in nats (natural units)
- Base 10: Entropy in dits or bans
When estimating entropy from samples:
- Plug-in estimator: Use observed frequencies
- Bias correction: Account for finite sample effects
- Smoothing: Handle zero-probability events
- Information Theory: Broader mathematical framework
- Kolmogorov Complexity: Alternative measure of information content
- Maximum Entropy Principle: Method for probability assignment
- Thermodynamic Entropy: Physical entropy concept
- Data Compression: Practical applications of entropy
- Shannon, C. E. (1948). "A Mathematical Theory of Communication." Bell System Technical Journal.
- Cover, T. M., & Thomas, J. A. (2012). Elements of Information Theory. John Wiley & Sons.
- MacKay, D. J. (2003). Information Theory, Inference and Learning Algorithms. Cambridge University Press.
- Boltzmann, L. (1877). "Über die Beziehung zwischen dem zweiten Hauptsatze der mechanischen Wärmetheorie und der Wahrscheinlichkeitsrechnung."
This page provides an introduction to entropy theory with emphasis on Shannon entropy. For specific applications or advanced topics, see the referenced materials and related pages.