### Entrory (in bits - amount of information)

In [4]:
p₁ = 0.99; @show -log2(p₁);
p₂ = 0.01; @show -log2(p₂);

-(log2(p₁)) = 0.014499569695115089
-(log2(p₂)) = 6.643856189774724


### Joint Entropy

$H(X,Y) = - E[log \, p(X,Y)] = - \sum\limits_{x \in \mathcal{X}} \sum\limits_{y \in \mathcal{Y}}$

### Proof of DPI using Marko Chains

$P(X,Z|Y) \overset{\tiny\text{(br)}}{=} \frac{P(X,Y,Z)}{P(Y)} \overset{\tiny\text{(br)}}{=} \frac{P(Z|Y,X) \, P(X,Y)}{P(Y)} \overset{\tiny\text{(mc+br)}}{=} P(Z|Y) P(X|Y)$ 

#### Rules Coding
- (br) => Bayes Rules
- (cr) => Chain Rules
- (cre) => Conditioning Reduces Entropy
- (maxH) => Maximum Entropy

$E[\mathcal{L}(\xi(X^n))]$

### Shannon's Coding Theorem :

#### Proof:

- Let us build a random code  
- We generate $ M $ code words $ X^n(1), X^n(2), \dots, X^n(M) $.  
  > According to $ P(X^n) = \prod_{i=1}^n P(X_i), \; X^n = (X_1, X_2, \dots, X_n) $  
- Assume that message $ i $ is transmitted ($ X^n(i) $ is transmitted).  
- Decoder receives $ Y^n $.  
- Decoder finds message $ i $ such that $ (X^n(i), Y^n) $ is jointly typical.  
  > Let’s see what joint typicality (J.T.) means.  

In [2]:
using Random, Statistics, Distributions

# Function to generate random codewords
function generate_codewords(M, n, p_dist)
    # Generate M codewords, each of length n, using distribution p_dist
    return [rand(p_dist, n) for _ in 1:M]
end

# Function to simulate channel (add noise to transmitted codeword)
function simulate_channel(Xn, noise_dist)
    return Xn .+ rand(noise_dist, length(Xn))
end

# Joint typicality check (simplified for i.i.d. Gaussian noise)
function is_jointly_typical(Xn, Yn, threshold)
    n = length(Xn)
    joint_average = mean(Xn .* Yn)
    return abs(joint_average - mean(Xn) * mean(Yn)) < threshold
end

# Decoder: Find the message index based on joint typicality
function decode(Yn, codebook, threshold)
    for (i, Xn) in enumerate(codebook)
        if is_jointly_typical(Xn, Yn, threshold)
            return i  # Return the index of the jointly typical codeword
        end
    end
    return nothing  # If no codeword is jointly typical, return nothing
end

# Parameters
M = 4               # Number of codewords
n = 10              # Length of each codeword
p_dist = Bernoulli(0.5)  # Probability distribution for codewords
noise_dist = Normal(0, 0.5)  # Noise distribution (Gaussian)
threshold = 0.1     # Threshold for joint typicality

# Generate random codewords
codebook = generate_codewords(M, n, p_dist)

# Transmit message i
i_transmitted = 2
Xn_transmitted = codebook[i_transmitted]
Yn_received = simulate_channel(Xn_transmitted, noise_dist)

# Decode the received message
i_decoded = decode(Yn_received, codebook, threshold)

# Print results
println("Transmitted message index: ", i_transmitted)
println("Decoded message index: ", i_decoded)

Transmitted message index: 2
Decoded message index: 1


This image appears to describe a calculation related to joint typicality in the context of information theory. Specifically, it deals with the sizes of the typical sets and the probability of joint typicality.

Explanation:

	1.	Typical Sets:
	•	￼ and ￼ are the sizes of the ￼-typical sets for ￼ and ￼, respectively.
	•	Here, ￼.
	•	￼ is the size of the joint ￼-typical set for the pair ￼.
	•	Here, ￼.
	2.	Joint Probability:
	•	The probability of the pair ￼ belonging to the joint ￼-typical set is bounded by:
￼
	•	Substituting the given sizes:
￼

Context:

This calculation is likely part of a discussion on random coding in information theory, where:
	•	￼ and ￼ are sequences (e.g., codewords and received sequences).
	•	Typical sets are used to analyze the performance of communication systems, particularly error probabilities and decoding regions.
	•	Joint typicality ensures that the decoder selects the correct message.

If you’d like more clarification or a detailed explanation of how this ties into a broader context (e.g., Shannon’s theorem, random coding argument), let me know!