Laura Boltà Ballesteros, NIU: 1705130

## __Lab 1: Correlation Attacks in Stream Ciphers__
---

In [1]:
import scipher
plaintext = scipher.read_string_from_file("07-known-plaintext.txt")
ciphertext = scipher.read_string_from_file("07-known-ciphertext.txt")
secret_ciphertext = scipher.read_string_from_file("07-secret-ciphertext.txt")

### __Exercise 1: Find potential correlations__
We want to analyze our stream cipher looking for possible vulnerabilities. Determine whether there is a correlation between the individual outputs of the LFSRs and the keystream. That is, determine if there is a correlation between any of the LFSRs separately and the keystream.

__(a) Explain how you find these possible correlations analytically.__

> There are several ways to find the possible correlations between the individual outputs of the LFSRs and the keystream.
>
> From the given equation $s(t)=(x_1(t) ∧ x_2(t)) ⊕ (¬x_1(t) ∧ x_3(t))$ we can deduce that:
> - When $x_1=1$, $s=x_2$
> - When $x_1=0$, $s=x_3$
>
> Thus, the output $s(t)$ will depend on $x_2$ half of the time and on $x_3$ the other half.
> To verify whether these inputs are correlated with the output, we analyze the truth table: 
>
> | $x_1$ | $x_2$ | $x_3$ | ($x_1 ∧ x_2$) | ($x_1 ∧ x_3$) | $s(t)$ |
> |----|----|----|------------|-------------|------|
> | 0  | 0  | 0  | 0          | 0           | 0    |
> | 0  | 0  | 1  | 0          | 1           | 1    |
> | 0  | 1  | 0  | 0          | 0           | 0    |
> | 0  | 1  | 1  | 0          | 1           | 1    |
> | 1  | 0  | 0  | 0          | 0           | 0    |
> | 1  | 0  | 1  | 0          | 0           | 0    |
> | 1  | 1  | 0  | 1          | 0           | 1    |
> | 1  | 1  | 1  | 1          | 0           | 1    |
>
> From this can also deduce that:
>
> - $s = x_1 \Rightarrow \frac{1}{2}$
>
> - $s = x_2 \Rightarrow \frac{3}{4}$
>
> - $s = x_3 \Rightarrow \frac{3}{4}$
>
> After deducing, we know that more than half of the times the output will be $x_2$. We also know that the same happens for $x_3$. So, we can conlude that for LFSR2 and LFSR3, the keystream equals their output 75% of the time, meaning there is correlation between them and the keystream.
>
> For LFSR1, the probability is exactly $\frac{1}{2}$, meaning it is not correlated (it is random).
>
> So, LFSR2 and LFSR3 are correlated with the keystream and can be potential points of attack, while LFSR1 shows no exploitable correlation.

__(b) Explain how you validate them empirically.__

> To empirically validate it we could:
>
> 1. Generate random bit sequences for $x_1$, $x_2$, and $x_3$.
>
> 2. For each time step, compute $s(t)=(x_1∧x_2) ⊕ (¬x_1∧x_3)$.
>
> 3. Compare $s$ with each input, and keep track of whether $s=x_1, s=x_2, s=x_3$.
>
> 4. Over many samples, compute the match rate for each ($\frac{matches}{number of time steps}$) and check if they are close to the analytical values we got ($\frac{1}{2}$ for $x_1$ and $\frac{3}{4}$ for $x_2$ and $x_3$).

__(c) Provide or identify the code used to do it.__

In [2]:
from scipher import Cipher
import random
def rand_nonzero_bits(n):
    rand_bit_seq = []
    for _ in range(n):
        rand_bit_seq.append(random.getrandbits(1))
    if any(rand_bit_seq):          # not all zeros
        return rand_bit_seq
    else:                     # if they are all zeros try again
        return rand_nonzero_bits(n)

# 1) generate random bit sequences
s1 = rand_nonzero_bits(Cipher.deg1) # as many bits as the degree of the polynomial
s2 = rand_nonzero_bits(Cipher.deg2)
s3 = rand_nonzero_bits(Cipher.deg3)

# 2) to validate that the sequence is valid and to start the LFSRs
cipher = Cipher((s1, s2, s3))

# 3) generate bits and compare (match rates)
N = 100000
m1 = m2 = m3 = 0
for _ in range(N):
    x1 = cipher.l1.clock()
    x2 = cipher.l2.clock()
    x3 = cipher.l3.clock()
    s = (x1 & x2) ^ ((1 - x1) & x3)
    # sometimes s can equal more than one (x1, x2, x3) that's why no 'else'
    if s == x1: 
        m1 += 1
    if s == x2: 
        m2 += 1
    if s == x3: 
        m3 += 1


rate1 = m1 / N   # expect 0.5
rate2 = m2 / N   # expect 0.75
rate3 = m3 / N   # expect 0.75
print(rate1, rate2, rate3)

0.47531 0.76775 0.7346


---

### __Exercise 2: Recover the keystream of the known ciphertext__

Attempt to reconstruct the maximal portion of the keystream employed to encrypt `NN-known-ciphertext.txt`.

__(a) Explain how you recovered the keystream.__

__(b) Provide or identify the code used to do it.__

---

### __Exercise 3: Find the initial state of a correlated LFSR__

If an LFSR output is correlated with the keystream, find its initial state.

__(a) Explain how you find such an initial state and show the result.__

__(b) How many bits from the keystream did you use? What are the implications of using more or less bits from the keystream to find such correlations?__

__(c) Provide or identify the code used to do it.__

---

### __Exercise 4: Define a strategy for a non-correlated LFSR__

If an LFSR’s output is not correlated to the keystream, define how you can find its initial state.
This strategy should improve on a brute-force search by leveraging the information obtained from correlated LFSR(s).

__(a) Explain how you can find the initial state for the LFSR(s) that are not correlated.__


---

### __Exercise 5: Find the key__

Find the key of the stream cipher.

__(a) Find the key for the stream cipher using the previous results and strategies.__


__(b) Provide or identify the code used to do it.__

__(c) Give an estimate of the computation time required to find it.__

### __Exercise 6: Decrypt the secret message__

Decrypt the secret message NN-secret-ciphertext.txt. with the key obtained earlier. Success or failure should be obvious.

__(a) Retrieve and show the secret message.__
