# Lempel-Ziv complexity for the Simons game

### **Author:** Quilee Simeon (qsimeon@mit.edu)

The Lempel–Ziv complexity is the number of **different sub-strings (or sub-words) encountered as the binary sequence is viewed as a stream (from left to right)**.

Could **this** be a definition of what a *chunk* is?

The core algorithm, which involves searching for unique substrings and updating indices accordingly, remains the same regardless of the alphabet size of the sequence.

For sequences derived from an alphabet larger than binary, you simply ensure that the sequence you input into the function is in a form where each symbol is distinct and treated as such by the algorithm. No conversion to a binary format is necessary, as the algorithm does not rely on the nature of the symbols but rather on the structure and repetition of patterns.

In [1]:
def lempel_ziv_complexity(sequence):
    i, l = 0, 1
    s = 0  # complexity
    sub_strings = set()

    while i + l <= len(sequence):
        w = sequence[i:i+l]
        if w not in sub_strings:
            sub_strings.add(w)
            i = i + l
            l = 1
            s += 1
        else:
            l += 1

    return s

# Example usage:
sequence = "BRRBRBRBB" # binary
lz_complexity = lempel_ziv_complexity(sequence)
print(f"The Lempel-Ziv complexity of the sequence {sequence} is: {lz_complexity}")

# Example usage:
sequence = "YBGGRRGBBYY" # quarternary
lz_complexity = lempel_ziv_complexity(sequence)
print(f"The Lempel-Ziv complexity of the sequence {sequence} is: {lz_complexity}")


The Lempel-Ziv complexity of the sequence BRRBRBRBB is: 5
The Lempel-Ziv complexity of the sequence YBGGRRGBBYY is: 7
