## A coding scheme for Simon game sequences

### **Author:** Sol Markman (smarkman@mit.edu)

#### Assumption: This scheme is hierarchical, so that different chunk types take priority over others. For example, repeats of a single color are chunked first, then repeats of multiple colors, and then alternations are chunked with the remaining items.
- RRRGRGRG is compressed to [R]^3 [GR]^2 [G], size=6
- RRRGRG is compressed to [R]^3 [GRG], size=4

#### Types of chunks/compression and their sizes:

#### 1) Repeats
Size = (size of subsequence) + 1. This assumes that size does not increase with the number of repeats, and 
repeats are only beneficial for size>2.

a) Single color repeats
- RRRR is compressed to [R]^4, size=2

b) Multicolor repeats
- RGRGRG is compressed to [RG]^3, size=3
    
c) Nested repeats
- RRRGGGRRRGGG is compressed to [ [R]^3 [G]^3 ]^2, size=5

#### 2) Alternations

#### 3) Cycles

#### ADDITIONAL: 4) Exposure compression



In [107]:
# 1a) ONLY: Chunking only repeats of a single color

test_sequences = ['BBBRRRYYYGGG', 'RRRZGZRRR', 'BBBBGRY', 'RRRYGYGYG', 'GBGBB', 
                  'BBBBBBBrrrBBBBBB', 'GGGGGGGG', 'ABCD', 'BGGGR']
answers = [8, 7, 5, 8, 5, 6, 2, 4, 4]

s_idx = 8
sequence = test_sequences[s_idx]
print(sequence, answers[s_idx])

def single_color_repeats(seq, return_chunked=False):
    num_rep = 1
    size = 1
    i = 0
    l = 1
    chunked_seq = []
    while(i+l < len(seq)):
        if seq[i+l] == seq[i+l-1]:
            num_rep += 1
            l += 1
            if i+l == len(seq):
                size += 1
                if num_rep > 2:
                    chunked_seq.append(f'[{seq[i]}]^{num_rep}')
                else:
                    chunked_seq.append(seq[-2])
                    chunked_seq.append(seq[-1])
        else:
            if num_rep > 2:
                chunked_seq.append(f'[{seq[i]}]^{num_rep}')
                size += 2
                i += num_rep
                l, num_rep = 1, 1
            else:
                chunked_seq.append(seq[i+l-1])
                size += 1
                i += 1
                l, num_rep = 1, 1
            if i+l == len(seq):
                chunked_seq.append(seq[-1])

    if return_chunked:
        return size, chunked_seq
    return size

print(single_color_repeats(sequence, return_chunked=True))

BGGGR 4
(4, ['B', '[G]^3', 'R'])


In [108]:
# TODO: 1a) + 1b) Chunking repeats of any number of colors
# THIS CODE IS WRONG

test_sequences = ['BBBRRRYYYGGG', 'RRRZGZRRR', 'BBBBGRY', 'RRRYGYGYG', 'GBGBB', 
                  'BBBBBBBrrrBBBBBB', 'GRGRGRZDZDZDZTTTTTTTGGG']
answers = [8, 7, 5, 5, 4, 6, 11]

s_idx = 6
sequence = test_sequences[s_idx]
print(sequence, answers[s_idx])

def find_multicolor_repeats(subseq):
    j = 0
    count = 1
    length = 2 # initialize pattern length

    if j + 2 * length > len(subseq):
        return subseq

    result = []
    while j + 2 * length <= len(subseq):
        if subseq[j:j+length] == subseq[j+length:j+2*length]:
            pattern = subseq[j:j+length]
            count += 1
            k = j + 2*length
            while k + length <= len(subseq) and subseq[k:k+length] == pattern:
                count += 1
                k += length
        if count >= 2:
            pattern_str = ''.join(subseq[j:j+length])
            result.append([f'[{pattern_str}]{count}'])
            #result.append(subseq[k-length:]) # remainder
            return result # if a repeat was found, break and return the result
        else:
            result.append(subseq[j])
            j += 1
    return result

def hyp3(seq, return_chunked=True):

    # first, chunk simple repeats using hyp1 code
    _, seq_hyp1 = single_color_repeats(seq)

    # join together the parts that are not yet chunked and look for multicolor repeats
    chunked_seq = []
    i = 0
    subseq = []
    while i < len(seq_hyp1):

        if '[' in seq_hyp1[i]: # if element is already chunked

            # look for multicolor repeats in the preceding unchunked subsequence
            if len(subseq) > 0:
                res = find_multicolor_repeats(subseq)
                chunked_seq.append(res) # update the chunked sequence with possible new multi-repeat chunks

            chunked_seq.append(seq_hyp1[i]) # the existing simple repeat chunk
            subseq = [] # reset

            i += 1

        else:
            subseq.append(seq_hyp1[i])
            i += 1

            if i == len(seq_hyp1) and len(subseq) > 0: # if it's the end of the sequence
                res = find_multicolor_repeats(subseq)
                chunked_seq.append(res) # update the chunked sequence with possible new multi-repeat chunks
    
    return len(chunked_seq), chunked_seq



GRGRGRZDZDZDZTTTTTTTGGG 11
