## The Circle Game

        We're captive on the carousel of time
        We can't return we can only look
        Behind from where we came
        And go round and round and round
        In the circle game
        -- Joni Mitchell, `Circle Game`
 

Suppose you want to compare a string, character-by-character, to itself and see how many matches there are.


In [3]:
def matches(seq: str) -> int:
    """Count matches of seq with itself."""
    _matches = 0
    for chr in seq:
        if chr == chr:
            _matches += 1
    return _matches

You know the answer already: they all match, so it's just the number of characters in the string.

In [4]:
seq = "hello, world"
matches(seq) == len(seq)

True

Change `seq` above to something else and press enter if you think `"hello, world"` is special.

But what if you shift the string a little, like this?

        hello, world
         hello, world

Now, almost nothing matches -- there's only the second `l` in the original `hello`, which matches the first `l` in the shifted version, below it.  

You also have to special-case the characters at the beginning of the original and end of the shifted version, which don't have anything to compare to. You can get rid of that nuisance by treating the string as a circle, moving the letters shifted off the end around to the beginning.

        hello, world
        dhello, worl

Let's rewrite `matches()` to do just that.

In [5]:
def matches(seq: str, shift: int) -> int:
    """Count matches of seq to a circular permutation, shifted by shift."""
    matches = 0
    length = len(seq)
    for pos, chr in enumerate(seq):
        # just use modular arithmetic to find the character to compare to
        shifted_chr = seq[(pos+shift) % length]
        if chr == shifted_chr:
            matches += 1
    return matches

A shift of 0 should give the length of the string,

In [6]:
matches(seq, 0) == len(seq)

True

and a shift of 1 should just leave a single, matching `l`.

In [7]:
matches(seq, 1)

1

The rotated version of the string is `seq[shift:] + seq[:shift]`,
so we can see the matches for each rotation like this:


In [8]:
for shift in range(len(seq)):
    n_match = matches(seq, shift)
    print(f"{seq[shift:] + seq[:shift]}: {n_match}")

hello, world: 12
ello, worldh: 1
llo, worldhe: 0
lo, worldhel: 0
o, worldhell: 2
, worldhello: 1
 worldhello,: 0
worldhello, : 1
orldhello, w: 2
rldhello, wo: 0
ldhello, wor: 0
dhello, worl: 1


After it rotates half way around, we start getting the same counds in reverse.
Of course! It's a circle, so the number of matches for a shift of one to the right

    hello, world
    dhello, worl

will be the same as the number of matches for a shift of one to the left
 
    hello, world
    ello, worldh

The first `len(seq) // 2` shifts give all possible mis-alignments. Even including the original sequence aligned with itself cuts the time to run the previous example nearly in half:

In [9]:
for shift in range(len(seq)//2 + 1):
    n_match = matches(seq, shift)
    print(f"{seq[shift:] + seq[:shift]} : {n_match=}")

hello, world : n_match=12
ello, worldh : n_match=1
llo, worldhe : n_match=0
lo, worldhel : n_match=0
o, worldhell : n_match=2
, worldhello : n_match=1
 worldhello, : n_match=0


Surprisingly, the fourth mis-alignment has a pair of matches, more than the first! Its second `o`, from `world`, matches the original `o` in `hello`, and its penultimate letter, `l`, matches the `l` in the original `world`. 

        hello, world
        o, worldhell

A longer shift gives a better match.

It's worth wrapping that into a function.

In [10]:
def shift_matches(seq: str) -> list[int]:
    """Return number of matches of sequence with each possible shift of itself."""
    _shift_matches = []
    for shift in range(len(seq)//2 + 1):
        _shift_matches.append(matches(seq, shift))
    return _shift_matches

In [11]:
shift_matches(seq)

[12, 1, 0, 0, 2, 1, 0]

Rearranging that a little makes it easy to display the shifts with the best matches.

In [12]:
from collections import defaultdict

def best_matches(seq: str) -> list[tuple[int, list[int]]]:
    """Return list of pairs: (# of matches, [shift(s) with that many matches]).
    
    Sort the list, reporting positions with most matches first.
    """
    positions = defaultdict(list)
    for pos, matches in enumerate(shift_matches(seq)):
        positions[matches].append(pos)
    return sorted(positions.items(), reverse=True)

In [13]:
best_matches(seq)

[(12, [0]), (2, [4]), (1, [1, 5]), (0, [2, 3, 6])]

As we've seen already, the most matches are with the unshifted string (no surprise), the second most are at a shift of 4.
Shifts of 1 and 5 both find only one match, and the rest have none. 